The Problem

The Problem

Modern embeddings are architecturally broken for production AI.

They are hardware-dependent, non-reproducible, impossible to diff, and expensive to store at scale. Every float changes when you switch hardware. Two embeddings of the same text on different machines produce different bytes.

There is no git for semantic state.

As systems grow larger, more distributed, and more persistent, these limitations compound. Memory becomes unstable. Routing becomes probabilistic. Storage and transmission become increasingly costly. The field has been working around this with heuristics, versioned indexes, and expensive re-indexing cycles.

The issue is not performance. It is representation. SEMQ fixes the root cause.

Traditional embedding

FP32 dense vector

[-0.321, 0.552, -0.113, 0.984, -0.447, 0.218, -0.032, 0.761, ...]
[+3, -1, +2, 0, -2, +1, 0, +3, ...]

Traditional high-dimensional FP32 embedding (768 floats).

Same meaning, smaller symbolic representation.

Why it matters

Dense FP32 vectors balloon storage and query costs. SEMQ keeps the semantic meaning while shrinking payloads so indexes stay light and queries fly.

Less I/O per lookup
Smaller indexes & RAM footprint
Cheaper scaling at every tier
Directional semantics fully preserved
Drop-in for existing embedding pipelines

Determinism

Same input, same output

The same input produces exactly the same output across any architecture.

Portability

Stable file format

.semq is a stable file format. Save on Mac, load on Linux, bit-identical.

Diffability

Version like code

99.97% per-dimension agreement across independent snapshots — run semq diff between model versions just like you would with source code.

Under the hood

How It Works

A topology-preserving projection.

SEMQ encodes pairs of vector components as angular coordinates on the unit circle. Each angle is a compact summary of a local relationship within the embedding. Taken together, the angle sequence is an address in semantic space that is independent of floating-point precision or hardware rounding.

The encoding is proven to preserve pairwise similarity to within a configurable error bound. As the number of angle dimensions increases, the bound tightens.

0.321-0.1130.761

FP32 vector

High-dimensional floats

θ

Angular projection

Pairs → unit-circle angles

+3 -1 +2 0 -2 +1+3 0 -1

Semantic address

Compact, stable, diffable

Angular encoding

Component pairs projected onto the unit circle as compact angles.

Bounded error

Pairwise similarity preserved within a configurable ε bound.

Hardware-free

No floating-point or rounding dependencies. Identical across any machine.

Systems design

Infrastructure Impact

A new primitive for AI systems.

Symbolic angles compose naturally with existing infrastructure. They are small enough to store in a database column, stable enough to use as cache keys, and structured enough to diff across model versions. SEMQ provides the semantic state layer that slots under your existing AI stack.

Storage

Fits in a database column

Caching

Stable exact-match cache keys

Versioning

Diff across model versions

Observability

Loggable structured sequences

Where SEMQ sits

Your AI application

LLM agents, search UI, pipelines

semantic queries & results

Vector DB / Search index

Pinecone, Weaviate, pgvector, Qdrant

compact angle codes

SEMQ semantic state layer

YOU ARE HERE
encodedecodediffcacheversion

raw FP32 vectors

Embedding model

OpenAI, Cohere, sentence-transformers

What SEMQ Does: A New Embedding Domain

SEMQ doesn't just compress embeddings – it converts them into a symbolic angular domain, collapsing redundant magnitude and preserving semantic direction.

continuous · euclidean · high-dimensional

Traditional FP32 Space (Euclidean)

In the usual embedding space, vectors live as high-dimensional floating-point numbers. Magnitude (scale) and direction are mixed, which distorts distances and adds redundant information.

  • High-dimensional FP32 vectors with large memory footprint.
  • Magnitude influences distance, even when it's not semantically relevant.
  • Storage, bandwidth and vector DB cost grow quickly with scale.

SEMQ Symbolic Angular Domain

SEMQ collapses magnitude and keeps only the semantic direction, converting vectors into compact symbolic angular codes that are easier to store, index and analyze.

  • Magnitudes are collapsed; direction (angle) is preserved.
  • Embeddings become discrete symbolic codes instead of raw floats.
  • Sharper semantic clusters and new indexing strategies become possible.

Before: continuous high-dimensional floats. Now: compact symbolic angles that keep the same meaning.

Foundations

Why This Is a New Domain

SEMQ is not just another compression trick. It introduces a symbolic angular domain for embeddings — similar to how JPEG introduced a transform domain for images.

01

Numeric Domain

FP32 / FP16 / INT8 embeddings live as continuous real-valued vectors. Magnitude and direction are mixed; distances depend on both, even when magnitude is not semantically important.

  • High memory footprint
  • Continuous floats, hard to index symbolically
  • No clear patterns at the code level
02

Hash Domain

Hashing methods (like SimHash) map vectors into compact binary signatures. They are fast and efficient, but lossy by design and not rehydratable back into a meaningful vector.

  • Great for approximate similarity
  • Not invertible, hard to recover geometry
  • Codes are opaque, not structured as angles
03

SEMQ Angular Domain

SEMQ discards redundant magnitude and keeps semantic direction, encoding embeddings as symbolic angular codes. Compact, rehydratable, and both geometric and symbolic.

  • Magnitude collapsed, angle preserved
  • Discrete symbolic codes (alphabet of directions)
  • Rehydratable via LUTs, compatible across models

Analogy

From Raw Pixels to JPEG — From FP32 to SEMQ

For images, raw pixels were never the final form. JPEG introduced a transform domain where compression and storage become efficient without breaking visual meaning. Pixels are no longer the unit of storage — frequency coefficients are.

For embeddings, FP32 vectors don't need to be the final form either. SEMQ introduces a symbolic angular domain where the unit of storage is a compact code representing direction, not raw floating-point magnitudes.

JPEG is to pixels what SEMQ aims to be to embeddings — a new representation space where compression, indexing, and analysis make much more sense.

For infrastructure

  • Embeddings cheaper to store and query at massive scale
  • Move part of your retrieval stack to symbolic operations
  • Edge and mobile devices carry more semantics with fewer bits
  • New index types and hardware instructions become possible

For research & products

  • Opens a new space of algorithms in a symbolic angular domain
  • Enables interpretability and pattern mining over codes
  • Decouples semantic representation from floating-point formats
  • Can be layered on top of any existing embedding model

SEMQ's core idea is simple but powerful: if meaning lives in the angle between vectors, then embeddings deserve their own angular, symbolic representation domain — not just smaller floats.