The Problem
The Problem
Modern embeddings are architecturally broken for production AI.
They are hardware-dependent, non-reproducible, impossible to diff, and expensive to store at scale. Every float changes when you switch hardware. Two embeddings of the same text on different machines produce different bytes.
There is no git for semantic state.
As systems grow larger, more distributed, and more persistent, these limitations compound. Memory becomes unstable. Routing becomes probabilistic. Storage and transmission become increasingly costly. The field has been working around this with heuristics, versioned indexes, and expensive re-indexing cycles.
The issue is not performance. It is representation. SEMQ fixes the root cause.
Traditional embedding
FP32 dense vector
[-0.321, 0.552, -0.113, 0.984, -0.447, 0.218, -0.032, 0.761, ...][+3, -1, +2, 0, -2, +1, 0, +3, ...]Traditional high-dimensional FP32 embedding (768 floats).
Same meaning, smaller symbolic representation.
Why it matters
Dense FP32 vectors balloon storage and query costs. SEMQ keeps the semantic meaning while shrinking payloads so indexes stay light and queries fly.
Determinism
Same input, same output
The same input produces exactly the same output across any architecture.
Portability
Stable file format
.semq is a stable file format. Save on Mac, load on Linux, bit-identical.
Diffability
Version like code
99.97% per-dimension agreement across independent snapshots — run semq diff between model versions just like you would with source code.
Under the hood
How It Works
A topology-preserving projection.
SEMQ encodes pairs of vector components as angular coordinates on the unit circle. Each angle is a compact summary of a local relationship within the embedding. Taken together, the angle sequence is an address in semantic space that is independent of floating-point precision or hardware rounding.
The encoding is proven to preserve pairwise similarity to within a configurable error bound. As the number of angle dimensions increases, the bound tightens.
FP32 vector
High-dimensional floats
Angular projection
Pairs → unit-circle angles
Semantic address
Compact, stable, diffable
Angular encoding
Component pairs projected onto the unit circle as compact angles.
Bounded error
Pairwise similarity preserved within a configurable ε bound.
Hardware-free
No floating-point or rounding dependencies. Identical across any machine.
Systems design
Infrastructure Impact
A new primitive for AI systems.
Symbolic angles compose naturally with existing infrastructure. They are small enough to store in a database column, stable enough to use as cache keys, and structured enough to diff across model versions. SEMQ provides the semantic state layer that slots under your existing AI stack.
Storage
Fits in a database column
Caching
Stable exact-match cache keys
Versioning
Diff across model versions
Observability
Loggable structured sequences
Where SEMQ sits
Your AI application
LLM agents, search UI, pipelines
semantic queries & results
Vector DB / Search index
Pinecone, Weaviate, pgvector, Qdrant
compact angle codes
SEMQ semantic state layer
YOU ARE HEREraw FP32 vectors
Embedding model
OpenAI, Cohere, sentence-transformers
What SEMQ Does: A New Embedding Domain
SEMQ doesn't just compress embeddings – it converts them into a symbolic angular domain, collapsing redundant magnitude and preserving semantic direction.
Traditional FP32 Space (Euclidean)
In the usual embedding space, vectors live as high-dimensional floating-point numbers. Magnitude (scale) and direction are mixed, which distorts distances and adds redundant information.
- High-dimensional FP32 vectors with large memory footprint.
- Magnitude influences distance, even when it's not semantically relevant.
- Storage, bandwidth and vector DB cost grow quickly with scale.
SEMQ Symbolic Angular Domain
SEMQ collapses magnitude and keeps only the semantic direction, converting vectors into compact symbolic angular codes that are easier to store, index and analyze.
- Magnitudes are collapsed; direction (angle) is preserved.
- Embeddings become discrete symbolic codes instead of raw floats.
- Sharper semantic clusters and new indexing strategies become possible.
Before: continuous high-dimensional floats. Now: compact symbolic angles that keep the same meaning.
Foundations
Why This Is a New Domain
SEMQ is not just another compression trick. It introduces a symbolic angular domain for embeddings — similar to how JPEG introduced a transform domain for images.
Numeric Domain
FP32 / FP16 / INT8 embeddings live as continuous real-valued vectors. Magnitude and direction are mixed; distances depend on both, even when magnitude is not semantically important.
- —High memory footprint
- —Continuous floats, hard to index symbolically
- —No clear patterns at the code level
Hash Domain
Hashing methods (like SimHash) map vectors into compact binary signatures. They are fast and efficient, but lossy by design and not rehydratable back into a meaningful vector.
- —Great for approximate similarity
- —Not invertible, hard to recover geometry
- —Codes are opaque, not structured as angles
SEMQ Angular Domain
SEMQ discards redundant magnitude and keeps semantic direction, encoding embeddings as symbolic angular codes. Compact, rehydratable, and both geometric and symbolic.
- —Magnitude collapsed, angle preserved
- —Discrete symbolic codes (alphabet of directions)
- —Rehydratable via LUTs, compatible across models
Analogy
From Raw Pixels to JPEG — From FP32 to SEMQ
For images, raw pixels were never the final form. JPEG introduced a transform domain where compression and storage become efficient without breaking visual meaning. Pixels are no longer the unit of storage — frequency coefficients are.
For embeddings, FP32 vectors don't need to be the final form either. SEMQ introduces a symbolic angular domain where the unit of storage is a compact code representing direction, not raw floating-point magnitudes.
JPEG is to pixels what SEMQ aims to be to embeddings — a new representation space where compression, indexing, and analysis make much more sense.
For infrastructure
- —Embeddings cheaper to store and query at massive scale
- —Move part of your retrieval stack to symbolic operations
- —Edge and mobile devices carry more semantics with fewer bits
- —New index types and hardware instructions become possible
For research & products
- —Opens a new space of algorithms in a symbolic angular domain
- —Enables interpretability and pattern mining over codes
- —Decouples semantic representation from floating-point formats
- —Can be layered on top of any existing embedding model
SEMQ's core idea is simple but powerful: if meaning lives in the angle between vectors, then embeddings deserve their own angular, symbolic representation domain — not just smaller floats.