SEMQ

The Problem

Modern AI systems rely on continuous vector embeddings as their semantic foundation.

While effective for short-term similarity, continuous representations lack structural guarantees. They drift over time, are expensive to persist at scale, and couple semantic meaning to magnitude and precision.

As systems grow larger, more distributed, and more persistent, these limitations compound. Memory becomes unstable. Routing becomes probabilistic. Storage and transmission become increasingly costly.

The issue is not performance. It is representation.

Embedding size

Choose a format

Each embedding per vector

Selected

FP32

Estimated size

30 KB

Estimated cost per 1M queries

100%vs FP32 baseline

Baseline cost reference for a vector database at FP32 precision.

Traditional embedding

FP32 dense vector

[-0.321, 0.552, -0.113, 0.984, -0.447, 0.218, -0.032, 0.761, ...]

[+3, -1, +2, 0, -2, +1, 0, +3, ...]

Traditional high-dimensional FP32 embedding (768 floats).

Same meaning, smaller symbolic representation.

Why it matters

Dense FP32 vectors balloon storage and query costs. SEMQ keeps the semantic meaning while shrinking payloads so indexes stay light and queries fly.

Less I/O per lookup

Smaller indexes & RAM

Cheaper scaling

Directional semantics preserved

What SEMQ Does: A New Embedding Domain

SEMQ doesn't just compress embeddings – it converts them into a symbolic angular domain, collapsing redundant magnitude and preserving semantic direction.

Traditional FP32 Space (Euclidean)

In the usual embedding space, vectors live as high-dimensional floating-point numbers. Magnitude (scale) and direction are mixed, which distorts distances and adds redundant information.

High-dimensional FP32 vectors with large memory footprint.
Magnitude influences distance, even when it's not semantically relevant.
Storage, bandwidth and vector DB cost grow quickly with scale.

SEMQ Symbolic Angular Domain

SEMQ collapses magnitude and keeps only the semantic direction, converting vectors into compact symbolic angular codes that are easier to store, index and analyze.

Magnitudes are collapsed; direction (angle) is preserved.
Embeddings become discrete symbolic codes instead of raw floats.
Sharper semantic clusters and new indexing strategies become possible.

Before: continuous high-dimensional floats. Now: compact symbolic angles that keep the same meaning.

Why This Is a New Domain

SEMQ is not just another compression trick. It introduces a symbolic angular domain for embeddings, similar to how JPEG introduced a transform domain for images. The key shift is moving from raw numeric values to compact symbolic angles that still preserve meaning.

1. Numeric Domain

FP32 / FP16 / INT8 embeddings live as continuous real-valued vectors. Magnitude and direction are mixed; distances depend on both, even when magnitude is not semantically important.

• High memory footprint.
• Continuous floats, hard to index symbolically.
• No clear patterns at the code level.

2. Hash Domain

Hashing methods (like SimHash) map vectors into compact binary signatures. They are fast and efficient, but lossy by design and not rehydratable back into a meaningful vector.

• Great for approximate similarity.
• Not invertible, hard to recover geometry.
• Codes are opaque, not structured as angles.

3. SEMQ Symbolic Angular Domain

SEMQ discards redundant magnitude and keeps semantic direction, encoding embeddings as symbolic angular codes. These codes are compact, rehydratable, and live in a domain that is both geometric and symbolic.

• Magnitude collapsed, angle preserved.
• Discrete symbolic codes (like an alphabet of directions).
• Rehydratable via LUTs, compatible across models.

From Raw Pixels to JPEG — From FP32 to SEMQ

For images, raw pixels were never the final form. JPEG introduced a transform domain (frequency coefficients) where compression and storage become efficient, without breaking visual meaning. Pixels are no longer the unit of storage — coefficients are.

For embeddings, FP32/FP16 vectors don't need to be the final form either. SEMQ introduces a symbolic angular domain where the unit of storage is a compact code that represents direction, not raw floating-point magnitudes.

In other words: JPEG is to pixels what SEMQ aims to be to embeddings — a new representation space where compression, indexing and analysis make much more sense.

Why this matters for infrastructure

• Embeddings become cheaper to store and query at massive scale.
• You can move part of your retrieval stack to symbolic operations.
• Edge and mobile devices can carry more semantics with fewer bits.
• New index types and hardware instructions become possible.

Why this matters for research & products

• Opens a new space of algorithms in a symbolic angular domain.
• Enables interpretability and pattern mining over codes.
• Decouples semantic representation from floating-point formats.
• Can be layered on top of any existing embedding model.

SEMQ's core idea is simple but powerful: if meaning lives in the angle between vectors, then embeddings deserve their own angular, symbolic representation domain — not just smaller floats.