SEMQ

SEMQ vs Everyone

Compare SEMQ to traditional embedding formats and compression methods. The key difference is not just the compression factor, but the fact that SEMQ lives in a symbolic angular domain instead of a purely numeric one.

Method	Domain	Symbolic?	Cross-model	Rehydratable	Needs training	Energy / cost (per 1M queries)
FP32 Baseline: full-precision floats, highest memory and cost.	Continuous numeric	No	Yes	N/A	No	100%
FP16 Half precision, lower memory but still continuous floats.	Continuous numeric	No	Yes	N/A	No	~70%
INT8 Numeric quantization; usually model-specific, requires calibration.	Quantization	No	Limited	No	Yes (calibration/quantization)	~40%
PQ / OPQ / RQ Codebooks need training; compression is strong but tied to the dataset/model.	Vector quantization	No	Limited	Partial	Yes	20–40%
SimHash Good for approximate similarity but not invertible and lossy by design.	Hashing	Partially	Yes	No	No	~10%
SEMQ New representation: compact symbolic angular codes with strong compression and semantic preservation.	Symbolic angular domain	Yes (100%)	Yes (universal)	Yes (LUT-based)	No	~10–15%

Values are indicative and simplified for comparison. SEMQ's main contribution is introducing a symbolic angular domain for embeddings, rather than being just another numeric quantization method.