SEMQ vs Everyone

Compare SEMQ to traditional embedding formats and compression methods. The key difference is not just the compression factor, but the fact that SEMQ lives in a symbolic angular domain instead of a purely numeric one.

MethodDomainSymbolic?Cross-modelRehydratableNeeds trainingEnergy / cost (per 1M queries)
FP32

Baseline: full-precision floats, highest memory and cost.

Continuous numericNoYesN/ANo100%
FP16

Half precision, lower memory but still continuous floats.

Continuous numericNoYesN/ANo~70%
INT8

Numeric quantization; usually model-specific, requires calibration.

QuantizationNoLimitedNoYes (calibration/quantization)~40%
PQ / OPQ / RQ

Codebooks need training; compression is strong but tied to the dataset/model.

Vector quantizationNoLimitedPartialYes20–40%
SimHash

Good for approximate similarity but not invertible and lossy by design.

HashingPartiallyYesNoNo~10%
SEMQ

New representation: compact symbolic angular codes with strong compression and semantic preservation.

Symbolic angular domainYes (100%)Yes (universal)Yes (LUT-based)No~10–15%

Values are indicative and simplified for comparison. SEMQ's main contribution is introducing a symbolic angular domain for embeddings, rather than being just another numeric quantization method.