SEMQ vs Everyone
Compare SEMQ to traditional embedding formats and compression methods. The key difference is not just the compression factor, but the fact that SEMQ lives in a symbolic angular domain instead of a purely numeric one.
| Method | Domain | Symbolic? | Cross-model | Rehydratable | Needs training | Energy / cost (per 1M queries) |
|---|---|---|---|---|---|---|
| FP32 Baseline: full-precision floats, highest memory and cost. | Continuous numeric | No | Yes | N/A | No | 100% |
| FP16 Half precision, lower memory but still continuous floats. | Continuous numeric | No | Yes | N/A | No | ~70% |
| INT8 Numeric quantization; usually model-specific, requires calibration. | Quantization | No | Limited | No | Yes (calibration/quantization) | ~40% |
| PQ / OPQ / RQ Codebooks need training; compression is strong but tied to the dataset/model. | Vector quantization | No | Limited | Partial | Yes | 20–40% |
| SimHash Good for approximate similarity but not invertible and lossy by design. | Hashing | Partially | Yes | No | No | ~10% |
| SEMQ New representation: compact symbolic angular codes with strong compression and semantic preservation. | Symbolic angular domain | Yes (100%) | Yes (universal) | Yes (LUT-based) | No | ~10–15% |
Values are indicative and simplified for comparison. SEMQ's main contribution is introducing a symbolic angular domain for embeddings, rather than being just another numeric quantization method.