Spec 0010 — Vector Compression (Bucket-Internal Quantization)
Status: Draft (Phase 3 design).
Depends on: spec/0001, spec/0002, spec/0004, spec/0007.
Motivation: spec/0004 v0 ships partitioning algorithms (LSH, IVF, IMI) that decide where a vector lives, but stores every vector as raw binary32. At billion scale this dominates total storage and bucket-fetch latency: 1B × 3 KB = 3 TB of vector bytes, ~12 MB per Spatial Bucket. Production-grade ANN systems (FAISS-IVF-PQ, ScaNN, DiskANN) compress vectors 30–100× via Product Quantization or its neural successors. This spec defines the DreamDB-side primitives — a VectorCompressor Object, modality-string extension, and bucket-layout impact — needed to swap raw f32 for quantized codes without disturbing the partitioning algorithms or the address grammar.
The flagship algorithm registered here is dreamdb.qinco-cosine (Huijben et al., NeurIPS 2025), a learned residual quantizer with neural codebooks. Classical dreamdb.pq-cosine is included as the well-understood baseline.
1. Purpose
Vector compression is orthogonal to vector partitioning. spec/0004 answers "which bucket does this vector live in"; spec/0010 answers "what bytes do we store for the vector once it's in the bucket." A modality SHOULD be free to combine dreamdb.imi-cosine partitioning with dreamdb.qinco-cosine compression, or dreamdb.lsh-cosine partitioning with raw f32 storage, without either subsystem leaking knowledge of the other.
By the end of this document the following are concrete:
- The VectorCompressor contract: signature, determinism, and distributability requirements for any function that encodes f32 vectors to compact codes.
- The VectorCompressor Object: the immutable, content-addressed CBOR Object that distributes encoder/decoder weights — symmetric with the SpatialIndex Object of spec/0004.
- The modality-string extension: a
compress=<algo-id>:<param>suffix that signals which compressor (if any) governs bucket records. - The bucket-layout impact: how spec/0007 §6.1 inline records and §6.3 VS Objects change when
record_sizeis determined by code length rather thandim × 4. - The re-rank contract: query path semantics for "fetch compressed codes, identify top-K candidates, optionally fetch full-precision vectors for exact distance."
- The v0.X default algorithms:
dreamdb.raw-f32(no compression — the v0 default for backwards compatibility),dreamdb.pq-cosine(classical PQ),dreamdb.qinco-cosine(learned RQ). - The neural-determinism discipline: a tightened version of spec/0004 §5.4.1's f32 rules that handles the harder reproducibility surface of neural codebooks.
What stays defined elsewhere:
- The
<spatial-key>derivation (partitioning) — spec/0004. - The Bucket Object header and reference-mode VS Object header — spec/0007 §6.1, §6.3.
- The Manifest registry shape — spec/0002 §7.2.
What this document does NOT define:
- Query-side ranking algorithms (e.g. ScaNN's anisotropic loss) — those are SDK implementation choices.
- Adaptive bit-budget policies (decide compression level per-bucket) — deferred to v0.X+1.
- Joint train of partitioner + compressor (e.g. SPANN-style) — deferred. spec/0010 assumes the two are trained independently.
2. The VectorCompressor Contract
A VectorCompressor is a deterministic, distributable pair of functions that map between vectors and compact byte codes:
Conformant implementations MUST satisfy three properties paralleling spec/0004 §2:
2.1 Encode determinism
Two implementations of the same VectorCompressor Object MUST produce bit-identical codes for identical inputs. No tolerance for hardware FP-determinism drift; if the reference codebook training was done on GPU, the inference path MUST still reproduce the scalar-CPU reference output to the bit. See §8 for the neural-determinism discipline.
2.2 Distributability
The VectorCompressor's full behavior MUST be reproducible from a single immutable Object (the VectorCompressor Object). Codebooks, network weights, normalization stats — everything required to encode (and to decode, if the algorithm provides decode) — lives in this Object. No hidden state.
2.3 Approximation bound
Each algorithm SHOULD declare a reconstruction-error or recall-at-fixed-budget property that an operator can use to size the storage/recall tradeoff. The contract requires only that one be provided in the algorithm's §11 entry; specific bounds are algorithm-specific.
3. The VectorCompressor Object
Address path:
(New top-level namespace, parallel to spatial-index/ and scalar-index/. Lives outside the per-Timeline tree because a single compressor MAY be shared across many Timelines.)
3.1 CBOR encoding
All fields are required. supports_decode = false means the compressor is encode-only and the SDK MUST issue a re-rank fetch (§5) for any query needing exact distances.
3.2 Reference from the Manifest registry
A modality that uses bucket-internal compression MUST declare its VectorCompressor Object hash in the same registry entry that already declares the SpatialIndex:
vector_compressor is OPTIONAL. Absent or null ⇒ behavior equivalent to dreamdb.raw-f32 (no compression). When present, readers MUST validate per §3.3.
rerank_storage MAY point at a per-Track Vector-Storage Object (per spec/0007 §6.3) that holds full-precision vectors keyed by anchor. When present, the query verb's optional rerank=true flag (per spec/0006 §6.6) fetches exact distances for the top-K compressed candidates. When null, the compressed distance is the only signal available — appropriate for "cheap recall" use cases (training data filtering, coarse semantic search) where exact distance is not required.
3.3 Registry-vs-VC consistency (mandatory validation)
The SDK MUST validate at load time:
- Algorithm match: VC Object's
algorithmfield equals the registry entry'svector_compressoralgorithm (when both are declared). - Dimensionality: VC Object's
dimequals the modality'sdimparameter and equals the SpatialIndex Object'sdim. - Code-bytes vs modality string: the
compress=<algo-id>:M=<bytes>modality suffix'sMvalue MUST equal VC Object'scode_bytes. - Bucket-header agreement: per
spec/0007§6.1.1, every Bucket fetched MUST carry a header fieldvector_compressor_hashmatching the VC Object the SDK is currently using. Mismatch ⇒ critical error per §6.1.1 (this spec extends the lineage-validation discipline to compressors).
Mismatch in any of (1)–(4) ⇒ reject Manifest as malformed (ManifestCorrupted). The same silent-under-recall class of bug that motivated spatial_index_hash (per 0004 §3.3.1) applies here in spades: a Bucket decoded with the wrong codebook produces garbage distances.
4. Modality-string extension
spec/0004 currently defines modality tags of the shape:
This spec adds an OPTIONAL trailing component:
Examples:
embedding.f32.dim=768.bucketed.spatial-bits=18.compress=qinco:M=8— QINCo, 8-byte codes (~384× compression for dim=768).embedding.f32.dim=128.bucketed.spatial-bits=14.compress=pq:M=16— classical PQ, 16-byte codes (32× for dim=128).embedding.f32.dim=768.bucketed.spatial-bits=18— no compression (=compress=raw-f32:).
Parsing: the <algo-id>:<param-string> after compress= is opaque to the modality parser. The algorithm registry §11 documents per-algorithm param-string grammar. Unknown algo-id ⇒ reject as unsupported per 0004 §3.4.
The compression suffix is trailing-optional and lex-greedy: modality tags without it continue to parse identically to today. Existing v0 datasets see no schema churn from spec/0010's adoption.
5. Re-rank contract (query path)
When vector_compressor is set on a modality, the candidate-fetch step of the query verb (per spec/0006 §6.6, post-spec/0004 §6.5) becomes:
Steps 1–4 are the production hot path. Step 5 is opt-in per query — operators trade an additional GET round-trip and ~K×4D byte fetch for exact distances. Empirically (FAISS-IVF-PQ behavior, replicated by our spec): K_compressed = 4×K with rerank closes >95% of the recall gap between compressed and exact, at ~10% latency overhead on warm caches.
5.1 ADC lookup-table construction
For Product Quantization and QINCo, the per-query ADC table is constructed once at step 1:
In-bucket distance for a record with codes (c_0, …, c_{M-1}):
M × K_centroids floating-point multiply-adds at table construction; M table lookups + adds per record at scoring. With M=8, K_centroids=256, this is 2048 multiplies upfront then 8 lookups per record — ~100× faster than computing <q, decoded_v> per record.
5.2 Re-rank storage layout
The VS Object referenced by rerank_storage follows the existing spec/0007 §6.3 format unchanged. It is a parallel structure to the Bucket Objects: same anchors, same modality path prefix, different content. The compressor's "compressed records" live in Bucket Objects (§6.1 inline mode); the exact f32 records live in the VS Object.
Operators with strict storage budgets MAY omit rerank_storage entirely — compressed-only mode. The recall gap from re-rank is then unrecoverable, but storage drops to code_bytes/4D of the uncompressed baseline. Acceptable for many ML-training and indexing workloads.
6. v0.X Default Algorithm: dreamdb.raw-f32
The trivial compressor. Provided so the algorithm registry covers the "no compression" case symmetrically — Manifests without a vector_compressor field behave identically to Manifests declaring algorithm: "dreamdb.raw-f32".
code_bytes = 4 × dim. supports_decode = true (encode and decode are both the identity). No training data required.
7. v0.X Algorithm: dreamdb.qinco-cosine
dreamdb.qinco-cosine is a learned residual quantizer with neural codebooks (Huijben et al., NeurIPS 2025). Compression quality at billion-scale dim=768: at M=8 codes (8-byte records ≈ 384× compression), recall@10 within ~2 percentage points of exact f32 on the BIGANN-1B benchmark; classical PQ at the same budget loses ~10 pp.
The structural difference from classical PQ:
- Classical PQ uses a fixed codebook per sub-quantizer, learned once. Quantization is independent across sub-quantizers.
- QINCo uses a small MLP that produces codebook deltas conditioned on the residual prefix. Each sub-quantizer's effective codebook adapts to the partial reconstruction so far. The MLP weights are the algorithm's identity.
7.1 Params (CBOR)
code_bytes = M (one byte per stage when K=256; the registered grammar permits other K, with proportional encoding).
7.2 Encoder pseudocode
Decode is the reverse: reconstruct effective per stage, lookup the centroid, accumulate. supports_decode = true.
7.3 Floating-point and neural determinism
QINCo inherits all of spec/0004 §5.4.1's f32 discipline (no FMA, no -ffast-math, scalar reference path) and adds:
7.3.1 MLP inference determinism (mandatory)
The MLP's forward pass MUST run on a fixed-order scalar reference path in the conformance test suite. Implementations MAY use SIMD or accelerator backends for performance but MUST verify against the scalar reference on every supported architecture (per 0009 §5.3.1).
Practical implication: a QINCo VC Object trained on GPU MUST round-trip through CPU inference and produce bit-identical codes for the conformance test vectors. This is achievable in practice — modern frameworks (ONNX Runtime in CPU-determinism mode, PyTorch with torch.use_deterministic_algorithms(True) + CPU backend) can match the scalar reference if compiled without fast-math and run on a single thread.
Producers SHOULD treat training as a separate phase from publishing: train on whatever hardware is convenient, then re-encode the codebooks on the scalar reference path before producing the VC Object's stage_codebooks bytes. The conformance suite ships round-trip vectors (v → encode(v) → codes) for both K=256, M=8, dim=768 and K=256, M=16, dim=128; an implementation that disagrees on any test vector is non-conformant.
7.3.2 Activation function choice
For determinism, QINCo's MLP MUST use ReLU activations, not GELU or SiLU. ReLU is a piecewise-linear function with exact f32 representation; GELU/SiLU require transcendental approximations whose bit-level result varies by library version (libm vs. mlibc vs. musl etc.). The training reference also uses ReLU; this is not a performance-vs-quality tradeoff but a determinism requirement.
7.4 Training and the codebook-publication contract
Same shape as spec/0004 §5.6.5 (IVF) and §5.7.5 (IMI):
- One writer trains and publishes the VC Object. Its content hash is the compressor's identity on the timeline.
- Subsequent writers fetch and consume. The Manifest's
vector_compressorfield carries the hash; downstream writers encode against the published codebooks, NOT re-trained ones. - No mid-Track re-training. A second-generation codebook goes into a new VC Object referenced from a new Track (per
spec/0008Track layering). Mutating an existing VC Object is FORBIDDEN.
Training procedure is RECOMMENDED to follow the QINCo reference codebase (Huijben et al. 2025, Algorithm 1) with the determinism harnesses of §7.3. Implementations that train independently MUST agree on:
- Training-data sample (canonical: first N vectors of the dataset, with N declared in
training_metadata). - RNG seed.
- Optimizer (AdamW), learning rate, schedule, epoch count — all declared in
training_metadata.
training_metadata is OPTIONAL because operators may legitimately publish a VC Object without disclosing their training pipeline; the field is for reproducibility, not for protocol-level validation.
8. Bucket-layout impact (cross-reference to spec/0007 §6)
This spec amends spec/0007 §6.1 (inline-mode Bucket layout) and §6.1.1 (lineage validation):
8.1 Record size becomes code-byte size
Per-record layout when vector_compressor is non-null:
code_bytes comes from the VC Object, not from the modality's dim parameter. For compress=qinco:M=8, records are 16 bytes — 192× smaller than a dim=768 f32 record (3080 bytes). Per spec/0007 §6.1, byte offsets remain computable from modality parameters alone: byte_offset(i) = 160 + i × (8 + code_bytes).
8.2 Bucket header carries vector_compressor_hash
The Bucket header field layout in spec/0007 §6.1 is extended:
Header size grows from 160 to 193 bytes (rounded up to 200 for alignment).
For all-zeros vector_compressor_hash: legacy v0 Buckets (written before spec/0010) implicitly have all-zeros here when read by a spec/0010 SDK. Such Buckets MUST be treated as raw-f32; the SDK's compressor validation (§3.3 step 4) treats all-zeros as "no compressor declared" and proceeds with f32 decoding. This preserves backwards compatibility for v0 datasets.
8.3 Reference-mode interaction (spec/0007 §6.2)
When tables > 1 (reference mode) AND vector_compressor is set, the Bucket Objects hold compressed-code references, but the VS Object holds the raw f32 vectors. The VS Object IS the rerank_storage (§5.2). Operators using reference mode + compression therefore get both benefits: dedup across tables for the f32 path, plus compressed-only candidate scoring for the query hot path.
The reference-mode Bucket reference structure (spec/0007 §6.2, 49-byte references) is unchanged. The compressed codes live inline in a different record format alongside the references; OR the Bucket holds only references and the SDK fetches codes from the VS Object's parallel "codes" segment. v0.X SHIPS the former (inline codes alongside references) — simpler operational story; deferred to v0.X+1 whether to factor codes into the VS Object.
9. Compression vs. partitioning composition
Algorithms register independently. Manifests declare:
algorithm(partitioner):dreamdb.lsh-cosine|dreamdb.ivf-cosine|dreamdb.imi-cosine| future.vector_compressor(compressor):dreamdb.raw-f32|dreamdb.pq-cosine|dreamdb.qinco-cosine| future.
All 3×3 combinations are well-defined and conformant. The SDK MUST NOT special-case the IMI+QINCo combination (or any other); both subsystems operate against the same input vector through orthogonal pipelines:
The two pipelines join only at "candidate Bucket fetch → compressed-record scoring." Neither knows the other's internals.
10. Storage and latency at billion-scale
Worked example. 1B vectors, dim=768, dreamdb.imi-cosine partitioning at k_sub=1024, varying compression:
| Compressor | code_bytes | Total vector storage | Per-Bucket size (~3800 records) | Wire fetch per query (~16 buckets) | Re-rank fetch (K=10, K_comp=40) |
|---|---|---|---|---|---|
dreamdb.raw-f32 | 3072 | 3.0 TB | ~12 MB | ~192 MB | n/a (already exact) |
dreamdb.pq-cosine M=8 | 8 | 8.0 GB | ~32 KB | ~512 KB | ~120 KB (40 × 3072 bytes) |
dreamdb.qinco-cosine M=8 | 8 | 8.0 GB | ~32 KB | ~512 KB | ~120 KB (40 × 3072 bytes) |
dreamdb.qinco-cosine M=4 | 4 | 4.0 GB | ~16 KB | ~256 KB | ~120 KB |
Recall budget (BIGANN-1B, recall@10 at default nprobe):
raw-f32: 1.00 (exact).pq-cosineM=8: ~0.87 compressed-only; ~0.96 with re-rank K_comp=40.qinco-cosineM=8: ~0.97 compressed-only; ~0.99 with re-rank K_comp=40.qinco-cosineM=4: ~0.90 compressed-only; ~0.97 with re-rank K_comp=40.
The wire-bytes savings (~375×) and the recall preservation (≥0.97 with re-rank) are the load-bearing numbers. Sub-100 ms p50 hot-path latency at billion scale becomes feasible primarily because §10's third row replaces the §10's first row.
11. Algorithm registry additions
Extends spec/0004 §3.4:
| Algorithm ID | Family | Defined in | Recommended use |
|---|---|---|---|
dreamdb.raw-f32 | compressor | §6 | Default; backwards-compatible with v0 |
dreamdb.pq-cosine | compressor | §11.1 (sketch — full draft deferred) | Well-understood baseline; ship after benchmark-justification |
dreamdb.qinco-cosine | compressor | §7 | Production billion-scale; v0.X flagship |
11.1 dreamdb.pq-cosine (sketch — full draft deferred)
Classical Product Quantization (Jégou et al. 2010). Params are M fixed codebooks of K centroids over D/M sub-dimensions. Deterministic; ~5pp recall behind QINCo at matched code budget; ~50% faster encode. Spec draft deferred until benchmarks justify shipping both. Implementations MAY add dreamdb.pq-cosine ahead of this spec via the user-defined-algorithm path (spec/0004 §3.4 reverse-DNS grammar).
12. Out of scope
- Per-bucket adaptive compression. Future: code budget varies by bucket density / query frequency. Defers to v0.X+1.
- Joint train of partitioner + compressor. SPANN-style integrated training MAY land as a single combined algorithm ID (e.g.
dreamdb.spann-cosine) without reusing this spec's separation. Deferred. - Cross-modality codebook sharing. A VC Object is keyed by
dimand metric only; nothing in the protocol prevents two modalities with the samedimfrom referencing the same VC Object. Address-space dedup happens automatically. No special protocol affordance needed. - Non-cosine metrics.
dreamdb.qinco-l2etc. follow the same template; defer until L2-metric modalities ship perspec/0004OQ-26.
13. Open questions
- OQ-44 (→ spec/0006): How does the query verb expose the
rerank=trueflag and the K_compressed multiplier? Probably as a new optional sub-Object in the query message. Defer to spec/0006 amendment. - OQ-45 (→ spec/0007): Should the reference-mode Bucket layout (§8.3) factor compressed codes into the VS Object instead of inlining? Operational tradeoff: inline = simpler reads; VS-Object = better dedup if compressed codes are deterministic across tables (they are, because the compressor is single-instance per modality). Deferred to a later iteration after measuring real workloads.
- OQ-46 (→ spec/0009): Conformance test vectors for
dreamdb.qinco-cosine. We need to ship a small, deterministically-trained reference VC Object (KB-scale, dim=64 or smaller) plus the round-trip encode/decode vectors. Defer details to spec/0009 amendment, but block the v0.X release on it. - OQ-47 (→ this spec): The 192-byte → 200-byte Bucket-header rounding (§8.2) — should the spare 7 bytes be reserved for a future field, or filled with magic-suffix bytes for early-corruption detection? Defer to first wire-format review.
Next: spec/0010 amendments after first implementation pass (likely tighter determinism guidance and concrete conformance vectors).