Spec 0013 — Graph-Based ANN (Vamana / DiskANN family)
Status: Draft (Phase 3 design).
Depends on: spec/0001, spec/0002, spec/0004, spec/0007, spec/0010, spec/0012.
Motivation: spec/0004's partition algorithms (LSH/IVF/IMI) all share a per-query cost structure: O(cells_touched × items_per_cell) byte fetch. At billion-scale this is fine; at ten-billion-plus, the cell-touched budget grows faster than the recall floor allows, and the partition-based approach loses to graph-based approaches in the recall/cost ratio. Microsoft's DiskANN (Subramanya et al. 2019) and Vamana (the underlying graph algorithm) measure ~3–5× better recall@cost at 1B–10B scale than IVF-PQ. spec/0013 adds graph-based ANN as a peer of the partition-based algorithms — same SpatialIndex registry, same composable-with-spec/0010-compression contract — and additionally resolves spec/0012 OQ-51 by defining a global routing layer for cross-shard graph queries.
1. Purpose
DreamDB's spec/0004 partition algorithms answer "which cells does this query touch?" Graph algorithms answer a different question: "starting from a known entry point, which sequence of greedy local steps converges to the nearest neighbors?" The two are independently optimal — partition for selectivity-first, graph for recall-floor-first.
By the end of this document the following are concrete:
- The graph-based SpatialIndex contract: an extension of spec/0004 §2 covering the entry-point + adjacency-list data model.
- The GraphIndex Object: immutable, content-addressed CBOR Object that distributes the graph's metadata (R, alpha, entry point, page layout, seed).
- The GraphPage Object kind: new on-disk format that holds a fixed-size batch of (node, vector, adjacency-list) tuples, page-aligned for I/O efficiency.
- The Vamana algorithm (
dreamdb.vamana-cosine): deterministic build + greedy search, suitable for 10B-scale. - The streaming-vs-batch build policy: how the graph is materialized at compaction time and (optionally) maintained incrementally.
- The cross-shard routing layer: a "router GraphIndex" over shards, resolving spec/0012 §6.2's scatter-gather recall floor and OQ-51.
- The compression interop: how
dreamdb.vamana-cosinecomposes withdreamdb.qinco-cosine(spec/0010) to ship a "graph over compressed vectors" production deployment.
What stays defined elsewhere:
- SpatialIndex Object format / registry — spec/0004 §3.
- Bucket-internal vector compression — spec/0010.
- Federation Manifest format — spec/0012 §3.
- HTTP backend contract — spec/0005.
What this document does NOT define:
- Learned graph algorithms (NGT/PANNG, learned-edge-prune networks). Defer to v0.X+1 after Vamana ships.
- Hybrid graph+partition (HNSW-style hierarchical graphs). Vamana is a single-layer R-regular graph; hierarchy is a future addition.
- Per-query graph mutation (online graph update as a side-effect of queries). DreamDB's content layer is strictly immutable; graph updates produce new GraphPage Objects (§5).
2. Graph-SpatialIndex contract (extends spec/0004 §2)
A graph-based SpatialIndex differs from a partition-based one in that it does NOT produce a deterministic spatial key. Instead it produces a search trajectory: a sequence of (node-id, partial-distance) tuples that the SDK consumes as it walks the graph.
The contract:
- Determinism (§2.1 of spec/0004): for the same query, the same
GraphIndexObject, the same fetched GraphPage bytes, two implementations MUST yield the same trajectory in the same order. - Distributability (§2.2 of spec/0004): the SDK MUST be able to reproduce the search from the GraphIndex Object + the GraphPage Objects alone.
- Locality: greedy local steps MUST converge toward the global nearest neighbors. Vamana's α-pruning property gives a provable convergence bound (Subramanya et al. 2019, Thm 1); we restate the practical implication in §4.
Partition-based and graph-based algorithms register under the same algorithm registry (spec/0004 §3.4). A modality declares one or the other in its registry entry; the SDK dispatches accordingly.
3. The GraphIndex Object
Address path:
(New top-level namespace, parallel to spatial-index/, scalar-index/, vector-compressor/, federation-manifests/.)
3.1 CBOR encoding
3.2 Reference from the Manifest registry
A modality using graph-based ANN declares it in the registry, parallel to spec/0004 §3.3:
graph_index and spatial_index (spec/0004) are mutually exclusive in a single registry entry. A modality is partition-based OR graph-based, not both.
3.3 Modality-string parameterization
The .graph token replaces .bucketed.spatial-bits=<N>. R is the graph's max out-degree. Compression is optional and composes per spec/0010.
Example:
embedding.f32.dim=768.graph.R=64.compress=qinco:M=8— DiskANN-style: Vamana graph over QINCo-compressed vectors.
3.4 Registry-vs-GraphIndex consistency (mandatory validation)
The SDK MUST validate at load time, parallel to spec/0004 §3.3.1 and spec/0010 §3.3:
- Algorithm match: GraphIndex Object's
algorithmequals registryalgorithm. - Dimensionality / R consistency: GraphIndex Object's
dimandgraph_layout.Rmatch modality parameters. - Vector layout: if
vector_compressoris set in registry, GraphIndex'svector_layout.compressor_hashMUST equal it. - Entry-point validity:
entry_point < node_count.
Mismatch ⇒ reject Manifest as malformed (ManifestCorrupted).
4. dreamdb.vamana-cosine
The flagship graph algorithm. Vamana (Subramanya et al. 2019) is a directed R-regular graph with α-pruned edge selection: starting from a random graph, two passes refine edges so that greedy search converges quickly.
4.1 Params (CBOR)
4.2 GraphPage Object format
Each GraphPage holds page_node_count nodes. Layout:
Variable-size records (adj_count may vary per node) preclude pure positional indexing. The header's node_count_in_page plus a per-page offset table (appended after the last node) MAY be added in a follow-up if measurement shows it's needed; v0.X reads pages whole.
Address path:
A node-id maps to (page-index, in-page-offset). The SDK consults the GraphIndex Object's graph_layout.page_node_count to compute the page-index for any node-id; the in-page-offset is found by scanning the page header forward (cheap — pages are typically 64 KiB).
4.3 Build algorithm (compaction-time)
Vamana is built during a special compaction phase (per spec/0006 §7.3's compaction verb). The build is deterministic given:
- The set of input vectors (their content-addressed identity).
- The GraphIndex Object's
build_seed,L_build,alpha,build_passes.
Procedure:
- Random init: every node has R out-edges to randomly chosen other nodes. The randomness is ChaCha20-keyed off
build_seed(same primitive as spec/0004 §5.3). - Pass 1 (alpha=1.0): for each node v in deterministic order (sorted by node-id):
a. Greedy-search from
entry_point(or current pass's entry point) toward v with search-list sizeL_build. b. The visited setV_vbecomes the candidate edge set. c. α-prune: starting from the nearest candidate, add edges greedily; for each candidate c, accept iff no already-accepted edge c' satisfiesdist(c, c') × α < dist(v, c). (At α=1.0 this is standard nearest-neighbor pruning.) d. Cap at R out-edges; if fewer than R accepted, fill with the closest unrejected candidates. - Pass 2 (alpha=user-supplied, default 1.2): repeat the pass with the user-supplied α. The diversity-boosting effect of α > 1 gives better recall at high
L_search. - Materialize GraphPages: assign node-ids to pages in node-id order; emit each GraphPage Object.
Determinism notes:
entry_pointMUST be deterministic. The convention: pick the medoid of the first 10,000 vectors (or all vectors if fewer); ties broken by smallest node-id. Approximate medoid (cheap to compute) suffices; full medoid is too expensive at 10B.- All inner-loop floating-point operations follow spec/0004 §5.4.1 discipline (scalar reference path, no FMA, no -ffast-math).
- The build is single-threaded by spec. Parallel build is permitted ONLY if it produces bit-identical results (typically requires careful work-stealing with deterministic seed-propagation). Implementations that can't guarantee bit-identical parallel build MUST fall back to single-thread for conformance test vectors.
4.4 Search algorithm
At query time, given query vector q and target k:
Per spec/0010 interop: when the GraphIndex references a vector_compressor, dist(q, vector_for(adj_node)) is computed via the ADC lookup table built once at query start.
Latency profile (1B-scale, R=64, compressed with QINCo M=8, L_search=100):
- ~150–300 hops per query (Vamana on 1B-scale; varies with α).
- Each hop fetches one GraphPage (~64 KiB) — most are warm-cache after the first ~50 hops cover the relevant graph region.
- Cold-start: ~50 cold GETs × ~50 ms over WAN = unacceptably slow.
- Warm-cache: ~250 hops × ~50 µs of local memory access + ~50 µs ADC scoring = ~25 ms per query.
The cold-start latency is why a hot in-memory cache of the entry-point region is mandatory for production deployments. The SDK is RECOMMENDED to pre-fetch and pin the GraphPages within K hops of entry_point at session start (K=3 typical).
4.5 Storage cost
For 1B nodes, R=64, dim=768 with QINCo M=8 compression:
- Per node: 8 (anchor) + 8 (compressed vector) + 4 (adj_count) + 64×8 (adj_list) = 532 bytes.
- Per page (256 nodes): 256 × 532 = ~133 KB raw, ~64 KiB after typical compression (zstd at REST is operator-layer).
- Total storage: 1B × 532 B = 532 GB.
Compared to spec/0010 §10's QINCo IVF: 8 GB for the same dataset. Graph-based storage is ~66× larger. The trade is paid for in recall: Vamana achieves ~0.99 recall@10 at L_search=100, vs IVF-QINCo at ~0.97 with re-rank.
For 10B-scale where recall floor matters more than storage absolute, graph wins. For <1B-scale or storage-constrained deployments, IVF+QINCo is still the right answer.
5. Incremental updates (deferred; design sketch)
DreamDB's content layer is strictly immutable. A streaming-update FreshDiskANN-style mechanism would produce new GraphPage Objects per insertion, rewriting affected pages. This is expensive (~kB-MB write amplification per insertion) but tractable.
v0.X DEFERS streaming-graph updates. The recommended pattern:
- Continuous ingest goes to a partition-indexed Track (spec/0004 IVF or IMI, possibly with spec/0010 compression).
- Periodic compaction (operator-scheduled, e.g. nightly or per-million-items) builds a graph-indexed Layer Track (spec/0008 Layer pattern) over the same underlying VS Object.
- Queries that need the graph's recall floor target the graph-Layer Track; queries that need fresh data target the partition Track. The application chooses per use case.
Streaming graph update (dreamdb.fresh-vamana-cosine?) is a v0.X+1 follow-up.
6. Cross-shard routing (resolves spec/0012 OQ-51)
The scatter-gather model in spec/0012 §6.2 incurs O(N_shards × per-shard-query) work. For vector queries at extreme scale (10B+ across 100+ shards), this dominates. A graph-based router resolves it.
6.1 The router GraphIndex
A "router" GraphIndex is a normal GraphIndex (per §3) whose shard_routing field is non-null:
The router graph's nodes are per-shard centroids (or representative samples) — typically thousands of nodes covering hundreds of shards. The Federation Manifest references the router GraphIndex via a new optional field:
6.2 Router-driven query path
This reduces fan-out from N to K_router, typically 10–30× cost reduction at 100-shard federations. The router is built from per-shard summaries (centroids or coreset samples) at federation-publish time.
6.3 Router build and refresh
The router is built by the federation operator's coordinator backend. Inputs: each shard's centroid or coreset, fetched once per federation refresh cycle. Build is the standard §4.3 algorithm at much smaller scale (typical: 10³–10⁴ nodes).
Refresh cadence is operator-defined; recommended every ~6 hours or when shard set changes by >1%. The router GraphIndex is content-addressed like any other; a federation publish that changes the router emits a new Federation Manifest with the new router hash.
7. Determinism conformance
The combination of f32 left-fold + scalar-reference (spec/0004 §5.4.1) + neural-MLP determinism (spec/0010 §7.3.1) covers most of what Vamana needs. Additional graph-specific requirements:
- Build node order: ascending node-id. No exceptions; even parallel implementations process completion in this order.
- Tiebreak in greedy search: when two candidates have identical distance, the smaller node-id wins. Ties at scale are rare but must be deterministic.
- Random tiebreak in α-pruning: when α-prune evaluation has identical results for two candidates, the smaller node-id wins. (Not a random tiebreak — deterministic.)
The conformance suite (spec/0009 §5.X) ships a small reference graph (dim=64, 10K nodes, R=8) plus build and search test vectors.
8. Storage and latency at 10B-scale
Worked example. 10 backends, 1B vectors each, dim=768, QINCo M=8 + Vamana R=64.
- Per-backend graph storage: 532 GB (per §4.5).
- Total federation storage: 5.32 TB.
- Cross-shard router: 10⁴ nodes × ~500 B = ~5 MB. Negligible.
Latency:
- Cold-start at scale: ~30 ms for router-graph greedy (10⁴ nodes, all warm if router is pinned) + ~30 ms × K_router shards scatter (parallel) + ~30 ms gather/rerank = ~90 ms p50.
- Warm-cache: ~25 ms total — the router fan-out reduction means we only contact 4–8 shards instead of 10.
Compared to spec/0012 §6.4 scatter-gather without router: ~40–100 ms p50 single-backend × N=10 = bounded by slowest shard, but p99 grows linearly with N. The router reduces effective N to K_router, dropping p99 proportionally.
9. Out of scope
- HNSW (hierarchical small-world graphs). Vamana is single-layer; HNSW's multi-layer hierarchy is a different storage/latency tradeoff. Defer to v0.X+1.
- Learned-edge graphs (NSG/SSG with learned pruning policies). Vamana's α-pruning is closed-form; learned variants require a training pipeline. Defer.
- Graph-aware deletion. Vamana under v0.X is append-and-rebuild. Tombstone-based deletion is theoretically tractable; defer.
- Concurrent build. v0.X is single-thread for conformance. A deterministic-parallel build is an open research question; defer.
10. Open questions
- OQ-53 (→ this spec): GraphPage offset tables. The current variable-size record format requires scanning to find a specific node within a page. Add a per-page positional index? Defer until benchmarks show scan cost dominates.
- OQ-54 (→
0016): Streaming graph updates. Vamana doesn't natively support insertion; FreshDiskANN does. Resolved:0016§4 shipsdreamdb.fresh-vamana-cosineas the streaming variant — append + background consolidation + tombstone deletes. Mid-iteration learned-graph approaches remain a v0.X+1 research item. - OQ-55 (→ spec/0009): Conformance test vectors for
dreamdb.vamana-cosine: a small reference graph + build/search vectors. Block v0.X release on this. - OQ-56 (→ spec/0010): When using graph + compressor, should the re-rank step (spec/0010 §5) re-rank by fetching VS Object records, or by re-running graph search on raw f32 only over the top-K candidates? Probably the former; design defers to first implementation.
Next: spec/0014 — streaming/CMAF extensions for video at scale (or revisit chunking spec).