DreamDBv0.2.0bec026

Spec 0016 — Streaming Updates and Real-Time Freshness

Status: Draft (Phase 4 design). Depends on: spec/0001, spec/0004, spec/0006, spec/0008, spec/0010, spec/0013. Motivation: DreamDB's immutable + paged data plane is exquisite for archival and large-batch ingest, but the steady-state "ingest now, query in <1 second" workload exposes two structural gaps: (1) every write must produce a new Manifest, which costs a per-publish HTTP round-trip (typically 30–100 ms on commodity backends) — many small writes per second are wasteful; (2) spec/0013 explicitly defers FreshDiskANN (incremental graph updates), so continuous ingest past ~100M forces a full graph rebuild — impractical. Production retrieval systems all solve this with a hot delta tier in front of the cold base tier, plus a streaming graph-update algorithm, plus a drift-monitoring signal so operators know when to re-train. spec/0016 adds those three primitives.


1. Purpose

DreamDB's current "write a Manifest per batch, re-resolve from a ref" cycle is correct semantically but slow at high write rates and high query freshness. The fix is a per-Track HotShard Object that holds recent appends in a compact form, plus protocol-level signals for index health (when to trigger re-training) and an incremental-update algorithm for graph indexes.

By the end of this document the following are concrete:

  • The HotShard Object: ObjectKind that buffers recent Items for a Track without a full Manifest publish per Item.
  • The bounded-staleness contract: how readers control freshness vs latency tradeoff.
  • The dreamdb.fresh-vamana-cosine algorithm: streaming Vamana with append + background consolidation (FreshDiskANN, Singh et al. 2021).
  • The index-health signal: per-modality registry field declaring estimated recall, drift metric, last-train timestamp. Operators / SDKs use it to schedule re-training.
  • The Ada-IVF style background maintenance: when drift exceeds threshold, the SDK / operator triggers re-training; the new SpatialIndex/VectorCompressor Object lands via a Layer Manifest.

What stays defined elsewhere:

  • Per-modality index byte format — spec/0007.
  • VectorCompressor codebook publication — spec/0010.
  • Graph Index / Page format — spec/0013.

What this document does NOT define:

  • Strongly-consistent live reads. HotShard provides bounded-staleness, NOT linearizable freshness. Linearizable reads require cross-actor consensus, which spec/0008 explicitly does not provide.
  • Manifest publishing rate limits. That's a connector-level concern; spec/0006 already covers it.
  • Hot-tier durability guarantees beyond "fsync before commit." Backend-specific.

2. The HotShard Object

A HotShard is a per-Track buffer of recent Items, content-addressed like everything else.

2.1 Address path

<timeline>/<modality>/hot-shard/<hotshard-hash>

(New per-Timeline slot, parallel to track/.)

2.2 CBOR encoding

{
  "version":         1,
  "parent_track":    <multihash>,                  ;; the immutable Track this overlay extends
  "published_at":    <u64>,                        ;; Unix ns at publish time; required for §3.3 staleness check
  "items": [                                       ;; ordered by time_anchor
    {
      "anchor":      <u64>,
      "payload":     <bytes | sub-Object>,         ;; modality-dependent
    },
    …
  ],
  "spatial_keys": [                                ;; OPTIONAL; present iff modality is bucketed
    { "anchor": <u64>, "key": <bytes> },           ;; mirrors Bucket Object record format
    …
  ],
  "stats": {
    "item_count":         <unsigned int>,
    "earliest_anchor":    <u64>,
    "latest_anchor":      <u64>,
    "flush_threshold":    <unsigned int>,          ;; items remaining before forced flush
    "ttl_seconds":        <unsigned int>,          ;; max age before forced flush
  },
}

2.3 Registry reference

The Manifest registry's per-modality entry gains an optional hot_shard field:

"registry": {
  "embedding.f32.dim=768.bucketed.spatial-bits=18": {
    "kind":          "continuous",
    "object_kind":   "spatial-bucket",
    "algorithm":     "dreamdb.imi-cosine",
    "spatial_index": [<hash>],
    "track":         <multihash-Track-Object>,
    "hot_shard":     <multihash-HotShard | null>,   ;; NEW
  }
}

Absent hot_shard ⇒ Track behaves identically to v0 (cold-only). Present ⇒ readers MUST consult both Track and HotShard during query resolution.

2.4 Append path (writer)

A writer appending small/fast batches:

1. Build the new Item(s).
2. Fetch the current HotShard (single GET; cached).
3. Append the items in-memory.
4. If (new_size > flush_threshold) OR (age > ttl_seconds):
     a. Build a full Track Object incorporating the buffered items.
     b. Publish a Manifest with the new Track Object + HotShard = null.
   Else:
     a. Encode and PUT the new HotShard Object.
     b. Publish a Manifest pointing at the new HotShard (Track Object unchanged).
5. Done.

Steps 1–3 + 4b are the hot path — no Track rewrite, no spatial index reorganization, just one HTTP PUT of the HotShard (small) and one ref CAS. Latency: ~50–100 ms end-to-end vs ~500 ms for full re-Publish.

2.5 Read path

A reader resolving a query for a Track with a HotShard:

1. Resolve Manifest → Track Object + HotShard Object hashes.
2. Fetch both in parallel.
3. Track Object → identify cold candidate buckets / fragments.
4. HotShard → in-memory filter by query predicate; produce hot candidates.
5. Merge cold + hot candidates; apply standard ranking (per-modality semantics).
6. Return merged top-K.

HotShard items live in CBOR, not in the modality's Spatial Bucket format — they bypass the spatial index. For vector ANN queries, the reader brute-forces the query against the HotShard's vectors (small N, typically <10K, brute-force is fast). For text or scalar queries, the reader matches the predicate directly.

The cost is bounded by the HotShard's flush_threshold parameter. Default: 10K items or 100 MiB of buffered bytes — large enough to absorb a minute of high-rate ingest, small enough to brute-force-scan in <10 ms.

2.6 Forward-compat / backwards-compat

  • A v0 reader (pre-0016) seeing a Manifest registry entry with an unknown hot_shard field ignores it per spec/0002 §3.1.3's map-extensibility rule (unknown-key tolerance). The reader processes the Track Object normally and loses freshness (recent appends not yet flushed into the Track will be invisible) but stays correct on what it does see. Recommended: SDK logs a "freshness extension not supported" warning so operators notice.
  • A v0.X reader (post-0016) seeing a Manifest without hot_shard reads exactly the v0 path — byte-identical hashes.

This is the same forward-compat story as spec/0010 / spec/0014: optional registry fields preserve existing Track hashes via the map-extensibility hatch.

3. The bounded-staleness contract

HotShard freshness has a producer-controlled upper bound and a consumer-controlled tolerance:

3.1 Producer side

The producer commits to a ttl_seconds (HotShard field) — after this elapses, the HotShard is FORCED-flushed even if size is below threshold. Default: 30 seconds.

Smaller TTL ⇒ stronger freshness, more publish overhead. Larger TTL ⇒ weaker freshness, less overhead.

3.2 Consumer side

The consumer specifies a max_staleness_seconds in the HybridQuery (per spec/0015) or in a Query verb option:

  • If (now - hotshard.stats.latest_publish_at) ≤ max_staleness_seconds: use the cached HotShard.
  • Else: re-resolve the ref, refresh the HotShard, then query.

This puts the freshness/cost tradeoff in the consumer's hands. A "must be current" query pays the round-trip; a "best effort" query takes the cached HotShard.

3.3 Latest-publish-at metadata

To support the consumer-side check, each HotShard Object carries a published_at: u64 (Unix ns) — set by the producer at publish time. This timestamp is non-authoritative for correctness (DreamDB content is time-anchored; freshness is a separate axis) but is the load-bearing signal for cache TTL.

The producer's published_at MUST be monotonically non-decreasing across HotShard publishes for the same Track — otherwise consumers' freshness checks misorder. Per spec/0008 monotonic-ts discipline.

4. Streaming Vamana — dreamdb.fresh-vamana-cosine

The spec/0013 §5 placeholder is filled. FreshDiskANN (Singh et al. 2021) defines an incremental Vamana that supports both append and delete without full rebuild.

4.1 Inheritance from spec/0013

Same GraphIndex / GraphPage / search algorithm. Different build/maintenance loop.

4.2 Append semantics

For each new item v:

  1. Identify the entry point's nearest few neighbors via greedy search (per spec/0013 §4.4).
  2. The visited set becomes v's adjacency candidate set.
  3. α-prune to R out-edges (spec/0013 §4.3.3).
  4. For each accepted neighbor u of v: append v to u's adjacency list, then α-prune u's adjacency if it exceeds R.
  5. Emit new GraphPage Objects for v and for any u that was modified.
  6. Publish a new GraphIndex Object with updated node_count and entry_point (the latter may shift slightly). Per spec/0013 §3.1, GraphIndex is immutable; "updating fields" means emitting a fresh content-addressed Object at a new hash, not mutating bytes in place.

The new GraphIndex hash differs from the old; both remain content-addressed and immutable. The OLD Track Object remains valid for queries; the NEW Track Object becomes the latest after Manifest publish.

4.3 Background consolidation

Append-only updates degrade graph quality over time (entry point drifts, alpha-pruning becomes sub-optimal). Periodic consolidation rebuilds the affected subgraph:

  • Triggered by SDK or operator schedule (default: every 1M appends or every 24 hours).
  • Selects regions of the graph touched by recent appends.
  • Re-runs the standard spec/0013 build algorithm over those regions.
  • Emits new GraphPage Objects + a new GraphIndex; publishes via Manifest.

Consolidation is non-blocking for query path — readers continue to use the old GraphIndex while consolidation runs. Only after the consolidation Manifest is published do readers switch (via ref freshness).

4.4 Delete semantics (tombstone-based)

Although DreamDB's data plane is immutable, logical deletes of indexed items are supported via the existing Layer mechanism (spec/0008): a "tombstone Track" carries the deleted doc-ids; query results filter against it. The graph itself is NOT modified — deleted nodes remain in adjacency lists but are filtered out of result sets.

Tombstones accumulate until the next consolidation rewrites the graph without them. Same lifecycle as IVF/IMI deletes (which DreamDB didn't previously formalize; this spec applies to them too).

4.5 Params (CBOR)

Same as spec/0013 §4.1 with two additions:

{
  …,
  "consolidation_threshold_appends":  <unsigned int>,   ;; default 1_000_000
  "consolidation_threshold_seconds":  <unsigned int>,   ;; default 86400
}

4.6 Modality string

embedding.f32.dim=768.graph.R=64.fresh

The .fresh suffix is the marker. Without it, the modality is non-streaming (rebuild-only) Vamana per spec/0013.

5. Index health signal

Drift detection is the missing operational signal. Without it, recall degrades silently as the data distribution moves past the trained centroids/codebooks.

5.1 Manifest registry extension

Per-modality registry entry gains an OPTIONAL index_health sub-Object:

"registry": {
  "embedding.f32.dim=768.bucketed.spatial-bits=18": {
    …,
    "index_health": {
      "last_trained_at":      <u64>,                  ;; Unix ns when SpatialIndex was trained
      "last_trained_doc_count": <u64>,                ;; corpus size at training time
      "current_doc_count":    <u64>,                  ;; latest-known corpus size
      "estimated_recall_at_10": <f32-as-bytes>,       ;; SDK or operator measurement
      "drift_metric":         <f32-as-bytes>,         ;; centroid-shift L2; 0 = no drift
      "recommended_action":   "none" | "schedule-retrain" | "retrain-now" | "rebuild-graph",
    }
  }
}

5.2 Operator policy

  • recommended_action = "schedule-retrain": drift > 5% from training distribution. Schedule re-training next maintenance window.
  • recommended_action = "retrain-now": drift > 15% OR recall < 0.85. Immediate re-training advised.
  • recommended_action = "rebuild-graph": graph-based index whose consolidation backlog exceeds the per-modality consolidation_threshold_appends. Trigger consolidation.

These are non-binding hints — operators choose response — but a SDK MAY surface them in logs / metrics so operators don't miss drift.

5.3 How drift_metric is computed

For IVF / IMI: average L2 distance from each item's vector to its assigned centroid, vs the same metric at training time. A 10% increase in this metric ⇒ drift_metric = 0.10.

For LSH: ratio of populated cells to expected cells given uniform-on-sphere assumption. Departure from expected ⇒ drift.

For Vamana: average path length for greedy search from entry point to query target. A 50% increase ⇒ drift_metric scaled accordingly.

SDKs SHOULD compute drift on a sampling basis (1% of queries) and publish to the Manifest registry asynchronously via a Layer Track (per spec/0008). Drift estimation is NOT in the query hot path.

6. Re-training as a Manifest Layer

When operator policy triggers re-training:

  1. The training process produces a new SpatialIndex/VectorCompressor/GraphIndex Object.
  2. The operator publishes a new Layer Track (spec/0008) whose modality registry points at the new index.
  3. The old Track + old SpatialIndex remain reachable from the prior Manifest's parent chain; queries against old Manifests continue to work.
  4. New queries (via the latest ref) use the new index.

Re-training is therefore just another DreamDB publish operation — no special "migration" verb required. The cost is the wall-clock training time + the bytes of the new index. Queries during the training window use the old index (correct but slightly stale).

For full-corpus re-encoding (e.g., the corpus's vectors need to be re-quantized against a new codebook), see spec/0017 for the Reencode verb.

7. Conformance categories (per spec/0009 §8.6.2)

CategoryPass criterionCoverage
hotshard.append.flush-threshold.*Buffered items exceeding threshold trigger Track rewriteBoundary cases
hotshard.append.ttl.*TTL expiry forces flush even below thresholdClock-skew injection
hotshard.read.merge.*Query merges Track + HotShard candidates correctlyTime / vector / scalar predicates
hotshard.staleness.consumer-tolerance.*max_staleness_seconds honored; refresh triggered on missAll staleness levels
fresh-vamana.append-search.*After N appends, recall@10 stays within 5% of full-rebuild baselineN ∈ {10K, 100K, 1M}
fresh-vamana.consolidation.recall.*Post-consolidation recall returns to full-rebuild baselineAfter 10× threshold of appends
fresh-vamana.tombstone.delete.*Tombstoned items absent from query results; remain in graph until consolidationMixed insert/delete
index-health.drift-metric.*Drift metric monotonically tracks distribution shiftSynthetic drift injection
index-health.recommended-action.*Threshold transitions produce correct recommended_action transitionsAll policy levels

8. Latency and cost at scale

8.1 HotShard overhead

Per append (small batch, <100 items, embedding modality):

  • 1 GET of current HotShard (~50 KB warm) + 1 PUT of new HotShard (~50 KB) + 1 ref CAS.
  • ~30–80 ms p50 on commodity backends.

vs full Manifest publish:

  • 1 PUT of new Track Object + 1 PUT of any new Bucket + 1 PUT of new Manifest + 1 ref CAS.
  • ~200–500 ms p50.

~5–10× speedup for high-rate ingest.

8.2 Streaming Vamana cost

Append a vector to a 1B-item graph:

  • ~150 GETs of GraphPage (warm cache after the first ~50) → ~10 ms.
  • α-prune in-memory → <1 ms.
  • ~64 PUTs of updated GraphPage Objects (neighbors of v) → ~30 ms.
  • Total: ~40–60 ms per append. Throughput: ~20 appends/sec/connection.

At higher throughput, operators batch appends (each append's neighbor-updates can be merged) — typical workloads sustain 200+ appends/sec via batching.

8.3 Consolidation cost

1B-item graph; consolidation_threshold_appends = 1M (0.1% of corpus):

  • Affected nodes: ~10× the appends, so ~10M nodes.
  • Re-build cost: O(N × R × L_build) operations; ~30 minutes on a single multi-core machine.
  • New GraphPage emissions: ~10M nodes / 256 per page = ~40K pages = ~2.5 GB.

Consolidation runs as a background job; doesn't impact query path until publish.

9. Out of scope

  • Linearizable cross-writer freshness. HotShard provides per-writer monotonic freshness; cross-writer requires consensus, which DreamDB defers (spec/0008).
  • Cross-modality drift detection. Per-modality only; correlated drift across multi-modal tracks is application-layer.
  • Adaptive index reconfiguration. "Switch from IVF to IMI as data grows" is a future operator tool; spec doesn't automate it.
  • Online learning of codebooks. QINCo codebooks are fixed at training time; online updates would change the content-hash, breaking immutability. Re-training is via Layer publish (§6).

10. Open questions

  • OQ-67 (→ this spec): HotShard durability — must writes be fsync'd before the writer considers commit "complete"? Standard answer: yes, the backend's PUT ack means durably written. But hot-tier semantics could permit "best-effort durability" for ultra-low-latency cases. Defer until pilot.
  • OQ-68 (→ this spec): Consolidation cadence — operator-scheduled vs automatic? Probably operator-controlled by default with a "recommended_action" hint. Auto-consolidation could be opt-in via a Manifest registry field.
  • OQ-69 (→ this spec): Drift metric definition for SPLADE / ColBERT modalities. Centroid-distance doesn't generalize cleanly. Likely a different per-algorithm metric. Defer until SPLADE/ColBERT implementations land.
  • OQ-70 (→ spec/0009): Conformance vectors for streaming-Vamana correctness — bit-identical incremental builds across implementations. Block v0.X+ on this.

Next: spec/0017 — schema evolution. When the embedding model upgrades, we need to re-index 10B items without losing the old corpus. Bridges the gap from spec/0016 incremental updates to "bulk migration."