Changelog
All notable changes to DreamDB are documented here. Reference implementation status; protocol-spec changes are noted with their spec doc reference. Dates use ISO format.
[Unreleased]
Added — AWS S3 / SigV4 production path (2026-05-22)
dreamdb-connector-http/src/sigv4.rs—S3Signerwraps theaws-sigv4crate; produces Authorization +x-amz-date+x-amz-content-sha256+ (optionally)x-amz-security-tokenheaders per request.HttpConnectorConfig.signer_from_env()auto-detectsAWS_ACCESS_KEY_ID/AWS_SECRET_ACCESS_KEY/AWS_SESSION_TOKEN/AWS_REGION(falls back tous-east-1). No signer attached → connector sends unsigned requests (MinIO-anonymous dev mode preserved).- Every connector verb (PUT / GET / HEAD / LIST / DELETE / multi-range GET) now signs when a signer is configured; SigV4 covers all wire headers including per-request
If-Match,Range, etc. dreamdb-cli/src/main.rsanddreamdb-dataset-python/src/lib.rsfactories pick up the env-var signer automatically — same code talks to MinIO and S3.- Verified live against
s3://dreamdb-test-20260518/in us-east-1: create + append + count + snapshot + branch + history + delete all round-trip; 69 Objects landed, 19.8 KB total.
Added — Python SDK ergonomics (2026-05-22)
Dataset.iter_stream(batch_size, fields, channel_buffer)(B1.5): real Python generator backed by a tokio mpsc channel. Bounded RAM per batch (verified 104 MB peak RSS on 91K records). Closes the last 10B-scale follow-up.Dataset.delete(anchors, reason=None)— tombstone anchors via spec/0020. Returns the new Manifest hash.Dataset.tombstone_set()— resolved set of suppressed anchors at the current Manifest.Dataset.merge(other_ref, strategy="fast-forward"|"union-tracks")— full Python parity with the RustMergeStrategyenum.Dataset.merge_many(branches)— N-way sequential union-merge for sharded ingest.Dataset.history(max_depth=50)— walks the Manifest DAG viaparents[0]; returns[{manifest, ts_ns, writer, parents_count, tracks_count}, …]. Mirrors the browser UI's ⏳ history button.Dataset.list_refs()— lex-sorted list of every ref under the backend'srefs/prefix.Dataset.count()+__len__— record count at the current tip, via iter_stream. First-time-user reflex now works.Schemais chainable:(vd.Schema().add_image(...).add_embedding(...).add_scalar_categorical(...)). Wrapped_RustSchema; backward compatible.- Friendlier HTTP errors: 403 / 404 / 412 / connection-refused / SI-conflict / schema-mismatch errors now carry actionable hints (e.g. "for local MinIO dev:
mc anonymous set public local/<bucket>").
Added — CLI ergonomics (2026-05-22)
dreamdb query --backend ... --ref-name ... --field <name> --query-file <path>— top-K vector search from the command line. Operator spot-check verb; reads raw LE-f32 query bytes from a file.dreamdb delete --ref-name ... <anchor> [<anchor>...]— tombstone CLI (paired with the SDK method).dreamdb merge-many --ref-name <trunk> <branches>...— orchestrator CLI for sharded-ingest workflows.- Fixed:
dreamdb snapshot --helpno longer shows the GC description (doc-comment block atmain.rs:207was attached to the wrong variant);dreamdb gc --helpnow has its own help text.
Added — operator manifests (2026-05-22)
dreamdb-cli/examples/sharded-ingest.yaml(228 lines) — 3-Job k8s pattern for N parallel workers +merge-manyorchestrator.dreamdb-cli/examples/ada-ivf-step-sharded.yaml(246 lines) — 4-stage pipeline (centroids → publish-SI → redispatch → finalize) for B3's sharded redispatch at 10B-scale.dreamdb-cli/examples/ada-ivf-step.yaml— header updated with "when to use which YAML" pointer.
Added — documentation (2026-05-22)
README.mdrewritten (40 → 165 lines): pitch + 4 first principles + Python sketch + status table + 60-second quickstart + repo layout + spec roadmap with honest gaps. Now reflects the actual shipped state.docs/tutorial.md(370 lines): "DreamDB in 10 minutes" end-to-end walkthrough — schema + ingest + snapshot + query + PyTorch DataLoader + time-travel + ada-ivf-status + delete + sharded ingest. Step 14 documents the S3 migration path.INDEX.md: refreshed to 21 specs (was 20); added spec/0020 row; "What's Next" updated to reflect shipped state.
Spec changes (2026-05-18 → 2026-05-22)
spec/0001 §2.1— Object-kinds table expanded from 10 to 17 entries (added TombstoneList, ScalarIndex, VectorCompressor, ItemManifest, GraphIndex, GraphPage, plus existing ones that were missing).spec/0008 §5.3(new) — formalizes the distinction between layered-merge (the original §6.1 approach: both parents' Tracks coexist as separateTrackEntrys) and fused-merge (B2's implementation: one merged TrackObject per modality with cell-by-cell bucket reconciliation). Reference implementation ships fused-merge for SpatialBucket tracks; both are spec-valid.spec/0020 §3.1— broken cross-reference"spec/0001 §3.2"corrected to"spec/0002 §7.5"(path table actually lives in spec/0002).spec/0020 §3.1— TombstoneEntryanchorfield clarified to beu64(the Item-level TimeAnchor), notMultihash. Single anchor suppresses every record across every modality.
Changed — dreamdb-dataset/src/dataset.rs split into 10 modules
The monolithic 5592-line file was split into a slim facade (dataset.rs, 184 lines) plus 9 single-responsibility submodules:
| Module | LOC | Responsibility |
|---|---|---|
dataset.rs | 184 | facade — structs (Dataset, Batch, MergeStrategy, DatasetVersion, FieldTrack), accessors, mod declarations |
dataset/append.rs | 1035 | append, append_many (write path) |
dataset/create.rs | 592 | create, open* (lifecycle) |
dataset/fetch.rs | 337 | private fetch helpers for Manifest/Track/Bucket Objects |
dataset/iter.rs | 1155 | iter, iter_stream, iter_with_fields, iter_time_range |
dataset/layer.rs | 570 | add_scalar_layer, add_embedding_layer |
dataset/merge.rs | 690 | merge, merge_many, union-merge family |
dataset/snapshot.rs | 254 | snapshot, branch, history, list_refs |
dataset/tombstone.rs | 235 | delete, tombstone_set |
dataset/util.rs | 906 | free helpers, SpatialDispatcher, unit tests |
Test suite unaffected: 721 tests still pass after the split.
Added — 10B-scale blocker push (2026-05-15 → 2026-05-18)
All 7 execution blockers from design/0006-10b-scale-blockers.md shipped; B7 audit verified no-op:
| # | Blocker | Status | Key change |
|---|---|---|---|
| B1 | Streaming iter | ✅ | Dataset::iter_stream lazy bucket walk, bounded RAM |
| B5 | Parallel blob fetch | ✅ | buffer_unordered(64) per-anchor prefetch |
| B3 | Sharded redispatch | ✅ | 4-stage k8s pipeline (--orchestrate-phase, --redispatch-shard) |
| B4 | Prefix-sharded GC | ✅ | dreamdb gc --shard N --of M — partition by leading-u64 of multihash |
| B8 | Tombstones | ✅ | spec/0020 + Dataset::delete + dreamdb delete |
| B6 | Rayon hash_vector | ✅ | Parallel IvfCosine::compute_dots above 512K-flop threshold |
| B7 | Manifest size at 10B | ✅ audit | ~2.5 KB at 10B; 400× headroom under 1 MiB |
| B2 | Sharded ingest | ✅ | MergeStrategy::UnionTracks + Dataset::merge_many + dreamdb merge-many |
Latent bug discovered en route: post-FastForward field_tracks wasn't being refreshed → reads returned stale pre-merge results. Fixed via Dataset::refresh_field_tracks_from_current().
Test suite: 705 → 721 tests, 0 regressions.
Other quality work
- Spec audit: 6 stale-claim fixes across
design/0001-0007(Phase-0 banners refreshed, "Still no tombstones" → "✅ shipped", phantomspec/0021references removed, etc.). - Unused-import warnings cleanup:
cargo fix+ manual sweep removed ~30 dead imports acrossdreamdb-dataset,dreamdb-protocol,dreamdb-bench.
Live evidence
- 231K imagenet-100 ingested with CLIP embeddings, RaBitQ corrected, IVF k=70. Validated linear-probe training (val_acc 4.9% → 36.0% in 5 epochs).
- B1.5 streaming verified mid-ingest on a half-baked 91K-record bucket: 5400 records/sec, 104 MB peak RSS, 425ms first-batch latency.
- AWS S3 SigV4 verified on
s3://dreamdb-test-20260518/(us-east-1): full Dataset lifecycle round-trips (create/append/count/snapshot/branch/history/delete) in 32.6s for 50 samples (WAN-bound, not DreamDB-bound). - 1.33M imagenet-1k-256 ingest aborted at ~162K records due to local disk pressure; the experiment validated SDK + B2/B3 mechanics; the partial bucket was reclaimed during disk cleanup.
Pre-history
This changelog starts with the 10B-scale push. Earlier work (Phase 1-3.4: dataset platform, schema persistence, IVF+RaBitQ foundation, Ada-IVF maintenance, paged tracks, chain-aware lineage, browser SDK) is documented in design/0002-known-flaws-retrospective.md and the per-spec status fields. The protocol spec (spec/0000-0020) has been stable since 2026-05-13.