Changelog

All notable changes to DreamDB are documented here. Reference implementation status; protocol-spec changes are noted with their spec doc reference. Dates use ISO format.

[Unreleased]

Added — AWS S3 / SigV4 production path (2026-05-22)

dreamdb-connector-http/src/sigv4.rs — S3Signer wraps the aws-sigv4 crate; produces Authorization + x-amz-date + x-amz-content-sha256 + (optionally) x-amz-security-token headers per request.
HttpConnectorConfig.signer_from_env() auto-detects AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY / AWS_SESSION_TOKEN / AWS_REGION (falls back to us-east-1). No signer attached → connector sends unsigned requests (MinIO-anonymous dev mode preserved).
Every connector verb (PUT / GET / HEAD / LIST / DELETE / multi-range GET) now signs when a signer is configured; SigV4 covers all wire headers including per-request If-Match, Range, etc.
dreamdb-cli/src/main.rs and dreamdb-dataset-python/src/lib.rs factories pick up the env-var signer automatically — same code talks to MinIO and S3.
Verified live against s3://dreamdb-test-20260518/ in us-east-1: create + append + count + snapshot + branch + history + delete all round-trip; 69 Objects landed, 19.8 KB total.

Added — Python SDK ergonomics (2026-05-22)

Dataset.iter_stream(batch_size, fields, channel_buffer) (B1.5): real Python generator backed by a tokio mpsc channel. Bounded RAM per batch (verified 104 MB peak RSS on 91K records). Closes the last 10B-scale follow-up.
Dataset.delete(anchors, reason=None) — tombstone anchors via spec/0020. Returns the new Manifest hash.
Dataset.tombstone_set() — resolved set of suppressed anchors at the current Manifest.
Dataset.merge(other_ref, strategy="fast-forward"|"union-tracks") — full Python parity with the Rust MergeStrategy enum.
Dataset.merge_many(branches) — N-way sequential union-merge for sharded ingest.
Dataset.history(max_depth=50) — walks the Manifest DAG via parents[0]; returns [{manifest, ts_ns, writer, parents_count, tracks_count}, …]. Mirrors the browser UI's ⏳ history button.
Dataset.list_refs() — lex-sorted list of every ref under the backend's refs/ prefix.
Dataset.count() + __len__ — record count at the current tip, via iter_stream. First-time-user reflex now works.
Schema is chainable: (vd.Schema().add_image(...).add_embedding(...).add_scalar_categorical(...)). Wrapped _RustSchema; backward compatible.
Friendlier HTTP errors: 403 / 404 / 412 / connection-refused / SI-conflict / schema-mismatch errors now carry actionable hints (e.g. "for local MinIO dev: mc anonymous set public local/<bucket>").

Added — CLI ergonomics (2026-05-22)

dreamdb query --backend ... --ref-name ... --field <name> --query-file <path> — top-K vector search from the command line. Operator spot-check verb; reads raw LE-f32 query bytes from a file.
dreamdb delete --ref-name ... <anchor> [<anchor>...] — tombstone CLI (paired with the SDK method).
dreamdb merge-many --ref-name <trunk> <branches>... — orchestrator CLI for sharded-ingest workflows.
Fixed: dreamdb snapshot --help no longer shows the GC description (doc-comment block at main.rs:207 was attached to the wrong variant); dreamdb gc --help now has its own help text.

Added — operator manifests (2026-05-22)

dreamdb-cli/examples/sharded-ingest.yaml (228 lines) — 3-Job k8s pattern for N parallel workers + merge-many orchestrator.
dreamdb-cli/examples/ada-ivf-step-sharded.yaml (246 lines) — 4-stage pipeline (centroids → publish-SI → redispatch → finalize) for B3's sharded redispatch at 10B-scale.
dreamdb-cli/examples/ada-ivf-step.yaml — header updated with "when to use which YAML" pointer.

Added — documentation (2026-05-22)

README.md rewritten (40 → 165 lines): pitch + 4 first principles + Python sketch + status table + 60-second quickstart + repo layout + spec roadmap with honest gaps. Now reflects the actual shipped state.
docs/tutorial.md (370 lines): "DreamDB in 10 minutes" end-to-end walkthrough — schema + ingest + snapshot + query + PyTorch DataLoader + time-travel + ada-ivf-status + delete + sharded ingest. Step 14 documents the S3 migration path.
INDEX.md: refreshed to 21 specs (was 20); added spec/0020 row; "What's Next" updated to reflect shipped state.

Spec changes (2026-05-18 → 2026-05-22)

spec/0001 §2.1 — Object-kinds table expanded from 10 to 17 entries (added TombstoneList, ScalarIndex, VectorCompressor, ItemManifest, GraphIndex, GraphPage, plus existing ones that were missing).
spec/0008 §5.3 (new) — formalizes the distinction between layered-merge (the original §6.1 approach: both parents' Tracks coexist as separate TrackEntrys) and fused-merge (B2's implementation: one merged TrackObject per modality with cell-by-cell bucket reconciliation). Reference implementation ships fused-merge for SpatialBucket tracks; both are spec-valid.
spec/0020 §3.1 — broken cross-reference "spec/0001 §3.2" corrected to "spec/0002 §7.5" (path table actually lives in spec/0002).
spec/0020 §3.1 — TombstoneEntry anchor field clarified to be u64 (the Item-level TimeAnchor), not Multihash. Single anchor suppresses every record across every modality.

Changed — `dreamdb-dataset/src/dataset.rs` split into 10 modules

The monolithic 5592-line file was split into a slim facade (dataset.rs, 184 lines) plus 9 single-responsibility submodules:

Module	LOC	Responsibility
`dataset.rs`	184	facade — structs (`Dataset`, `Batch`, `MergeStrategy`, `DatasetVersion`, `FieldTrack`), accessors, `mod` declarations
`dataset/append.rs`	1035	`append`, `append_many` (write path)
`dataset/create.rs`	592	`create`, `open*` (lifecycle)
`dataset/fetch.rs`	337	private fetch helpers for Manifest/Track/Bucket Objects
`dataset/iter.rs`	1155	`iter`, `iter_stream`, `iter_with_fields`, `iter_time_range`
`dataset/layer.rs`	570	`add_scalar_layer`, `add_embedding_layer`
`dataset/merge.rs`	690	`merge`, `merge_many`, union-merge family
`dataset/snapshot.rs`	254	`snapshot`, `branch`, `history`, `list_refs`
`dataset/tombstone.rs`	235	`delete`, `tombstone_set`
`dataset/util.rs`	906	free helpers, `SpatialDispatcher`, unit tests

Test suite unaffected: 721 tests still pass after the split.

Added — 10B-scale blocker push (2026-05-15 → 2026-05-18)

All 7 execution blockers from design/0006-10b-scale-blockers.md shipped; B7 audit verified no-op:

#	Blocker	Status	Key change
B1	Streaming iter	✅	`Dataset::iter_stream` lazy bucket walk, bounded RAM
B5	Parallel blob fetch	✅	`buffer_unordered(64)` per-anchor prefetch
B3	Sharded redispatch	✅	4-stage k8s pipeline (`--orchestrate-phase`, `--redispatch-shard`)
B4	Prefix-sharded GC	✅	`dreamdb gc --shard N --of M` — partition by leading-u64 of multihash
B8	Tombstones	✅	spec/0020 + `Dataset::delete` + `dreamdb delete`
B6	Rayon hash_vector	✅	Parallel `IvfCosine::compute_dots` above 512K-flop threshold
B7	Manifest size at 10B	✅ audit	~2.5 KB at 10B; 400× headroom under 1 MiB
B2	Sharded ingest	✅	`MergeStrategy::UnionTracks` + `Dataset::merge_many` + `dreamdb merge-many`

Latent bug discovered en route: post-FastForward field_tracks wasn't being refreshed → reads returned stale pre-merge results. Fixed via Dataset::refresh_field_tracks_from_current().

Test suite: 705 → 721 tests, 0 regressions.

Other quality work

Spec audit: 6 stale-claim fixes across design/0001-0007 (Phase-0 banners refreshed, "Still no tombstones" → "✅ shipped", phantom spec/0021 references removed, etc.).
Unused-import warnings cleanup: cargo fix + manual sweep removed ~30 dead imports across dreamdb-dataset, dreamdb-protocol, dreamdb-bench.

Live evidence

231K imagenet-100 ingested with CLIP embeddings, RaBitQ corrected, IVF k=70. Validated linear-probe training (val_acc 4.9% → 36.0% in 5 epochs).
B1.5 streaming verified mid-ingest on a half-baked 91K-record bucket: 5400 records/sec, 104 MB peak RSS, 425ms first-batch latency.
AWS S3 SigV4 verified on s3://dreamdb-test-20260518/ (us-east-1): full Dataset lifecycle round-trips (create/append/count/snapshot/branch/history/delete) in 32.6s for 50 samples (WAN-bound, not DreamDB-bound).
1.33M imagenet-1k-256 ingest aborted at ~162K records due to local disk pressure; the experiment validated SDK + B2/B3 mechanics; the partial bucket was reclaimed during disk cleanup.

Pre-history

This changelog starts with the 10B-scale push. Earlier work (Phase 1-3.4: dataset platform, schema persistence, IVF+RaBitQ foundation, Ada-IVF maintenance, paged tracks, chain-aware lineage, browser SDK) is documented in design/0002-known-flaws-retrospective.md and the per-spec status fields. The protocol spec (spec/0000-0020) has been stable since 2026-05-13.