DreamDB Specification — 0001: Data Model
Status: Draft. Builds on
0000-overview.md. This document defines the entities of the DreamDB data model precisely. Hash schemes, address syntax, and timestamp formats are deferred to0002,0003, and0004.
1. Purpose
0000 introduced the vocabulary informally. This document fixes the meaning of each entity — what it is, what fields it has, what invariants it must satisfy, and how the entities compose into a Space.
The model is described abstractly. Concrete byte-level encodings of each entity are referenced (e.g. "the manifest is content-addressed per 0002") but specified in their own documents.
2. The Five Entities
The five entities are: Space, Timeline, Track, Item, Manifest. Layer is not a separate entity — it is a role a Track plays relative to other Tracks (see §6).
2.1 DreamDB Object (the storage-layer concept)
The data model entities above are conceptual. When stored on a backend, each is realized as one or more DreamDB Objects — content-addressed byte sequences whose addresses are the BLAKE3-256 hashes of their bytes (per 0002 §2). All DreamDB Objects share three properties:
- Immutable: once written, bytes never change. The address IS the hash, so any modification produces a different address.
- Content-addressed: the address is a deterministic function of the bytes alone.
- Backend-stored: the bytes live at a backend path equal to the DreamDB address (per
0002§4 — "the address IS the path").
Object kinds defined in v0:
| Object kind | Realizes | Defined in |
|---|---|---|
| Genesis Object | Timeline identity | §5 |
| Manifest | Space snapshot | §7 |
| Track Object | Track metadata + index | §4 |
| Index Page | B-tree page (paged Track index, paged Manifest tracks) | 0002 §7.3.2 |
| SpatialIndex Object | Algorithm parameters for spatial bucketing | 0004 §3 |
| Constant Object | A single Constant Item's payload | §4.3 |
| Fragment | Media chunk (Continuous Signal Track, media) | §4.1 |
| Spatial Bucket | Vector batch (Continuous Signal Track, feature) | §4.1 |
| Time-bucketed batch | Event batch (Discrete Event Track, high-volume) | §4.2 |
| Vector-Storage Object | Per-Track vector pool (reference-mode bucketing) | 0007 §6.3 |
| ScalarIndex Object | Scalar-field index (categorical / bitmap) | 0011 §3 |
| VectorCompressor Obj | Per-Space compressor parameters (PQ / RaBitQ) | 0010 §4 |
| ItemManifest | Chunk list for one large multi-chunk Item | 0002 §7.5 |
| GraphIndex Object | Vamana / DiskANN parameters + page layout | 0013 §3 |
| GraphPage | Packed Vamana graph nodes (vector + adjacency) | 0013 §4.2 |
| TombstoneList Object | Suppressed-anchor list (Item-level deletion) | 0020 §3 |
The protocol defines no other Object kinds in v0. All references to "an Object" anywhere in the spec mean "an instance of one of these kinds."
3. Item
An Item is the atomic unit of a Track. It is the smallest object the protocol addresses individually.
Every Item has, at minimum:
- A time anchor. A point, a half-open interval
[t_start, t_end), or the special "all of time" anchor (for Constants). The time anchor's representation is fixed in0003. - A payload. An opaque byte string, interpreted according to the modality of the Track that contains the item. The protocol does not parse payloads.
- A modality tag (inherited from the containing Track — not stored per-item). This determines how the payload bytes are to be decoded by clients.
Items are content-addressed per 0002: an item's address is a deterministic function of its (timeline, modality, time-anchor, payload-hash) and is independent of which writer produced it.
Three concrete sub-types exist, corresponding to the three Track kinds:
| Track kind | Item sub-type | Time-anchor shape | Cardinality per track |
|---|---|---|---|
| Continuous Signal | Frame | point, regular cadence | many, dense |
| Discrete Event | Event | point or short interval, sparse | many, sparse |
| Global Constant | Constant | the entire timeline span | exactly one |
3.1 Items vs Objects (the bucketing layer)
An Item is the protocol's logical addressable unit. An Object is the backend's physical storage unit. They are distinct concepts:
-
For modalities where item count and item size are well-matched to one-Item-per-Object — small embedding tracks, sparse low-volume events, all Constants — the Item is the Object.
-
For modalities where one-Item-per-Object is uneconomical (per-frame addressing of VBR media; per-vector storage at 1B+ scale; per-event storage at high event rates), Items are grouped into Objects via a per-modality bucketing scheme. The Object kinds are:
Item type Object kind Intra-object locator Defined in Frame (media) Fragment (GOP) byte offset within fragment 0007Frame (vector at scale) Spatial Bucket index into packed array 0004,0007Event (high-volume) Time-bucketed batch time-ordered position 0007
The full Item address decomposes accordingly:
Both the lookup-by-prefix mode (returns a set of Items, after fetching the matching Objects) and the lookup-by-exact-address mode (returns one Item, locating it within one Object) are preserved by this decomposition. Detailed grammar in 0002.
The bucketing scheme for a given Track is part of the Track's modality definition, not a per-track choice. (E.g. video.h264 always uses GOP-aligned Fragments; embedding.f32.dim=768.bucketed always uses Spatial Buckets at 1B+ scale.)
4. Track
A Track is a typed collection of Items, anchored to one Timeline.
A Track has:
- A reference to its Timeline.
- A modality tag, e.g.
video.h264,audio.opus,embedding.f32.dim=512,title.text. Modality tags follow a structured grammar fixed in0002§5 (<class>.<encoding>[.<param>...], withparamallowingkey=valueform). The modality determines:- What the payload bytes mean.
- Which Track kind the track belongs to (some modalities are inherently continuous, some inherently event-like, some inherently constant).
- A kind, one of
continuous,event,constant. Determined by the modality tag; not freely chosen. - A time interval of coverage,
[t_min, t_max), which constrains the time-anchors of all Items in the track. (For Constant tracks, the coverage is the time anchor of the single Constant.) - Zero or more Items.
Tracks are immutable once published. "Adding an item to a track" is not a protocol operation. To add information at a new timestamp, a writer publishes a new track (or, more commonly, a new version of an existing logical track via the manifest layering mechanism in §6).
4.1 Track Kind: Continuous Signal
A Continuous Signal Track holds Frames densely over its coverage interval.
- Frame time-anchors lie within the track's coverage interval.
- The cadence (frames per second, samples per second, vectors per chunk, etc.) is part of the modality's parameterization.
- Per §3.1, Frames group into Objects via a modality-specific bucketing scheme:
- Media modalities (
video.*,audio.*) — Frames group into Fragments (self-contained, decoder-ready chunks, typically a GOP, ~1–10 s each). VBR is supported because the fragment-index in the Track entry records actual byte sizes per Fragment. Defined in0007. - Feature modalities at scale (
embedding.*with large dim and large item count) — Frames (vectors) group into Spatial Buckets keyed by a derived spatial-bucket-key (per0004). Targets ~1–10 MB per Bucket Object. - Small / low-volume feature modalities — no bucketing; each Frame is its own Object.
- Media modalities (
- The Track's manifest entry includes the Object index appropriate to its bucketing scheme: a fragment-index for media, a bucket-list for spatially-bucketed vectors. The index is small (KB to low MB) and content-addressed.
- Examples of modality tags:
video.h264,video.av1,audio.opus,embedding.f32.dim=512.per_frame,embedding.f32.dim=768.bucketed.
4.2 Track Kind: Discrete Event
A Discrete Event Track holds Events sparsely over its coverage interval.
- Event time-anchors lie within the track's coverage interval but are unconstrained in spacing.
- An Event may have a duration (interval anchor) or be instantaneous (point anchor).
- The track has no expected cadence; the count of events ranges from 1 to many millions, depending on the application.
- Per §3.1, Events group into Objects according to track volume:
- Low-volume Event Tracks (up to a few thousand Events total) — no bucketing; each Event is its own Object.
- High-volume Event Tracks — Events group into Time-bucketed batch Objects (each batch covers a fixed time slab; intra-batch ordering is by time anchor). Defined in
0007.
- Examples of modality tags:
annotation.json,transcript.turn,scene.boundary,sensor.gps.
4.3 Track Kind: Global Constant
A Global Constant Track holds exactly one Constant, anchored to its full coverage interval.
- The Constant's time anchor equals the track's coverage interval (the "all of time for this track" anchor).
- The payload represents a single value: a string, a JSON document, a small binary blob, etc.
- There is no "name within the track" key. The track's modality tag identifies what the constant represents (e.g.
title.text,license.spdx,author.json). Modeling several attributes therefore requires several Tracks, not one Track with several keys. This preserves the "time is the sole primary key" principle (0000§5.1). - Storage layout is a single small immutable object.
- If the constant's value needs to change later, a new Constant Track of the same modality is published as a higher Layer (§6); the original remains addressable.
4.4 Constraints Common to All Track Kinds
- A Track is immutable. Its bytes — including the enumeration of its Items — are fixed at publication.
- A Track's address (per
0002) is a function of(timeline, modality, coverage-interval, content-hash-of-items). - A Track belongs to exactly one Timeline. Cross-timeline queries are performed by joining at the manifest level (§7), not by sharing tracks.
4.5 Empty / degenerate Track rules
Empty-state rules per Track kind (conformant readers MUST validate; conformant writers MUST honor):
- Continuous Signal Track: MAY have zero Frames. Valid as a placeholder track committing to a modality without yet having data. Its
object_indexis the empty inline list (or a paged form withtotal_items = 0). - Discrete Event Track: MAY have zero Events. Valid as a placeholder for future events. Same encoding as above.
- Global Constant Track: MUST have exactly one Constant. Tracks with zero or two-or-more Constants are malformed; readers MUST reject them.
Manifest-level validation: a Constant Track entry that does not satisfy the "exactly one" rule causes the entire Manifest to be invalid. Other Track kinds with empty object_index are valid.
Append operations (per 0006 §5.1) with zero Items SHOULD be a no-op — the writer SHOULD NOT publish a new Manifest with no track changes. (Writers MAY publish empty-Append Manifests for forensic reasons; readers MUST accept them.)
5. Timeline
A Timeline is a monotonic axis of timestamps over which Tracks are anchored. Every Timeline has a globally unique identity that requires no central registry.
5.1 Timeline Genesis (the identity-bearing Object)
A Timeline is defined by a small immutable Genesis Object:
The Timeline's identity is the content hash of the Genesis Object (using the hash function fixed in 0002).
5.2 Why identity is globally unique
- Cryptographic uniqueness. The 128-bit nonce makes the probability of two independently-created Genesis Objects colliding on identity ~2⁻¹²⁸ — cryptographically negligible. No central registry, no name authority, no DNS-style coordination is required.
- Self-certifying. Given a Timeline identity, anyone can fetch the Genesis Object from any backend that holds it and re-hash to verify the binding.
- Deterministic per Genesis. Two writers who share the Genesis Object (by exchanging it out-of-band) compute the same identity, so they both write to the same Timeline. Genuine collaboration on one Timeline is fully supported; the only prerequisite is exchanging the Genesis Object first.
- Independent of Space. A Timeline's identity does not depend on which Space, manifest, or backend references it. The same Timeline may appear in many Spaces, or in none. (This mirrors how Git treats blob hashes as independent of repos.)
canonical_nameis opaque to identity. It is included inside the Genesis Object — so changing it produces a different Timeline — but no part of the protocol consults it for routing, joining, or addressing. Renaming a Timeline is therefore impossible by construction; what users call "rename" is "publish a new Timeline."
5.3 Timeline metadata
Beyond identity-bearing fields, a Timeline carries:
- An origin, the timestamp
t = 0. (Whether origins are absolute Unix nanoseconds or per-Genesis relative is OQ-1 in0000, resolved in0003. The choice affects time encoding, not identity.) - (Resolution is implicit at 1 ns for v0 per
0003§3; not stored in the Genesis Object.) - An optional horizon, a declared
[t_min, t_max)interval beyond which Items are not expected. Useful for bounded recordings; absent for open-ended live streams.
5.4 Role in the data model
A Timeline is the only primary key for joining data within itself. There is no foreign-key concept. Two pieces of data are "about the same thing" if and only if their time anchors overlap on the same Timeline.
A Space may contain many Timelines. Tracks belong to exactly one Timeline. Cross-Timeline relationships are not part of the v0 data model. Two recordings of the same physical event from different cameras produce two Genesis Objects with different nonces, hence two distinct Timelines — which is correct: they are distinct recordings, however correlated their content. To express "these are the same event from different angles," a higher-level alignment artifact would be needed (a small object mapping time_on_T1 ↔ time_on_T2); alignment is deliberately out of scope for v0 and is mentioned only as a non-goal in §11.
6. Layer (a Role, Not an Entity)
A Layer is a Track that derives information from, or supersedes, another Track on the same Timeline.
Layering is not a structural property of the Track itself. It is a relationship recorded in the Manifest (§7). The same physical Track object can be:
- A "base" track in one Space (no parent).
- A "layer" in another Space (parent declared by a manifest entry).
This is intentional. It mirrors how Git treats blobs: a blob is a blob; whether it represents a file added in this commit or one carried over from a parent is a property of the commit's tree, not of the blob.
The Manifest declares, for each Track in a Space:
- Its address (where the bytes live;
0002). - Its role:
base(no parent) orlayer-of: <track-address>(derived from another track). - Optionally, supersession semantics: whether this layer extends, overrides, or annotates its parent. Vocabulary fixed in
0008.
Three layering patterns are common; all use the same structural mechanism:
- Augmentation. A vector embedding track layered over a video track. Same time interval, different modality. Both remain readable.
- Correction. A Constant track of modality
title.textpublished as a higher layer over an earlier Constant track of the same modality. Manifest readers see the higher layer first. - Annotation. An Event track of modality
annotation.jsonlayered over a video. Each event references the time interval it pertains to via its own time anchor — not via a pointer to the parent track.
In all three cases, the parent track's bytes are untouched. Layering is additive at the storage level and resolved at query time by walking the manifest.
7. Manifest
A Manifest is an immutable, content-addressed object that enumerates the state of a Space at a moment in its history.
A Manifest has:
- A parent reference, which is the address of the previous Manifest, or null for the first Manifest in a Space.
- A list of Timeline entries (additions; the protocol does not support timeline removal in v0).
- A list of Track entries. Each entry has:
- The address of the Track.
- The Timeline it belongs to.
- Its modality tag and kind (redundant with the Track's own bytes, but cached here so manifest readers can plan queries without first fetching every track).
- Its role (
baseorlayer-of: <track-address>). - Its coverage interval.
- A timestamp of publication, by the manifest's writer. This is metadata, not part of any Item's time anchor.
- A writer identity tag (optional, opaque to the protocol).
A Manifest is the unit of versioning. Publishing new data — be it a base track, a layer, or a correction Constant — produces a new Manifest. The chain of Manifests is the history of the Space, analogous to a Git commit history.
Manifest semantics — branching, merging, conflict-free concurrent publication — are detailed in 0008. For the purposes of this document, only the structure is fixed.
8. Space
A Space is the top-level container of a DreamDB installation. It is identified by:
- Its root manifest address — the address of the most recent Manifest the reader has chosen to read at. (A Space is therefore implicitly parameterized by which manifest you are reading.)
- Its backend binding — which backend the Manifests, Tracks, and Items live on. The protocol does not require the binding to be unique: the same set of immutable objects may be hosted on multiple backends, and a Space at "manifest M on backend A" is observationally equivalent to "manifest M on backend B" if both backends contain the closure of objects reachable from M.
There is no global registry of Spaces. A Space exists wherever its manifest chain is reachable.
9. Type System Summary
| Entity | Mutable? | Content-addressed? | Belongs to | Contains |
|---|---|---|---|---|
| Space | by reference (which manifest you read) | n/a | — | Timelines, Tracks, Manifests (transitively) |
| Manifest | no | yes | Space (transitively) | references to Timelines and Tracks |
| Timeline | no | yes (by content hash of Genesis Object — globally unique via 128-bit nonce) | nothing (Timeline identity is independent of Space) | nothing (it is a coordinate axis) |
| Track | no | yes | Timeline | Items |
| Item | no | yes | Track | a payload + a time anchor |
"Mutable by reference" for Space means: the bytes never change, but the Space-as-a-user-sees-it advances when the user chooses to read a newer manifest.
10. Worked Examples (Three Track Kinds in One Space)
Suppose a Space contains a single Timeline T representing one 10-minute video recording, with origin at the camera's start-of-recording wall-clock time and nanosecond resolution.
A reasonable populated Space might contain:
A query like "what was the title of the recording where the goalkeeper saved a penalty?" dispatches to two different lookup paths:
- "Goalkeeper saved a penalty" is a feature query against
embedding.f32.dim=512. The SDK encodes the text, computes the spatial address region (per0004), and finds the matching frame's time anchor onT. - "What was the title?" is a Constant lookup. The SDK reads the address of the
title.textConstant Track onT(per0002, this is a direct function of(T, "title.text", full-coverage)) and fetches its single Constant. One backend read.
Both lookups consult the same Manifest to know which Tracks are live at the chosen point in history, but neither involves a scan.
11. Out of Scope for this Document (and for v0)
- The byte-level encoding of any of these entities. (
0002,0007) - The timestamp format and the resolution of OQ-1 from
0000. (0003) - The address syntax, including the modality-tag grammar referenced informally above. (
0002) - The spatial-indexing scheme for feature-bearing tracks. (
0004) - The manifest chaining and merge semantics, including how concurrent writers reconcile. (
0008) - Cross-Timeline alignment. Mapping
time_on_T1 ↔ time_on_T2between two distinct Timelines (e.g. two cameras at the same physical event) is a higher-level construct and is deliberately deferred. v0 deals with relationships within a Timeline only. - Human-readable Timeline naming as identity. The
canonical_namefield on a Genesis Object is a diagnostic hint, never a routing or join key. Identity is always the content hash.
12. Open Questions Surfaced by This Document
- OQ-6 (→ 0002 §5): Modality-tag grammar. Resolved:
class.encoding(.param)*with reverse-DNS namespacing for user-defined modalities; built-in classes are reserved. - OQ-7 (→ 0008 §6.3, 0007 §8.2): Concurrent constant correction conflicts. Resolved: lexicographically-greatest layer-Track address wins for Constants; non-Constant Tracks union-merge.
- OQ-8 (→ 0007 §8): Default Time-batch duration. Resolved: per-modality via
bucket=<duration>parameter (no fixed default; recommended sizing table in §8.1.2). - OQ-9 (→ 0004 §5): Default spatial-bucket-key derivation. Resolved:
dreamdb.lsh-cosineships as v0 default; the algorithm registry supports v0.1+ alternatives (dreamdb.pq-ivf, etc.). - OQ-10 (→ 0007 §5.4): Default Fragment duration. Resolved: 2 s default; per-modality tunable via
frag-duration=parameter; permitted range 1–30 s.
Next: 0002-content-addressing.md — the hash function, what gets hashed, the modality-tag grammar (OQ-6), and the address syntax.