DreamDBv0.2.0bec026

DreamDB Specification — 0003: Time Encoding

Status: Draft. Builds on 0000-overview.md, 0001-data-model.md, and 0002-content-addressing.md. This document fixes the byte and string formats for <time-anchor> and <time-bucket>, the contents of the Timeline Genesis Object's origin field, and resolves 0000 OQ-1 (absolute vs. Genesis-relative).


1. Purpose

0001 introduced time as the sole primary key. 0002 defined the slots in the address grammar where time appears (<time-anchor>, <time-bucket>). This document pins down the actual bits and characters:

  • The reference frame for timestamps (Genesis-relative, with an optional absolute back-reference).
  • The resolution and range of time values.
  • The CBOR encoding inside hashable Objects (Genesis, Track, Manifest, Index Pages).
  • The string encoding inside dreamdb:// URIs and backend object keys.
  • The placement-rule arithmetic for <time-bucket>.

What stays as defined elsewhere:

  • The abstract data model of time anchors (point / interval / all-of-time) — 0001 §3.
  • The role of time as primary key — 0000 §5.1.
  • The fragment-spans-bucket-boundary placement rule — 0002 §6.3.1.
  • The two-part address structure — 0002 §4.

What this document does not fix:

  • Multi-Timeline alignment (mapping anchors between two distinct Timelines) — out of scope for v0 per 0001 §11.
  • The byte format inside Fragments / Buckets / Time-batches — 0007.

2. Resolves OQ-1: Genesis-Relative is Canonical

The canonical form of every DreamDB time anchor is an integer count of resolution units measured from the Timeline's Genesis origin. t = 0 sits at the Genesis origin; positive values are after.

The Genesis Object's origin field carries an optional absolute back-reference — a Unix-nanosecond timestamp at which t = 0 sits in wall-clock time. Three modes are supported:

Genesis origin fieldTimeline modeMeaning
Unix nanosecond integerAnchoredt = 0 corresponds to the given Unix-ns instant. Wall-clock conversion is well-defined.
nullAbstractThe Timeline has no wall-clock anchor. Only relative comparisons within itself are meaningful.
(Reserved future tags)(Future)Other epoch references — TAI, GPS, monotonic-since-boot. Not in v0.

Why Genesis-relative as the canonical form:

  • Compactness. A 10-year recording at ns resolution fits in 53 bits — comfortably inside a 64-bit integer with room to spare. Genesis-relative integers stay small; absolute Unix-ns integers are already at ~1.8 × 10¹⁸ in 2026 and burn most of the i64 range.
  • Abstract timelines. Test data, simulations, fictional timelines, and offline replays don't have a meaningful wall-clock origin. Genesis-relative makes them first-class.
  • Wall-clock conversion stays cheap when needed. A reader who wants absolute time fetches the (small, immutable) Genesis Object once and adds the offset.
  • No system-clock dependence in the protocol. DreamDB never assumes the writer's clock matches anything in particular. The optional origin field declares an alleged absolute reference; no protocol verb requires it.

3. Resolution: Nanoseconds (Implicit)

DreamDB v0 fixes time resolution at nanoseconds for every Timeline. The resolution is implicit — it is not stored in the Genesis Object. Every time anchor in v0 is an integer count of nanoseconds since the Genesis origin.

Why ns-only in v0:

  • Covers the highest-rate modalities that matter (96 kHz audio, sub-µs sensor pings, video frame edges).
  • Slow events on a ns timeline simply use larger numbers. The cost of the larger numbers is bounded — i64 handles 584 years of ns.
  • One resolution means there's no Timeline-vs-Timeline unit conversion to get wrong.
  • Storing nothing about resolution saves a field per Genesis (small at billion-Genesis scale, but cleaner schema).

A v0.1+ spec extension MAY introduce coarser resolutions (ms, s, etc.) by adding a resolution field to Genesis with a forward-compat default of "ns"; v0 Genesis Objects (no resolution field) are interpreted as ns by all v0.1+ readers.

4. Range

v0 time anchors are unsigned 64-bit integers: 0 ≤ t < 2⁶⁴.

  • At ns resolution, this gives ~584 years of range from the Genesis origin. Adequate for every practical recording, surveillance feed, simulation, or archive.
  • Negative anchors (before the Genesis origin) are forbidden in v0. Writers SHOULD set the Genesis origin at or before the earliest anchor they expect to record. If items are discovered earlier than the origin, the correct response is to publish them on a new Timeline with an earlier origin (Genesis is immutable).
  • A future spec version MAY introduce signed anchors via a bias scheme (e.g. anchors stored as bias + value so lex-order remains numeric-order). Compatibility is forward — v0 readers reading a v0 Genesis never encounter negative anchors.

4.1 All time arithmetic is integer arithmetic

Time anchors, time-bucket indices, bucket-durations, and wall-clock back-references are all 64-bit integers. No protocol step may use floating-point arithmetic to compute a time value. This is a hard requirement, not a recommendation.

Why this matters:

  • IEEE 754 double (f64) has a 52-bit mantissa. Values above 2⁵² ≈ 4.5 × 10¹⁵ lose ns-level precision on f64 round-trip.
  • Unix-nanosecond timestamps in 2026 already exceed 1.7 × 10¹⁸. Any f64 conversion silently quantizes to ~250 ns granularity at this magnitude — destroying every precision guarantee this document makes.
  • A writer that computes seconds × 1e9 in f64 to produce a ns-resolution anchor introduces drift on the very first conversion. The drift is invisible to the writer but corrupts cross-implementation comparisons forever.

The discipline:

  • Anchor arithmetic. All +, -, floor-div, mod operations between time anchors and bucket-durations MUST use 64-bit (or wider) integer paths. No f64, f32, or "approximate" math anywhere on the time axis.
  • Wall-clock conversion. Computing t_relative = t_wall_clock_ns - origin_unix_ns MUST be integer subtraction. Any wall-clock representation (RFC-3339 strings, (seconds, nanoseconds) pairs, time.Time structs) MUST be converted to a single i64 ns count via integer multiplication and addition (e.g. seconds × 1_000_000_000 + nanoseconds) before any arithmetic — never via an f64 intermediary.
  • Duration parsing. Parsing the bucket-duration suffix grammar (§8) — 500ms, 60s, 5m — MUST use integer multiplication against the suffix's integer constant (1_000_000, 1_000_000_000, 60_000_000_000, …). No f64 path.
  • Overflow guarding. Writers SHOULD use checked-multiplication primitives when normalizing durations to ns and reject any duration whose ns count would exceed 2⁶³ − 1. (24h fits with vast headroom; 1_000_000_000 h does not. Guard accordingly.)
  • Display vs. computation. Output formatting (e.g. printing t as a decimal-fraction-of-second for human eyes) MAY use floats. Anything that consumes a time value back into protocol space MUST round-trip through an i64 representation first.

This rule applies to writers, readers, SDKs, and any tool that emits or interprets DreamDB time values. Storage formats already enforce integer representation (CBOR int, hex digits in addresses) — this discipline closes the loophole between storage and arithmetic. A conforming implementation has zero f64 ops on the time path.

4.1.1 Non-Rust SDKs and arbitrary-precision integers

Rust, Go, Java, C/C++, and most systems languages have native u64 / i64 types — the discipline is straightforward. Some target languages do not:

  • JavaScript / TypeScript: native Number is f64. Time arithmetic in plain JS silently loses ns precision the moment Unix-ns values are involved. Conformant JS SDKs MUST use BigInt for all time values — no exceptions.
  • Python: int is arbitrary-precision (good), but time.time() returns f64 (bad). datetime.timestamp() is also f64. Conformant Python SDKs MUST convert wall-clock to ns via time.time_ns() (Python 3.7+) or equivalent ns-resolution APIs, never via time.time() * 1e9.
  • PHP / older Lua / older shell: similar gotchas. Conformant SDKs MUST use the language's BigInt or arbitrary-precision-integer library throughout the time path.
  • WebAssembly: usually backed by Rust/C++/AssemblyScript with native i64 support — fine.

Implementations targeting browser or scripting environments SHOULD include explicit conformance tests demonstrating BigInt round-trips for the standard test vectors. A JS SDK that "works for small Tracks but loses precision at production scale" is non-conformant; it produces silently-wrong addresses.

5. CBOR Encoding (inside hashable Objects)

Time anchors appearing inside CBOR-encoded DreamDB Objects use plain CBOR integers in the canonical encoding — not the dreamdb.tag (per 0002 §3.2). The schema field name (t_start, t_end, coverage, t_min, t_max, origin, etc.) is already authoritative about the field's meaning, so the tag would be redundant.

The dreamdb.tag from 0002 §3.2 remains reserved for use in foreign or ambiguous CBOR contexts — for example, a DreamDB time anchor embedded inside a generic CBOR document outside the spec'd DreamDB schemas, where readers cannot otherwise distinguish it from a plain integer.

This decision is sized for billion-scale: the redundant tag would cost ~6 GB of additional storage on a 1B-fragment Track Object's index across all leaves. See §10.1 for the full sizing argument.

5.1 Point anchor

<unsigned 64-bit integer>                     ;; ticks since Genesis origin

5.2 Interval anchor

[ <t_start>, <t_end_exclusive> ]              ;; both unsigned i64
                                              ;; t_end > t_start
                                              ;; half-open [t_start, t_end)

5.3 "All of time" anchor (Constants)

Not encoded. A Constant Track's object_index is a single constant_address; the per-item time anchor is implicit — coverage is the entire Timeline span.

The Track-level coverage field (0001 §4) still carries [t_min, t_max) so readers know the Timeline span being asserted.

5.4 Genesis Object's origin field

{
   "origin":         <int Unix-ns> | null,    ;; absolute back-reference
   ...
}

The origin field, like all other time-typed fields in DreamDB schemas, is a plain CBOR integer. The Genesis Object does NOT carry a resolution field in v0 — all timestamps are nanoseconds (per §3).

6. String Encoding (inside addresses)

Time anchors and time buckets appearing as path segments in dreamdb:// URIs and backend object keys use a 16-character lowercase hex, big-endian, fixed-width format:

<time-anchor> := <16 hex digits>             ;; lowercase [0-9a-f]
                                              ;; big-endian unsigned 64-bit
                                              ;; e.g. t = 152,481,000,000 ns
                                              ;;     → "00000000000913f4a98"... ←  16 chars
                                              ;; e.g. t = 152,481,000,000 ns
                                              ;;     → "00000000238b97e8c0"  (correct)

(One concrete example below in §9.1 with arithmetic spelled out.)

Why 16-char fixed-width hex:

  • Prefix-orderable. Lexicographic order on the string equals numeric order on the underlying integer. List-prefix queries on time-bucket segments (per 0002 §6.3.1) work correctly without requiring backends to interpret the encoding.
  • Compact. 16 chars vs. 20 chars for fixed-width decimal (i64 max is 20 decimal digits). In an address that already runs to ~150 chars, the savings are marginal but free.
  • Universal. Hex is supported everywhere; no encoding-table edge cases.
  • Aligns with content-hash style. Although content hashes use base32 (per 0002 §8.1) for case-insensitivity in object keys, time anchors don't carry the same case-collision risk — they're already lowercase by construction. Mixing hex (for time) and base32 (for hashes) is intentional and follows the same role-based principle as base2 for spatial keys (0002 §6.3.2).

For interval anchors appearing in addresses: only the t_start is used (per the placement rule in 0002 §6.3.1 — the time-bucket of an interval is determined by its start). Address segments therefore only encode point values; intervals appear only inside CBOR-encoded Objects.

7. Time-Bucket Encoding

The <time-bucket> address segment (per 0002 §6.3) is the integer index of a time bucket, derived from the placement rule:

<time-bucket> := floor( t_start / bucket-duration-in-ticks )

Both t_start and bucket-duration-in-ticks are unsigned 64-bit integers; the division is integer floor-division (per §4.1). No f64 conversion is involved at any step. The bucket-duration is declared in the modality's parameters (e.g. transcript.turn.bucket=10s → bucket-duration = 10 × 1_000_000_000 = 10_000_000_000 ns, computed by integer multiplication).

The result is a non-negative integer encoded as 16-char lowercase hex — same format as time anchors. Reusing one format keeps the address grammar uniform.

Example: for a Fragment with t_start = 60.0 s and bucket-duration = 60 s:

t_start in ns                   = 60_000_000_000
bucket-duration in ns           = 60_000_000_000
<time-bucket>  (decimal integer) = 1
<time-bucket>  (16-char hex)     = 0000000000000001

8. Bucket Duration Encoding (Modality Parameters)

The bucket-duration carried in a modality tag (e.g. transcript.turn.bucket=10s) uses a small human-readable suffix grammar:

bucket-duration  := <positive-integer> <unit-suffix>
unit-suffix      := "ns" | "us" | "ms" | "s" | "m" | "h" | "d"
                                              ;; resolves to ns at parse time

Examples: bucket=500ms, bucket=10s, bucket=5m, bucket=1h, bucket=24h.

The unit suffixes are a lexical convenience — at protocol level the duration always becomes an integer ns count via integer multiplication against the suffix's integer constant. The constants:

SuffixMultiplier (ns)
ns1
us1_000
ms1_000_000
s1_000_000_000
m60_000_000_000
h3_600_000_000_000
d86_400_000_000_000

Writers and readers MUST normalize via integer multiplication (per §4.1); no f64 path is permitted. Implementations SHOULD use checked multiplication and reject any duration whose ns count would exceed 2⁶³ − 1.

The set of bucket-duration unit suffixes is fixed in v0; user-defined units are forbidden (the multiplicative semantics of m, h, d are constant SI seconds; calendar-relative units like "month" are deliberately excluded).

9. Worked Examples

9.1 Point anchor, full round-trip

A Discrete Event Track records a button-press at t = 152.481 s after Genesis origin.

  • Tick value (ns): 152.481 × 10⁹ = 152_481_000_000
  • In hex: 152_481_000_000 = 0x2380936A40 (10 hex digits)
  • Zero-padded to 16 chars (the address-segment form): 0000002380936a40

The CBOR encoding inside an Index Page is just the integer:

152_481_000_000                               ;; plain CBOR uint, no tag

9.2 Interval anchor in CBOR

A scene-boundary Event spans t = [60.0 s, 60.1 s):

[ 60_000_000_000, 60_100_000_000 ]            ;; plain CBOR 2-array of uints

In the address (when this Event is placed): only t_start = 60_000_000_000 becomes the time-bucket key; the t_end lives in the Track Object's object_index.

9.3 Time-bucket placement across a boundary

Per 0002 §6.3.1, an Item that spans a bucket boundary is placed by t_start. With bucket-duration = 60 s and an interval anchor [59.9 s, 60.1 s):

t_start (ns)               = 59_900_000_000
bucket-duration (ns)       = 60_000_000_000
floor(t_start / D)         = 0
<time-bucket> (16-hex)     = "0000000000000000"

Even though the Item's t_end extends into bucket 1, its placement segment is 0000000000000000. Time-range queries consult the Track Object's object_index (which records t_end = 60_100_000_000) and find this Item correctly via interval-overlap.

9.4 Genesis with absolute back-reference

A camera Genesis at start of recording, 2026-05-06T09:00:00.000000000Z:

{
   "origin":         1_762_419_600_000_000_000,    ;; Unix-ns (illustrative — exact
                                                   ;;  Unix-ns of that instant)
   "horizon":        [ 0, 600_000_000_000 ],       ;; 600 s
   "nonce":          h'a3b9f1...',
   "canonical_name": "match-2026-05-06",
}

A query for "what's at t = 152.481 s" computes the address segment 0000002380936a40 and prefix-lists. A query for "what's at wall-clock 2026-05-06T09:02:32.481Z" first reads the Genesis (1 small GET, cached), then performs the wall-clock conversion entirely in integer arithmetic (per §4.1):

wall_clock_seconds  =  1_762_419_752                           ;; from RFC-3339 parse
wall_clock_ns_part  =  481_000_000
wall_clock_ns       =  wall_clock_seconds × 1_000_000_000
                       + wall_clock_ns_part
                    =  1_762_419_752_481_000_000

origin              =  1_762_419_600_000_000_000

t_relative          =  wall_clock_ns − origin
                    =  152_481_000_000           ;; ns since Genesis

All operands are i64; no f64 step appears anywhere. The result 152_481_000_000 is then encoded to 0000002380936a40 and the query proceeds identically to the Genesis-relative case.

10. Billion-Scale Considerations

DreamDB's primary proof point is a billion-scale retrieval benchmark. Time-encoding choices at scale are not merely aesthetic — at 1B+ items, small per-anchor costs compound into GB-scale storage and bandwidth. This section gathers the time-encoding implications writers must understand to operate at that scale.

10.1 Per-anchor encoding cost

A Fragment-track Index Page leaf entry, sized concretely:

FieldPlain CBOR uintIf dreamdb.tag-wrapped
t_start~9 B~12 B
t_end~9 B~12 B
byte_size~5 B~5 B
fragment_address~37 B~37 B
Map overhead~5 B~5 B
Per entry~65 B~71 B

At 1 B fragments → ~65 GB vs. ~71 GB total index storage. The 6 GB saved by encoding time anchors as plain CBOR ints inside schema-typed DreamDB fields (per §5) is real I/O and storage cost across millions of Index Pages. This is why §5 reserves the dreamdb.tag for ambiguous foreign-CBOR contexts and uses plain ints in the canonical DreamDB schema encoding.

A future deferred optimization (OQ-20): delta encoding of time anchors within Index Page leaves. Most fragments are sequentially ordered, so t_start_delta_from_page_t_min collapses to ~3-byte CBOR varints, saving an additional ~12 GB on a 1B-fragment track. Deferred to 0007 so the Index Page byte layout can pin the scheme alongside the rest of the Object format.

10.2 Bucket-duration sizing

At 1B items, the choice of bucket-duration determines bucket-Object count on the backend. Concrete table for video at 60 fps over 10 years:

bucket=Total buckets per modalityBackend behavior
1ms315BFatal — list-prefix unusable, request fees lethal
100ms3.2BStill unworkable
1s315MBorderline; list-prefix slow
10s32MWorkable
60s5.3MComfortable
1h88KGreat for archival/cold queries
1d3.7KToo coarse for typical latency targets

Writers SHOULD target bucket counts in the 10⁴ – 10⁷ range per modality. This keeps list-prefix latencies bounded, request fees tolerable, and Index Page traversals shallow.

10.3 Bucket-duration ≠ Item duration

A common error at scale: setting bucket=2s because Fragments are 2 s long. Result: every Fragment falls in its own bucket. The bucket structure adds zero aggregation, but the address segment overhead is paid in full.

The SHOULD from 0002 §6.3.1 (max(item-duration) ≤ bucket-duration) is a floor, not a target. At billion-scale, target 10–100× that ratio so each bucket holds tens to hundreds of Fragments. Example: 2 s Fragments with bucket=120s → ~60 Fragments per bucket → meaningful aggregation, sane bucket count.

10.4 Genesis is one cached fetch — even at billion-scale

The Genesis Object is small (~100 bytes) and fetched once per Timeline per session. Caching it in the SDK eliminates Genesis as a hot-path concern, regardless of how many Items exist on the Timeline. No scaling concern; mentioned here only to head off unnecessary worry.

10.5 Integer arithmetic discipline matters more at scale

§4.1's no-f64 rule has a sharper edge at billion-scale: a 250 ns f64 quantization error compounds across 1B comparisons, producing visible drift in interval-overlap tests and bucket-placement decisions. Implementations targeting the billion-scale benchmark MUST audit their time arithmetic paths for f64 contamination — there is zero margin for "this case is small enough not to matter."

11. Comparison and Ordering Rules

  • Two point anchors compare by their integer value.
  • Two interval anchors compare lexicographically on (t_start, t_end).
  • A point at t and an interval [a, b) overlap iff a ≤ t < b. Two intervals [a, b) and [c, d) overlap iff a < d ∧ c < b.
  • An "all of time" anchor (Constants) overlaps every other anchor on the same Timeline.

These rules are protocol-level; they govern how time-range queries (0006) and bucket-placement decisions (0002 §6.3.1) interpret anchors.

12. Out of Scope for this Document

  • Wall-clock-skew handling. If a writer's system clock is incorrect, items may be tagged with anchors that don't match wall-clock reality. DreamDB does not detect or correct this. The Genesis origin field documents the writer's claim; readers may compare it against trusted sources externally.
  • Leap seconds. Unix-ns nominally includes leap-second smearing (depending on the writer's NTP setup). DreamDB inherits whatever the writer's clock provided. If a stricter time domain is needed, future spec MAY add a "tai" or "utc-no-leap" resolution mode.
  • Calendar units. Bucket-duration suffixes m, h, d are constant SI multiples of seconds — not "calendar month" or "civil day with daylight saving." Calendar arithmetic is an application concern.
  • Multi-Timeline alignment. Per 0001 §11, expressing "two Timelines describe the same physical event" requires a higher-level alignment artifact. Out of scope for v0.

13. Open Questions Surfaced by This Document

  • OQ-17 (→ 0009 §3.2): Concrete CBOR tag value for time anchors. Resolved alongside OQ-11: single tag dreamdb.tag = 65521 for foreign-CBOR contexts; NOT used inside DreamDB schemas (where time anchors are plain CBOR uints).
  • OQ-18 (→ future spec): Should v0.1 add coarser resolutions ("ms", "s")? Currently the answer is "no — ns is enough"; revisit if measurements show storage waste from large ns integers in slow-event modalities.
  • OQ-19 (→ future spec): Should "before-Genesis" (negative) anchors be supported via a bias scheme? Currently out of scope for v0; revisit if real workloads need it.
  • OQ-20 (→ 0007 §7.3): Index Page time-anchor delta encoding. Resolved: per-leaf-entry deltas (t_start - t_min, duration); full u64 anchors at page header (t_min, t_max).
  • OQ-21 (→ 0007 §7.4): Byte-size delta encoding. Resolved as "no in v0": byte_size stored as-is. Encoding complexity not justified relative to time-anchor delta savings; v0.1 may revisit.

Next: 0004-spatial-indexing.md — the hardest doc. Defines the spatial-key derivation algorithm (LSH? PQ? Learned hashing?) that turns a vector into an N-bit string for the spatial-key segment, with quantified locality guarantees that make the 1B-scale benchmark feasible.