DreamDBv0.2.0bec026

Spec 0020 — Tombstones (Deletion Primitive)

Status: Draft (Phase 5). Depends on: spec/0001, spec/0002, spec/0006, spec/0008. Motivation: DreamDB is structurally append-only. The protocol is content-addressed and immutable, which is exactly why it scales — but it also means that "delete this record" cannot mean "rewrite history." For GDPR Right-to-Erasure, individual record retraction, and any other workflow that must hide prior writes without rewriting them, the protocol provides a tombstone primitive: an immutable Object that names anchors which subsequent reads MUST skip. Compaction of the underlying bytes is a separate operator-driven step (see §6).


1. Purpose

DreamDB's append-only data plane means every record ever written is, by construction, recoverable from any Manifest that referenced it. This is correct from a content-addressing perspective (the bytes are still in S3) and load-bearing for time-travel queries. It is also, on its own, a compliance hazard.

This spec defines:

  • The TombstoneListObject — an immutable Object that lists anchors to suppress on subsequent reads.
  • The Manifest registry entry dreamdb.tombstones — opt-in, pointing to the latest TombstoneListObject for a Space.
  • Read-side semantics: how iter and query verbs MUST consult the tombstone set.
  • Chain-aware lineage: how a deletion extends prior tombstones without rewriting them, via parents: Vec<Multihash>.
  • Compaction: how the underlying bucket bytes are eventually reclaimed.

What this spec does NOT define:

  • Authentication / authorization. Who is allowed to publish a tombstone is operator-policy.
  • Per-record (sub-anchor) deletion. Tombstones target Item-level anchors; a single anchor's records across all modalities are suppressed together. Field-level redaction is out of scope.
  • Retroactive deletion of manifests. Old Manifests that referenced the now-tombstoned records still exist and remain time-travel-queryable; the tombstone filter applies on the current path. Operators wanting full retroactive erasure must additionally GC old Manifests beyond keep_manifests and re-publish.

2. Threat model

Tombstones are suppress on the read path, not "secure erase." A determined attacker with backend credentials can still find the underlying bytes by walking the bucket directly. For environments requiring cryptographic deletion, use spec/0019 encryption with per-record keys and shred the keys at delete time. Tombstones complement, not replace, that mechanism.

3. TombstoneListObject

3.1 Structure

TombstoneListObject := {
    "kind":       "dreamdb.tombstone-list.v1",   // string, exact
    "anchors":    [ TombstoneEntry, ... ],      // array; sorted ascending by anchor value
    "parents":    [ Multihash, ... ],           // array of prior TombstoneListObject hashes
    "issued_at":  u64,                          // ms since epoch; MAY be 0 (unsigned)
}

TombstoneEntry := {
    "anchor":     u64,                          // Item-level TimeAnchor (same key as SpatialBucket records)
    "deleted_at": u64,                          // ms since epoch
    "reason":     ?Text,                        // optional short tag (operator-defined; e.g. "gdpr", "test", "abuse")
}

The anchor field is the 64-bit TimeAnchor of the Item to suppress. This is the same time_anchor used in SpatialBucket records and across modality joins. A single anchor suppresses every record bound to it across every Track / modality — anchor-level (not field-level) is the right granularity for GDPR-style deletion.

The CBOR encoding is canonical per spec/0002 §4.1 (sorted map keys, definite-length, smallest encoding). The Object's address is the BLAKE3-256 multihash of its canonical CBOR bytes.

Path: tombstones/<base32-multihash> (new top-level prefix; see spec/0002 §7.5 path table).

3.2 Anchor sort invariant

anchors MUST be sorted ascending by raw u64 anchor value. This makes set-membership checks O(log N) via binary search and gives content-addressed canonicalization (the same set of anchors → the same bytes → the same hash).

Decoders MUST reject TombstoneListObjects with unsorted or duplicate anchors as malformed.

3.3 Parents

parents references prior TombstoneListObjects whose anchors are also suppressed. The effective tombstone set is the transitive union of anchors over all ancestors.

The list MAY have multiple parents to support multi-writer merges (cf. SpatialIndex multi-parent merges from spec/0008). Cycles are forbidden: parents MUST form a DAG.

A reader resolving the effective set walks parents breadth-first to a configurable depth limit (RECOMMENDED 100). If the limit is reached before the chain terminates, the reader MUST abort with a clear error — partial tombstone application is unsafe.

The list MAY be empty (initial deletion in a Space).

3.4 Issued-at

The issued_at field is operator metadata only. It is NEVER consulted for correctness. Readers MAY use it to surface "tombstone N records since T" observability. Two TombstoneListObjects with identical anchors and parents but different issued_at are distinct Objects (different content → different hash).

Setting issued_at: 0 is permitted and discouraged outside of test fixtures.

4. Manifest registry entry

A Space opts in to tombstones by adding a dreamdb.tombstones entry to its Manifest's registry map:

manifest.registry["dreamdb.tombstones"] = { "head": <multihash-of-latest-TombstoneListObject> }

The entry value is a CBOR map with one required key:

  • head: 33-byte Multihash of the latest TombstoneListObject for this Space.

Manifests without dreamdb.tombstones have no tombstones (the set is empty). This is the default; adding tombstones is purely additive.

The entry is per-Space, not per-modality. Tombstones target Item anchors; a single tombstone suppresses every record bound to that anchor across every modality. This matches the GDPR semantics: erasing a person erases their data, not a specific column.

5. Read-side semantics

Every read verb — iter, iter_stream, query, time-range, nearest — MUST consult the effective tombstone set before yielding records.

5.1 Resolution

On first read against a Manifest:

  1. Look up registry["dreamdb.tombstones"].head. If absent: effective set is empty; proceed.
  2. Fetch the named TombstoneListObject.
  3. Walk parents breadth-first to a depth limit (RECOMMENDED 100).
  4. Accumulate the union of anchors arrays into an in-memory BTreeSet<Multihash>.
  5. Cache the set keyed by the head multihash (it is content-addressed, hence safe to memoize indefinitely for that head).

5.2 Filtering

For each candidate record (anchor, record_bytes):

if tombstone_set.contains(record.anchor) {
    skip;
} else {
    yield (anchor, record_bytes);
}

Implementations MAY apply the filter at the bucket-decode boundary (records-per-bucket already requires per-record work; the additional cost is one BTreeSet hit per record).

5.3 Counting

Verbs that report counts (e.g., Dataset::count_records) MUST exclude tombstoned records from the count. The pre-tombstone count is recoverable by reading the Track Object's object_index total and subtracting tombstone_set.len() (bounded above; exact only if every anchor exists in the dataset).

6. Compaction

A tombstone is a hint to the read path; the underlying record bytes still occupy storage. Operator-driven compaction reclaims the bytes.

6.1 Compaction trigger

When the ratio of tombstoned records to total records in a bucket exceeds a threshold (e.g., 10%), the bucket is a compaction candidate.

6.2 Compaction process

A future dreamdb-cli compact-tombstones --bucket <addr> verb (deferred to spec/0020.1):

  1. Fetches the bucket.
  2. Decodes records.
  3. Filters out tombstoned anchors.
  4. If <50% of records remain: marks the bucket for deletion and re-distributes survivors to adjacent buckets via the spec/0004 dispatch path.
  5. Otherwise: writes a new compact bucket without the tombstoned records.
  6. Publishes a new SpatialIndex with parents: [old_si_hash].
  7. Optionally: emits a new TombstoneListObject with parents: [old_head] and empty anchors, signaling "compaction caught up to head N." This is operator-policy.

Once a bucket's tombstoned records have been compacted out AND no Manifest retained by keep_manifests references the old bucket, dreamdb-cli gc reclaims the old bucket's bytes.

6.3 Tombstones survive compaction

Compaction does NOT remove entries from the TombstoneListObject chain. Even after the underlying records are reclaimed, the tombstone entries remain — necessary for time-travel correctness against older Manifests that still reference the original buckets.

A bounded-history variant (truncating very old tombstone entries after, say, keep_manifests × bucket_compaction_interval) is deferred to a future revision.

7. SDK API (informative)

Reference implementations SHOULD expose:

rust
impl Dataset {
    /// Tombstone the named anchors. Single round-trip.
    /// Returns the new Manifest hash.
    pub async fn delete(&mut self, anchors: &[u64], reason: Option<&str>)
        -> Result<Multihash, Error>;

    /// Resolve the effective tombstone set at this Dataset's current Manifest.
    pub async fn tombstone_set(&self) -> Result<BTreeSet<u64>, Error>;
}

Read verbs MUST consult tombstone_set automatically; callers do NOT pass it through.

8. Conformance

A conforming implementation:

  1. Decodes/encodes TombstoneListObject per §3 with full canonical-CBOR sort invariants.
  2. Walks parents to a depth limit, accumulates the set, caches by head.
  3. Filters all read verbs by the effective set.
  4. Returns deterministic results across writers running on different machines.
  5. (Optional) Supports dreamdb-cli compact-tombstones; absence is permitted in v0 implementations.

A conforming reader that encounters a dreamdb.tombstones registry entry but does NOT know how to walk it MUST refuse to serve reads (fail-closed): silently ignoring the entry would surface deleted records to callers, violating the contract.

9. Versioning

This spec is v0. Future revisions may:

  • Define paged TombstoneListObjects (B-tree pages for >100K anchors).
  • Add sub-anchor field-level tombstones.
  • Define merge semantics for multi-writer tombstones (analogous to spec/0008 SpatialIndex merge rules).
  • Add compatible_with_manifest_versions to constrain when a tombstone applies (for time-travel-aware deletion).

All extensions MUST preserve §3's canonical CBOR and content-addressing.