DreamDBv0.2.0bec026

DreamDB Specification — 0005: Backend Interface (HTTP Contract)

Status: Draft. Builds on 0000-overview.md, 0001-data-model.md, 0002-content-addressing.md, 0003-time-encoding.md, and 0004-spatial-indexing.md. This document fixes the HTTP semantic contract between the Storage Connector (in-SDK) and the Cloud Object Store (the backend). It resolves OQ-3 (consistency), OQ-22 (local-FS Connector), and OQ-29 (HTTP version requirement). It does not define a language-level interface — the contract is HTTP itself.


1. Purpose

Per 0000 §3, DreamDB's architecture has a clean HTTP boundary between the Storage Connector and the Cloud Object Store. This document specifies what that boundary requires:

  • The HTTP verbs the Connector uses and the backend MUST handle correctly.
  • The conditional headers that distinguish idempotent puts (content layer) from CAS updates (refs layer).
  • The status codes the backend MUST return and the Connector MUST interpret.
  • The consistency model the protocol assumes — strong read-after-write for content-addressed objects, linearizable CAS for refs, eventual consistency tolerated for prefix listings.
  • The HTTP version required to make the billion-scale hot path achievable.

What this document does not define:

  • A language-level trait or interface. The contract is HTTP semantics, not Rust/Go/Python types. (Per project_architecture.md.)
  • Backend-specific authentication. Connectors translate to whatever the backend uses (AWS SigV4, GCS HMAC, OAuth, presigned URLs).
  • Specific TLS configurations. The backend's HTTPS endpoint is whatever the operator runs; the Connector trusts the system's certificate authorities.

DreamDB defines neither the Application above the SDK nor the Cloud Object Store below the HTTP boundary. This document defines the HTTP boundary itself.

1.1 Auth is a v0 deployment gap (advisory)

v0 leans entirely on the backend's native authentication (AWS SigV4, GCS HMAC, OAuth, presigned URLs). This works for single-tenant deployments where one organization owns the backend. It does not work for multi-tenant production scenarios where you want:

  • Per-Space write isolation — writers for Space X cannot write to Space Y's paths even if they share a bucket.
  • Per-modality access control — sensitive modalities (PII, regulated data) gated separately from public modalities.
  • Per-writer audit logging — backend access logs name the auth identity, but DreamDB doesn't tie that to writer-identity in Manifests.
  • Capability tokens — short-lived, scoped tokens for browser-based clients or ephemeral workers.

Cloud object stores authenticate at the bucket level, not at path-prefix granularity (S3 IAM policies can simulate prefix-scoped writes via condition keys, but the rules are fragile and hard to audit). A writer with bucket-level write access can write to any DreamDB path on that bucket.

v0 production deployments MUST address this out-of-band, typically by:

  • One bucket per Space (the simplest isolation; works at small Space counts).
  • A backend gateway that mediates writes (per-tenant auth at the gateway, then forwards to backend with privileged credentials).
  • IAM policy with prefix-condition rules (works on AWS S3 with care; brittle).

v0.1+ MAY define a "DreamDB Auth Layer" spec that addresses these concerns at the protocol level — likely involving capability tokens embedded in Manifests, signed writer identities, and per-Space access policies. Until then, treat auth as the operator's responsibility and document the failure mode prominently in deployment guides.

2. Two Conformance Tiers

A backend may implement one or both of:

TierRequired for…Mandatory?
ContentStore-ConformantHash-addressed Spaces (the data plane)Yes
RefStore-ConformantNamed, advanceable manifest pointersNo

Every DreamDB backend MUST be at least ContentStore-Conformant. Every ref-supporting backend MUST also be RefStore-Conformant. A backend that supports only the ContentStore tier hosts hash-addressed Spaces only — clients address Manifests by their hash, with manifest distribution handled out-of-band (e.g. shared via another channel).

This split is a direct consequence of 0000 §5.2 (Content layer immutable, Refs layer optionally CAS-mutable). A backend without conditional-write primitives can host the data plane but not the refs plane.

3. ContentStore Tier (Required)

The ContentStore tier provides idempotent put-by-hash, range reads, and prefix listings over a flat path namespace.

3.1 URL path namespace

The Connector issues HTTP requests against URLs of the form:

<base-url> / <dreamdb-path>

where <base-url> is backend-specific (e.g. https://my-bucket.s3.amazonaws.com/) and <dreamdb-path> is the DreamDB address-grammar path defined in 0002:

<timeline-id> / <modality-tag> / <spatiotemporal-key> / <content-hash>
<timeline-id> / <modality-tag> / track / <content-hash>
<timeline-id> / <modality-tag> / index / <content-hash>
manifests / <content-hash>
manifests / index / <content-hash>
genesis / <content-hash>
spatial-index / <content-hash>
refs / <ref-name>

Connectors MUST pass the <dreamdb-path> to the backend exactly as constructed by the DreamDB Protocol. No path rewriting, prefix injection, or escaping is permitted (paths are already URL-safe per 0002 §8 — base32 hashes, base2 spatial keys, hex time anchors).

3.2 PUT — idempotent put-by-hash

To write a content-addressed Object:

http
PUT /<dreamdb-path> HTTP/2
Host: <backend>
Content-Type: application/octet-stream
Content-Length: <N>

<N bytes of CBOR or raw payload>

Required backend semantics:

  • If the path does not exist, the bytes are stored. Backend returns 200 OK or 201 Created.
  • If the path already exists with the same bytes (which the content-addressing guarantees, since the path is the hash), the backend MAY: (a) return 200 OK treating it as a no-op; or (b) return 412 Precondition Failed if the Connector sent If-None-Match: *. The Connector treats both as success.
  • If the path already exists with different bytes, the situation is impossible by construction (BLAKE3 collision, ~2⁻¹²⁸). Backends that nonetheless detect this MUST treat it as a server error (5xx).

The Connector MAY use If-None-Match: * to make the put strictly create-only and avoid the implicit-no-op case:

http
PUT /<dreamdb-path> HTTP/2
If-None-Match: *

412 Precondition Failed returned by If-None-Match: * indicates the object already exists — equivalent to a successful no-op for DreamDB's purposes (the content-addressed bytes are already there). Connectors MUST treat 412 in this case as success, not as failure.

3.3 GET — full-object or range read

To read a content-addressed Object in full:

http
GET /<dreamdb-path> HTTP/2
Host: <backend>

Returns 200 OK with the bytes in the body, or 404 Not Found if the object does not exist (e.g. referenced by a manifest but not yet propagated, or never written).

For a partial read (the hot path for Fragments and Buckets per 0004 §7.3):

http
GET /<dreamdb-path> HTTP/2
Host: <backend>
Range: bytes=<start>-<end>

Returns 206 Partial Content with the requested byte range, or 416 Range Not Satisfiable if the range is invalid.

The <start>-<end> form follows HTTP Range semantics — <end> is inclusive (HTTP convention), differing from DreamDB's internal [start, end) half-open intervals (per 0002 §6.5). Connectors MUST translate: DreamDB bytes:<a>-<b> (half-open) → HTTP Range: bytes=<a>-<b−1> (inclusive).

Backend MUST support arbitrary byte ranges — DreamDB makes no assumption about chunk-aligned reads. This is true on every modern object store (S3, GCS, Azure Blob, OSS, MinIO).

3.4 HEAD — existence check & metadata

http
HEAD /<dreamdb-path> HTTP/2
Host: <backend>

Returns 200 OK with Content-Length (and optionally ETag, Last-Modified) if the object exists, or 404 Not Found. Connectors use HEAD for: existence checks before triggering a re-write, size discovery before deciding on range vs. full-GET, and cheap cache validation.

3.5 LIST — prefix-scoped listing

The list operation is the cold-start discovery primitive (per 0004 §7.3 and 0002 §6.3.1) and the admin-scan primitive (operator-level audits, GC walks). It is never on the steady-state query hot path — see §5.3.1 for the Manifest Supremacy doctrine that governs this.

Backends MUST support listing objects whose paths start with a given prefix. The exact protocol verb is backend-specific:

  • S3 / S3-compatible (MinIO, R2, B2): GET /?list-type=2&prefix=<prefix>&continuation-token=<token>
  • GCS: GET /storage/v1/b/<bucket>/o?prefix=<prefix>&pageToken=<token>
  • Azure Blob: GET /<container>?restype=container&comp=list&prefix=<prefix>&marker=<token>
  • Aliyun OSS: GET /?prefix=<prefix>&marker=<token> (S3-style)

Connectors MUST translate the protocol's logical "list by prefix" into the backend's specific list call, including pagination via continuation tokens. The backend MUST return:

  • A list of object keys (or full paths, depending on backend convention) matching the prefix.
  • For each entry: at minimum the key/path. SHOULD also return Content-Length and ETag to enable cache-friendly behavior.
  • A continuation token if the result is paginated.

3.5.1 Sort order

Backends MUST sort results in lexicographic byte order over UTF-8 (equivalent to memcmp over the raw byte sequences). All DreamDB addresses are pure ASCII (base32 hashes, base2 spatial keys, hex time anchors, lowercase modality-tag literals — see 0002 §5, §6, §8), so this is unambiguous. Connectors MUST NOT apply any locale-specific collation, Unicode normalization, or case folding to listed keys. Any backend that delivers non-byte-ordered listings is non-conformant; in practice every backend in §10's matrix conforms.

3.5.2 Pagination correctness

The 1000-object default page boundary on S3 (and most S3-compatible backends) is the most common silent-failure surface for cold-start LIST. Connectors MUST follow a strict discipline:

  • Iterate to exhaustion. A Connector that returns "the first page" of a list-prefix call to the Protocol layer is buggy. The Connector MUST iterate continuation tokens until the backend signals end-of-list (IsTruncated=false for S3, absent nextPageToken for GCS, empty NextMarker for Azure, etc.) before returning to the Protocol component.

  • Pagination is per-request, not per-session. Continuation tokens are bound to the request that generated them. Connectors MUST NOT cache continuation tokens across separate list-prefix invocations.

  • Backend-specific exhaustion signals. Each backend uses a different "no more pages" signal. Connectors MUST translate them all to a uniform "exhausted" return to the Protocol layer:

    BackendTruncation signal
    S3 / MinIO<IsTruncated>true</IsTruncated> + <NextContinuationToken>
    GCSnextPageToken field present in JSON response
    Azure Blob<NextMarker>...</NextMarker> non-empty
    Aliyun OSS<IsTruncated>true</IsTruncated> + <NextMarker>
  • Cross-page boundary cases. The 1000th object boundary is a known correctness hazard. Connectors MUST handle the case where a single logical list-prefix result spans K * 1000 + r entries for any K, r ≥ 0, including r = 0 (exact multiple of 1000) and r = 1 (one entry on a fresh page). Conformance test vectors covering these cases are required (OQ-30).

  • Verify against Manifest where possible. When the SDK has a Manifest available, the DreamDB Protocol layer SHOULD verify list-prefix results against the Manifest's known bucket-list as a cheap sanity check — silently-truncated listings surface as missing entries, easy to detect and re-list.

Page size: backends typically return up to 1000 entries per page (S3 default; GCS configurable up to 1000; Azure up to 5000; OSS up to 1000). Connectors MAY pass the backend's max page size to reduce round trips but MUST NOT rely on a specific page size for correctness — backends sometimes return fewer entries per page than requested under load.

3.6 DELETE — optional, garbage collection only

http
DELETE /<dreamdb-path> HTTP/2
Host: <backend>

DELETE is optional at the DreamDB protocol level. The protocol has no delete verb (per 0000 §5.2 — corrections are layered, never destructive). DELETE exists only for garbage collection: an operator may choose to drop old, orphaned content-addressed objects that no live manifest references.

Returns 200 OK or 204 No Content on success, 404 Not Found if absent (treat as success), 403 Forbidden if the operator's policy prohibits.

A backend that disables DELETE entirely is still ContentStore-Conformant — DreamDB's content layer is append-only by design. GC is an operator concern, not a protocol concern.

4. RefStore Tier (Optional)

The RefStore tier adds conditional-write semantics for a small mutable namespace at refs/<ref-name>. Per 0000 §5.2, this is the only mutable thing in DreamDB.

4.1 Initial create — If-None-Match: *

To create a ref that does not yet exist:

http
PUT /refs/<ref-name> HTTP/2
Host: <backend>
If-None-Match: *
Content-Type: application/octet-stream
Content-Length: 33

<33-byte multihash of the manifest>

Returns:

  • 200 OK or 201 Created — ref now points to the manifest.
  • 412 Precondition Failed — ref already exists; Connector treats this as a CAS conflict and retries with §4.2 semantics.

4.2 Update — If-Match: <etag>

To advance an existing ref to a new manifest:

http
GET /refs/<ref-name> HTTP/2
Host: <backend>

Returns the current value plus the backend's strong validator (commonly ETag, sometimes x-goog-generation for GCS). Then:

http
PUT /refs/<ref-name> HTTP/2
Host: <backend>
If-Match: <etag-from-prior-GET>
Content-Type: application/octet-stream
Content-Length: 33

<33-byte multihash of the new manifest>

Returns:

  • 200 OK or 204 No Content — ref advanced atomically.
  • 412 Precondition Failed — another writer advanced the ref between the GET and PUT. Connector retries with the new state.

This is optimistic concurrency control, not cooperative locking. No writer ever blocks; conflicts are detected and retried.

4.3 Resolve — GET /refs/<ref-name>

http
GET /refs/<ref-name> HTTP/2
Host: <backend>

Returns 200 OK with the 33-byte multihash body, or 404 Not Found. SHOULD include ETag so the Connector can use it for a subsequent CAS update without an extra HEAD.

4.4 List refs — optional

http
GET /refs/?prefix=<ref-name-prefix>

Same semantics as §3.5. Useful for "list all branches" or "list all release tags." Optional even within RefStore-Conformant backends.

4.5 Backend-specific conditional-write headers

The HTTP semantics described above use the IETF-standard If-Match / If-None-Match headers. Some backends use proprietary equivalents:

Backend"Create only if absent""Update only if matching"
S3 (post-2024)If-None-Match: *If-Match: <etag>
MinIOIf-None-Match: *If-Match: <etag>
GCSx-goog-if-generation-match: 0x-goog-if-generation-match: <gen>
Azure BlobIf-None-Match: *If-Match: <etag>
Aliyun OSSx-oss-forbid-overwrite: true (create-only)If-Match: <etag>

Connectors MUST translate the standard headers above into whichever flavor the configured backend understands. The DreamDB Protocol component above the Connector emits the standard form.

5. Consistency Requirements (resolves OQ-3)

5.1 Content-addressed objects: strong read-after-write

After a successful PUT of a content-addressed object (Genesis, Manifest, Track, Index Page, Bucket Object, Fragment, etc.), any subsequent GET of the same path MUST return the bytes that were PUT. The backend MUST guarantee this strong read-after-write consistency for content-addressed Objects.

This is the modern S3 default (Amazon S3 went strongly consistent globally in December 2020), the GCS default, the Azure Blob default, the Aliyun OSS default, and the MinIO default. A backend that fails this guarantee is non-conformant.

5.2 Refs: linearizable CAS

A successful conditional PUT with If-Match: <etag> MUST be linearizable — no two writers can both succeed against the same ref with the same If-Match value. The backend's conditional-write primitive MUST satisfy this; all backends listed in §4.5 do.

5.3 Prefix listings: eventual consistency tolerated

LIST /<prefix> MAY lag recent writes. A newly-PUT object MAY not appear in a subsequent list-prefix call for some short window (typically milliseconds, occasionally up to seconds on geo-replicated backends; up to ~1 minute observed on multi-region S3 buckets under transient replication lag).

The classic failure mode: a Pipeline PUTs a Bucket Object → 200 OK → notifies the SDK → SDK does list-prefix → empty result set. The Bucket exists and is fetchable by direct GET against its full path, but the prefix listing has not yet propagated.

DreamDB tolerates this only because the protocol design renders list-prefix non-essential for steady-state operation. See §5.3.1.

5.3.1 Manifest Supremacy (doctrine)

When a Manifest is available, the SDK MUST resolve Object addresses through the Manifest — which is content-addressed and strongly consistent (§5.1) — never through list-prefix, which is eventually consistent.

This is the Manifest Supremacy doctrine. It is not a recommendation; it is a correctness requirement for any conformant SDK.

The doctrine has three concrete consequences:

  1. The query hot path uses the cached object_index (from the Manifest), not list-prefix. Per 0004 §7.3 and 0002 §7.3.2, every Track Object's object_index enumerates the addresses of its constituent Bucket / Fragment / Index-Page Objects. Once the SDK has the Manifest, every subsequent fetch is a direct GET against a known, content-addressed path — strongly consistent.
  2. list-prefix is reserved for two cases, and only those:
    • Cold-start bootstrap — an SDK with no Manifest at all needs list-prefix to discover what's on the backend.
    • Administrative scans — operator audits (orphan-object detection, GC walks, replication checks) genuinely need to enumerate the backend's contents. These are out-of-band of the query path.
  3. Any code path that uses list-prefix during steady-state query operation is a bug. Reviewers must reject SDK implementations that re-list to discover newly-written Bucket Objects. New writes are discovered by reading the new Manifest (which references them by hash) and then direct-GETting via the resolved addresses.

Why this works: the writer's PUT-then-publish-Manifest sequence guarantees that if a reader has the new Manifest, then every Object the Manifest references is available to direct GET (§5.1). list-prefix lag is irrelevant — the reader never depends on it for visibility of recent writes.

Standard write sequence that producers must follow to make this work:

  1. PUT all leaf Objects (Buckets, Fragments, Index Pages, etc.).
  2. PUT all parent Objects (Track Objects, Manifest Index Pages).
  3. PUT the Manifest last.
  4. (Ref-conformant backends) advance the ref to the new Manifest hash via If-Match (§4.2).

A reader that resolves the new Manifest is guaranteed to find its dependencies live (§5.1). A reader on the previous Manifest is unaffected. Manifest Supremacy makes the eventual-consistency window of list-prefix invisible to steady-state queries.

A Connector MUST NOT depend on list-prefix returning the most-recent-write. The DreamDB Protocol layer MUST NOT issue list-prefix calls during steady-state operation. Together, these constraints render the eventual-consistency window irrelevant — by construction, not by lucky timing.

5.4 What the protocol does NOT require

  • Cross-backend consistency. DreamDB makes no claims about replication between backends. The same Manifest hash on backends A and B is the same Manifest if both backends contain the closure of objects reachable from it.
  • Read-your-writes for list operations. See §5.3.
  • Atomicity across multiple PUTs. Each PUT is independently atomic; multi-object atomicity (e.g. "publish a Manifest and three new Track Objects together") is the writer's concern. The standard pattern is: PUT all leaf Objects → PUT index pages → PUT Track Objects → PUT Manifest last. A reader resolving the new Manifest finds its dependencies live; a reader on the old Manifest is unaffected.

6. HTTP Version Requirement (resolves OQ-29)

The Storage Connector MUST support HTTP/2 for the connection to the backend. HTTP/3 is permitted as an upgrade.

HTTP/1.1 is permitted only as a fallback when the backend does not advertise HTTP/2 support, AND the Connector logs a warning that performance will be degraded.

6.1 Why HTTP/2 is required

The hot path of a billion-scale query (per 0004 §7.3) issues ~16 parallel ranged GETs to fetch matching Bucket Objects. Under HTTP/1.1:

  • Each parallel GET requires its own TCP connection and TLS handshake (~30–80 ms each on cold connections).
  • Practical client connection-pool limits (typically 6–20 per host) serialize requests.
  • The hot path balloons from ~100 ms to ~500–1000 ms.

Under HTTP/2:

  • All 16 GETs share one TCP+TLS connection.
  • Streams are multiplexed at the protocol level — no head-of-line blocking at the TCP layer (HTTP/3 fixes this further by moving multiplexing to UDP/QUIC).
  • Hot path stays at ~50–100 ms.

6.2 Backend support

All major cloud object stores have supported HTTP/2 since 2018 or earlier:

BackendHTTP/2 since
AWS S32017
GCS2018
Azure Blob2019
Aliyun OSS2019
MinIO2018 (GA)
Cloudflare R22022 (launch)

HTTP/3 (QUIC) is increasingly available; Connectors MAY prefer it when both peers support it. CDN-fronted deployments routinely terminate HTTP/3 at the edge.

6.3 Connector behavior

The Connector:

  • MUST initiate HTTP/2 via ALPN during the TLS handshake.
  • MUST share a single connection per backend host across concurrent DreamDB queries (i.e., don't open one connection per request — defeats multiplexing).
  • SHOULD use HTTP/3 if available.
  • MUST use HTTP/2's flow control to avoid head-of-line blocking (TCP head-of-line is not eliminated in HTTP/2 — HTTP/3 does that — but HTTP/2 still removes the application-layer blocking).

7. Status Code Reference

A summary of status codes the Connector MUST handle:

CodeMeaning at the DreamDB layer
200Success (full GET, HEAD, ref GET, idempotent re-PUT)
201Created (PUT of new content-addressed Object or new ref)
204No Content (DELETE success, ref update without echo)
206Partial Content (range GET success)
304Not Modified (conditional GET; rare in DreamDB but Connector MUST handle)
400Bad Request — malformed Connector code or path; treat as bug
403Forbidden — auth failure or policy denial; Connector surfaces to caller
404Not Found — Object absent (legitimate for cold reads, list-prefix gaps, garbage-collected)
412Precondition Failed — If-Match / If-None-Match check rejected. Treat as success for ContentStore PUT (idempotent re-write); treat as CAS conflict for RefStore PUT (retry).
416Range Not Satisfiable — Connector emitted invalid range; bug
429Too Many Requests — backend rate-limit; Connector backs off and retries
500Server error — Connector retries with backoff
503Service Unavailable — Connector retries
5xxOther server errors — retry with exponential backoff up to a budget

8. Connector Responsibilities

The Storage Connector is intentionally minimal — its only job is HTTP transport. But the v0 spec assigns it a small set of well-defined responsibilities:

8.1 Connection pooling and multiplexing

Maintain a single HTTP/2 (or HTTP/3) connection per (backend host, auth identity) pair. All concurrent DreamDB requests against the same host go through that connection's multiplexed streams. A connection-per-request implementation defeats §6.

8.2 Retry & timeout policy

  • 5xx and 429: exponential backoff, max ~3 retries by default. Configurable.
  • 412 on ContentStore PUT: treat as success (the bytes are there).
  • 412 on RefStore PUT: treat as CAS conflict, return to caller for retry with new state.
  • 404: surface to caller; do not retry.
  • Network errors (timeout, reset): retry with backoff.
  • Default request timeout: 30 s for GETs, 5 min for large PUTs (Manifest, Bucket, Fragment uploads).

8.3 Authentication translation

The Connector receives DreamDB-relative paths from the Protocol component above; it adds the backend's auth (signing, presigned URLs, OAuth tokens, etc.) before sending the request. The DreamDB Protocol component is auth-unaware.

8.4 No business logic

The Connector MUST NOT:

  • Parse DreamDB Object bytes (Manifests, Track Objects, Index Pages). These are opaque to the Connector.
  • Make decisions about which paths to fetch or in what order. That's the Protocol component's job.
  • Cache DreamDB Objects beyond what's needed to fulfill in-flight HTTP responses. Application-level caching belongs in the Protocol component.

The Connector is purely transport. Anything beyond transport is a bug at the Connector layer.

8.5 ETag handling and normalization

ETags are the most variably-formatted field in the HTTP contract — different backends quote them differently, occasionally use weak validators, and use semantically-different inner values (some are hashes, some are opaque generation tokens, some are length-of-bytes-of-hash-of-parts). Connectors are responsible for normalizing ETags so the DreamDB Protocol layer above sees a uniform representation, while preserving the exact bytes needed for round-trip CAS (If-Match) requests.

Two rules govern Connector ETag handling:

8.5.1 Opaque round-trip for If-Match

For ref-update CAS (§4.2), the Connector receives an ETag header on a GET /refs/<name> response and uses it on a subsequent PUT /refs/<name> with If-Match. The Connector MUST round-trip the ETag bytes exactly as received — same quoting, same weak/strong prefix, same content. The Connector MUST NOT interpret, hash-compare, or modify the value. This guarantees the backend's CAS check works correctly regardless of backend-specific ETag semantics.

GET /refs/main                       → ETag: "abc123def456"
PUT /refs/main  If-Match: "abc123def456"   ← exact bytes from GET response

8.5.2 Canonical form for Protocol-layer comparison

When the DreamDB Protocol layer needs to compare or cache ETags across requests (e.g., to detect that a ref has changed since the last GET), the Connector MUST present ETags in a canonical form with these normalizations applied:

  • Strip surrounding double-quotes. "abc123"abc123.
  • Strip leading W/ (weak validator prefix). W/"abc123"abc123.
  • No whitespace inside or outside the value. Trim if present (defensive; should not occur on conformant backends).
  • No Unicode normalization. ETags are ASCII per HTTP RFC 7232; any non-ASCII byte is a backend bug.

When the Protocol layer hands a canonical ETag back to the Connector for a CAS request, the Connector reapplies the appropriate quoting for the target backend (typically "<value>" for strong ETags). This is symmetric: opaque on the wire, canonical at the Protocol-layer boundary.

8.5.3 What ETags are NOT

  • Not content hashes. S3 returns hex-MD5 for single-part objects, hex-of-md5-of-parts plus -N suffix for multipart. GCS returns opaque generation tokens. Azure returns opaque tokens. DreamDB Connectors MUST NOT assume an ETag equals BLAKE3-of-bytes — even though the path is the BLAKE3 hash, the ETag is a separate backend-specific identifier.
  • Not stable across regions. Some replication setups change ETags during cross-region copy. Don't compare ETags across backends.
  • Not a substitute for * in If-None-Match. If-None-Match: "<some-etag>" checks "object's current ETag is not this value" — different from If-None-Match: * which checks "object does not exist." DreamDB uses * for create-only PUT (§3.2, §4.1); never use a specific ETag in If-None-Match.

Conformance test vectors (0009) MUST include round-trip cases across backends with different ETag conventions: quoted vs. unquoted, weak vs. strong, multipart-style with -N suffix.

9. Local-Filesystem Connector (resolves OQ-22)

v0 is HTTP-only. A file://-style direct-FS Connector is not in v0.

Reasoning:

  • One Connector contract, testable end-to-end with curl. Adds zero variance to the conformance story.
  • Local development uses MinIO (or any embedded-mode HTTP object store: localstack, fake-gcs-server, Azurite) running on localhost. MinIO starts in <1 s, runs in ~30 MB of memory, supports the full S3 API. No file:// Connector saves any meaningful developer ergonomics.
  • A file:// Connector that "implements the same semantics" inevitably drifts from HTTP semantics in subtle ways — handling of conditional headers, list pagination, range edge cases. Catching those drifts requires duplicating the conformance test suite. Not worth the maintenance burden for a dev convenience.
  • A future v0.1 spec MAY add a file:// Connector; nothing in v0 forecloses it. v0 just ships HTTP-only.

For local CI, the recommended setup is:

bash
docker run -p 9000:9000 minio/minio server /data

The Connector is configured with http://localhost:9000 as the backend; everything else is identical to production.

10. Backend Compatibility Matrix

OperationS3 / R2 / B2GCSAzure BlobAliyun OSSMinIONotes
PUT (idempotent put-by-hash)Universal
PUT with If-None-Match: *✓ (2024+)✓ (proprietary)S3 added in 2024; Connector handles flavor
PUT with If-Match: <etag>✓ (2024+)Same
GET (full-object)Universal
GET with Range: bytes=...Universal
HEADUniversal
LIST with prefixAPI surface differs; Connector translates
DELETEUniversal; optional at protocol level
HTTP/2All since 2017–2019
HTTP/3✓ (CloudFront)△ (preview)✓ (2024)Optional but preferred
Strong read-after-write (content)✓ (2020+)All major backends post-2020

= partial / region-dependent / preview availability.

The Connector is responsible for normalizing backend differences (proprietary headers, list-API shapes). The DreamDB Protocol layer above sees a uniform contract.

11. Worked HTTP Examples

Concrete request/response pairs for the four most common operations.

11.1 PUT a Manifest

http
PUT /manifests/blake3-1b3w4q9j5k6f7p8x9z2c4v6n8m1q3w5e7r9t1y3u5i7o9p1a3s5d7f9g1h3j5k HTTP/2
Host: dreamdb-bucket.s3.amazonaws.com
Authorization: AWS4-HMAC-SHA256 Credential=...
If-None-Match: *
Content-Type: application/cbor
Content-Length: 4231

<4231 bytes of deterministic-CBOR Manifest>
http
HTTP/2 201 Created
ETag: "1b3w4q9j5k6f7p8x9z2c4v6n8m1q3w5e7r9t1y3u5i7o9p1a"
Content-Length: 0

(412 Precondition Failed would mean "already exists" — equivalent to success for ContentStore.)

11.2 GET a Bucket Object byte range (hot path)

http
GET /xy7g.../embedding.f32.dim=768.bucketed.spatial-bits=18/101100110011010101/blake3-9k8j7h6g5f4d3s2a1q3w5e7r9t1y3u5i7o9p1a3s5d7f9g1h3j5k HTTP/2
Host: dreamdb-bucket.s3.amazonaws.com
Authorization: AWS4-HMAC-SHA256 Credential=...
Range: bytes=3831296-3834367
http
HTTP/2 206 Partial Content
Content-Length: 3072
Content-Range: bytes 3831296-3834367/12582912
ETag: "9k8j7h6g5f4d3s2a..."

<3072 bytes of f32 vector>

Note: DreamDB bytes:3831296-3834368 (half-open) becomes HTTP Range: bytes=3831296-3834367 (inclusive). Connector handled the off-by-one.

11.3 List bucket addresses for cold-start discovery

http
GET /?list-type=2&prefix=xy7g.../embedding.f32.dim=768.bucketed.spatial-bits=18/1011 HTTP/2
Host: dreamdb-bucket.s3.amazonaws.com
Authorization: AWS4-HMAC-SHA256 Credential=...
http
HTTP/2 200 OK
Content-Type: application/xml

<ListBucketResult>
   <Contents>
      <Key>xy7g.../...spatial-bits=18/101100110011010101/blake3-9k8j...</Key>
      <Size>12582912</Size>
   </Contents>
   <Contents>
      <Key>xy7g.../...spatial-bits=18/101101110011010101/blake3-3a4f...</Key>
      <Size>11456001</Size>
   </Contents>
   ...
</ListBucketResult>

The Connector parses the XML (or JSON for non-S3 backends) and returns a list of (path, size) tuples to the Protocol layer.

11.4 Conditional ref update (CAS)

http
GET /refs/main HTTP/2
Host: dreamdb-bucket.s3.amazonaws.com
http
HTTP/2 200 OK
ETag: "abc123def456..."
Content-Length: 33

<33-byte multihash of M_old>
http
PUT /refs/main HTTP/2
Host: dreamdb-bucket.s3.amazonaws.com
If-Match: "abc123def456..."
Content-Type: application/octet-stream
Content-Length: 33

<33-byte multihash of M_new>

Success:

http
HTTP/2 200 OK
ETag: "ghi789jkl012..."

CAS conflict (another writer advanced the ref):

http
HTTP/2 412 Precondition Failed

The Connector returns the conflict to the DreamDB Protocol layer; the Protocol retries with the latest state.

12. Out of Scope for this Document

  • Authentication mechanisms. Backends use whatever auth they use; Connectors translate.
  • TLS configuration. System certificate authorities, OS trust stores. Standard HTTPS practice.
  • Quotas, rate limits, billing. Operator concerns; Connector retries on 429 per §8.2.
  • Server-side encryption keys (SSE). Backend-specific; configured at the bucket level, transparent to DreamDB.
  • Replication, backup, cross-region. Operator concerns; DreamDB is content-addressed, so replication is just "copy the bytes."
  • The semantic-cache layer in the DreamDB Protocol component that holds Manifests, Track Objects, and SpatialIndex Objects across queries. That's a 0006 (protocol operations) concern; this document only covers the Connector.

13. Open Questions Surfaced by This Document

  • OQ-30 (→ 0009 §6 / §7): Concrete conformance test vectors — HTTP request/response pairs. Resolved: full battery in 0009 §6 (HTTP semantics) and §7 (verb behavior). Test vector battery covers:
    • 412-on-existing-PUT round-trip (treat-as-success vs. treat-as-CAS-conflict per tier).
    • Range-read half-open-vs-inclusive translation (bytes:128-3200 DreamDB-side ↔ Range: bytes=128-3199 HTTP-side).
    • CAS conflict and retry on RefStore PUT against a moving target.
    • List-prefix pagination boundary at exactly K × 1000 and K × 1000 + 1 entries, with continuation-token round-trips for K = 0, 1, and 2 (the 1000-, 2000-, and 3000-object boundary cases). Tests that the Connector returns ALL entries, not just the first page.
    • List-prefix sort order verification: backend MUST return results in lexicographic byte order; test vectors include adversarial keys (paths with adjacent base32 characters, numeric-looking suffixes) to catch locale-collation bugs.
    • ETag flavor compatibility: quoted ("abc123"), weak (W/"abc123"), multipart-style ("abc123-5"). Connector must round-trip opaquely for If-Match (§8.5.1) and canonicalize for cross-request comparison (§8.5.2).
    • Manifest Supremacy assertion: a steady-state query test that asserts NO LIST HTTP request is issued during the hot path. Bootstrap-only LIST is allowed and verified separately.
  • OQ-31 (→ v0.1 spec): Multipart upload for large Bucket Objects (> 5 GB on S3, > 256 MB on Azure). v0 PUT assumes single-shot; very large Buckets need backend-specific multipart APIs. Out of scope for v0 because Bucket sizes are bounded to ~100 MB by 0007's splitting rule (OQ-24).
  • OQ-32 (→ v0.1 spec): Pre-signed URLs for browser-based clients. The current spec assumes the SDK has direct backend access; web SDKs typically need a gateway that issues short-lived pre-signed URLs. Future spec MAY define a "presigning Connector" pattern.

Next: 0006-protocol-operations.md — defines the verbs (append, layer, query, stream) at the DreamDB Protocol level. Builds on this document's HTTP contract by composing those verbs into HTTP request sequences.