DreamDB Specification — 0005: Backend Interface (HTTP Contract)
Status: Draft. Builds on
0000-overview.md,0001-data-model.md,0002-content-addressing.md,0003-time-encoding.md, and0004-spatial-indexing.md. This document fixes the HTTP semantic contract between the Storage Connector (in-SDK) and the Cloud Object Store (the backend). It resolves OQ-3 (consistency), OQ-22 (local-FS Connector), and OQ-29 (HTTP version requirement). It does not define a language-level interface — the contract is HTTP itself.
1. Purpose
Per 0000 §3, DreamDB's architecture has a clean HTTP boundary between the Storage Connector and the Cloud Object Store. This document specifies what that boundary requires:
- The HTTP verbs the Connector uses and the backend MUST handle correctly.
- The conditional headers that distinguish idempotent puts (content layer) from CAS updates (refs layer).
- The status codes the backend MUST return and the Connector MUST interpret.
- The consistency model the protocol assumes — strong read-after-write for content-addressed objects, linearizable CAS for refs, eventual consistency tolerated for prefix listings.
- The HTTP version required to make the billion-scale hot path achievable.
What this document does not define:
- A language-level trait or interface. The contract is HTTP semantics, not Rust/Go/Python types. (Per
project_architecture.md.) - Backend-specific authentication. Connectors translate to whatever the backend uses (AWS SigV4, GCS HMAC, OAuth, presigned URLs).
- Specific TLS configurations. The backend's HTTPS endpoint is whatever the operator runs; the Connector trusts the system's certificate authorities.
DreamDB defines neither the Application above the SDK nor the Cloud Object Store below the HTTP boundary. This document defines the HTTP boundary itself.
1.1 Auth is a v0 deployment gap (advisory)
v0 leans entirely on the backend's native authentication (AWS SigV4, GCS HMAC, OAuth, presigned URLs). This works for single-tenant deployments where one organization owns the backend. It does not work for multi-tenant production scenarios where you want:
- Per-Space write isolation — writers for Space X cannot write to Space Y's paths even if they share a bucket.
- Per-modality access control — sensitive modalities (PII, regulated data) gated separately from public modalities.
- Per-writer audit logging — backend access logs name the auth identity, but DreamDB doesn't tie that to writer-identity in Manifests.
- Capability tokens — short-lived, scoped tokens for browser-based clients or ephemeral workers.
Cloud object stores authenticate at the bucket level, not at path-prefix granularity (S3 IAM policies can simulate prefix-scoped writes via condition keys, but the rules are fragile and hard to audit). A writer with bucket-level write access can write to any DreamDB path on that bucket.
v0 production deployments MUST address this out-of-band, typically by:
- One bucket per Space (the simplest isolation; works at small Space counts).
- A backend gateway that mediates writes (per-tenant auth at the gateway, then forwards to backend with privileged credentials).
- IAM policy with prefix-condition rules (works on AWS S3 with care; brittle).
v0.1+ MAY define a "DreamDB Auth Layer" spec that addresses these concerns at the protocol level — likely involving capability tokens embedded in Manifests, signed writer identities, and per-Space access policies. Until then, treat auth as the operator's responsibility and document the failure mode prominently in deployment guides.
2. Two Conformance Tiers
A backend may implement one or both of:
| Tier | Required for… | Mandatory? |
|---|---|---|
| ContentStore-Conformant | Hash-addressed Spaces (the data plane) | Yes |
| RefStore-Conformant | Named, advanceable manifest pointers | No |
Every DreamDB backend MUST be at least ContentStore-Conformant. Every ref-supporting backend MUST also be RefStore-Conformant. A backend that supports only the ContentStore tier hosts hash-addressed Spaces only — clients address Manifests by their hash, with manifest distribution handled out-of-band (e.g. shared via another channel).
This split is a direct consequence of 0000 §5.2 (Content layer immutable, Refs layer optionally CAS-mutable). A backend without conditional-write primitives can host the data plane but not the refs plane.
3. ContentStore Tier (Required)
The ContentStore tier provides idempotent put-by-hash, range reads, and prefix listings over a flat path namespace.
3.1 URL path namespace
The Connector issues HTTP requests against URLs of the form:
where <base-url> is backend-specific (e.g. https://my-bucket.s3.amazonaws.com/) and <dreamdb-path> is the DreamDB address-grammar path defined in 0002:
Connectors MUST pass the <dreamdb-path> to the backend exactly as constructed by the DreamDB Protocol. No path rewriting, prefix injection, or escaping is permitted (paths are already URL-safe per 0002 §8 — base32 hashes, base2 spatial keys, hex time anchors).
3.2 PUT — idempotent put-by-hash
To write a content-addressed Object:
Required backend semantics:
- If the path does not exist, the bytes are stored. Backend returns 200 OK or 201 Created.
- If the path already exists with the same bytes (which the content-addressing guarantees, since the path is the hash), the backend MAY: (a) return 200 OK treating it as a no-op; or (b) return 412 Precondition Failed if the Connector sent
If-None-Match: *. The Connector treats both as success. - If the path already exists with different bytes, the situation is impossible by construction (BLAKE3 collision, ~2⁻¹²⁸). Backends that nonetheless detect this MUST treat it as a server error (5xx).
The Connector MAY use If-None-Match: * to make the put strictly create-only and avoid the implicit-no-op case:
412 Precondition Failed returned by If-None-Match: * indicates the object already exists — equivalent to a successful no-op for DreamDB's purposes (the content-addressed bytes are already there). Connectors MUST treat 412 in this case as success, not as failure.
3.3 GET — full-object or range read
To read a content-addressed Object in full:
Returns 200 OK with the bytes in the body, or 404 Not Found if the object does not exist (e.g. referenced by a manifest but not yet propagated, or never written).
For a partial read (the hot path for Fragments and Buckets per 0004 §7.3):
Returns 206 Partial Content with the requested byte range, or 416 Range Not Satisfiable if the range is invalid.
The <start>-<end> form follows HTTP Range semantics — <end> is inclusive (HTTP convention), differing from DreamDB's internal [start, end) half-open intervals (per 0002 §6.5). Connectors MUST translate: DreamDB bytes:<a>-<b> (half-open) → HTTP Range: bytes=<a>-<b−1> (inclusive).
Backend MUST support arbitrary byte ranges — DreamDB makes no assumption about chunk-aligned reads. This is true on every modern object store (S3, GCS, Azure Blob, OSS, MinIO).
3.4 HEAD — existence check & metadata
Returns 200 OK with Content-Length (and optionally ETag, Last-Modified) if the object exists, or 404 Not Found. Connectors use HEAD for: existence checks before triggering a re-write, size discovery before deciding on range vs. full-GET, and cheap cache validation.
3.5 LIST — prefix-scoped listing
The list operation is the cold-start discovery primitive (per 0004 §7.3 and 0002 §6.3.1) and the admin-scan primitive (operator-level audits, GC walks). It is never on the steady-state query hot path — see §5.3.1 for the Manifest Supremacy doctrine that governs this.
Backends MUST support listing objects whose paths start with a given prefix. The exact protocol verb is backend-specific:
- S3 / S3-compatible (MinIO, R2, B2):
GET /?list-type=2&prefix=<prefix>&continuation-token=<token> - GCS:
GET /storage/v1/b/<bucket>/o?prefix=<prefix>&pageToken=<token> - Azure Blob:
GET /<container>?restype=container&comp=list&prefix=<prefix>&marker=<token> - Aliyun OSS:
GET /?prefix=<prefix>&marker=<token>(S3-style)
Connectors MUST translate the protocol's logical "list by prefix" into the backend's specific list call, including pagination via continuation tokens. The backend MUST return:
- A list of object keys (or full paths, depending on backend convention) matching the prefix.
- For each entry: at minimum the key/path. SHOULD also return
Content-LengthandETagto enable cache-friendly behavior. - A continuation token if the result is paginated.
3.5.1 Sort order
Backends MUST sort results in lexicographic byte order over UTF-8 (equivalent to memcmp over the raw byte sequences). All DreamDB addresses are pure ASCII (base32 hashes, base2 spatial keys, hex time anchors, lowercase modality-tag literals — see 0002 §5, §6, §8), so this is unambiguous. Connectors MUST NOT apply any locale-specific collation, Unicode normalization, or case folding to listed keys. Any backend that delivers non-byte-ordered listings is non-conformant; in practice every backend in §10's matrix conforms.
3.5.2 Pagination correctness
The 1000-object default page boundary on S3 (and most S3-compatible backends) is the most common silent-failure surface for cold-start LIST. Connectors MUST follow a strict discipline:
-
Iterate to exhaustion. A Connector that returns "the first page" of a list-prefix call to the Protocol layer is buggy. The Connector MUST iterate continuation tokens until the backend signals end-of-list (
IsTruncated=falsefor S3, absentnextPageTokenfor GCS, emptyNextMarkerfor Azure, etc.) before returning to the Protocol component. -
Pagination is per-request, not per-session. Continuation tokens are bound to the request that generated them. Connectors MUST NOT cache continuation tokens across separate list-prefix invocations.
-
Backend-specific exhaustion signals. Each backend uses a different "no more pages" signal. Connectors MUST translate them all to a uniform "exhausted" return to the Protocol layer:
Backend Truncation signal S3 / MinIO <IsTruncated>true</IsTruncated>+<NextContinuationToken>GCS nextPageTokenfield present in JSON responseAzure Blob <NextMarker>...</NextMarker>non-emptyAliyun OSS <IsTruncated>true</IsTruncated>+<NextMarker> -
Cross-page boundary cases. The 1000th object boundary is a known correctness hazard. Connectors MUST handle the case where a single logical list-prefix result spans
K * 1000 + rentries for anyK, r ≥ 0, includingr = 0(exact multiple of 1000) andr = 1(one entry on a fresh page). Conformance test vectors covering these cases are required (OQ-30). -
Verify against Manifest where possible. When the SDK has a Manifest available, the DreamDB Protocol layer SHOULD verify list-prefix results against the Manifest's known bucket-list as a cheap sanity check — silently-truncated listings surface as missing entries, easy to detect and re-list.
Page size: backends typically return up to 1000 entries per page (S3 default; GCS configurable up to 1000; Azure up to 5000; OSS up to 1000). Connectors MAY pass the backend's max page size to reduce round trips but MUST NOT rely on a specific page size for correctness — backends sometimes return fewer entries per page than requested under load.
3.6 DELETE — optional, garbage collection only
DELETE is optional at the DreamDB protocol level. The protocol has no delete verb (per 0000 §5.2 — corrections are layered, never destructive). DELETE exists only for garbage collection: an operator may choose to drop old, orphaned content-addressed objects that no live manifest references.
Returns 200 OK or 204 No Content on success, 404 Not Found if absent (treat as success), 403 Forbidden if the operator's policy prohibits.
A backend that disables DELETE entirely is still ContentStore-Conformant — DreamDB's content layer is append-only by design. GC is an operator concern, not a protocol concern.
4. RefStore Tier (Optional)
The RefStore tier adds conditional-write semantics for a small mutable namespace at refs/<ref-name>. Per 0000 §5.2, this is the only mutable thing in DreamDB.
4.1 Initial create — If-None-Match: *
To create a ref that does not yet exist:
Returns:
- 200 OK or 201 Created — ref now points to the manifest.
- 412 Precondition Failed — ref already exists; Connector treats this as a CAS conflict and retries with §4.2 semantics.
4.2 Update — If-Match: <etag>
To advance an existing ref to a new manifest:
Returns the current value plus the backend's strong validator (commonly ETag, sometimes x-goog-generation for GCS). Then:
Returns:
- 200 OK or 204 No Content — ref advanced atomically.
- 412 Precondition Failed — another writer advanced the ref between the GET and PUT. Connector retries with the new state.
This is optimistic concurrency control, not cooperative locking. No writer ever blocks; conflicts are detected and retried.
4.3 Resolve — GET /refs/<ref-name>
Returns 200 OK with the 33-byte multihash body, or 404 Not Found. SHOULD include ETag so the Connector can use it for a subsequent CAS update without an extra HEAD.
4.4 List refs — optional
Same semantics as §3.5. Useful for "list all branches" or "list all release tags." Optional even within RefStore-Conformant backends.
4.5 Backend-specific conditional-write headers
The HTTP semantics described above use the IETF-standard If-Match / If-None-Match headers. Some backends use proprietary equivalents:
| Backend | "Create only if absent" | "Update only if matching" |
|---|---|---|
| S3 (post-2024) | If-None-Match: * | If-Match: <etag> |
| MinIO | If-None-Match: * | If-Match: <etag> |
| GCS | x-goog-if-generation-match: 0 | x-goog-if-generation-match: <gen> |
| Azure Blob | If-None-Match: * | If-Match: <etag> |
| Aliyun OSS | x-oss-forbid-overwrite: true (create-only) | If-Match: <etag> |
Connectors MUST translate the standard headers above into whichever flavor the configured backend understands. The DreamDB Protocol component above the Connector emits the standard form.
5. Consistency Requirements (resolves OQ-3)
5.1 Content-addressed objects: strong read-after-write
After a successful PUT of a content-addressed object (Genesis, Manifest, Track, Index Page, Bucket Object, Fragment, etc.), any subsequent GET of the same path MUST return the bytes that were PUT. The backend MUST guarantee this strong read-after-write consistency for content-addressed Objects.
This is the modern S3 default (Amazon S3 went strongly consistent globally in December 2020), the GCS default, the Azure Blob default, the Aliyun OSS default, and the MinIO default. A backend that fails this guarantee is non-conformant.
5.2 Refs: linearizable CAS
A successful conditional PUT with If-Match: <etag> MUST be linearizable — no two writers can both succeed against the same ref with the same If-Match value. The backend's conditional-write primitive MUST satisfy this; all backends listed in §4.5 do.
5.3 Prefix listings: eventual consistency tolerated
LIST /<prefix> MAY lag recent writes. A newly-PUT object MAY not appear in a subsequent list-prefix call for some short window (typically milliseconds, occasionally up to seconds on geo-replicated backends; up to ~1 minute observed on multi-region S3 buckets under transient replication lag).
The classic failure mode: a Pipeline PUTs a Bucket Object → 200 OK → notifies the SDK → SDK does list-prefix → empty result set. The Bucket exists and is fetchable by direct GET against its full path, but the prefix listing has not yet propagated.
DreamDB tolerates this only because the protocol design renders list-prefix non-essential for steady-state operation. See §5.3.1.
5.3.1 Manifest Supremacy (doctrine)
When a Manifest is available, the SDK MUST resolve Object addresses through the Manifest — which is content-addressed and strongly consistent (§5.1) — never through list-prefix, which is eventually consistent.
This is the Manifest Supremacy doctrine. It is not a recommendation; it is a correctness requirement for any conformant SDK.
The doctrine has three concrete consequences:
- The query hot path uses the cached
object_index(from the Manifest), not list-prefix. Per0004§7.3 and0002§7.3.2, every Track Object'sobject_indexenumerates the addresses of its constituent Bucket / Fragment / Index-Page Objects. Once the SDK has the Manifest, every subsequent fetch is a direct GET against a known, content-addressed path — strongly consistent. - list-prefix is reserved for two cases, and only those:
- Cold-start bootstrap — an SDK with no Manifest at all needs list-prefix to discover what's on the backend.
- Administrative scans — operator audits (orphan-object detection, GC walks, replication checks) genuinely need to enumerate the backend's contents. These are out-of-band of the query path.
- Any code path that uses list-prefix during steady-state query operation is a bug. Reviewers must reject SDK implementations that re-list to discover newly-written Bucket Objects. New writes are discovered by reading the new Manifest (which references them by hash) and then direct-GETting via the resolved addresses.
Why this works: the writer's PUT-then-publish-Manifest sequence guarantees that if a reader has the new Manifest, then every Object the Manifest references is available to direct GET (§5.1). list-prefix lag is irrelevant — the reader never depends on it for visibility of recent writes.
Standard write sequence that producers must follow to make this work:
- PUT all leaf Objects (Buckets, Fragments, Index Pages, etc.).
- PUT all parent Objects (Track Objects, Manifest Index Pages).
- PUT the Manifest last.
- (Ref-conformant backends) advance the ref to the new Manifest hash via
If-Match(§4.2).
A reader that resolves the new Manifest is guaranteed to find its dependencies live (§5.1). A reader on the previous Manifest is unaffected. Manifest Supremacy makes the eventual-consistency window of list-prefix invisible to steady-state queries.
A Connector MUST NOT depend on list-prefix returning the most-recent-write. The DreamDB Protocol layer MUST NOT issue list-prefix calls during steady-state operation. Together, these constraints render the eventual-consistency window irrelevant — by construction, not by lucky timing.
5.4 What the protocol does NOT require
- Cross-backend consistency. DreamDB makes no claims about replication between backends. The same Manifest hash on backends A and B is the same Manifest if both backends contain the closure of objects reachable from it.
- Read-your-writes for list operations. See §5.3.
- Atomicity across multiple PUTs. Each PUT is independently atomic; multi-object atomicity (e.g. "publish a Manifest and three new Track Objects together") is the writer's concern. The standard pattern is: PUT all leaf Objects → PUT index pages → PUT Track Objects → PUT Manifest last. A reader resolving the new Manifest finds its dependencies live; a reader on the old Manifest is unaffected.
6. HTTP Version Requirement (resolves OQ-29)
The Storage Connector MUST support HTTP/2 for the connection to the backend. HTTP/3 is permitted as an upgrade.
HTTP/1.1 is permitted only as a fallback when the backend does not advertise HTTP/2 support, AND the Connector logs a warning that performance will be degraded.
6.1 Why HTTP/2 is required
The hot path of a billion-scale query (per 0004 §7.3) issues ~16 parallel ranged GETs to fetch matching Bucket Objects. Under HTTP/1.1:
- Each parallel GET requires its own TCP connection and TLS handshake (~30–80 ms each on cold connections).
- Practical client connection-pool limits (typically 6–20 per host) serialize requests.
- The hot path balloons from ~100 ms to ~500–1000 ms.
Under HTTP/2:
- All 16 GETs share one TCP+TLS connection.
- Streams are multiplexed at the protocol level — no head-of-line blocking at the TCP layer (HTTP/3 fixes this further by moving multiplexing to UDP/QUIC).
- Hot path stays at ~50–100 ms.
6.2 Backend support
All major cloud object stores have supported HTTP/2 since 2018 or earlier:
| Backend | HTTP/2 since |
|---|---|
| AWS S3 | 2017 |
| GCS | 2018 |
| Azure Blob | 2019 |
| Aliyun OSS | 2019 |
| MinIO | 2018 (GA) |
| Cloudflare R2 | 2022 (launch) |
HTTP/3 (QUIC) is increasingly available; Connectors MAY prefer it when both peers support it. CDN-fronted deployments routinely terminate HTTP/3 at the edge.
6.3 Connector behavior
The Connector:
- MUST initiate HTTP/2 via ALPN during the TLS handshake.
- MUST share a single connection per backend host across concurrent DreamDB queries (i.e., don't open one connection per request — defeats multiplexing).
- SHOULD use HTTP/3 if available.
- MUST use HTTP/2's flow control to avoid head-of-line blocking (TCP head-of-line is not eliminated in HTTP/2 — HTTP/3 does that — but HTTP/2 still removes the application-layer blocking).
7. Status Code Reference
A summary of status codes the Connector MUST handle:
| Code | Meaning at the DreamDB layer |
|---|---|
| 200 | Success (full GET, HEAD, ref GET, idempotent re-PUT) |
| 201 | Created (PUT of new content-addressed Object or new ref) |
| 204 | No Content (DELETE success, ref update without echo) |
| 206 | Partial Content (range GET success) |
| 304 | Not Modified (conditional GET; rare in DreamDB but Connector MUST handle) |
| 400 | Bad Request — malformed Connector code or path; treat as bug |
| 403 | Forbidden — auth failure or policy denial; Connector surfaces to caller |
| 404 | Not Found — Object absent (legitimate for cold reads, list-prefix gaps, garbage-collected) |
| 412 | Precondition Failed — If-Match / If-None-Match check rejected. Treat as success for ContentStore PUT (idempotent re-write); treat as CAS conflict for RefStore PUT (retry). |
| 416 | Range Not Satisfiable — Connector emitted invalid range; bug |
| 429 | Too Many Requests — backend rate-limit; Connector backs off and retries |
| 500 | Server error — Connector retries with backoff |
| 503 | Service Unavailable — Connector retries |
| 5xx | Other server errors — retry with exponential backoff up to a budget |
8. Connector Responsibilities
The Storage Connector is intentionally minimal — its only job is HTTP transport. But the v0 spec assigns it a small set of well-defined responsibilities:
8.1 Connection pooling and multiplexing
Maintain a single HTTP/2 (or HTTP/3) connection per (backend host, auth identity) pair. All concurrent DreamDB requests against the same host go through that connection's multiplexed streams. A connection-per-request implementation defeats §6.
8.2 Retry & timeout policy
- 5xx and 429: exponential backoff, max ~3 retries by default. Configurable.
- 412 on ContentStore PUT: treat as success (the bytes are there).
- 412 on RefStore PUT: treat as CAS conflict, return to caller for retry with new state.
- 404: surface to caller; do not retry.
- Network errors (timeout, reset): retry with backoff.
- Default request timeout: 30 s for GETs, 5 min for large PUTs (Manifest, Bucket, Fragment uploads).
8.3 Authentication translation
The Connector receives DreamDB-relative paths from the Protocol component above; it adds the backend's auth (signing, presigned URLs, OAuth tokens, etc.) before sending the request. The DreamDB Protocol component is auth-unaware.
8.4 No business logic
The Connector MUST NOT:
- Parse DreamDB Object bytes (Manifests, Track Objects, Index Pages). These are opaque to the Connector.
- Make decisions about which paths to fetch or in what order. That's the Protocol component's job.
- Cache DreamDB Objects beyond what's needed to fulfill in-flight HTTP responses. Application-level caching belongs in the Protocol component.
The Connector is purely transport. Anything beyond transport is a bug at the Connector layer.
8.5 ETag handling and normalization
ETags are the most variably-formatted field in the HTTP contract — different backends quote them differently, occasionally use weak validators, and use semantically-different inner values (some are hashes, some are opaque generation tokens, some are length-of-bytes-of-hash-of-parts). Connectors are responsible for normalizing ETags so the DreamDB Protocol layer above sees a uniform representation, while preserving the exact bytes needed for round-trip CAS (If-Match) requests.
Two rules govern Connector ETag handling:
8.5.1 Opaque round-trip for If-Match
For ref-update CAS (§4.2), the Connector receives an ETag header on a GET /refs/<name> response and uses it on a subsequent PUT /refs/<name> with If-Match. The Connector MUST round-trip the ETag bytes exactly as received — same quoting, same weak/strong prefix, same content. The Connector MUST NOT interpret, hash-compare, or modify the value. This guarantees the backend's CAS check works correctly regardless of backend-specific ETag semantics.
8.5.2 Canonical form for Protocol-layer comparison
When the DreamDB Protocol layer needs to compare or cache ETags across requests (e.g., to detect that a ref has changed since the last GET), the Connector MUST present ETags in a canonical form with these normalizations applied:
- Strip surrounding double-quotes.
"abc123"→abc123. - Strip leading
W/(weak validator prefix).W/"abc123"→abc123. - No whitespace inside or outside the value. Trim if present (defensive; should not occur on conformant backends).
- No Unicode normalization. ETags are ASCII per HTTP RFC 7232; any non-ASCII byte is a backend bug.
When the Protocol layer hands a canonical ETag back to the Connector for a CAS request, the Connector reapplies the appropriate quoting for the target backend (typically "<value>" for strong ETags). This is symmetric: opaque on the wire, canonical at the Protocol-layer boundary.
8.5.3 What ETags are NOT
- Not content hashes. S3 returns hex-MD5 for single-part objects, hex-of-md5-of-parts plus
-Nsuffix for multipart. GCS returns opaque generation tokens. Azure returns opaque tokens. DreamDB Connectors MUST NOT assume an ETag equals BLAKE3-of-bytes — even though the path is the BLAKE3 hash, the ETag is a separate backend-specific identifier. - Not stable across regions. Some replication setups change ETags during cross-region copy. Don't compare ETags across backends.
- Not a substitute for
*inIf-None-Match.If-None-Match: "<some-etag>"checks "object's current ETag is not this value" — different fromIf-None-Match: *which checks "object does not exist." DreamDB uses*for create-only PUT (§3.2, §4.1); never use a specific ETag inIf-None-Match.
Conformance test vectors (0009) MUST include round-trip cases across backends with different ETag conventions: quoted vs. unquoted, weak vs. strong, multipart-style with -N suffix.
9. Local-Filesystem Connector (resolves OQ-22)
v0 is HTTP-only. A file://-style direct-FS Connector is not in v0.
Reasoning:
- One Connector contract, testable end-to-end with curl. Adds zero variance to the conformance story.
- Local development uses MinIO (or any embedded-mode HTTP object store: localstack, fake-gcs-server, Azurite) running on
localhost. MinIO starts in <1 s, runs in ~30 MB of memory, supports the full S3 API. Nofile://Connector saves any meaningful developer ergonomics. - A
file://Connector that "implements the same semantics" inevitably drifts from HTTP semantics in subtle ways — handling of conditional headers, list pagination, range edge cases. Catching those drifts requires duplicating the conformance test suite. Not worth the maintenance burden for a dev convenience. - A future v0.1 spec MAY add a
file://Connector; nothing in v0 forecloses it. v0 just ships HTTP-only.
For local CI, the recommended setup is:
The Connector is configured with http://localhost:9000 as the backend; everything else is identical to production.
10. Backend Compatibility Matrix
| Operation | S3 / R2 / B2 | GCS | Azure Blob | Aliyun OSS | MinIO | Notes |
|---|---|---|---|---|---|---|
| PUT (idempotent put-by-hash) | ✓ | ✓ | ✓ | ✓ | ✓ | Universal |
PUT with If-None-Match: * | ✓ (2024+) | ✓ | ✓ | ✓ (proprietary) | ✓ | S3 added in 2024; Connector handles flavor |
PUT with If-Match: <etag> | ✓ (2024+) | ✓ | ✓ | ✓ | ✓ | Same |
| GET (full-object) | ✓ | ✓ | ✓ | ✓ | ✓ | Universal |
GET with Range: bytes=... | ✓ | ✓ | ✓ | ✓ | ✓ | Universal |
| HEAD | ✓ | ✓ | ✓ | ✓ | ✓ | Universal |
| LIST with prefix | ✓ | ✓ | ✓ | ✓ | ✓ | API surface differs; Connector translates |
| DELETE | ✓ | ✓ | ✓ | ✓ | ✓ | Universal; optional at protocol level |
| HTTP/2 | ✓ | ✓ | ✓ | ✓ | ✓ | All since 2017–2019 |
| HTTP/3 | ✓ (CloudFront) | ✓ | △ (preview) | △ | ✓ (2024) | Optional but preferred |
| Strong read-after-write (content) | ✓ (2020+) | ✓ | ✓ | ✓ | ✓ | All major backends post-2020 |
△ = partial / region-dependent / preview availability.
The Connector is responsible for normalizing backend differences (proprietary headers, list-API shapes). The DreamDB Protocol layer above sees a uniform contract.
11. Worked HTTP Examples
Concrete request/response pairs for the four most common operations.
11.1 PUT a Manifest
(412 Precondition Failed would mean "already exists" — equivalent to success for ContentStore.)
11.2 GET a Bucket Object byte range (hot path)
Note: DreamDB bytes:3831296-3834368 (half-open) becomes HTTP Range: bytes=3831296-3834367 (inclusive). Connector handled the off-by-one.
11.3 List bucket addresses for cold-start discovery
The Connector parses the XML (or JSON for non-S3 backends) and returns a list of (path, size) tuples to the Protocol layer.
11.4 Conditional ref update (CAS)
Success:
CAS conflict (another writer advanced the ref):
The Connector returns the conflict to the DreamDB Protocol layer; the Protocol retries with the latest state.
12. Out of Scope for this Document
- Authentication mechanisms. Backends use whatever auth they use; Connectors translate.
- TLS configuration. System certificate authorities, OS trust stores. Standard HTTPS practice.
- Quotas, rate limits, billing. Operator concerns; Connector retries on
429per §8.2. - Server-side encryption keys (SSE). Backend-specific; configured at the bucket level, transparent to DreamDB.
- Replication, backup, cross-region. Operator concerns; DreamDB is content-addressed, so replication is just "copy the bytes."
- The semantic-cache layer in the DreamDB Protocol component that holds Manifests, Track Objects, and SpatialIndex Objects across queries. That's a
0006(protocol operations) concern; this document only covers the Connector.
13. Open Questions Surfaced by This Document
- OQ-30 (→ 0009 §6 / §7): Concrete conformance test vectors — HTTP request/response pairs. Resolved: full battery in
0009§6 (HTTP semantics) and §7 (verb behavior). Test vector battery covers:- 412-on-existing-PUT round-trip (treat-as-success vs. treat-as-CAS-conflict per tier).
- Range-read half-open-vs-inclusive translation (
bytes:128-3200DreamDB-side ↔Range: bytes=128-3199HTTP-side). - CAS conflict and retry on RefStore PUT against a moving target.
- List-prefix pagination boundary at exactly
K × 1000andK × 1000 + 1entries, with continuation-token round-trips for K = 0, 1, and 2 (the 1000-, 2000-, and 3000-object boundary cases). Tests that the Connector returns ALL entries, not just the first page. - List-prefix sort order verification: backend MUST return results in lexicographic byte order; test vectors include adversarial keys (paths with adjacent base32 characters, numeric-looking suffixes) to catch locale-collation bugs.
- ETag flavor compatibility: quoted (
"abc123"), weak (W/"abc123"), multipart-style ("abc123-5"). Connector must round-trip opaquely forIf-Match(§8.5.1) and canonicalize for cross-request comparison (§8.5.2). - Manifest Supremacy assertion: a steady-state query test that asserts NO
LISTHTTP request is issued during the hot path. Bootstrap-only LIST is allowed and verified separately.
- OQ-31 (→ v0.1 spec): Multipart upload for large Bucket Objects (> 5 GB on S3, > 256 MB on Azure). v0 PUT assumes single-shot; very large Buckets need backend-specific multipart APIs. Out of scope for v0 because Bucket sizes are bounded to ~100 MB by
0007's splitting rule (OQ-24). - OQ-32 (→ v0.1 spec): Pre-signed URLs for browser-based clients. The current spec assumes the SDK has direct backend access; web SDKs typically need a gateway that issues short-lived pre-signed URLs. Future spec MAY define a "presigning Connector" pattern.
Next: 0006-protocol-operations.md — defines the verbs (append, layer, query, stream) at the DreamDB Protocol level. Builds on this document's HTTP contract by composing those verbs into HTTP request sequences.