DreamDBv0.2.0bec026

TypeScript SDK Reference

The @dreamlake/dreamdb package provides a TypeScript-first interface to the DreamDB protocol. It mirrors the Python SDK's design while using idiomatic TypeScript patterns (async iterators, typed generics, tree-shakeable exports).

Core concepts

ConceptDescription
SchemaDeclares the fields (image, embedding, scalar, video, audio) a dataset contains. Chainable builder API.
DatasetA versioned collection of multimodal records anchored to a shared timeline.
SpaceA resolved view of a dataset's Manifest and Tracks, ready for reads.
BackendAn S3-compatible HTTP endpoint where Objects are stored (MinIO, AWS S3, R2, etc.).

Creating a Schema

A Schema declares the shape of every record in a dataset. Methods are chainable:

ts
import { Schema } from "@dreamlake/dreamdb";

const schema = new Schema()
  .addImage("image", { mime: "jpeg" })
  .addEmbedding("embedding", {
    dim: 512,
    algorithm: "dreamdb.ivf-cosine",
    rerank: true,
  })
  .addScalarCategorical("label");

Schema field methods

MethodDescription
addImage(name, opts?)Image field. opts.mime defaults to "jpeg".
addVideo(name, opts?)Video field. opts.mime defaults to "mp4".
addAudio(name, opts?)Audio field. opts.mime defaults to "wav".
addEmbedding(name, opts)Vector embedding. Requires dim. Optional algorithm, lshBits, compressor, spatialIndex, rerank.
addScalarCategorical(name)Categorical string scalar (e.g. labels, splits).
addScalarBool(name)Boolean scalar.
addScalarInt(name)64-bit integer scalar.
addScalarFloat(name)64-bit float scalar.
addScalarString(name)Free-form string scalar.
addScalarTimestamp(name)Nanosecond-precision timestamp scalar.

Every field accepts an optional required parameter (default true).

Opening and creating datasets

ts
import { Dataset } from "@dreamlake/dreamdb";

const backend = "http://localhost:9000/demo";

// Create a new dataset with the schema defined above.
const ds = await Dataset.create("my-dataset", schema, backend);

// Open an existing dataset (schema is recovered from the Manifest).
const ds2 = await Dataset.open("my-dataset", { backend });

Appending data

Records are plain objects keyed by field name. Embedding values are number[] or Float32Array.

ts
await ds.appendMany([
  {
    image: jpegBytes,            // Uint8Array
    embedding: clipVector,       // number[] of length 512
    label: "tabby cat",
  },
  {
    image: anotherJpeg,
    embedding: anotherVector,
    label: "golden retriever",
  },
]);

Caller-specified time anchors

By default, each record receives a nanosecond timestamp as its anchor. To supply explicit anchors (e.g. when replaying a log file), include the reserved _anchor key:

ts
await ds.appendMany([
  { _anchor: 1779083474_791_115_000n, image: imgA, label: "cat" },
  { _anchor: 1779083474_791_116_000n, image: imgB, label: "dog" },
]);

Within a single appendMany call, either every record must carry _anchor or none may -- mixing is rejected.

Vector queries

iterVector returns the top-K nearest neighbors to a query vector, optionally filtered by scalar predicates.

ts
const batches = await ds.iterVector({
  field: "embedding",
  query: clipTextEmbedding,   // number[]
  topK: 10,
  batchSize: 64,
  whereEq: { label: "cat" }, // optional scalar filter
});

for (const batch of batches) {
  const anchors = batch._time_anchors;  // bigint[]
  const labels = batch.label;           // string[]
  console.log(`Found ${anchors.length} results`);
}

Parameters

ParameterTypeDescription
fieldstringName of the embedding field in the schema.
querynumber[]Query vector (must match the field's dim).
topKnumberTotal results to return across all batches.
batchSizenumberRows per batch (default 64).
whereEqRecord<string, string | number | boolean>Restrict results to records matching these scalar values.

Streaming iteration

For full-dataset scans (training, export, analytics), use iterStream to iterate all records with bounded memory:

ts
for await (const batch of ds.iterStream({ batchSize: 256 })) {
  const anchors = batch._time_anchors;
  // Process each batch; memory is bounded to one batch at a time.
}

Unlike eager methods that materialize the entire result set, iterStream yields one batch at a time. This is essential for datasets with millions of records.

Snapshots

Snapshots pin a named label to the current dataset state. They are immutable -- subsequent writes to the dataset do not affect the snapshot.

ts
// Create a snapshot.
const snap = await ds.snapshot("baseline-v1");
console.log(snap.label);     // "baseline-v1"
console.log(snap.manifest);  // base32 Manifest hash

// Open the dataset at a prior snapshot.
const old = await Dataset.openRef("baseline-v1", backend);

Branches and merging

Branches allow parallel writes that are later merged back into the main ref.

ts
// Create a branch from the current tip.
const branch = await ds.branch("ingest-worker-0");

// Write to the branch independently.
await branch.appendMany(workerSlice);

// Merge branches back into trunk.
const newManifest = await ds.mergeMany([
  "ingest-worker-0",
  "ingest-worker-1",
]);

Merge strategies

StrategyDescription
"fast-forward"Advance the ref pointer without a new Manifest. Only works when the source is a strict descendant.
"union-tracks"Three-way fused merge with per-cell reconciliation. Writes a multi-parent Manifest.

Deletion (tombstones)

DreamDB supports GDPR-style deletion. Tombstoned anchors are suppressed on all subsequent reads. Storage reclamation is a separate operator pass.

ts
const newHash = await ds.delete(
  [1779083474_791_115_000n],
  { reason: "gdpr" }
);

// Inspect the effective tombstone set.
const suppressed = await ds.tombstoneSet();

Inspection

ts
// Current Manifest hash (base32).
const hash = ds.currentManifest();

// Walk the Manifest DAG (newest first).
const history = await ds.history(50);
for (const entry of history) {
  console.log(entry.manifest, entry.ts, entry.writer);
}

// List all refs under the backend.
const refs = await ds.listRefs();

// Count visible records (excludes tombstoned anchors).
const n = await ds.count();

Compaction

After many appends, bucket fragmentation can slow queries. compact merges fragments back to one bucket per cell.

ts
const outcome = await ds.compact();
console.log(outcome);
// { manifest: "d2qx...", cellsExamined: 100,
//   cellsCompacted: 25, fragmentsCollapsed: 112 }

Compaction is read-online (queries keep hitting the old Manifest until the final atomic ref update), idempotent, and safe to run in production.

Browser usage

In the browser, use DreamDBSpace to resolve a Space URI and read samples directly from an S3-compatible backend -- no application server required:

ts
import { DreamDBSpace } from "@dreamlake/dreamdb";

const space = await DreamDBSpace.fromUri(
  "http://localhost:9000/my-bucket/refs/my-dataset"
);

// List all tracks (image, embedding, scalar, etc.).
const tracks = space.tracks();

// Materialize samples with scalar values and blob URLs.
const samples = await space.samples();

// Walk the Manifest history for time-travel.
const history = await space.history(20);

See the Browser Demo for a live example.