Core Concepts

Derivation & Recompute

When an artifact is reproducible, store the few-hundred-byte recipe instead of the terabytes of output.

Warning

This is a roadmap and design topic, not a shipped feature. Today only L1 ships: exact content addressing, FastCDC deduplication, BLAKE3 hashing, and local history. Derivation-addressing (L3) is described here as planned design.

The cheapest byte is the one you never store

Dedup layers shrink what you store (see Addressing), but many of the heaviest AI artifacts do not need to be stored at all — they are deterministically reproducible from inputs you already track. For those, Dits can store the recipe rather than the result: source references, the transform, and its parameters, seed, and config. That is a few hundred bytes standing in for gigabytes or terabytes.

What a derivable looks like

A quantized model is derivable from its source checkpoint plus the quantization method and settings.
A LoRA-merged model is just base + recipe — the base weights plus the adapter and merge parameters.
A training checkpoint is reproducible from its data references, training config, and random seed.

{
  "derive": "quantize",
  "source": "dits://model/llama-3-8b@blake3:ab12…",
  "params": { "method": "gptq", "bits": 4, "group_size": 128 },
  "seed": 0,
  "expected": "blake3:ef56…"
}

The storage-versus-compute tradeoff

Derivation trades storage for compute: you drop the bytes and pay to regenerate them when needed. That only makes sense if recomputation is actually available and reasonably cheap. Dits' planned cloud compute layer is what executes recomputation — materializing a derivable on demand from its recipe, verifying the result, and caching it where useful.

Important

A derivable is only safe to drop if its output is guaranteed reproducible. Nondeterminism — library or driver version drift, nondeterministic GPU kernels, unpinned randomness, floating-point reduction order — can make recompute return different bytes. Every recipe carries an expected content hash so a mismatch is detected, and the original is only discarded once determinism is pinned and verified.

Tip

Sibling concepts: Similarity Dedup (store a small delta) and Tensor-Aware Chunking (diff in the tensor domain). See the How it works overview and the Roadmap for where this lands.