Derivation & Recompute
When an artifact is reproducible, store the few-hundred-byte recipe instead of the terabytes of output.
The cheapest byte is the one you never store
Dedup layers shrink what you store (see Addressing), but many of the heaviest AI artifacts do not need to be stored at all — they are deterministically reproducible from inputs you already track. For those, Dits can store the recipe rather than the result: source references, the transform, and its parameters, seed, and config. That is a few hundred bytes standing in for gigabytes or terabytes.
What a derivable looks like
- A quantized model is derivable from its source checkpoint plus the quantization method and settings.
- A LoRA-merged model is just
base + recipe— the base weights plus the adapter and merge parameters. - A training checkpoint is reproducible from its data references, training config, and random seed.
{
"derive": "quantize",
"source": "dits://model/llama-3-8b@blake3:ab12…",
"params": { "method": "gptq", "bits": 4, "group_size": 128 },
"seed": 0,
"expected": "blake3:ef56…"
}The storage-versus-compute tradeoff
Derivation trades storage for compute: you drop the bytes and pay to regenerate them when needed. That only makes sense if recomputation is actually available and reasonably cheap. Dits' planned cloud compute layer is what executes recomputation — materializing a derivable on demand from its recipe, verifying the result, and caching it where useful.
expected content hash so a mismatch is detected, and the original is only discarded once determinism is pinned and verified.