Skip to main content
Dits for AI docs
Core Concepts

Derivation & Recompute

When an artifact is reproducible, store the few-hundred-byte recipe instead of the terabytes of output.

The cheapest byte is the one you never store

Dedup layers shrink what you store (see Addressing), but many of the heaviest AI artifacts do not need to be stored at all — they are deterministically reproducible from inputs you already track. For those, Dits can store the recipe rather than the result: source references, the transform, and its parameters, seed, and config. That is a few hundred bytes standing in for gigabytes or terabytes.

What a derivable looks like

  • A quantized model is derivable from its source checkpoint plus the quantization method and settings.
  • A LoRA-merged model is just base + recipe — the base weights plus the adapter and merge parameters.
  • A training checkpoint is reproducible from its data references, training config, and random seed.
{
  "derive": "quantize",
  "source": "dits://model/llama-3-8b@blake3:ab12…",
  "params": { "method": "gptq", "bits": 4, "group_size": 128 },
  "seed": 0,
  "expected": "blake3:ef56…"
}

The storage-versus-compute tradeoff

Derivation trades storage for compute: you drop the bytes and pay to regenerate them when needed. That only makes sense if recomputation is actually available and reasonably cheap. Dits' planned cloud compute layer is what executes recomputation — materializing a derivable on demand from its recipe, verifying the result, and caching it where useful.