Dits for AI
Documentation
Content-addressed version control for the heaviest data in AI — model weights, checkpoints, and datasets. Same open engine as Dits for media, pointed at a different set of enormous files.
Note
Dits for AI runs on the same engine as Dits for media. Engine-level topics (installation, the CLI, repositories, configuration) are documented once in the core engine docs and apply identically to AI artifacts. The pages here cover what's specific to AI: addressing layers, tensor-aware chunking, and model/dataset workflows.
Start here
New to Dits? Read Why Dits for AI, then follow the Quick Start. If you want the mental model first, jump to Three-Layer Addressing.
Core Concepts
- Three-Layer AddressingExact, similar, and derived — the model that takes Dits past byte-match dedup.
- Content AddressingHow BLAKE3 hashes give every chunk a stable identity.
- Chunking & DeduplicationFastCDC content-defined chunking and where dedup pays off.
- Tensor-Aware ChunkingWhy float weights defeat byte-level dedup, and the roadmap fix.
- Similarity DedupDedupe near-duplicate samples and re-encodes, not just identical bytes.
- Derivation & RecomputeStore the recipe, recompute the artifact — zero-byte derivables.