From bytes to commits, explained simply
Dits versions huge media files by storing the unique pieces of your data exactly once. Here's the whole idea — in plain English first, with the technical detail right behind it.
The whole idea in three words
Chunk, hash, deduplicate. Everything else is an optimization on top of these three.
Content-defined chunking
Most tools cut files into fixed-size blocks. That breaks the moment you insert a single byte near the start — every block after it shifts, so nothing lines up and deduplication collapses. Dits uses FastCDC, which picks cut points based on the bytes themselves. Insert something near the start and only the chunk around the insertion changes; the rest are byte-for-byte identical.
Chunk sizes are tuned per file type. Smaller chunks deduplicate more finely but add bookkeeping; larger chunks keep manifests small for huge media. The real defaults:
| Profile | Min | Avg | Max | Best for |
|---|---|---|---|---|
| project | 4 KB | 16 KB | 64 KB | Project files, XML, JSON, configs |
| default | 16 KB | 64 KB | 256 KB | General-purpose balance |
| media | 64 KB | 256 KB | 1 MB | Video, 3D, large binaries |
Dits picks a profile automatically from file size, and a media-aware path aligns chunk boundaries to video keyframes.
Content addressing with BLAKE3
Each chunk is identified by the BLAKE3 hash of its bytes — a 32-byte fingerprint. Because the name is derived from the content, identical chunks collide on the same name by design. That's the whole trick behind deduplication: to store a chunk, you first ask “do I already have something with this name?” If yes, you store nothing.
chunk A ──blake3──▶ 7f3a91c2… (store it)
chunk B ──blake3──▶ a91c02bd… (store it)
chunk C ──blake3──▶ 7f3a91c2… (same as A → store nothing, just point at it)BLAKE3 is fast and parallelizable, so hashing keeps up with chunking even on large files. The same hash also verifies integrity: re-hash a chunk and compare to detect any corruption.
Format-aware handling & FACR
the differentiatorGeneric chunking is byte-blind, and on a full video re-export that's a real weakness (see below). Dits' advantage is that it can understand the file. For MP4/ISOBMFF it parses the container, separates the metadata atoms (moov) from the media data (mdat), and reconstructs byte-exactly. On top of that, FACR (frame-addressable, content-addressed video) makes individual frames the unit of dedup.
Where Dits wins — and where it loses
Numbers from a single-machine benchmark spike comparing git-lfs, restic, borg, xdelta3, and dits. See the full methodology on the benchmarks page. Showing the loss case is deliberate — it's what makes the wins credible.
Hybrid Git + Dits storage
Text and code are already well served by Git, so Dits keeps using libgit2 for them — you get real diff, merge, blame, and history. Binary and media files go through the chunk-and-dedup path instead. One repository, the right strategy per file, and full Git-style operations across the whole project.
Working with terabytes: VFS, locking & sync
These pieces complete the picture for teams. We're being explicit about what ships today versus what's still on the roadmap, so you can plan honestly.
Git vs Dits, honestly
Git is excellent at what it was built for: text. Dits is built for the heavy stuff. Where a capability isn't shipped yet, we mark it Roadmap rather than claim it.
| Capability | Dits | Git | Git LFS |
|---|---|---|---|
| Large file handling | YesNative, no extensions | NoPractically unusable | PartialPointer files, separate store |
| Cross-file deduplication | YesAutomatic, content-addressed | NoNone | NoFull file copies |
| Format-aware (MP4 atom) handling | Yesmoov / mdat parse + reconstruct | NoByte-blind | NoByte-blind |
| Frame-addressable video (FACR) | ExperimentalFrame-level diff/dedup | NoNot applicable | NoNot applicable |
| Convergent encryption | YesAES-256-GCM, dedup-friendly | NoNone | NoNone |
| Hybrid text + binary storage | Yeslibgit2 for text, chunks for binary | PartialGreat for text only | PartialGit for text, LFS for binary |
| On-demand file hydration (VFS) | PartialFUSE/WinFSP mount works locally; remote hydration is roadmap | NoFull checkout | PartialManual selection |
| Networked push / pull / sync | RoadmapQUIC delta sync — scaffolding today | YesMature | YesMature |
| Open source | YesApache-2.0 / MIT | YesGPL v2 | YesMIT |
See it on your own files
Read the deep-dive docs, or watch the engine chunk and deduplicate your data in the browser (coming soon).