Skip to main content
How it works

From bytes to commits, explained simply

Dits versions huge media files by storing the unique pieces of your data exactly once. Here's the whole idea — in plain English first, with the technical detail right behind it.

The whole idea in three words

Chunk, hash, deduplicate. Everything else is an optimization on top of these three.

Chunk
Files are split into variable-size pieces at boundaries chosen by the content itself — not at fixed offsets. A small edit only disturbs the chunks around it.
Hash
Every chunk is named by the BLAKE3 hash of its bytes. The name is the content, so two identical chunks always get the same name.
Deduplicate
A chunk that's already stored under that name is never stored again. Across versions, across files, across projects — you keep one copy.
Chunking Pipeline
Binary File
FastCDC Chunker
BLAKE3 Hash
Content-Addressable Store
Deduplication Results
Move file A→B
0 bytes
(hashes match)
100% saved
Trim video start
~5% of file
(only start chunks change)
95% saved
Append to file
Size of append only
(existing chunks reused)
varies saved

Content-defined chunking

Most tools cut files into fixed-size blocks. That breaks the moment you insert a single byte near the start — every block after it shifts, so nothing lines up and deduplication collapses. Dits uses FastCDC, which picks cut points based on the bytes themselves. Insert something near the start and only the chunk around the insertion changes; the rest are byte-for-byte identical.

Fixed-size blocks
Insert one byte at the start → every following block shifts → 0% reuse. The file looks “entirely new” even though you changed almost nothing.
Content-defined (FastCDC)
Insert one byte → only the nearby chunk changes → 95%+ of chunks are reused. Boundaries follow the content, so they survive edits.

Chunk sizes are tuned per file type. Smaller chunks deduplicate more finely but add bookkeeping; larger chunks keep manifests small for huge media. The real defaults:

ProfileMinAvgMaxBest for
project4 KB16 KB64 KBProject files, XML, JSON, configs
default16 KB64 KB256 KBGeneral-purpose balance
media64 KB256 KB1 MBVideo, 3D, large binaries

Dits picks a profile automatically from file size, and a media-aware path aligns chunk boundaries to video keyframes.

Content addressing with BLAKE3

Each chunk is identified by the BLAKE3 hash of its bytes — a 32-byte fingerprint. Because the name is derived from the content, identical chunks collide on the same name by design. That's the whole trick behind deduplication: to store a chunk, you first ask “do I already have something with this name?” If yes, you store nothing.

chunk A  ──blake3──▶  7f3a91c2…  (store it)
chunk B  ──blake3──▶  a91c02bd…  (store it)
chunk C  ──blake3──▶  7f3a91c2…  (same as A → store nothing, just point at it)

BLAKE3 is fast and parallelizable, so hashing keeps up with chunking even on large files. The same hash also verifies integrity: re-hash a chunk and compare to detect any corruption.

Format-aware handling & FACR

the differentiator

Generic chunking is byte-blind, and on a full video re-export that's a real weakness (see below). Dits' advantage is that it can understand the file. For MP4/ISOBMFF it parses the container, separates the metadata atoms (moov) from the media data (mdat), and reconstructs byte-exactly. On top of that, FACR (frame-addressable, content-addressed video) makes individual frames the unit of dedup.

Where Dits wins — and where it loses

Full re-export of a video
Honest loss
When an NLE re-exports the whole clip, almost every byte shifts. Generic content-defined chunking can't help — in our spike, dits' generic layer stored the most of any tool (88.6 MiB delta). We show this on purpose.
Metadata-only change (MP4 moov rewrite)
Win
Same media data, rewritten container header. dits stored 0.19 MiB — beating restic (0.77 MiB) and borg (5.93 MiB), because it understands the file structure.
Frame-addressable re-grade (FACR)
Win
Re-grade 5 of 300 frames and dits stores 5 new frames, reusing 295 — 98.3% deduplicated. You store the frames you changed, not the file.
Incremental streaming re-publish
Win
Re-grade a 2-second window of a 12s clip and dits re-encodes 1 of 6 HLS segments, reusing 5. It re-delivers 471 KB instead of 3.5 MB — 7.4× less shipped, 86.5% saved.

Numbers from a single-machine benchmark spike comparing git-lfs, restic, borg, xdelta3, and dits. See the full methodology on the benchmarks page. Showing the loss case is deliberate — it's what makes the wins credible.

Hybrid Git + Dits storage

Text and code are already well served by Git, so Dits keeps using libgit2 for them — you get real diff, merge, blame, and history. Binary and media files go through the chunk-and-dedup path instead. One repository, the right strategy per file, and full Git-style operations across the whole project.

Working with terabytes: VFS, locking & sync

These pieces complete the picture for teams. We're being explicit about what ships today versus what's still on the roadmap, so you can plan honestly.

Roadmap
Virtual filesystem
A FUSE/WinFSP mount so files appear local but hydrate on demand — work with a terabyte repo without downloading all of it. Designed; not yet shipped.
Roadmap
File locking
Distributed locks so two people don't edit the same un-mergeable binary at once. Part of the collaboration phase.
Roadmap
QUIC delta sync
Push and pull only the chunks the other side is missing, over QUIC. The network layer is scaffolding today — local commit, add, branch, and merge already work.

Git vs Dits, honestly

Git is excellent at what it was built for: text. Dits is built for the heavy stuff. Where a capability isn't shipped yet, we mark it Roadmap rather than claim it.

CapabilityDitsGitGit LFS
Large file handling
YesNative, no extensions
NoPractically unusable
PartialPointer files, separate store
Cross-file deduplication
YesAutomatic, content-addressed
NoNone
NoFull file copies
Format-aware (MP4 atom) handling
Yesmoov / mdat parse + reconstruct
NoByte-blind
NoByte-blind
Frame-addressable video (FACR)
ExperimentalFrame-level diff/dedup
NoNot applicable
NoNot applicable
Convergent encryption
YesAES-256-GCM, dedup-friendly
NoNone
NoNone
Hybrid text + binary storage
Yeslibgit2 for text, chunks for binary
PartialGreat for text only
PartialGit for text, LFS for binary
On-demand file hydration (VFS)
PartialFUSE/WinFSP mount works locally; remote hydration is roadmap
NoFull checkout
PartialManual selection
Networked push / pull / sync
RoadmapQUIC delta sync — scaffolding today
YesMature
YesMature
Open source
YesApache-2.0 / MIT
YesGPL v2
YesMIT

See it on your own files

Read the deep-dive docs, or watch the engine chunk and deduplicate your data in the browser (coming soon).