Content Addressing
A chunk’s name is the hash of its content — so identity, deduplication, and integrity all come from one idea.
In Dits, a chunk is not addressed by where it lives or what you called it. It is addressed by what it is: the BLAKE3 hash of its bytes. That hash is the chunk's permanent name. Change one byte and you get a different name pointing at different content; change nothing and you get the same name every time, on every machine.
Same bytes, same address, stored once
Because the address is derived purely from content, two pieces of identical data resolve to the same hash and therefore the same stored object. Writing the same chunk twice is a no-op the second time — the store already has it. This is the mechanism behind exact deduplication: it falls out of the addressing scheme rather than needing a separate bookkeeping pass.
# Same content -> same BLAKE3 address, regardless of filename
$ blake3 weights-v1.safetensors
b3:9f2c... 8a1e
$ cp weights-v1.safetensors copy.safetensors
$ blake3 copy.safetensors
b3:9f2c... 8a1e # identical address -> stored once, not twice
# Flip a single byte -> a completely different address
$ blake3 weights-v1-edited.safetensors
b3:41d7... 0c93Every read is verified
Because the name is the hash, a read can re-hash what it loaded and compare. If a byte rotted on disk, was truncated, or was tampered with, the recomputed hash will not match the address that was requested, and the read fails loudly. Corruption becomes detectable instead of silent — you never hand a model trainer a quietly damaged shard.
Why BLAKE3
- Fast — it keeps up with disk and network throughput, so verifying on every read is not a tax you feel.
- Collision-resistant — distinct content reliably gets distinct addresses, which is what makes dedup safe.
- Deterministic — the same bytes hash the same way everywhere, so addresses are portable across machines and time.
Content addressing is layer L1 of Three-Layer Addressing, and it is the part that works today. It pairs with Chunking & Deduplication: chunking decides what gets a content address, and addressing decides how those pieces are named and verified.