Design, not product status

Exact, similar, and derived are different claims

A trustworthy system keeps cryptographic identity separate from approximate similarity and records enough inputs to justify a reproducibility claim.

Generic engine today

Exact identity

Chunk and hash exact artifact bytes. Identical content can share an object; every claimed exact result must verify cryptographically.

Research

Similarity index

Use approximate fingerprints to find candidates or near-duplicates. Similarity may aid discovery or delta choices, but never replaces exact identity.

Research

Derivation graph

Link outputs to data, code, configuration, seeds, tools, and source artifacts so reproducible results can be rebuilt and invalidated.

The difficult workload

Small numeric updates can change bytes across an entire checkpoint, defeating generic exact-chunk reuse. Tensor-aware or delta designs must prove fidelity, bounded decode cost, and real savings on public histories before Dits should claim an answer.