Research track · not a separate product

Reproducible history for models and datasets

Dits is exploring whether its local content-addressed engine and future derivation graph can help version heavy AI and scientific artifacts. There are no AI-specific commands, tensor formats, remote sync, or hosted workflows today.

Read the research notes Core product roadmap

Research questions

Start with evidence, not a second brand promise

Where does exact reuse help?

Measure chunk reuse across dataset snapshots, shared shards, adapters, variants, and real checkpoint histories instead of assuming savings.

What is reproducible?

Record data, code, configuration, seeds, tools, and source artifacts so a derived object can be rebuilt or invalidated honestly.

Where can similarity assist?

Use perceptual or semantic indexes for candidate search and discovery, never as a substitute for exact object identity.

Reusable today

The generic local engine

Arbitrary files can enter the same local chunk store and Git-shaped history used by Dits. This is generic byte storage, not tensor-aware intelligence or an ML workflow product.

Unbuilt

AI-specific semantics

Tensor-aware formats, dataset schemas, experiment lineage, model registries, similarity layers, recompute orchestration, and distributed artifact transfer need designs, fixtures, and measured validation.

Contribute a workload, not a slogan

Useful contributions include redistributable checkpoint or dataset fixtures, controlled edit histories, reproducibility requirements, and comparisons with Xet, DVC, object storage, and registries.