Data Structures
Dits stores all repository data as content-addressed objects. This page describes the core data structures and how they relate.
Object Hierarchy
Chunk
The smallest unit of storage. Chunks are variable-size pieces of file content, typically 256KB to 4MB.
struct Chunk {
// 32-byte BLAKE3 hash of the raw content
hash: [u8; 32],
// Uncompressed size in bytes
size: u32,
// Compression algorithm used (if any)
compression: Option<Compression>,
// The actual data (when loaded)
data: Vec<u8>,
}
enum Compression {
None,
Zstd { level: u8 },
Lz4,
}
// Storage format on disk:
// .dits/objects/chunks/a1/b2c3d4e5f6...
// ^^
// First 2 hex chars of hashChunk Properties
- Immutable: Content never changes after creation
- Deduplicated: Identical chunks share storage
- Verifiable: Hash guarantees integrity
- Independent: Can be stored/transferred separately
Asset
An asset represents a single file. It contains metadata and an ordered list of chunk references that reconstruct the file.
struct Asset {
// Hash of the entire file content (for verification)
content_hash: [u8; 32],
// Hash of this asset manifest
hash: [u8; 32],
// Total file size in bytes
size: u64,
// MIME type
mime_type: String,
// Ordered list of chunks
chunks: Vec<ChunkRef>,
// Optional media metadata
media: Option<MediaMetadata>,
}
struct ChunkRef {
// Hash of the chunk
hash: [u8; 32],
// Offset in the original file
offset: u64,
// Size of this chunk
size: u32,
}
struct MediaMetadata {
// For video files
duration_ms: Option<u64>,
width: Option<u32>,
height: Option<u32>,
frame_rate: Option<f32>,
codec: Option<String>,
keyframe_positions: Vec<u64>,
}Asset Properties
- File reconstruction: Concatenate chunks in order
- Random access: Seek to any offset using chunk table
- Sparse storage: Only fetch needed chunks
Tree (Manifest)
A tree represents a directory structure at a point in time. It maps paths to assets.
struct Tree {
// Hash of the tree (computed from sorted entries)
hash: [u8; 32],
// Map of paths to entries
entries: BTreeMap<PathBuf, TreeEntry>,
}
struct TreeEntry {
// Hash of the asset
asset_hash: [u8; 32],
// File mode (permissions)
mode: FileMode,
// File size (for quick listing)
size: u64,
}
enum FileMode {
Regular, // 0o100644
Executable, // 0o100755
Symlink, // 0o120000
}
// Serialization (sorted by path for consistent hashing):
footage/scene1.mov 100644 abc123...
footage/scene2.mov 100644 def456...
project.prproj 100644 ghi789...Tree Hashing
Commit
A commit records a snapshot of the repository with metadata about who made the change and when.
struct Commit {
// Hash of this commit
hash: [u8; 32],
// Hash of the tree (directory snapshot)
tree: [u8; 32],
// Parent commit hashes (usually 1, 2 for merges)
parents: Vec<[u8; 32]>,
// Author information
author: Signature,
// Committer information (may differ from author)
committer: Signature,
// Commit message
message: String,
// Additional headers (for extensions)
headers: HashMap<String, String>,
}
struct Signature {
name: String,
email: String,
timestamp: DateTime<Utc>,
timezone_offset: i32, // minutes from UTC
}
// Serialization format:
tree def45678...
parent 9f8e7d6c...
author Jane Editor <jane@example.com> 1705340400 -0800
committer Jane Editor <jane@example.com> 1705340400 -0800
Add color grading to scene 1Commit Graph
Commits form a directed acyclic graph (DAG) through parent references:
a1b2c3d9f8e7d65c4b3a21234567abcdef0Reference
References are named pointers to commits. They enable branch and tag functionality.
// Reference types:
// Branch - mutable pointer to a commit
// .dits/refs/heads/main → a1b2c3d4...
// Tag - immutable pointer to a commit
// .dits/refs/tags/v1.0 → 9f8e7d6c...
// Remote tracking branch
// .dits/refs/remotes/origin/main → a1b2c3d4...
// HEAD - current position (symbolic or direct)
// .dits/HEAD → ref: refs/heads/main
// or
// .dits/HEAD → a1b2c3d4... (detached)Index (Staging Area)
The index tracks staged changes between the working directory and the last commit.
struct Index {
// Version for format compatibility
version: u32,
// Indexed entries
entries: Vec<IndexEntry>,
// Extensions (cache, resolve-undo, etc.)
extensions: Vec<Extension>,
}
struct IndexEntry {
// Path relative to repository root
path: PathBuf,
// Asset hash (staged content)
asset_hash: [u8; 32],
// File statistics (for change detection)
stat: FileStat,
// Flags
flags: IndexFlags,
}
struct FileStat {
ctime: SystemTime,
mtime: SystemTime,
dev: u64,
ino: u64,
mode: u32,
uid: u32,
gid: u32,
size: u64,
}Pack Files
For efficient storage and transfer, objects can be packed together:
struct PackFile {
// Pack header
magic: [u8; 4], // "PACK"
version: u32,
object_count: u32,
// Packed objects (compressed, potentially deltified)
objects: Vec<PackedObject>,
// Pack checksum
checksum: [u8; 32],
}
struct PackIndex {
// Maps object hash to offset in pack file
// Enables O(log n) lookups
entries: BTreeMap<[u8; 32], PackOffset>,
// Pack file hash this index corresponds to
pack_hash: [u8; 32],
}
// Storage:
// .dits/objects/packs/pack-a1b2c3d4.pack
// .dits/objects/packs/pack-a1b2c3d4.idxObject Storage Layout
Related Topics
- Algorithms - How these structures are created and processed
- Content Addressing - The foundation of Dits storage
- Chunking - How files are split into chunks