Skip to main content
Documentation

Data Structures

Dits stores all repository data as content-addressed objects. This page describes the core data structures and how they relate.

Object Hierarchy

Commit a1b2c3d4
Tree def45678Manifest
footage/scene1.mov→ Asset abc123 (Chunks: 001, 002, ...)
footage/scene2.mov→ Asset def456
project.prproj→ Asset ghi789

Chunk

The smallest unit of storage. Chunks are variable-size pieces of file content, typically 256KB to 4MB.

struct Chunk {
    // 32-byte BLAKE3 hash of the raw content
    hash: [u8; 32],

    // Uncompressed size in bytes
    size: u32,

    // Compression algorithm used (if any)
    compression: Option<Compression>,

    // The actual data (when loaded)
    data: Vec<u8>,
}

enum Compression {
    None,
    Zstd { level: u8 },
    Lz4,
}

// Storage format on disk:
// .dits/objects/chunks/a1/b2c3d4e5f6...
//                      ^^
//                      First 2 hex chars of hash

Chunk Properties

  • Immutable: Content never changes after creation
  • Deduplicated: Identical chunks share storage
  • Verifiable: Hash guarantees integrity
  • Independent: Can be stored/transferred separately

Asset

An asset represents a single file. It contains metadata and an ordered list of chunk references that reconstruct the file.

struct Asset {
    // Hash of the entire file content (for verification)
    content_hash: [u8; 32],

    // Hash of this asset manifest
    hash: [u8; 32],

    // Total file size in bytes
    size: u64,

    // MIME type
    mime_type: String,

    // Ordered list of chunks
    chunks: Vec<ChunkRef>,

    // Optional media metadata
    media: Option<MediaMetadata>,
}

struct ChunkRef {
    // Hash of the chunk
    hash: [u8; 32],

    // Offset in the original file
    offset: u64,

    // Size of this chunk
    size: u32,
}

struct MediaMetadata {
    // For video files
    duration_ms: Option<u64>,
    width: Option<u32>,
    height: Option<u32>,
    frame_rate: Option<f32>,
    codec: Option<String>,
    keyframe_positions: Vec<u64>,
}

Asset Properties

  • File reconstruction: Concatenate chunks in order
  • Random access: Seek to any offset using chunk table
  • Sparse storage: Only fetch needed chunks

Tree (Manifest)

A tree represents a directory structure at a point in time. It maps paths to assets.

struct Tree {
    // Hash of the tree (computed from sorted entries)
    hash: [u8; 32],

    // Map of paths to entries
    entries: BTreeMap<PathBuf, TreeEntry>,
}

struct TreeEntry {
    // Hash of the asset
    asset_hash: [u8; 32],

    // File mode (permissions)
    mode: FileMode,

    // File size (for quick listing)
    size: u64,
}

enum FileMode {
    Regular,     // 0o100644
    Executable,  // 0o100755
    Symlink,     // 0o120000
}

// Serialization (sorted by path for consistent hashing):
footage/scene1.mov  100644  abc123...
footage/scene2.mov  100644  def456...
project.prproj      100644  ghi789...

Commit

A commit records a snapshot of the repository with metadata about who made the change and when.

struct Commit {
    // Hash of this commit
    hash: [u8; 32],

    // Hash of the tree (directory snapshot)
    tree: [u8; 32],

    // Parent commit hashes (usually 1, 2 for merges)
    parents: Vec<[u8; 32]>,

    // Author information
    author: Signature,

    // Committer information (may differ from author)
    committer: Signature,

    // Commit message
    message: String,

    // Additional headers (for extensions)
    headers: HashMap<String, String>,
}

struct Signature {
    name: String,
    email: String,
    timestamp: DateTime<Utc>,
    timezone_offset: i32,  // minutes from UTC
}

// Serialization format:
tree def45678...
parent 9f8e7d6c...
author Jane Editor <jane@example.com> 1705340400 -0800
committer Jane Editor <jane@example.com> 1705340400 -0800

Add color grading to scene 1

Commit Graph

Commits form a directed acyclic graph (DAG) through parent references:

a1b2c3d
HEADmain
9f8e7d6
5c4b3a2
merge commit
1234567
abcdef0

Reference

References are named pointers to commits. They enable branch and tag functionality.

// Reference types:

// Branch - mutable pointer to a commit
// .dits/refs/heads/main → a1b2c3d4...

// Tag - immutable pointer to a commit
// .dits/refs/tags/v1.0 → 9f8e7d6c...

// Remote tracking branch
// .dits/refs/remotes/origin/main → a1b2c3d4...

// HEAD - current position (symbolic or direct)
// .dits/HEAD → ref: refs/heads/main
// or
// .dits/HEAD → a1b2c3d4...  (detached)

Index (Staging Area)

The index tracks staged changes between the working directory and the last commit.

struct Index {
    // Version for format compatibility
    version: u32,

    // Indexed entries
    entries: Vec<IndexEntry>,

    // Extensions (cache, resolve-undo, etc.)
    extensions: Vec<Extension>,
}

struct IndexEntry {
    // Path relative to repository root
    path: PathBuf,

    // Asset hash (staged content)
    asset_hash: [u8; 32],

    // File statistics (for change detection)
    stat: FileStat,

    // Flags
    flags: IndexFlags,
}

struct FileStat {
    ctime: SystemTime,
    mtime: SystemTime,
    dev: u64,
    ino: u64,
    mode: u32,
    uid: u32,
    gid: u32,
    size: u64,
}

Pack Files

For efficient storage and transfer, objects can be packed together:

struct PackFile {
    // Pack header
    magic: [u8; 4],    // "PACK"
    version: u32,
    object_count: u32,

    // Packed objects (compressed, potentially deltified)
    objects: Vec<PackedObject>,

    // Pack checksum
    checksum: [u8; 32],
}

struct PackIndex {
    // Maps object hash to offset in pack file
    // Enables O(log n) lookups
    entries: BTreeMap<[u8; 32], PackOffset>,

    // Pack file hash this index corresponds to
    pack_hash: [u8; 32],
}

// Storage:
// .dits/objects/packs/pack-a1b2c3d4.pack
// .dits/objects/packs/pack-a1b2c3d4.idx

Object Storage Layout

.dits
HEADCurrent branch reference
configRepository configuration
indexStaging area
objects
chunksLoose chunk objects
assetsAsset manifests
treesTree manifests
commitsCommit objects
packsPacked objects
refs
headsLocal branches
remotesRemote tracking
tagsTags
hooksRepository hooks

Related Topics