P

Lab Notes: Chunk Fetch

date: 2026-06-19
author: Peterino

Test

Seeing the architecture of lore being content addressible means that I could theoretically use it like an object server. That is- I hit the server and ask for specific bytes. I DON’T KNOW IF THIS IS A GOOD IDEA I’M JUST TRYING SHIT OUT HERE

Workflow

Use python to do a checkout of a file by name

from pathlib import Path
import subprocess

lore = "lore"
file_path = "music/armageddon.ogg"

def run_lore(*args):
    return subprocess.check_output([lore, "-P", *args], text=True)

info = run_lore("file", "info", "--remote", file_path)

fields = {}
for line in info.splitlines():
    if ":" in line:
        key, value = line.split(":", 1)
        fields[key.strip().lower()] = value.strip()

root = f"{fields['hash']}-{fields['context']}"
print(root)

Then ask the remote immutable store for the root object and its leaf fragments. This will discover all the chunks associated with the file so we can address them individually by content.

query = run_lore(
    "repository",
    "store",
    "immutable",
    "query",
    "--remote",
    "--recurse",
    root,
)

leaf_fragments = []
current = None

for line in query.splitlines():
    if line.startswith("Address "):
        current = {
            "address": line.split()[1],
            "remote": "(remote)" in line,
            "subfragment": "(subfragment)" in line,
        }
    elif current and line.startswith("Content:"):
        current["content_size"] = int(line.split()[1])
        if current["remote"] and current["subfragment"]:
            leaf_fragments.append(current)
        current = None

print(len(leaf_fragments), "leaf fragments")

Actual address shape

A Lore immutable address in this run was a 64-hex content hash, a dash, then a 32-hex context id. The context stayed the same for the file root and its leaf fragments.

root:
3bfe886eb6c0754407c7e28c3e9829a34fc08f1a7991205c33b194477e04e576-019ee2ee88dd7b9293b737d8c75aa694

leaf fragments:
018b75c41d4942008ceafac6ebef55597943c17a53de46e11a4c21e019f4d573-019ee2ee88dd7b9293b737d8c75aa694
b47ad88c8f77487adb5c403efc3efe23d2bfa5e56b6c9f345ec6df5818ddbc62-019ee2ee88dd7b9293b737d8c75aa694
3b9e81abdb35dc0e24395ab21e42a91461112b7528f8ba67dbefe3062a39c5f9-019ee2ee88dd7b9293b737d8c75aa694
c64070cc2e6b66310792a46c86f8e2641b93926af4be8971131c1fa698cfb8a3-019ee2ee88dd7b9293b737d8c75aa694

The recursive query reports the root object first, then the independently fetchable subfragments:

Address 3bfe886eb6c0754407c7e28c3e9829a34fc08f1a7991205c33b194477e04e576-019ee2ee88dd7b9293b737d8c75aa694
Payload: 920 bytes
Content: 2098320 bytes

Address 018b75c41d4942008ceafac6ebef55597943c17a53de46e11a4c21e019f4d573-019ee2ee88dd7b9293b737d8c75aa694 (subfragment)
Payload: 49473 bytes
Content: 49473 bytes

Address b47ad88c8f77487adb5c403efc3efe23d2bfa5e56b6c9f345ec6df5818ddbc62-019ee2ee88dd7b9293b737d8c75aa694 (subfragment)
Payload: 106333 bytes
Content: 106333 bytes

Fetch individual leaf fragments by address

out_dir = Path("fetched-chunks")
out_dir.mkdir(exist_ok=True)

for index, fragment in enumerate(leaf_fragments, start=1):
    output = out_dir / f"{index:04d}-{fragment['address'][:16]}.bin"
    subprocess.run(
        [
            lore,
            "-P",
            "file",
            "write",
            "--remote",
            "--address",
            fragment["address"],
            "--output",
            str(output),
        ],
        check=True,
    )

Fetch the root address when the goal is the full file, not one leaf chunk per request:

import hashlib

def sha256(path):
    digest = hashlib.sha256()
    with open(path, "rb") as handle:
        for block in iter(lambda: handle.read(1024 * 1024), b""):
            digest.update(block)
    return digest.hexdigest()

subprocess.run(
    [lore, "-P", "file", "write", "--remote", "--address", root, "--output", "restored.ogg"],
    check=True,
)

print(sha256(file_path) == sha256("restored.ogg"))

For a metadata-only spread check, read content sizes from the recursive query output:

sizes = sorted(fragment["content_size"] for fragment in leaf_fragments)

def percentile(values, fraction):
    return values[int((len(values) - 1) * fraction)]

print("count", len(sizes))
print("min", sizes[0])
print("p50", percentile(sizes, 0.50))
print("p95", percentile(sizes, 0.95))
print("max", sizes[-1])

buckets = {}
for size in sizes:
    start_kib = (size // (32 * 1024)) * 32
    label = f"{start_kib}-{start_kib + 32} KiB"
    buckets[label] = buckets.get(label, 0) + 1

for label, count in buckets.items():
    print(label, count)

Chunk tuning

It seems like there is just a default chunk size. Uses FastCDC with a 32KiB floor by default, 64KiB as expected size and 256KiB ceiling. (Note… these numbers look familiar, seems to be.)

Note lore commit does not expose a chunk-size flag. The compiled VCS path uses FastCDC with a 32 KiB floor, 64 KiB expected size, and 256 KiB ceiling.

Fragment count

File Size Remote leaf fragments Average fragment size
music/armageddon.ogg 2.0 MiB 23 89 KiB
Sample MP3 36.2 MB / 34.5 MiB 417 85 KiB

Fragment spread

File Fragments Min P50 P95 Max Mean
music/armageddon.ogg 23 40.6 KiB 86.2 KiB 123.0 KiB 181.9 KiB 89.1 KiB
Sample MP3 417 32.3 KiB 77.0 KiB 149.3 KiB 230.8 KiB 84.7 KiB
Combined 440 32.3 KiB 77.1 KiB 149.3 KiB 230.8 KiB 85.0 KiB

The music data landed mostly above the 64 KiB expected size, but well below the 256 KiB ceiling. Only one sampled fragment reached the 224-256 KiB bucket.

Bucket OGG count MP3 count
32-64 KiB 5 87
64-96 KiB 9 218
96-128 KiB 7 66
128-160 KiB 1 29
160-192 KiB 1 10
192-224 KiB 0 6
224-256 KiB 0 1

Many chunk requests

This run fetched every leaf fragment for both files: 440 separate lore file write --address requests. Each request launched the Lore CLI, asked the server for one fragment, wrote a file, and hashed the result.

Payload Requests Bytes fetched Summed fetch time Rate Latency
OGG fragments 23 2.00 MiB 2.633 s 8.74 chunks/s, 0.76 MiB/s p50 113.8 ms, p95 119.2 ms
MP3 fragments 417 34.50 MiB 83.606 s 4.99 chunks/s, 0.41 MiB/s p50 200.1 ms, p95 272.7 ms
Combined 440 36.51 MiB 86.239 s 5.10 chunks/s, 0.42 MiB/s p50 194.8 ms, p95 272.7 ms

This is a request-count stress test, not a bandwidth test. The fragments are small, around 85 KiB on average, so per-request overhead dominates. This is more what I was interested in testing though if it was an object storage

Root-address fetch

The root address can also be fetched directly. In that mode, Lore reassembles the file from its fragment tree and writes the full payload.

File Fragments behind root Bytes written Time Rate Verification
music/armageddon.ogg 23 2.00 MiB 0.315 s 6.35 MiB/s SHA-256 matched worktree file
Sample MP3 417 34.50 MiB 0.541 s 63.81 MiB/s SHA-256 matched worktree file

The root fetch is the better comparison for normal file retrieval. The 417-fragment MP3 was reconstructed by content address in just over half a second. It’s honestly still too small of a file for testing.

Object-server reading

Lore’s VCS behavior sits on top of a smaller storage idea: immutable bytes by address, plus mutable pointers for names such as branches.

Layer What it does here
Immutable store Stores the root object and leaf fragments. Given an address, it can return bytes.
Mutable store Records changing pointers such as branch latest. It is not where the music payloads live.
VCS layer Uses those pieces for stage, commit, push, sync, clone, and checkout.

That makes an object-server use case plausible: ingest bytes, get stable content addresses, dedupe identical chunks, verify returned bytes by hash, and fetch data later without caring about the original path.

The current performance shape is mixed. Root-address fetches were fast enough to look useful. One-request-per-leaf-fragment fetches were slow because each tiny request paid CLI and RPC overhead.

Some thoughts

  • The root address acts like a content-addressed handle for the whole file.
  • The root object can point at many leaf fragments.
  • Leaf fragments are independently fetchable payload chunks.
  • Once a client has an address, it can request bytes without knowing the original file path.
  • Content addressing gives verification and dedupe, but it does not automatically make small random fetches cheap.

Useful distinction

repository store immutable query asks what exists and how it is shaped. It reports the root object and subfragments.

file write --address asks Lore to materialize bytes for an address. With a leaf fragment, it writes that chunk. With the root address, it reassembles the full file.

As an object server?

Promising as an asset/object backend, not yet convincing as a general low-latency object store through the CLI path. The useful path is fetching roots or batched ranges of content, not launching hundreds of single-fragment requests.

Related: music server lab notes, object server potential, and Lore system design.

← Previous Lab Notes: Music Server Next → Observation: Object Server Potential