Lab Notes: Chunk Fetch
Test
Seeing the architecture of lore being content addressible means that I could theoretically use it like an object server. That is- I hit the server and ask for specific bytes. I DON’T KNOW IF THIS IS A GOOD IDEA I’M JUST TRYING SHIT OUT HERE
Workflow
Use python to do a checkout of a file by name
from pathlib import Path
import subprocess
lore = "lore"
file_path = "music/armageddon.ogg"
def run_lore(*args):
return subprocess.check_output([lore, "-P", *args], text=True)
info = run_lore("file", "info", "--remote", file_path)
fields = {}
for line in info.splitlines():
if ":" in line:
key, value = line.split(":", 1)
fields[key.strip().lower()] = value.strip()
root = f"{fields['hash']}-{fields['context']}"
print(root)
Then ask the remote immutable store for the root object and its leaf fragments. This will discover all the chunks associated with the file so we can address them individually by content.
query = run_lore(
"repository",
"store",
"immutable",
"query",
"--remote",
"--recurse",
root,
)
leaf_fragments = []
current = None
for line in query.splitlines():
if line.startswith("Address "):
current = {
"address": line.split()[1],
"remote": "(remote)" in line,
"subfragment": "(subfragment)" in line,
}
elif current and line.startswith("Content:"):
current["content_size"] = int(line.split()[1])
if current["remote"] and current["subfragment"]:
leaf_fragments.append(current)
current = None
print(len(leaf_fragments), "leaf fragments")
Actual address shape
A Lore immutable address in this run was a 64-hex content hash, a dash, then a 32-hex context id. The context stayed the same for the file root and its leaf fragments.
root:
3bfe886eb6c0754407c7e28c3e9829a34fc08f1a7991205c33b194477e04e576-019ee2ee88dd7b9293b737d8c75aa694
leaf fragments:
018b75c41d4942008ceafac6ebef55597943c17a53de46e11a4c21e019f4d573-019ee2ee88dd7b9293b737d8c75aa694
b47ad88c8f77487adb5c403efc3efe23d2bfa5e56b6c9f345ec6df5818ddbc62-019ee2ee88dd7b9293b737d8c75aa694
3b9e81abdb35dc0e24395ab21e42a91461112b7528f8ba67dbefe3062a39c5f9-019ee2ee88dd7b9293b737d8c75aa694
c64070cc2e6b66310792a46c86f8e2641b93926af4be8971131c1fa698cfb8a3-019ee2ee88dd7b9293b737d8c75aa694
The recursive query reports the root object first, then the independently fetchable subfragments:
Address 3bfe886eb6c0754407c7e28c3e9829a34fc08f1a7991205c33b194477e04e576-019ee2ee88dd7b9293b737d8c75aa694
Payload: 920 bytes
Content: 2098320 bytes
Address 018b75c41d4942008ceafac6ebef55597943c17a53de46e11a4c21e019f4d573-019ee2ee88dd7b9293b737d8c75aa694 (subfragment)
Payload: 49473 bytes
Content: 49473 bytes
Address b47ad88c8f77487adb5c403efc3efe23d2bfa5e56b6c9f345ec6df5818ddbc62-019ee2ee88dd7b9293b737d8c75aa694 (subfragment)
Payload: 106333 bytes
Content: 106333 bytes
Fetch individual leaf fragments by address
out_dir = Path("fetched-chunks")
out_dir.mkdir(exist_ok=True)
for index, fragment in enumerate(leaf_fragments, start=1):
output = out_dir / f"{index:04d}-{fragment['address'][:16]}.bin"
subprocess.run(
[
lore,
"-P",
"file",
"write",
"--remote",
"--address",
fragment["address"],
"--output",
str(output),
],
check=True,
)
Fetch the root address when the goal is the full file, not one leaf chunk per request:
import hashlib
def sha256(path):
digest = hashlib.sha256()
with open(path, "rb") as handle:
for block in iter(lambda: handle.read(1024 * 1024), b""):
digest.update(block)
return digest.hexdigest()
subprocess.run(
[lore, "-P", "file", "write", "--remote", "--address", root, "--output", "restored.ogg"],
check=True,
)
print(sha256(file_path) == sha256("restored.ogg"))
For a metadata-only spread check, read content sizes from the recursive query output:
sizes = sorted(fragment["content_size"] for fragment in leaf_fragments)
def percentile(values, fraction):
return values[int((len(values) - 1) * fraction)]
print("count", len(sizes))
print("min", sizes[0])
print("p50", percentile(sizes, 0.50))
print("p95", percentile(sizes, 0.95))
print("max", sizes[-1])
buckets = {}
for size in sizes:
start_kib = (size // (32 * 1024)) * 32
label = f"{start_kib}-{start_kib + 32} KiB"
buckets[label] = buckets.get(label, 0) + 1
for label, count in buckets.items():
print(label, count)
Chunk tuning
It seems like there is just a default chunk size. Uses FastCDC with a 32KiB floor by default, 64KiB as expected size and 256KiB ceiling. (Note… these numbers look familiar, seems to be.)
Note lore commit does not expose a chunk-size flag. The
compiled VCS path uses FastCDC with a 32 KiB floor, 64 KiB expected
size, and 256 KiB ceiling.
Fragment count
| File | Size | Remote leaf fragments | Average fragment size |
|---|---|---|---|
music/armageddon.ogg |
2.0 MiB | 23 | 89 KiB |
| Sample MP3 | 36.2 MB / 34.5 MiB | 417 | 85 KiB |
Fragment spread
| File | Fragments | Min | P50 | P95 | Max | Mean |
|---|---|---|---|---|---|---|
music/armageddon.ogg |
23 | 40.6 KiB | 86.2 KiB | 123.0 KiB | 181.9 KiB | 89.1 KiB |
| Sample MP3 | 417 | 32.3 KiB | 77.0 KiB | 149.3 KiB | 230.8 KiB | 84.7 KiB |
| Combined | 440 | 32.3 KiB | 77.1 KiB | 149.3 KiB | 230.8 KiB | 85.0 KiB |
The music data landed mostly above the 64 KiB expected size, but well below the 256 KiB ceiling. Only one sampled fragment reached the 224-256 KiB bucket.
| Bucket | OGG count | MP3 count |
|---|---|---|
| 32-64 KiB | 5 | 87 |
| 64-96 KiB | 9 | 218 |
| 96-128 KiB | 7 | 66 |
| 128-160 KiB | 1 | 29 |
| 160-192 KiB | 1 | 10 |
| 192-224 KiB | 0 | 6 |
| 224-256 KiB | 0 | 1 |
Many chunk requests
This run fetched every leaf fragment for both files: 440 separate
lore file write --address requests. Each request launched
the Lore CLI, asked the server for one fragment, wrote a file, and
hashed the result.
| Payload | Requests | Bytes fetched | Summed fetch time | Rate | Latency |
|---|---|---|---|---|---|
| OGG fragments | 23 | 2.00 MiB | 2.633 s | 8.74 chunks/s, 0.76 MiB/s | p50 113.8 ms, p95 119.2 ms |
| MP3 fragments | 417 | 34.50 MiB | 83.606 s | 4.99 chunks/s, 0.41 MiB/s | p50 200.1 ms, p95 272.7 ms |
| Combined | 440 | 36.51 MiB | 86.239 s | 5.10 chunks/s, 0.42 MiB/s | p50 194.8 ms, p95 272.7 ms |
This is a request-count stress test, not a bandwidth test. The fragments are small, around 85 KiB on average, so per-request overhead dominates. This is more what I was interested in testing though if it was an object storage
Root-address fetch
The root address can also be fetched directly. In that mode, Lore reassembles the file from its fragment tree and writes the full payload.
| File | Fragments behind root | Bytes written | Time | Rate | Verification |
|---|---|---|---|---|---|
music/armageddon.ogg |
23 | 2.00 MiB | 0.315 s | 6.35 MiB/s | SHA-256 matched worktree file |
| Sample MP3 | 417 | 34.50 MiB | 0.541 s | 63.81 MiB/s | SHA-256 matched worktree file |
The root fetch is the better comparison for normal file retrieval. The 417-fragment MP3 was reconstructed by content address in just over half a second. It’s honestly still too small of a file for testing.
Object-server reading
Lore’s VCS behavior sits on top of a smaller storage idea: immutable bytes by address, plus mutable pointers for names such as branches.
| Layer | What it does here |
|---|---|
| Immutable store | Stores the root object and leaf fragments. Given an address, it can return bytes. |
| Mutable store | Records changing pointers such as branch latest. It is not where the music payloads live. |
| VCS layer | Uses those pieces for stage, commit, push, sync, clone, and checkout. |
That makes an object-server use case plausible: ingest bytes, get stable content addresses, dedupe identical chunks, verify returned bytes by hash, and fetch data later without caring about the original path.
The current performance shape is mixed. Root-address fetches were fast enough to look useful. One-request-per-leaf-fragment fetches were slow because each tiny request paid CLI and RPC overhead.
Some thoughts
- The root address acts like a content-addressed handle for the whole file.
- The root object can point at many leaf fragments.
- Leaf fragments are independently fetchable payload chunks.
- Once a client has an address, it can request bytes without knowing the original file path.
- Content addressing gives verification and dedupe, but it does not automatically make small random fetches cheap.
Useful distinction
repository store immutable query asks what exists and
how it is shaped. It reports the root object and subfragments.
file write --address asks Lore to materialize bytes for
an address. With a leaf fragment, it writes that chunk. With the root
address, it reassembles the full file.
As an object server?
Promising as an asset/object backend, not yet convincing as a general low-latency object store through the CLI path. The useful path is fetching roots or batched ranges of content, not launching hundreds of single-fragment requests.
Related: music server lab notes, object server potential, and Lore system design.