Problem
When running multiple Firecracker VMs from the same base rootfs, three operations carry significant overhead:
- Cloning — creating a new VM requires copying the entire root disk image. For a 400MB rootfs, that's ~200ms per clone, scaling linearly with VM count.
- Resetting — returning a VM's disk to a clean state requires the same full disk copy.
- Snapshotting disk state — Firecracker snapshots capture VM state and memory but not disk changes. Operators must manage disk state externally, typically by copying the full disk image.
Proposal
Add an Overlay variant to the FileEngine enum that implements block-level copy-on-write inside the VMM:
- A shared read-only base image serves reads for unmodified blocks
- A per-VM sparse overlay file captures writes
- A dirty bitmap (one bit per 4KB block) tracks which blocks have been written and routes reads to the correct source
- A delta file format with CRC64 integrity captures only dirty blocks for efficient snapshot persistence
This sits at the same layer as the existing Sync and Async file engines — transparent to the guest, no changes to the virtio protocol, no guest-side components.
API
{
"drive_id": "rootfs",
"path_on_host": "/path/to/overlay.ext4",
"base_path": "/path/to/base-rootfs.ext4",
"is_root_device": true,
"io_engine": "Overlay"
}
On snapshot create, an optional block_delta_dir captures only dirty blocks:
{
"snapshot_path": "/path/to/snap.bin",
"mem_file_path": "/path/to/snap.mem",
"block_delta_dir": "/path/to/deltas/"
}
On restore, the same block_delta_dir applies the delta to a fresh overlay — enabling cloning from a snapshot without copying the full disk.
Benchmark Results
Bare metal (AMD Ryzen 9 7950X3D, NVMe, 128GB RAM). Guest: 2 vCPUs, 256MB RAM, 396MB rootfs.
Clone
| Metric |
Sync (current) |
Overlay |
Improvement |
| Clone disk cost |
205ms (full 396MB copy) |
0ms (560KB delta) |
no disk copy needed |
| Total clone cost |
430ms |
223ms |
~2x faster |
| Clone data size |
396MB |
560KB |
700x smaller |
Reset
| Method |
Time |
| Full disk copy (current) |
190ms |
| Overlay truncate + bitmap clear |
1-2ms |
| Speedup |
~100x |
Snapshot
Snapshot time is identical (~225ms) because it is dominated by the guest memory dump. The overlay advantage is that no separate disk copy is needed for cloning — the delta file (dirty blocks only) is produced as part of the snapshot.
Implementation
I have a working implementation on my fork.
Happy to open a PR upstream if there is interest.
Problem
When running multiple Firecracker VMs from the same base rootfs, three operations carry significant overhead:
Proposal
Add an
Overlayvariant to theFileEngineenum that implements block-level copy-on-write inside the VMM:This sits at the same layer as the existing
SyncandAsyncfile engines — transparent to the guest, no changes to the virtio protocol, no guest-side components.API
{ "drive_id": "rootfs", "path_on_host": "/path/to/overlay.ext4", "base_path": "/path/to/base-rootfs.ext4", "is_root_device": true, "io_engine": "Overlay" }On snapshot create, an optional
block_delta_dircaptures only dirty blocks:{ "snapshot_path": "/path/to/snap.bin", "mem_file_path": "/path/to/snap.mem", "block_delta_dir": "/path/to/deltas/" }On restore, the same
block_delta_dirapplies the delta to a fresh overlay — enabling cloning from a snapshot without copying the full disk.Benchmark Results
Bare metal (AMD Ryzen 9 7950X3D, NVMe, 128GB RAM). Guest: 2 vCPUs, 256MB RAM, 396MB rootfs.
Clone
Reset
Snapshot
Snapshot time is identical (~225ms) because it is dominated by the guest memory dump. The overlay advantage is that no separate disk copy is needed for cloning — the delta file (dirty blocks only) is produced as part of the snapshot.
Implementation
I have a working implementation on my fork.
Happy to open a PR upstream if there is interest.