Single-file JSONL, append-only, no rotation. Corruption blast radius grows with file size. One partial write requires hand-repairing the tail. For the 40K+ message fine-tune corpus target, add size-based rotation into shards (e.g. 100MB each) and a manifest.json tracking shard count, record count, schema version range. kanon training lifecycle already implements this pattern.
Single-file JSONL, append-only, no rotation. Corruption blast radius grows with file size. One partial write requires hand-repairing the tail. For the 40K+ message fine-tune corpus target, add size-based rotation into shards (e.g. 100MB each) and a manifest.json tracking shard count, record count, schema version range. kanon training lifecycle already implements this pattern.