Skip to content

feat: add transaction-level table_metadata_updates#6255

Open
fecet wants to merge 3 commits intolance-format:mainfrom
fecet:feat/transaction-table-metadata-updates
Open

feat: add transaction-level table_metadata_updates#6255
fecet wants to merge 3 commits intolance-format:mainfrom
fecet:feat/transaction-table-metadata-updates

Conversation

@fecet
Copy link
Contributor

@fecet fecet commented Mar 23, 2026

Summary

  • Promote table_metadata_updates from an UpdateConfig-only field to a Transaction-level attribute, allowing any operation (e.g. Append) to atomically carry metadata updates merged into the manifest's table_metadata
  • Add UpdateMap::keys_overlap() and wire it into the conflict resolver to detect concurrent transactions writing the same metadata keys
  • Fix a pre-existing gap where modifies_same_metadata() did not check table_metadata_updates overlap for UpdateConfig vs UpdateConfig conflicts (introduced in feat: add table metadata, support incremental updates of all metadata #4350)

Motivation

transaction_properties are stored in separate .txn files, requiring O(N) reads to scan on recovery. table_metadata lives in the manifest and is loaded at dataset open time (O(1)). By allowing metadata updates alongside any operation in a single commit, consumers like persistent message queues can embed cursors atomically with data appends — no separate UpdateConfig round-trip, no scan on recovery.

Design

Transaction
├── read_version
├── uuid
├── tag
├── transaction_properties  // in .txn file
├── table_metadata_updates  // NEW — merged into manifest
└── oneof operation { ... }

In build_manifest(), transaction-level table_metadata_updates are applied after operation-level handling (UpdateConfig), so if both are present on the same key, the transaction-level value wins.

Conflict detection: two concurrent transactions with overlapping table_metadata_updates keys are rejected as incompatible. Disjoint keys are commutative and always compatible.

Test plan

  • cargo check -p lance -p pylance compiles cleanly
  • test_table_metadata_key_conflict::disjoint_keys — two Appends with different metadata keys pass conflict check
  • test_table_metadata_key_conflict::overlapping_keys — two Appends with same metadata key rejected as IncompatibleTransaction

@github-actions github-actions bot added enhancement New feature or request python labels Mar 23, 2026
@github-actions
Copy link
Contributor

PR Review: feat: add transaction-level table_metadata_updates

Clean design overall. Two conflict-detection gaps that can cause silent metadata loss:

P0: replace: true bypasses conflict detection

keys_overlap() only checks key intersection, but when replace: true the apply_update_map function clears the entire target map first. Two concurrent transactions with replace: true and disjoint keys will both pass the overlap check, yet the second commit will wipe out the first's metadata entirely. The conflict check needs to treat replace: true as conflicting with any other table_metadata_updates (or at minimum with any other replace: true).

P1: No cross-check between transaction-level and operation-level metadata updates

The conflict resolver checks transaction-level table_metadata_updates against each other, and modifies_same_metadata() checks UpdateConfig.table_metadata_updates against each other — but there is no check between the two sources. Consider:

  • Transaction A: Operation::Append + transaction-level table_metadata_updates writing key "foo"
  • Transaction B: Operation::UpdateConfig with table_metadata_updates writing key "foo"

These will pass conflict resolution (Append vs UpdateConfig has no metadata cross-check), but both write to the same key in manifest.table_metadata. The second to commit silently overwrites the first.

Minor

  • Tests only cover replace: false — add a case for replace: true to validate whichever fix is chosen for P0.
  • keys_overlap uses HashSet<&str> — fine for small maps, but per CLAUDE.md, consider whether RoaringBitmap or similar is warranted if metadata maps could grow large. Likely fine for now given expected cardinality.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3d708ec01c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +114 to +123
pub fn keys_overlap(&self, other: &Self) -> bool {
let self_keys: std::collections::HashSet<&str> = self
.update_entries
.iter()
.map(|e| e.key.as_str())
.collect();
other
.update_entries
.iter()
.any(|e| self_keys.contains(e.key.as_str()))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Treat replace=true metadata updates as conflicting

If either UpdateMap uses replace=true, apply_update_map clears the entire table metadata before inserting the listed entries, so a concurrent write to any other key is order-dependent. With the current keys_overlap() implementation, replace={"a": ...} and an incremental update to "b" are treated as compatible because the explicit key sets are disjoint, but whichever transaction rebases last will silently discard the other's metadata. This affects both transaction-level metadata updates and UpdateConfig table-metadata conflicts.

Useful? React with 👍 / 👎.

Comment on lines +227 to +233
// Check transaction-level table_metadata_updates key overlap
if let (Some(self_updates), Some(other_updates)) = (
&self.transaction.table_metadata_updates,
&other_transaction.table_metadata_updates,
) {
if self_updates.keys_overlap(other_updates) {
return Err(self.incompatible_conflict_err(other_transaction, other_version));

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Compare transaction-level metadata keys with UpdateConfig

This new conflict check only compares Transaction.table_metadata_updates against the same field on the other transaction. A concurrent UpdateConfig { table_metadata_updates: ... } still stores its table-metadata write inside operation, and check_append_txn / check_update_config_txn currently allow AppendUpdateConfig, so two commits that update the same metadata key can both succeed and the later rebase simply wins. That leaves the new append-with-metadata path unprotected against the existing UpdateConfig path.

Useful? React with 👍 / 👎.

Comment on lines 448 to +450
//TODO: handle batch transaction merges in the future
transaction_properties: None,
table_metadata_updates: None,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve table_metadata_updates in batch append commits

After this change, append transactions can legally carry table_metadata_updates, but execute_batch() always synthesizes the merged append transaction with table_metadata_updates: None. A commit_batch() caller that batches append+cursor/checkpoint transactions will get all of the data fragments committed while every metadata mutation is silently dropped, which breaks the new atomic append+metadata use case.

Useful? React with 👍 / 👎.

fecet added 2 commits March 23, 2026 18:52
Promote table_metadata_updates from UpdateConfig-only to a
Transaction-level field. Any operation can now atomically carry
metadata updates that are merged into the manifest's table_metadata
during build_manifest(), after operation-level handling.

This enables use cases like embedding a cursor alongside an Append
in a single atomic commit, avoiding a separate UpdateConfig round-trip.
Add UpdateMap::conflicts_with() — treats replace:true as conflicting
with any concurrent metadata update, otherwise checks key overlap.

Wire it into:
- check_txn(): transaction-level vs transaction-level
- modifies_same_metadata(): UpdateConfig vs UpdateConfig (pre-existing
  gap since lance-format#4350)
@fecet fecet force-pushed the feat/transaction-table-metadata-updates branch from 3d708ec to 5e3517e Compare March 23, 2026 10:56
@codecov
Copy link

codecov bot commented Mar 23, 2026

Codecov Report

❌ Patch coverage is 72.09302% with 12 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
rust/lance/src/dataset/transaction.rs 63.63% 9 Missing and 3 partials ⚠️

📢 Thoughts on this report? Let us know!

@fecet
Copy link
Contributor Author

fecet commented Mar 24, 2026

Hi maintainers, friendly ping — PTAL when you get a chance 🙏 @Xuanwo @wjones127

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request python

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant