feat: add transaction-level table_metadata_updates#6255
feat: add transaction-level table_metadata_updates#6255fecet wants to merge 3 commits intolance-format:mainfrom
Conversation
PR Review: feat: add transaction-level table_metadata_updatesClean design overall. Two conflict-detection gaps that can cause silent metadata loss: P0:
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 3d708ec01c
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| pub fn keys_overlap(&self, other: &Self) -> bool { | ||
| let self_keys: std::collections::HashSet<&str> = self | ||
| .update_entries | ||
| .iter() | ||
| .map(|e| e.key.as_str()) | ||
| .collect(); | ||
| other | ||
| .update_entries | ||
| .iter() | ||
| .any(|e| self_keys.contains(e.key.as_str())) |
There was a problem hiding this comment.
Treat replace=true metadata updates as conflicting
If either UpdateMap uses replace=true, apply_update_map clears the entire table metadata before inserting the listed entries, so a concurrent write to any other key is order-dependent. With the current keys_overlap() implementation, replace={"a": ...} and an incremental update to "b" are treated as compatible because the explicit key sets are disjoint, but whichever transaction rebases last will silently discard the other's metadata. This affects both transaction-level metadata updates and UpdateConfig table-metadata conflicts.
Useful? React with 👍 / 👎.
| // Check transaction-level table_metadata_updates key overlap | ||
| if let (Some(self_updates), Some(other_updates)) = ( | ||
| &self.transaction.table_metadata_updates, | ||
| &other_transaction.table_metadata_updates, | ||
| ) { | ||
| if self_updates.keys_overlap(other_updates) { | ||
| return Err(self.incompatible_conflict_err(other_transaction, other_version)); |
There was a problem hiding this comment.
Compare transaction-level metadata keys with UpdateConfig
This new conflict check only compares Transaction.table_metadata_updates against the same field on the other transaction. A concurrent UpdateConfig { table_metadata_updates: ... } still stores its table-metadata write inside operation, and check_append_txn / check_update_config_txn currently allow Append↔UpdateConfig, so two commits that update the same metadata key can both succeed and the later rebase simply wins. That leaves the new append-with-metadata path unprotected against the existing UpdateConfig path.
Useful? React with 👍 / 👎.
| //TODO: handle batch transaction merges in the future | ||
| transaction_properties: None, | ||
| table_metadata_updates: None, |
There was a problem hiding this comment.
Preserve table_metadata_updates in batch append commits
After this change, append transactions can legally carry table_metadata_updates, but execute_batch() always synthesizes the merged append transaction with table_metadata_updates: None. A commit_batch() caller that batches append+cursor/checkpoint transactions will get all of the data fragments committed while every metadata mutation is silently dropped, which breaks the new atomic append+metadata use case.
Useful? React with 👍 / 👎.
Promote table_metadata_updates from UpdateConfig-only to a Transaction-level field. Any operation can now atomically carry metadata updates that are merged into the manifest's table_metadata during build_manifest(), after operation-level handling. This enables use cases like embedding a cursor alongside an Append in a single atomic commit, avoiding a separate UpdateConfig round-trip.
Add UpdateMap::conflicts_with() — treats replace:true as conflicting with any concurrent metadata update, otherwise checks key overlap. Wire it into: - check_txn(): transaction-level vs transaction-level - modifies_same_metadata(): UpdateConfig vs UpdateConfig (pre-existing gap since lance-format#4350)
3d708ec to
5e3517e
Compare
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
|
Hi maintainers, friendly ping — PTAL when you get a chance 🙏 @Xuanwo @wjones127 |
Summary
table_metadata_updatesfrom anUpdateConfig-only field to aTransaction-level attribute, allowing any operation (e.g. Append) to atomically carry metadata updates merged into the manifest'stable_metadataUpdateMap::keys_overlap()and wire it into the conflict resolver to detect concurrent transactions writing the same metadata keysmodifies_same_metadata()did not checktable_metadata_updatesoverlap for UpdateConfig vs UpdateConfig conflicts (introduced in feat: add table metadata, support incremental updates of all metadata #4350)Motivation
transaction_propertiesare stored in separate.txnfiles, requiring O(N) reads to scan on recovery.table_metadatalives in the manifest and is loaded at dataset open time (O(1)). By allowing metadata updates alongside any operation in a single commit, consumers like persistent message queues can embed cursors atomically with data appends — no separate UpdateConfig round-trip, no scan on recovery.Design
In
build_manifest(), transaction-leveltable_metadata_updatesare applied after operation-level handling (UpdateConfig), so if both are present on the same key, the transaction-level value wins.Conflict detection: two concurrent transactions with overlapping
table_metadata_updateskeys are rejected as incompatible. Disjoint keys are commutative and always compatible.Test plan
cargo check -p lance -p pylancecompiles cleanlytest_table_metadata_key_conflict::disjoint_keys— two Appends with different metadata keys pass conflict checktest_table_metadata_key_conflict::overlapping_keys— two Appends with same metadata key rejected as IncompatibleTransaction