Skip to content

feat: add append_atlan_tags to Batch and save_merging_cm#866

Open
mothership-ai[bot] wants to merge 1 commit intomainfrom
feat/batch-append-atlan-tags
Open

feat: add append_atlan_tags to Batch and save_merging_cm#866
mothership-ai[bot] wants to merge 1 commit intomainfrom
feat/batch-append-atlan-tags

Conversation

@mothership-ai
Copy link

@mothership-ai mothership-ai bot commented Mar 18, 2026

Session owner: @Pawan-atlan

Summary

  • Adds append_atlan_tags parameter to Batch, AsyncBatch, save_merging_cm, and update_merging_cm — enabling incremental Atlan tag operations (add/update/remove) via the bulk API without replacing all existing tags
  • Previously, the Batch class only supported replace_atlan_tags (overwrite all or ignore all), making it unusable for high-volume tag sync workflows where specific tags need to be added/removed on 100K+ assets
  • The save() method already supported append_atlan_tags, but the Batch class and save_merging_cm/update_merging_cm methods did not pass it through

Context (REQ-717)

A customer running a PyAtlan-based tag sync workflow across ~100K Snowflake assets is experiencing:

  1. Performance bottleneck: Each add_atlan_tags()/remove_atlan_tag() call does an IndexSearch + individual save (~0.5-1s per asset)
  2. Stability issues: Occasional hangs when the backend is overloaded from sequential requests
  3. Unnecessary IndexSearch: _modify_tags() always fetches by qualifiedName even when the customer has the GUID

Recommended approach for the customer (with this fix)

from pyatlan.client.asset import Batch
from pyatlan.model.assets import Table
from pyatlan.model.structs import AtlanTag

batch = Batch(
    client=client,
    max_size=20,
    append_atlan_tags=True,  # NEW parameter
)

for guid, tags_to_add, tags_to_remove in changes:
    asset = Table.ref_by_guid(guid)
    if tags_to_add:
        asset.add_or_update_classifications = [
            AtlanTag(type_name=t) for t in tags_to_add
        ]
    if tags_to_remove:
        asset.remove_classifications = [
            AtlanTag(type_name=t) for t in tags_to_remove
        ]
    batch.add(asset)

batch.flush()

This reduces 100K+ individual API calls (2 per asset) to ~5K bulk API calls (batches of 20), eliminating the IndexSearch overhead entirely.

Test plan

  • Verified Batch, AsyncBatch, save_merging_cm, update_merging_cm all accept append_atlan_tags parameter
  • Verified flush() passes append_atlan_tags through to save() and save_merging_cm()
  • Ruff lint passes on all changed files
  • Integration tests should verify tags are appended (not replaced) when append_atlan_tags=True

🤖 Generated with Claude Code

…g_cm, and update_merging_cm

The Batch and AsyncBatch classes only supported replace_atlan_tags (True/False),
meaning Atlan tags were either fully overwritten or completely ignored during
bulk operations. This made it impossible to use the Batch class for incremental
tag updates (add/update/remove specific tags without affecting others).

This change adds an append_atlan_tags parameter to:
- Batch.__init__() and its flush() method
- AsyncBatch.__init__() and its flush() method
- AssetClient.save_merging_cm() and update_merging_cm()
- AsyncAssetClient.save_merging_cm() and update_merging_cm()

When append_atlan_tags=True, the bulk API is called with appendTags=true,
enabling incremental tag operations via add_or_update_classifications and
remove_classifications fields on assets. This is critical for high-volume
tag sync workflows (100K+ assets) where customers need to add/remove
specific tags without fetching existing state first.

Resolves: REQ-717

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant