Skip to content

feat: hide sensitive columns in PolarsCompare#502

Open
sh13m wants to merge 21 commits intocapitalone:developfrom
sh13m:polars-hide-col
Open

feat: hide sensitive columns in PolarsCompare#502
sh13m wants to merge 21 commits intocapitalone:developfrom
sh13m:polars-hide-col

Conversation

@sh13m
Copy link
Copy Markdown

@sh13m sh13m commented Mar 13, 2026

Implements sensitive_columns in a similar fashion to #494 but done in polars instead.

  • Slight difference in intersect_rows. In pandas, the merge does not suffix the dataframe name for columns that are only in one of the dataframes, while the way merging is implemented in PolarsCompare always adds the suffix.
  • Refactor: moved _set_and_validate_sensitive_columns, the sensitive_columns property, and reveal_sensitive_columns into BaseCompare since the implementations are identical and should remain so for the other compares.

@sh13m

This comment was marked as outdated.

@sh13m sh13m marked this pull request as ready for review March 27, 2026 13:57
Copy link
Copy Markdown
Member

@fdosani fdosani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR, nice work porting this over to Polars!
Left a few minor comments

Comment thread datacompy/polars.py Outdated
"""Get the list of sensitive columns."""
return self._sensitive_columns

def _set_and_validate_sensitive_columns(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_set_and_validate_sensitive_columns, the sensitive_columns property, and reveal_sensitive_columns are identical to the PandasCompare implementations. Worth considering pulling these into BaseCompare in a follow-up, leaving only hide_sensitive_columns in each subclass since that's the only method with backend-specific logic.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think that's a good idea. Planning to add column hiding to the other types of compares soon, so I can move the methods to base sometime then (assuming nothing weird shows up which would require a different implementation, probably won't).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you want to make that shift now so we can review discuss it here?

Copy link
Copy Markdown
Author

@sh13m sh13m Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fdosani changes made in latest commits, let me know what you think.

Comment thread tests/test_polars.py Outdated
Comment thread tests/test_polars.py Outdated
Comment thread datacompy/polars.py
Copy link
Copy Markdown
Member

@fdosani fdosani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add a test to check that SparkSQLCompare and SnowflakeCompare don't crash when accessing .sensitive_columns after construction.

Aside from that just the comment on adding _sensitive_columns as a class level attribute especially for the init of it.

Comment thread datacompy/base.py

class BaseCompare(ABC):
"""Base comparison class."""

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might need this here or SparkSQLCompare, SnowflakeCompare will crash if someone tries to access: sensitive_columns

Suggested change
_sensitive_columns: List[str] | None = None

Copy link
Copy Markdown
Author

@sh13m sh13m Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Planning to port over sensitive_columns to both soon, correct me if I'm wrong but I assume at that point we wouldn't need this here? I can add this + the tests in for now and remove them later in their respective PRs if we want.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay! Been swamped! Sure just add them in for now so things don't crash.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha no worries, sure I'll add them in.

Comment thread datacompy/polars.py Outdated
Comment thread datacompy/pandas.py Outdated
@sh13m sh13m closed this Apr 11, 2026
@sh13m sh13m reopened this Apr 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants