feat: hide sensitive columns in PolarsCompare#502
feat: hide sensitive columns in PolarsCompare#502sh13m wants to merge 21 commits intocapitalone:developfrom
Conversation
This comment was marked as outdated.
This comment was marked as outdated.
fdosani
left a comment
There was a problem hiding this comment.
Thanks for the PR, nice work porting this over to Polars!
Left a few minor comments
| """Get the list of sensitive columns.""" | ||
| return self._sensitive_columns | ||
|
|
||
| def _set_and_validate_sensitive_columns( |
There was a problem hiding this comment.
_set_and_validate_sensitive_columns, the sensitive_columns property, and reveal_sensitive_columns are identical to the PandasCompare implementations. Worth considering pulling these into BaseCompare in a follow-up, leaving only hide_sensitive_columns in each subclass since that's the only method with backend-specific logic.
There was a problem hiding this comment.
Yeah I think that's a good idea. Planning to add column hiding to the other types of compares soon, so I can move the methods to base sometime then (assuming nothing weird shows up which would require a different implementation, probably won't).
There was a problem hiding this comment.
Did you want to make that shift now so we can review discuss it here?
There was a problem hiding this comment.
@fdosani changes made in latest commits, let me know what you think.
Co-authored-by: Faisal <faisal.dosani@pm.me>
…e_columns, and reveal_sensitive_columns into BaseCompare
fdosani
left a comment
There was a problem hiding this comment.
We should add a test to check that SparkSQLCompare and SnowflakeCompare don't crash when accessing .sensitive_columns after construction.
Aside from that just the comment on adding _sensitive_columns as a class level attribute especially for the init of it.
|
|
||
| class BaseCompare(ABC): | ||
| """Base comparison class.""" | ||
|
|
There was a problem hiding this comment.
Might need this here or SparkSQLCompare, SnowflakeCompare will crash if someone tries to access: sensitive_columns
| _sensitive_columns: List[str] | None = None |
There was a problem hiding this comment.
Planning to port over sensitive_columns to both soon, correct me if I'm wrong but I assume at that point we wouldn't need this here? I can add this + the tests in for now and remove them later in their respective PRs if we want.
There was a problem hiding this comment.
Sorry for the delay! Been swamped! Sure just add them in for now so things don't crash.
There was a problem hiding this comment.
Haha no worries, sure I'll add them in.
…lake to prevent crashing for now
Implements
sensitive_columnsin a similar fashion to #494 but done in polars instead.intersect_rows. In pandas, the merge does not suffix the dataframe name for columns that are only in one of the dataframes, while the way merging is implemented in PolarsCompare always adds the suffix._set_and_validate_sensitive_columns, thesensitive_columnsproperty, andreveal_sensitive_columnsintoBaseComparesince the implementations are identical and should remain so for the other compares.