[WIP] feat(narwhals): implement the new unifying backend#2223
[WIP] feat(narwhals): implement the new unifying backend#2223deepyaman wants to merge 20 commits intounionai-oss:mainfrom
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #2223 +/- ##
==========================================
+ Coverage 82.74% 83.33% +0.58%
==========================================
Files 180 189 +9
Lines 15089 15984 +895
==========================================
+ Hits 12486 13320 +834
- Misses 2603 2664 +61 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
hey @deepyaman let me know if you need any help here |
Hey @cosmicBboy, updated with a rough description of where things are at/what's necessary before moving forward with this. |
There was a problem hiding this comment.
hey @deepyaman how are you thinking about the .planning directory? should those be checked in for posterity (and future agent reference)?
I also made a dev/narwhals branch that we can use to merge this PR... can you do an interactive rebase to squash some of the commits into smaller chunks (so as to not lose granularity of commits)?
I don't intend to check them in, but I thought it could be useful to share for now. From my list of pre-merge TODOs above:
I don't know what you think, but I don't think it makes sense to leave project-level artifacts just for agents, even if want to support them? TBH I'm not super familiar on this.
Sure, will look into doing this. |
deepyaman
left a comment
There was a problem hiding this comment.
There seem to be a number of potential issues, most of which fall into two categories: executing too eagerly and backend-specific logic.
| # Import is guarded so ibis remains an optional dependency. | ||
| try: | ||
| import ibis as _ibis | ||
| if isinstance(failure_cases, _ibis.Table): |
There was a problem hiding this comment.
| if isinstance(failure_cases, _ibis.Table): | |
| import ibis | |
| if isinstance(failure_cases, ibis.Table): |
Why alias, why not just import ibis?
| if isinstance(failure_cases, str): # Avoid returning str length | ||
| return 1 | ||
|
|
||
| # ibis.Table raises ExpressionError for len(); use .count().execute() instead. |
There was a problem hiding this comment.
It seems wrong to add an Ibis-specific branch to the base ErrorHandler; isinstance(failure_cases, ibis.table) made a lot more sense in the Ibis ErrorHandler, and this doesn't seem like the right way to handle it for Narwhals. Maybe the Narwhals backend needs it's own ErrorHandler, and that can dispatch based on type—or, much better, just use the Narwahls way of counting, not sure why this wouldn't work...
| Auto-detects narwhals: if narwhals is installed, registers narwhals backends | ||
| (NarwhalsCheckBackend, narwhals ColumnBackend, narwhals DataFrameSchemaBackend) | ||
| and emits a UserWarning. If narwhals is not installed, registers the native |
There was a problem hiding this comment.
Nit: Don't know why we're not capitalizing Narwhals.
| Auto-detects narwhals: if narwhals is installed, registers narwhals backends | |
| (NarwhalsCheckBackend, narwhals ColumnBackend, narwhals DataFrameSchemaBackend) | |
| and emits a UserWarning. If narwhals is not installed, registers the native | |
| Auto-detects Narwhals: if Narwhals is installed, registers Narwhals backends | |
| (NarwhalsCheckBackend, Narwhals ColumnBackend, Narwhals DataFrameSchemaBackend) | |
| and emits a UserWarning. If Narwhals is not installed, registers the native |
There was a problem hiding this comment.
Was this resolved? I thought we were resolving it, but I'm not sure I see the changes.
| import narwhals.stable.v1 as nw | ||
| import polars as pl | ||
|
|
||
| from pandera.api.base.error_handler import ErrorHandler |
There was a problem hiding this comment.
Again, seems like we should be importing the Narwhals ErrorHandler here, rather than modifying the base one.
| ) | ||
|
|
||
| @staticmethod | ||
| def _materialize(frame) -> nw.DataFrame: |
There was a problem hiding this comment.
This is eagerly executing? I think executing early for the check output is not ideal, but still reasonable. However, this is also getting called elsewhere above. Furthermore, the conditional logic seems overly complicated—not sure why this is needed.
| if issubclass(return_type, pl.DataFrame): | ||
| return native.collect() | ||
| return native |
There was a problem hiding this comment.
What is this for? Type-specific .collect() call seems like a red flag. Once again, Ibis and Polars are following different paths, and that can't be right.
| components = self.collect_schema_components( | ||
| check_lf, schema, column_info | ||
| ) | ||
| check_obj_parsed = _to_frame_kind_nw(check_lf, return_type) |
There was a problem hiding this comment.
Why is the object potentially getting collected here? It seems like, if the user passes a pl.DataFrame, we .collect()—for what reason?
|
|
||
| check_results = [] | ||
| check_passed = [] | ||
| # Convert to native pl.LazyFrame for column component dispatch. |
There was a problem hiding this comment.
Not necessarily pl.LazyFrame, right? What if it's an Ibis table?
| ): | ||
| """Collects all schema components to use for validation.""" | ||
|
|
||
| from pandera.api.polars.components import Column |
There was a problem hiding this comment.
Why is something from the Polars backend being used here? This makes no sense.
| column_info: Any, | ||
| ) -> list[CoreCheckResult]: | ||
| """Check that all columns in the schema are present in the dataframe.""" | ||
| from pandera.api.narwhals.utils import _to_native |
There was a problem hiding this comment.
Why is to_native necessary here? Why isn't this being handled in a backend-agnostic way?
2a02d7e to
44967b2
Compare
deepyaman
left a comment
There was a problem hiding this comment.
I've taken another pass at reviewing this. There are a few major themes in the feedback, alongside the other miscellaneous comments:
- Let's unify the strategy to handle native-type-dependent logic in the final stages (e.g. failure case reporting). Right now, there are a lot of complex
if/elif/elselogic blocks, based onisinstance()andhasattr()checks to determine the native type and what to do in each situation. Maybe these can be fixed by some combination of utility functions, constants (sort of like https://github.com/narwhals-dev/narwhals/blob/228fee8d83d92d06e6cb32646d0e131acf0c1e2e/narwhals/_typing.py#L28), and dispatching. Again, the native-type-dependent logic should not expand—it must stay just at the final steps of returning results back to the user—but we can have a better stragey around how we do it. - There are a number of cases where we don't use top-level imports, where either we clearly should (standard library imports), or it seems like it may be a red flag if it must be an inner import.
- We need a cohesive testing strategy, and, in order to get a mergeable PR, we need a CI pipeline that works for existing backends, without using the Narwhals backend, as well as testing Ibis, Polars LazyFrame, Polars DataFrame (I think we need to test Polars DataFrame, right?) using the Narwhals backend. What this probably means is, we need to see how much of the existing Ibis backend tests work when Narwhals is installed now (and investigate failures, potentially xfail things that aren't super critical), how much of the Polars backend tests work when Narwhals is installed now (and investigate failures, potentially xfail things that aren't super critical), as well as add the "future" state of how we want to test in a unified way (the Narwhals backends tests should parametrize across the types we support, and we should make sure the tests pass, and this "future" state will help us towards having more-or-less a single, more maintainable test suite at some point in the future. Happy to discuss what this looks like.
| return cls( | ||
| cls.get_builtin_check_fn(name), | ||
| statistics=statistics, | ||
| native=False, |
There was a problem hiding this comment.
How will this affect existing backend implementations? We should take care not to break them.
It's seems like it may be fine, because we can assume all existing checks were native (since Narwhals wasn't introduced), but let's confirm things work when Narwhals isn't installed/we're using the existing check backends.
| :param native: If True (default), the check function receives the raw | ||
| native frame and the column key as positional args: | ||
| ``check_fn(native_frame, key)``. If False, the check function | ||
| receives a narwhals expression ``nw.col(key)`` (a ``nw.Expr``) | ||
| as its sole argument: ``check_fn(nw.col(key))``. Builtin checks | ||
| use ``native=False``. |
There was a problem hiding this comment.
"Native" only has a meaning for Narwhals. At the very least, it seems like the docstring should be clear that this only applies when using the Narwhals backend; otherwise, all the checks are native, since the concept of non-native (i.e. Narwhals) checks doesn't exist.
| Auto-detects narwhals: if narwhals is installed, registers narwhals backends | ||
| (NarwhalsCheckBackend, narwhals ColumnBackend, narwhals DataFrameSchemaBackend) | ||
| and emits a UserWarning. If narwhals is not installed, registers the native |
There was a problem hiding this comment.
Was this resolved? I thought we were resolving it, but I'm not sure I see the changes.
| return True | ||
| if isinstance(fc, nw.DataFrame): | ||
| native = nw.to_native(fc) | ||
| return hasattr(native, "execute") # ibis.Table has .execute(); polars DataFrame does not |
There was a problem hiding this comment.
This seems very finicky. Is there a better way to do this, or reference something in Narwhals? I see something like https://github.com/narwhals-dev/narwhals/blob/228fee8d83d92d06e6cb32646d0e131acf0c1e2e/narwhals/_typing.py#L29, but that's not quite what we're looking for...
Or, should we just hardcode for the types we support? it looks like the if isinstance(fc, nw.LazyFrame): return True is clear enough, but if it's a nw.DataFrame, maybe we actually just check if isinstance(ibis.Table): return True, etc. The Ibis import would need to be in a guard though, so as not to break for Polars or other backends.
| # Guard: SQL-lazy backends don't support tail without full ordering | ||
| if tail is not None: | ||
| native = nw.to_native(check_obj) | ||
| if hasattr(native, "execute"): # ibis.Table has .execute(); pl.LazyFrame does not |
There was a problem hiding this comment.
Can this use something like https://github.com/narwhals-dev/narwhals/blob/228fee8d83d92d06e6cb32646d0e131acf0c1e2e/narwhals/_typing.py#L28? Again, it seems very finicky to be going off of looking for the "execute" attribute, since that may just apply to Ibis backend.
| from pandera.api.narwhals.types import NarwhalsData | ||
| from pandera.api.narwhals.utils import _to_native |
There was a problem hiding this comment.
Again, can these not be top-level imports?
| key = data_container.key | ||
| _key = "" if key == "*" else f"'{key}' in" | ||
| # Produce native failure_cases: collect original frame as native | ||
| failure_cases = _to_native(data_container.frame.collect()) |
There was a problem hiding this comment.
Collect original frame as native? What if it's a big dataset, why are we trying to collect the whole thing?
|
|
||
| try: | ||
| lf = self.coerce(data_container) | ||
| lf.collect() |
There was a problem hiding this comment.
Why are we collecting everything? What if it's a big dataset (e.g. 100 billion rows)?
| try: | ||
| failure_cases = failure_cases.select(key) | ||
| except Exception: | ||
| pass |
There was a problem hiding this comment.
What do we except Exception: pass? Why would we get an error, and why would we swallow it if we do?
There was a problem hiding this comment.
We need to rethink the test strategy, as well as how it's going to work in CI. Right now, there are a lot of tests that we've added to catch regressions, certain edge cases, etc., but there's no cohesive test strategy, and we need to change that in order to have an acceptable PR.
The goal isn't to throw away test coverage, but rather to organize and test in a structured, understandable manner. We can talk more about how to achieve this.
21675d0 to
457fc61
Compare
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu>
Adds NarwhalsSchemaBackend, ColumnBackend, and DataFrameSchemaBackend with lazy-first materialization and drop_invalid_rows support via nw.Expr accumulation. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu>
- Remove `_is_lazy_or_sql` from base.py; add shared `_is_lazy` to api/narwhals/utils.py - Add `CheckData` protocol to api/base/types.py; add `.frame` property to PolarsData/IbisData - Add `DataFrameSchema.infer_columns()` so narwhals/container.py no longer reaches into pandera.api.polars.components (TYPE_CHECKING guard replaces the runtime import) - Unify builtin comparison checks: base/builtin_checks.py now implements them via narwhals; polars/builtin_checks.py is stripped of duplicated implementations - `_normalize_native_output` handles pl.Series/pl.DataFrame from native=True checks; bool scalar fallback uses pyarrow instead of polars - Hoist all inner imports (re, importlib, functools) to module level across narwhals backends; guard remaining polars imports with try/except for ibis-only environments - Capitalize "Narwhals" in all docstrings/comments (was: "narwhals"); clarify `native` param scope in Check docstring - Add CLEAN-01/CLEAN-02/infer_columns architecture regression tests to test_phase01_arch.py Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu>
- Expand `make_narwhals_frame` fixture from 2-way (polars, ibis) to 3-way (polars_eager, polars_lazy, ibis_table); test_checks.py now runs 108 tests across all three native frame types (TEST-02) - Add session-scoped autouse `_ensure_polars_backend_registered` / `_ensure_ibis_backend_registered` fixtures to tests/polars/conftest.py and new tests/ibis/conftest.py; prevents narwhals backend from shadowing library-native backends when both are installed (TEST-01) - Annotate intentionally type-specific tests in test_e2e.py and test_lazy_regressions.py with TEST-02 comments - Add `unit-tests-narwhals` GitHub Actions job: runs tests/backends/narwhals/ with narwhals + polars + ibis co-installed on Python 3.11 and 3.12 (TEST-03) - Wire narwhals nox session: adds polars+ibis to its install set and routes test_dir to backends/narwhals/ Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu>
Mark tests in tests/polars/ and tests/ibis/ that fail when the narwhals backend is installed. Uses pytest.mark.xfail(condition=narwhals_installed, strict=True) for known definite failures, and imperative pytest.xfail() calls for class-based tests and selective parametrize cases. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu>
- Add unit-tests-narwhals-backend CI job running tests/polars/ and tests/ibis/ with narwhals co-installed across Python 3.10–3.13 × polars/ibis - Add tests_narwhals_backend nox session (TEST-03) - Expand unit-tests-narwhals Python matrix to 3.10–3.13 - Remove backend isolation fixtures from polars/ibis conftests (TEST-01 arch fix) - Trim TEST-01/TEST-03 comment noise in arch test and CI yml Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu>
41e1657 to
e4d0973
Compare
Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu>
- Add type: ignore[call-arg] for native= kwarg in BaseCheck.from_builtin_check_name - Assert isinstance(pl.DataFrame) after pl.from_arrow() to narrow union type - Declare ColumnBackend: type | None before try/except guard in test - Add type: ignore suppression for try/except re-imports in polars/ibis register Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu>
1.26.0 introduced `narwhals.exceptions.ComputeError`. While Ibis support is really only function since 1.40.0, this seems to be the minimum bound for most Polars backends tests to pass (i.e. not error out immediately). Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu>
Refs: bbd1440 Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu>
Two tests added in edc0835 assume non-Narwhals ibis backend behavior: - test_failed_cases_index_for_column_check: row index is not preserved in the Narwhals backend's lazy/SQL failure_cases path. - test_failed_cases_index_for_dataframe_check: uses IbisData-style check functions which are incompatible with the Narwhals backend (same root cause as the existing xfail on test_dataframe_level_checks). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu>
Add `coerce_dtype` parser step that handles row-wise dtypes (auto_coerce=True, e.g. PydanticModel) at the dataframe level by unwrapping to native and delegating to the dtype's own coerce/try_coerce method. Guard `collect_schema_components` so per-column components are not created for row-wise dtypes — doing so would incorrectly compare each column's native type against the row-wise dtype class. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu>
…nnotations Annotating with `Any` caused `singledispatch.register(typing.Any, ...)` to fail on Python 3.10, which rejects `typing.Any` as a dispatch type. Use the concrete narwhals types (`nw.Datetime`, `nw.Duration`, `nw.List`) for the argument annotations and `Self` for the return type. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu>
…ocation Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu>
Tip
I've used Claude to significantly squash the commits and remove GSD artifacts; however; I'm still working off of the branch that has all of those, now pushed to https://github.com/deepyaman/pandera/tree/feat/narwhals/create-backend-all-artifacts
Prologue
Most of the below description, as well as almost all of the PR, is AI-generated. However, I have been very closely guiding the process and reviewing each step of the way. I have still not deeply reviewed the code in it's entirety, which I plan to do.
I've also verified the functionality at a high level manually. With this change, you can run checks using the existing Polars and Ibis APIs, and they leverage the newly-added Narwhals backend-pretty good for a first pass!
I've currently started exploring adding support for the PySpark backend, but that could be a separate PR. What I think are necessary steps in this PR, before merging:
.planning/(I left them there for now, both as a backup and in case wanted to share the context)Turning it over to my agentic intern...
Warning
The below is outdated and needs updating for milestone v1.1.
Narwhals backend for Polars and Ibis (v1.0)
This PR introduces a unified Narwhals-backed validation engine that replaces library-specific backends for Polars and Ibis with a single shared implementation. Users continue to pass native frames — pandera routes them through Narwhals internally.
Scope: 14 new files, ~2,950 lines of new production code and tests across 5 implementation phases.
What was built
New packages and modules
pandera/api/narwhals/types.pyNarwhalsDataNamedTuple (frame,key) — the data container passed to builtin checkspandera/api/narwhals/utils.py_to_native()helper to unwrap narwhals wrapperspandera/engines/narwhals_engine.pypandera/backends/narwhals/checks.pyNarwhalsCheckBackend— routes builtins throughNarwhalsData, user-defined checks through native frame unwrapping or ibis delegationpandera/backends/narwhals/builtin_checks.pynw.Expr:equal_to,not_equal_to,greater_than,greater_than_or_equal_to,less_than,less_than_or_equal_to,in_range,isin,notin,str_matches,str_contains,str_startswith,str_endswith,str_lengthpandera/backends/narwhals/components.pyColumnBackend— per-column validation (dtype check, nullable, unique, run_checks)pandera/backends/narwhals/container.pyDataFrameSchemaBackend— full container validation (parsers, checks, strict/ordered, lazy mode, drop_invalid_rows)pandera/backends/narwhals/base.pyNarwhalsSchemaBackend— shared helpers:run_check,subsample,failure_cases_metadata,drop_invalid_rows,is_float_dtypeModified files
pandera/backends/polars/register.pyUserWarning.pandera/backends/ibis/register.py@lru_cache(was missing). RegistersNarwhalsCheckBackendforibis.Table/ibis.Column/nw.LazyFrame.Test suite
tests/backends/narwhals/— backend-agnostic, parameterized against both Polars and Ibis:conftest.py—make_narwhals_framefixture producingnw.LazyFramefrom eitherpl.LazyFrameoribis.memtable;autousefixture that calls both register functionstest_checks.py— 14 builtin checks × 2 backends (valid + invalid data paths)test_components.py— column-level dtype, nullable, unique, and check validationtest_container.py— full container validation: strict, ordered, lazy mode, failure cases, drop_invalid_rowstest_parity.py— behavioral parity between Polars and Ibis paths: valid, invalid, lazy, strict, filter, decorator, DataFrameModeltest_narwhals_dtypes.py— dtype engine registration and coerce/try_coerceArchitecture decisions
Narwhals is internal plumbing, not a user-facing API. Users pass
pl.DataFrame,pl.LazyFrame, oribis.Table— no changes to call sites.Auto-detection over configuration.
register_polars_backends()andregister_ibis_backends()check for narwhals viatry/importand swap backends transparently. No config flag needed.SQL-lazy
element_wiseraisesNotImplementedError. Row-level Python functions cannot be applied to lazy query plans (Ibis, DuckDB, PySpark). The error is surfaced as aSchemaErrorwithCHECK_ERRORreason code. NOTE(@deepyaman): See narwhals-dev/narwhals#3512Ibis
drop_invalid_rowsdelegates toIbisSchemaBackend. Narwhals has no positional-join /row_numberabstraction for SQL-lazy backends. NOTE(@deepyaman): I'll verify this later.How to verify
Install with narwhals:
Polars — object API
Polars — class-based model
Polars — lazy mode (collect all errors before raising)
Ibis — same schema, different backend
Ibis — user-defined check (delegates to IbisCheckBackend)
Known gaps and next steps
coercefor Ibisxfail(strict=True)— intentionally deferred; will break CI when implemented so the mark gets cleaned upadd_missing_columnsparserset_defaultfor Column fieldsgroup_by().agg()pattern designed; not implementednarwhals stable.v2migrationsample=subsamplingNotImplementedError; onlyhead=/tail=are supported