Skip to content

[WIP] feat(narwhals): implement the new unifying backend#2223

Draft
deepyaman wants to merge 20 commits intounionai-oss:mainfrom
deepyaman:feat/narwhals/create-backend
Draft

[WIP] feat(narwhals): implement the new unifying backend#2223
deepyaman wants to merge 20 commits intounionai-oss:mainfrom
deepyaman:feat/narwhals/create-backend

Conversation

@deepyaman
Copy link
Copy Markdown
Collaborator

@deepyaman deepyaman commented Mar 10, 2026

Tip

I've used Claude to significantly squash the commits and remove GSD artifacts; however; I'm still working off of the branch that has all of those, now pushed to https://github.com/deepyaman/pandera/tree/feat/narwhals/create-backend-all-artifacts

Prologue

Most of the below description, as well as almost all of the PR, is AI-generated. However, I have been very closely guiding the process and reviewing each step of the way. I have still not deeply reviewed the code in it's entirety, which I plan to do.

I've also verified the functionality at a high level manually. With this change, you can run checks using the existing Polars and Ibis APIs, and they leverage the newly-added Narwhals backend-pretty good for a first pass!

I've currently started exploring adding support for the PySpark backend, but that could be a separate PR. What I think are necessary steps in this PR, before merging:

  • Proper review of the generated code (of course)
  • Removal of artifacts from .planning/ (I left them there for now, both as a backup and in case wanted to share the context)
  • Adding the CI workflow, that will test using both the existing backend and the Narwhals backend for Polars and Ibis (probably the biggest remaining piece)

Turning it over to my agentic intern...

Warning

The below is outdated and needs updating for milestone v1.1.

Narwhals backend for Polars and Ibis (v1.0)

This PR introduces a unified Narwhals-backed validation engine that replaces library-specific backends for Polars and Ibis with a single shared implementation. Users continue to pass native frames — pandera routes them through Narwhals internally.

Scope: 14 new files, ~2,950 lines of new production code and tests across 5 implementation phases.


What was built

New packages and modules

File Description
pandera/api/narwhals/types.py NarwhalsData NamedTuple (frame, key) — the data container passed to builtin checks
pandera/api/narwhals/utils.py _to_native() helper to unwrap narwhals wrappers
pandera/engines/narwhals_engine.py Narwhals dtype engine with 18 registered types (Int8–UInt64, Float32/64, String, Bool, Date, DateTime, Duration, Categorical, List, Struct)
pandera/backends/narwhals/checks.py NarwhalsCheckBackend — routes builtins through NarwhalsData, user-defined checks through native frame unwrapping or ibis delegation
pandera/backends/narwhals/builtin_checks.py 14 builtin checks implemented against nw.Expr: equal_to, not_equal_to, greater_than, greater_than_or_equal_to, less_than, less_than_or_equal_to, in_range, isin, notin, str_matches, str_contains, str_startswith, str_endswith, str_length
pandera/backends/narwhals/components.py ColumnBackend — per-column validation (dtype check, nullable, unique, run_checks)
pandera/backends/narwhals/container.py DataFrameSchemaBackend — full container validation (parsers, checks, strict/ordered, lazy mode, drop_invalid_rows)
pandera/backends/narwhals/base.py NarwhalsSchemaBackend — shared helpers: run_check, subsample, failure_cases_metadata, drop_invalid_rows, is_float_dtype

Modified files

File Change
pandera/backends/polars/register.py Auto-detects narwhals at registration time; swaps all Polars backends for narwhals equivalents when narwhals is installed. Falls back to native Polars backends otherwise. Emits UserWarning.
pandera/backends/ibis/register.py Same auto-detection pattern for Ibis. Adds @lru_cache (was missing). Registers NarwhalsCheckBackend for ibis.Table / ibis.Column / nw.LazyFrame.

Test suite

tests/backends/narwhals/ — backend-agnostic, parameterized against both Polars and Ibis:

  • conftest.pymake_narwhals_frame fixture producing nw.LazyFrame from either pl.LazyFrame or ibis.memtable; autouse fixture that calls both register functions
  • test_checks.py — 14 builtin checks × 2 backends (valid + invalid data paths)
  • test_components.py — column-level dtype, nullable, unique, and check validation
  • test_container.py — full container validation: strict, ordered, lazy mode, failure cases, drop_invalid_rows
  • test_parity.py — behavioral parity between Polars and Ibis paths: valid, invalid, lazy, strict, filter, decorator, DataFrameModel
  • test_narwhals_dtypes.py — dtype engine registration and coerce/try_coerce

Architecture decisions

Narwhals is internal plumbing, not a user-facing API. Users pass pl.DataFrame, pl.LazyFrame, or ibis.Table — no changes to call sites.

Auto-detection over configuration. register_polars_backends() and register_ibis_backends() check for narwhals via try/import and swap backends transparently. No config flag needed.

SQL-lazy element_wise raises NotImplementedError. Row-level Python functions cannot be applied to lazy query plans (Ibis, DuckDB, PySpark). The error is surfaced as a SchemaError with CHECK_ERROR reason code. NOTE(@deepyaman): See narwhals-dev/narwhals#3512

Ibis drop_invalid_rows delegates to IbisSchemaBackend. Narwhals has no positional-join / row_number abstraction for SQL-lazy backends. NOTE(@deepyaman): I'll verify this later.


How to verify

Install with narwhals:

pip install pandera[<dataframe API you're using>,narwhals]

Polars — object API

import polars as pl
import pandera.polars as pa

schema = pa.DataFrameSchema({
    "name": pa.Column(pl.String, pa.Check.str_startswith("A")),
    "age":  pa.Column(pl.Int64,  pa.Check.greater_than(0)),
})

df = pl.DataFrame({"name": ["Alice", "Bob"], "age": [30, 25]})
schema.validate(df)  # raises SchemaError: "Bob" fails str_startswith("A")

Polars — class-based model

import polars as pl
import pandera.polars as pa
from pandera.typing.polars import DataFrame, Series

class UserSchema(pa.DataFrameModel):
    name: Series[str]
    age:  Series[int]

    @pa.check("age")
    def age_positive(cls, series):
        return series > 0

UserSchema.validate(pl.DataFrame({"name": ["Alice"], "age": [30]}))

Polars — lazy mode (collect all errors before raising)

import polars as pl
import pandera.polars as pa
from pandera.errors import SchemaErrors

schema = pa.DataFrameSchema({
    "a": pa.Column(pl.Int64, pa.Check.greater_than(10)),
    "b": pa.Column(pl.Int64, pa.Check.greater_than(10)),
})

try:
    schema.validate(pl.DataFrame({"a": [1, 2], "b": [3, 4]}), lazy=True)
except SchemaErrors as e:
    print(len(e.schema_errors))  # 2 — both columns collected

Ibis — same schema, different backend

import pandas as pd
import ibis
import pandera.ibis as pa
import ibis.expr.datatypes as dt

schema = pa.DataFrameSchema({
    "a": pa.Column(dt.int64, pa.Check.greater_than(0)),
})

t = ibis.memtable(pd.DataFrame({"a": [1, 2, 3]}))
result = schema.validate(t)
result.execute()

Ibis — user-defined check (delegates to IbisCheckBackend)

from pandera.api.ibis.types import IbisData
import ibis.selectors as s
from ibis import _ as D

def check_positive(data: IbisData):
    return data.table.select(s.across(s.all(), D > 0))

schema = pa.DataFrameSchema({"a": pa.Column(dt.int64, pa.Check(check_positive))})
schema.validate(ibis.memtable(pd.DataFrame({"a": [1, 2, 3]})))

Known gaps and next steps

Gap Notes
coerce for Ibis xfail(strict=True) — intentionally deferred; will break CI when implemented so the mark gets cleaned up
pandas via narwhals backend Not yet wired; pandas currently still uses its own backend
PySpark via narwhals backend Feasibility TBD
add_missing_columns parser Deferred to next milestone
set_default for Column fields Deferred to next milestone
Groupby-based checks group_by().agg() pattern designed; not implemented
Schema IO (YAML/JSON) Deferred
Hypothesis strategies Deferred
narwhals stable.v2 migration Monitor narwhals releases; migrate when stable
sample= subsampling Raises NotImplementedError; only head=/tail= are supported

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 10, 2026

Codecov Report

❌ Patch coverage is 92.26580% with 71 lines in your changes missing coverage. Please review.
✅ Project coverage is 83.33%. Comparing base (98fceeb) to head (765261f).
⚠️ Report is 4 commits behind head on main.

Files with missing lines Patch % Lines
pandera/backends/narwhals/base.py 86.44% 24 Missing ⚠️
pandera/backends/narwhals/container.py 90.00% 21 Missing ⚠️
pandera/engines/narwhals_engine.py 92.85% 11 Missing ⚠️
pandera/backends/narwhals/components.py 95.04% 6 Missing ⚠️
pandera/backends/narwhals/checks.py 94.94% 5 Missing ⚠️
pandera/api/narwhals/utils.py 88.23% 2 Missing ⚠️
pandera/api/dataframe/container.py 85.71% 1 Missing ⚠️
pandera/backends/narwhals/builtin_checks.py 98.41% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2223      +/-   ##
==========================================
+ Coverage   82.74%   83.33%   +0.58%     
==========================================
  Files         180      189       +9     
  Lines       15089    15984     +895     
==========================================
+ Hits        12486    13320     +834     
- Misses       2603     2664      +61     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@cosmicBboy
Copy link
Copy Markdown
Collaborator

hey @deepyaman let me know if you need any help here

@deepyaman
Copy link
Copy Markdown
Collaborator Author

hey @deepyaman let me know if you need any help here

Hey @cosmicBboy, updated with a rough description of where things are at/what's necessary before moving forward with this.

Copy link
Copy Markdown
Collaborator

@cosmicBboy cosmicBboy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hey @deepyaman how are you thinking about the .planning directory? should those be checked in for posterity (and future agent reference)?

I also made a dev/narwhals branch that we can use to merge this PR... can you do an interactive rebase to squash some of the commits into smaller chunks (so as to not lose granularity of commits)?

@deepyaman
Copy link
Copy Markdown
Collaborator Author

hey @deepyaman how are you thinking about the .planning directory? should those be checked in for posterity (and future agent reference)?

I don't intend to check them in, but I thought it could be useful to share for now. From my list of pre-merge TODOs above:

Removal of artifacts from .planning/ (I left them there for now, both as a backup and in case wanted to share the context)

I don't know what you think, but I don't think it makes sense to leave project-level artifacts just for agents, even if want to support them? TBH I'm not super familiar on this.

I also made a dev/narwhals branch that we can use to merge this PR... can you do an interactive rebase to squash some of the commits into smaller chunks (so as to not lose granularity of commits)?

Sure, will look into doing this.

Copy link
Copy Markdown
Collaborator Author

@deepyaman deepyaman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There seem to be a number of potential issues, most of which fall into two categories: executing too eagerly and backend-specific logic.

Comment thread pandera/api/base/error_handler.py Outdated
# Import is guarded so ibis remains an optional dependency.
try:
import ibis as _ibis
if isinstance(failure_cases, _ibis.Table):
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if isinstance(failure_cases, _ibis.Table):
import ibis
if isinstance(failure_cases, ibis.Table):

Why alias, why not just import ibis?

Comment thread pandera/api/base/error_handler.py Outdated
if isinstance(failure_cases, str): # Avoid returning str length
return 1

# ibis.Table raises ExpressionError for len(); use .count().execute() instead.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems wrong to add an Ibis-specific branch to the base ErrorHandler; isinstance(failure_cases, ibis.table) made a lot more sense in the Ibis ErrorHandler, and this doesn't seem like the right way to handle it for Narwhals. Maybe the Narwhals backend needs it's own ErrorHandler, and that can dispatch based on type—or, much better, just use the Narwahls way of counting, not sure why this wouldn't work...

Comment thread pandera/backends/ibis/register.py Outdated
Comment on lines +15 to +17
Auto-detects narwhals: if narwhals is installed, registers narwhals backends
(NarwhalsCheckBackend, narwhals ColumnBackend, narwhals DataFrameSchemaBackend)
and emits a UserWarning. If narwhals is not installed, registers the native
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Don't know why we're not capitalizing Narwhals.

Suggested change
Auto-detects narwhals: if narwhals is installed, registers narwhals backends
(NarwhalsCheckBackend, narwhals ColumnBackend, narwhals DataFrameSchemaBackend)
and emits a UserWarning. If narwhals is not installed, registers the native
Auto-detects Narwhals: if Narwhals is installed, registers Narwhals backends
(NarwhalsCheckBackend, Narwhals ColumnBackend, Narwhals DataFrameSchemaBackend)
and emits a UserWarning. If Narwhals is not installed, registers the native

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this resolved? I thought we were resolving it, but I'm not sure I see the changes.

Comment thread pandera/backends/narwhals/base.py Outdated
import narwhals.stable.v1 as nw
import polars as pl

from pandera.api.base.error_handler import ErrorHandler
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, seems like we should be importing the Narwhals ErrorHandler here, rather than modifying the base one.

Comment thread pandera/backends/narwhals/checks.py Outdated
)

@staticmethod
def _materialize(frame) -> nw.DataFrame:
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is eagerly executing? I think executing early for the check output is not ideal, but still reasonable. However, this is also getting called elsewhere above. Furthermore, the conditional logic seems overly complicated—not sure why this is needed.

Comment thread pandera/backends/narwhals/container.py Outdated
Comment on lines +39 to +41
if issubclass(return_type, pl.DataFrame):
return native.collect()
return native
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this for? Type-specific .collect() call seems like a red flag. Once again, Ibis and Polars are following different paths, and that can't be right.

Comment thread pandera/backends/narwhals/container.py Outdated
components = self.collect_schema_components(
check_lf, schema, column_info
)
check_obj_parsed = _to_frame_kind_nw(check_lf, return_type)
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the object potentially getting collected here? It seems like, if the user passes a pl.DataFrame, we .collect()—for what reason?

Comment thread pandera/backends/narwhals/container.py Outdated

check_results = []
check_passed = []
# Convert to native pl.LazyFrame for column component dispatch.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessarily pl.LazyFrame, right? What if it's an Ibis table?

Comment thread pandera/backends/narwhals/container.py Outdated
):
"""Collects all schema components to use for validation."""

from pandera.api.polars.components import Column
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is something from the Polars backend being used here? This makes no sense.

Comment thread pandera/backends/narwhals/container.py Outdated
column_info: Any,
) -> list[CoreCheckResult]:
"""Check that all columns in the schema are present in the dataframe."""
from pandera.api.narwhals.utils import _to_native
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is to_native necessary here? Why isn't this being handled in a backend-agnostic way?

@deepyaman deepyaman force-pushed the feat/narwhals/create-backend branch from 2a02d7e to 44967b2 Compare March 25, 2026 20:29
@deepyaman deepyaman changed the base branch from main to dev/narwhals March 25, 2026 20:30
Copy link
Copy Markdown
Collaborator Author

@deepyaman deepyaman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've taken another pass at reviewing this. There are a few major themes in the feedback, alongside the other miscellaneous comments:

  1. Let's unify the strategy to handle native-type-dependent logic in the final stages (e.g. failure case reporting). Right now, there are a lot of complex if/elif/else logic blocks, based on isinstance() and hasattr() checks to determine the native type and what to do in each situation. Maybe these can be fixed by some combination of utility functions, constants (sort of like https://github.com/narwhals-dev/narwhals/blob/228fee8d83d92d06e6cb32646d0e131acf0c1e2e/narwhals/_typing.py#L28), and dispatching. Again, the native-type-dependent logic should not expand—it must stay just at the final steps of returning results back to the user—but we can have a better stragey around how we do it.
  2. There are a number of cases where we don't use top-level imports, where either we clearly should (standard library imports), or it seems like it may be a red flag if it must be an inner import.
  3. We need a cohesive testing strategy, and, in order to get a mergeable PR, we need a CI pipeline that works for existing backends, without using the Narwhals backend, as well as testing Ibis, Polars LazyFrame, Polars DataFrame (I think we need to test Polars DataFrame, right?) using the Narwhals backend. What this probably means is, we need to see how much of the existing Ibis backend tests work when Narwhals is installed now (and investigate failures, potentially xfail things that aren't super critical), how much of the Polars backend tests work when Narwhals is installed now (and investigate failures, potentially xfail things that aren't super critical), as well as add the "future" state of how we want to test in a unified way (the Narwhals backends tests should parametrize across the types we support, and we should make sure the tests pass, and this "future" state will help us towards having more-or-less a single, more maintainable test suite at some point in the future. Happy to discuss what this looks like.

Comment thread pandera/api/base/checks.py Outdated
return cls(
cls.get_builtin_check_fn(name),
statistics=statistics,
native=False,
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How will this affect existing backend implementations? We should take care not to break them.

It's seems like it may be fine, because we can assume all existing checks were native (since Narwhals wasn't introduced), but let's confirm things work when Narwhals isn't installed/we're using the existing check backends.

Comment thread pandera/api/checks.py Outdated
Comment on lines +86 to +91
:param native: If True (default), the check function receives the raw
native frame and the column key as positional args:
``check_fn(native_frame, key)``. If False, the check function
receives a narwhals expression ``nw.col(key)`` (a ``nw.Expr``)
as its sole argument: ``check_fn(nw.col(key))``. Builtin checks
use ``native=False``.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Native" only has a meaning for Narwhals. At the very least, it seems like the docstring should be clear that this only applies when using the Narwhals backend; otherwise, all the checks are native, since the concept of non-native (i.e. Narwhals) checks doesn't exist.

Comment thread pandera/backends/ibis/register.py Outdated
Comment on lines +15 to +17
Auto-detects narwhals: if narwhals is installed, registers narwhals backends
(NarwhalsCheckBackend, narwhals ColumnBackend, narwhals DataFrameSchemaBackend)
and emits a UserWarning. If narwhals is not installed, registers the native
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this resolved? I thought we were resolving it, but I'm not sure I see the changes.

Comment thread pandera/backends/narwhals/base.py Outdated
return True
if isinstance(fc, nw.DataFrame):
native = nw.to_native(fc)
return hasattr(native, "execute") # ibis.Table has .execute(); polars DataFrame does not
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems very finicky. Is there a better way to do this, or reference something in Narwhals? I see something like https://github.com/narwhals-dev/narwhals/blob/228fee8d83d92d06e6cb32646d0e131acf0c1e2e/narwhals/_typing.py#L29, but that's not quite what we're looking for...

Or, should we just hardcode for the types we support? it looks like the if isinstance(fc, nw.LazyFrame): return True is clear enough, but if it's a nw.DataFrame, maybe we actually just check if isinstance(ibis.Table): return True, etc. The Ibis import would need to be in a guard though, so as not to break for Polars or other backends.

Comment thread pandera/backends/narwhals/base.py Outdated
# Guard: SQL-lazy backends don't support tail without full ordering
if tail is not None:
native = nw.to_native(check_obj)
if hasattr(native, "execute"): # ibis.Table has .execute(); pl.LazyFrame does not
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this use something like https://github.com/narwhals-dev/narwhals/blob/228fee8d83d92d06e6cb32646d0e131acf0c1e2e/narwhals/_typing.py#L28? Again, it seems very finicky to be going off of looking for the "execute" attribute, since that may just apply to Ibis backend.

Comment thread pandera/engines/narwhals_engine.py Outdated
Comment on lines +55 to +56
from pandera.api.narwhals.types import NarwhalsData
from pandera.api.narwhals.utils import _to_native
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, can these not be top-level imports?

Comment thread pandera/engines/narwhals_engine.py Outdated
key = data_container.key
_key = "" if key == "*" else f"'{key}' in"
# Produce native failure_cases: collect original frame as native
failure_cases = _to_native(data_container.frame.collect())
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Collect original frame as native? What if it's a big dataset, why are we trying to collect the whole thing?

Comment thread pandera/engines/narwhals_engine.py Outdated

try:
lf = self.coerce(data_container)
lf.collect()
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we collecting everything? What if it's a big dataset (e.g. 100 billion rows)?

Comment on lines +71 to +74
try:
failure_cases = failure_cases.select(key)
except Exception:
pass
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do we except Exception: pass? Why would we get an error, and why would we swallow it if we do?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to rethink the test strategy, as well as how it's going to work in CI. Right now, there are a lot of tests that we've added to catch regressions, certain edge cases, etc., but there's no cohesive test strategy, and we need to change that in order to have an acceptable PR.

The goal isn't to throw away test coverage, but rather to organize and test in a structured, understandable manner. We can talk more about how to achieve this.

@deepyaman deepyaman force-pushed the feat/narwhals/create-backend branch from 21675d0 to 457fc61 Compare April 11, 2026 18:33
@deepyaman deepyaman changed the base branch from dev/narwhals to main April 11, 2026 18:35
deepyaman and others added 12 commits April 11, 2026 23:04
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu>
Adds NarwhalsSchemaBackend, ColumnBackend, and DataFrameSchemaBackend
with lazy-first materialization and drop_invalid_rows support via
nw.Expr accumulation.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu>
- Remove `_is_lazy_or_sql` from base.py; add shared `_is_lazy` to api/narwhals/utils.py
- Add `CheckData` protocol to api/base/types.py; add `.frame` property to PolarsData/IbisData
- Add `DataFrameSchema.infer_columns()` so narwhals/container.py no longer reaches into
  pandera.api.polars.components (TYPE_CHECKING guard replaces the runtime import)
- Unify builtin comparison checks: base/builtin_checks.py now implements them via
  narwhals; polars/builtin_checks.py is stripped of duplicated implementations
- `_normalize_native_output` handles pl.Series/pl.DataFrame from native=True checks;
  bool scalar fallback uses pyarrow instead of polars
- Hoist all inner imports (re, importlib, functools) to module level across narwhals
  backends; guard remaining polars imports with try/except for ibis-only environments
- Capitalize "Narwhals" in all docstrings/comments (was: "narwhals"); clarify `native`
  param scope in Check docstring
- Add CLEAN-01/CLEAN-02/infer_columns architecture regression tests to test_phase01_arch.py

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu>
- Expand `make_narwhals_frame` fixture from 2-way (polars, ibis) to 3-way
  (polars_eager, polars_lazy, ibis_table); test_checks.py now runs 108 tests
  across all three native frame types (TEST-02)
- Add session-scoped autouse `_ensure_polars_backend_registered` /
  `_ensure_ibis_backend_registered` fixtures to tests/polars/conftest.py and
  new tests/ibis/conftest.py; prevents narwhals backend from shadowing
  library-native backends when both are installed (TEST-01)
- Annotate intentionally type-specific tests in test_e2e.py and
  test_lazy_regressions.py with TEST-02 comments
- Add `unit-tests-narwhals` GitHub Actions job: runs tests/backends/narwhals/
  with narwhals + polars + ibis co-installed on Python 3.11 and 3.12 (TEST-03)
- Wire narwhals nox session: adds polars+ibis to its install set and routes
  test_dir to backends/narwhals/

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu>
Mark tests in tests/polars/ and tests/ibis/ that fail when the narwhals
backend is installed. Uses pytest.mark.xfail(condition=narwhals_installed,
strict=True) for known definite failures, and imperative pytest.xfail()
calls for class-based tests and selective parametrize cases.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu>
- Add unit-tests-narwhals-backend CI job running tests/polars/ and tests/ibis/
  with narwhals co-installed across Python 3.10–3.13 × polars/ibis
- Add tests_narwhals_backend nox session (TEST-03)
- Expand unit-tests-narwhals Python matrix to 3.10–3.13
- Remove backend isolation fixtures from polars/ibis conftests (TEST-01 arch fix)
- Trim TEST-01/TEST-03 comment noise in arch test and CI yml

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu>
@deepyaman deepyaman force-pushed the feat/narwhals/create-backend branch from 41e1657 to e4d0973 Compare April 12, 2026 05:13
deepyaman and others added 4 commits April 12, 2026 08:38
Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu>
- Add type: ignore[call-arg] for native= kwarg in BaseCheck.from_builtin_check_name
- Assert isinstance(pl.DataFrame) after pl.from_arrow() to narrow union type
- Declare ColumnBackend: type | None before try/except guard in test
- Add type: ignore suppression for try/except re-imports in polars/ibis register

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu>
1.26.0 introduced `narwhals.exceptions.ComputeError`. While Ibis support
is really only function since 1.40.0, this seems to be the minimum bound
for most Polars backends tests to pass (i.e. not error out immediately).

Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu>
Refs: bbd1440

Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu>
deepyaman and others added 4 commits April 13, 2026 00:15
Two tests added in edc0835 assume non-Narwhals ibis backend behavior:

- test_failed_cases_index_for_column_check: row index is not preserved
  in the Narwhals backend's lazy/SQL failure_cases path.
- test_failed_cases_index_for_dataframe_check: uses IbisData-style check
  functions which are incompatible with the Narwhals backend (same root
  cause as the existing xfail on test_dataframe_level_checks).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu>
Add `coerce_dtype` parser step that handles row-wise dtypes (auto_coerce=True,
e.g. PydanticModel) at the dataframe level by unwrapping to native and
delegating to the dtype's own coerce/try_coerce method. Guard
`collect_schema_components` so per-column components are not created for
row-wise dtypes — doing so would incorrectly compare each column's native
type against the row-wise dtype class.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu>
…nnotations

Annotating with `Any` caused `singledispatch.register(typing.Any, ...)` to
fail on Python 3.10, which rejects `typing.Any` as a dispatch type. Use
the concrete narwhals types (`nw.Datetime`, `nw.Duration`, `nw.List`) for
the argument annotations and `Self` for the return type.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu>
…ocation

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants