[SPARK-54943][PYTHON][TESTS] Add test coverage for `pa.Array.cast` with `safe=False` #54010

Yicong-Huang · 2026-01-27T18:46:48Z

What changes were proposed in this pull request?

This PR extends the PyArrow array cast tests to cover both safe=True and safe=False behaviors in each test case.

Changes:

Modified _run_cast_tests method: Now tests both safe=True and safe=False modes for each test case
Extended test case format: Support 3-tuple format (src_arr, expected_safe_true, expected_safe_false) where expected_safe_false=None means same behavior as safe=True
Updated 200+ test cases: All overflow/truncation cases now explicitly specify the safe=False behavior (wrapping, truncation, or saturation)

Categories of safe=False behavior covered:

Integer overflow: Negative-to-unsigned wraps (e.g., -1 → 255 for uint8)
Integer narrowing: Value wraps on overflow (e.g., 128 as int16 → -128 as int8)
Float-to-integer: Truncation and saturation for out-of-range values
Timestamp/Duration precision loss: Truncation when casting to coarser units
Decimal overflow: Wrapping behavior for values exceeding integer limits

Why are the changes needed?

Part of SPARK-54936. The existing tests only covered safe=True behavior. This PR adds comprehensive coverage for safe=False to ensure PySpark correctly handles PyArrow's overflow/truncation semantics, which is important for data processing pipelines that need to handle edge cases gracefully.

Does this PR introduce any user-facing change?

No

How was this patch tested?

All 39 tests in PyArrowNumericalCastTests pass with the updated test framework.

Was this patch authored or co-authored using generative AI tooling?

No.

github-actions · 2026-01-27T18:47:07Z

JIRA Issue Information

=== Sub-task SPARK-54943 ===
Summary: Add tests for pa.Array.cast with overflow
Assignee: None
Status: Open
Affected: ["4.2.0"]

This comment was automatically generated by GitHub Actions

zhengruifeng · 2026-01-28T02:07:02Z

@Yicong-Huang @fangchenli I am thinking whether we should use golden files instead.
If we use golden files, we don't need to touch the test files too much, just need to regenereate or compare a new file with new conditions (safe=False in this PR)

Yicong-Huang · 2026-01-28T02:14:17Z

@Yicong-Huang @fangchenli I am thinking whether we should use golden files instead. If we use golden files, we don't need to touch the test files too much, just need to regenereate or compare a new file with new conditions (safe=False in this PR)

hmm I am really not a fan of golden file, especially for this kind of type related tests. To compare on a text based format, all python objects needs to be serialized in a way, (e.g., str(object) or repr(object)) which could cause edge cases missing. For example, a decimal128 and a decimal256 may have different digits initin it but could be serialized to be the same string (I am making it up as an example). We will need to make sure we don't introduce potential problems like this. This particular test for pa.Array.cast maybe sensitive to an extra serialization.

Maybe we can exercise golden file practice in future/other tests?

zhengruifeng · 2026-01-28T02:55:56Z

@Yicong-Huang we can control the string representation and include intrested infromation in it.

e.g in UDF type coercion test, we store both value and type f{value}_{type(_)}

zhengruifeng · 2026-01-28T02:57:48Z

thanks, merged to master

Yicong-Huang added 2 commits January 27, 2026 10:43

test: add coverage for safe=False

a5be423

fix: format

52c0174

github-actions bot added CORE PYTHON labels Jan 27, 2026

zhengruifeng approved these changes Jan 28, 2026

View reviewed changes

zhengruifeng closed this in ff56767 Jan 28, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-54943][PYTHON][TESTS] Add test coverage for `pa.Array.cast` with `safe=False` #54010

[SPARK-54943][PYTHON][TESTS] Add test coverage for `pa.Array.cast` with `safe=False` #54010

Yicong-Huang commented Jan 27, 2026

Uh oh!

github-actions bot commented Jan 27, 2026

Uh oh!

zhengruifeng commented Jan 28, 2026

Uh oh!

Yicong-Huang commented Jan 28, 2026 •

edited by zhengruifeng

Loading

Uh oh!

zhengruifeng commented Jan 28, 2026

Uh oh!

zhengruifeng commented Jan 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[SPARK-54943][PYTHON][TESTS] Add test coverage for pa.Array.cast with safe=False #54010

[SPARK-54943][PYTHON][TESTS] Add test coverage for pa.Array.cast with safe=False #54010

Conversation

Yicong-Huang commented Jan 27, 2026

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

github-actions bot commented Jan 27, 2026

JIRA Issue Information

Uh oh!

zhengruifeng commented Jan 28, 2026

Uh oh!

Yicong-Huang commented Jan 28, 2026 • edited by zhengruifeng Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zhengruifeng commented Jan 28, 2026

Uh oh!

zhengruifeng commented Jan 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[SPARK-54943][PYTHON][TESTS] Add test coverage for `pa.Array.cast` with `safe=False` #54010

[SPARK-54943][PYTHON][TESTS] Add test coverage for `pa.Array.cast` with `safe=False` #54010

Yicong-Huang commented Jan 28, 2026 •

edited by zhengruifeng

Loading