Skip to content

Conversation

@e-strauss
Copy link
Contributor

@e-strauss e-strauss commented Dec 10, 2025

This commit implements direct transfer support for Scipy sparse matrices (CSR and COO formats) between Python and the Java runtime.

  • Added Java utility methods to convert Scipy Compressed Sparse Row (CSR) and Coordinate (COO) byte arrays directly into internal MatrixBlock representations.
  • Updated the Python SystemDSContext to support a sparse_data_transfer flag and extended from_numpy to accept scipy.sparse matrices.
  • Enhanced the Python converter logic to detect sparse matrices and either use the new optimized sparse transfer or fall back to dense conversion based on configuration.

Benchmark results:
image

@github-project-automation github-project-automation bot moved this to In Progress in SystemDS PR Queue Dec 10, 2025
@e-strauss e-strauss changed the title [Draft] Transfer scipy compressed matrices transfer to java runtime [Draft] Transfer scipy compressed matrices to java runtime Dec 10, 2025
@e-strauss
Copy link
Contributor Author

e-strauss commented Dec 10, 2025

@Baunsgaard @mboehm7 https://github.com/apache/systemds/blob/8de93a1f996348c260d48c1c29340d5a88005e6f/src/main/python/systemds/context/systemds_context.py#L772C4-L772C20

the naming "from_numpy" is misleading since we support now scipy compressed matrices as well
shall we rename it to something like "from_py_matrix", "from_array_like"?

We could also use something like "from_pydata" which merges the two separate methods from_numpy and from_pandas to single method, internally we could route to the corresponding handler based on the input instance type.

@codecov
Copy link

codecov bot commented Dec 10, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 71.54%. Comparing base (b394e32) to head (cce93be).
⚠️ Report is 5 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #2379      +/-   ##
============================================
+ Coverage     71.51%   71.54%   +0.03%     
- Complexity    47441    47466      +25     
============================================
  Files          1539     1539              
  Lines        182605   182631      +26     
  Branches      35916    35919       +3     
============================================
+ Hits         130585   130665      +80     
+ Misses        42028    41972      -56     
- Partials       9992     9994       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@Baunsgaard
Copy link
Contributor

@Baunsgaard @mboehm7 https://github.com/apache/systemds/blob/8de93a1f996348c260d48c1c29340d5a88005e6f/src/main/python/systemds/context/systemds_context.py#L772C4-L772C20

the naming "from_numpy" is misleading since we support now scipy compressed matrices as well shall we rename it to something like "from_py_matrix", "from_array_like"?

We could also use something like "from_pydata" which merges the two separate methods from_numpy and from_pandas to single method, internally we could route to the corresponding handler based on the input instance type.

This sounds good to me, unifying the API would be fine.
However,

  1. when doing it do not break backwards compatibility.
  2. when calling the old methods report a deprecated message to the user via the logging framework.

@e-strauss
Copy link
Contributor Author

sure, I agree

@e-strauss e-strauss force-pushed the compressed-matrix-transfer branch 2 times, most recently from 7a35b34 to bc56a18 Compare January 29, 2026 18:55
This commit implements optimized data transfer for Scipy sparse matrices from Python to the Java runtime. Key changes include the addition of `convertSciPyCSRToMB` and `convertSciPyCOOToMB` in the Java utility layer to directly handle compressed sparse row and coordinate formats. On the Python side, the `SystemDSContext` now supports a `sparse_data_transfer` flag and a new `from_py` method to unify data ingestion. These updates allow sparse data to be transferred without being converted to dense arrays, improving efficiency. Additionally, several data conversion methods were refactored for better maintenance.
@e-strauss e-strauss force-pushed the compressed-matrix-transfer branch from bc56a18 to cce93be Compare January 29, 2026 19:16
@e-strauss e-strauss changed the title [Draft] Transfer scipy compressed matrices to java runtime [SYTEMDS-3902] Transfer scipy compressed matrices to java runtime Jan 29, 2026
@e-strauss e-strauss marked this pull request as ready for review January 29, 2026 19:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

2 participants