[SPARK-55031][SQL] Add vector avg/sum aggregation function expressions #54011
+2,497
−2
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
This PR adds support for vector aggregation functions to Spark SQL, enabling element-wise sum and average computations across groups of vectors.
Key implementation details:
This PR only includes SQL language support; DataFrame API will be added in a separate PR.
Why are the changes needed?
Vector aggregation functions are fundamental operations for:
These functions complement the vector distance/similarity functions and are commonly available in other systems (Snowflake's VECTOR_SUM/VECTOR_AVG, PostgreSQL pgvector's SUM/AVG over vectors).
Does this PR introduce any user-facing change?
Yes, this PR introduces 2 new SQL aggregate functions:
How was this patch tested?
SQL Golden File Tests: Added
vector-agg.sqlwith test coverage:vector_sumandvector_avgUnit tests: Added
VectorAggSuite.scalato test aggregate lifecycle phases:initialize(): Empty buffer returns nullupdate(): Single/multiple vectors, NULL handling, special floatsmerge(): Two buffers, empty buffers, different counts (for weighted average)eval(): Result extraction from binary bufferWas this patch authored or co-authored using generative AI tooling?
Yes, code assistance with Claude Opus 4.5 in combination with manual editing by the author.