[Feature] Implement cosine_similarity array function #60403

Copilot · 2026-01-31T08:25:48Z

Implements cosine_similarity array function following DuckDB semantics. The function computes the cosine similarity between two float arrays: dot(x, y) / (||x|| * ||y||).

Changes

Backend (C++)

Added CosineSimilarity class in function_array_distance.h
Implemented computation in function_array_distance.cpp using the existing FunctionArrayDistance template
Returns 0.0 for zero vectors (consistent with DuckDB)

Frontend (Java)

Created CosineSimilarity.java scalar function class
Added visitor method in ScalarFunctionVisitor.java
Registered in BuiltinScalarFunctions.java

Tests

BE unit tests: identical/orthogonal/opposite vectors, zero vectors, empty arrays, known values
Regression tests: NULL handling, array size validation, table queries, relationship with cosine_distance

Usage

SELECT cosine_similarity([1, 2, 3], [1, 2, 3]);  -- 1.0 (identical)
SELECT cosine_similarity([1, 0], [0, 1]);        -- 0.0 (orthogonal)
SELECT cosine_similarity([1, 2, 3], [-1, -2, -3]); -- -1.0 (opposite)

-- Relationship: cosine_similarity + cosine_distance = 1.0
SELECT cosine_similarity([1, 2, 3], [3, 5, 7]) + cosine_distance([1, 2, 3], [3, 5, 7]); -- 1.0

Behavior

Input: Two ARRAY<FLOAT> of equal length
Output: FLOAT
NULL arrays or NULL elements throw exceptions (consistent with other distance functions)
Different array sizes throw exceptions

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

duckdb.org
- Triggering command: /home/REDACTED/work/_temp/ghcca-node/node/bin/node /home/REDACTED/work/_temp/ghcca-node/node/bin/node --enable-source-maps /home/REDACTED/work/_temp/copilot-developer-action-main/dist/index.js (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

Configure Actions setup steps to set up my environment, which run before the firewall is enabled
Add the appropriate URLs or hosts to the custom allowlist in this repository's Copilot coding agent settings (admins only)

💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

hello-stephen · 2026-01-31T08:25:55Z

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

What problem was fixed (it's best to include specific error reporting information). How it was fixed.
Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
What features were added. Why was this function added?
Which code was refactored and why was this part of the code refactored?
Which functions were optimized and what is the difference before and after the optimization?

Co-authored-by: zclllyybb <[email protected]>

zclllyybb · 2026-01-31T08:41:17Z

run buildall

doris-robot · 2026-01-31T09:56:39Z

BE UT Coverage Report

Increment line coverage 100.00% (15/15) 🎉

Increment coverage report
Complete coverage report

Category	Coverage
Function Coverage	52.52% (19289/36724)
Line Coverage	36.00% (179249/497974)
Region Coverage	32.41% (139048/429043)
Branch Coverage	33.35% (60154/180353)

doris-robot · 2026-01-31T10:28:16Z

TPC-H: Total hot run time: 31615 ms

machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit ee407798f640e58bcf140c4eeeed1363113ff6cf, data reload: false

------ Round 1 ----------------------------------
q1	17620	5381	5052	5052
q2	2067	354	191	191
q3	10154	1288	738	738
q4	10230	852	312	312
q5	7532	2150	1881	1881
q6	188	178	149	149
q7	881	748	621	621
q8	9260	1427	1090	1090
q9	5246	4843	4802	4802
q10	6827	1945	1550	1550
q11	490	293	266	266
q12	344	371	223	223
q13	17797	4076	3236	3236
q14	250	241	224	224
q15	909	808	819	808
q16	706	669	623	623
q17	651	831	436	436
q18	6787	6604	6426	6426
q19	1235	982	604	604
q20	384	342	230	230
q21	2575	1979	1891	1891
q22	355	315	262	262
Total cold run time: 102488 ms
Total hot run time: 31615 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5336	5318	5306	5306
q2	267	326	253	253
q3	2172	2681	2274	2274
q4	1352	1737	1313	1313
q5	4242	4188	4082	4082
q6	226	186	139	139
q7	2017	2102	1916	1916
q8	2710	2451	2397	2397
q9	7465	7444	7555	7444
q10	2817	3023	2661	2661
q11	559	474	460	460
q12	761	746	600	600
q13	3881	4432	3467	3467
q14	290	352	331	331
q15	891	881	809	809
q16	653	739	659	659
q17	1179	1617	1390	1390
q18	8144	8096	7970	7970
q19	864	840	804	804
q20	2147	2142	2057	2057
q21	4739	4149	4170	4149
q22	545	571	508	508
Total cold run time: 53257 ms
Total hot run time: 50989 ms

doris-robot · 2026-01-31T10:45:02Z

ClickBench: Total hot run time: 28.07 s

machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit ee407798f640e58bcf140c4eeeed1363113ff6cf, data reload: false

query1	0.05	0.05	0.04
query2	0.09	0.04	0.05
query3	0.26	0.08	0.09
query4	1.60	0.11	0.11
query5	0.27	0.24	0.26
query6	1.16	0.68	0.67
query7	0.03	0.02	0.02
query8	0.06	0.04	0.05
query9	0.56	0.50	0.50
query10	0.55	0.54	0.55
query11	0.14	0.10	0.09
query12	0.14	0.10	0.11
query13	0.64	0.61	0.63
query14	1.07	1.07	1.05
query15	0.88	0.86	0.88
query16	0.38	0.39	0.41
query17	1.15	1.12	1.11
query18	0.23	0.20	0.21
query19	2.10	1.88	2.08
query20	0.02	0.01	0.01
query21	15.44	0.23	0.14
query22	5.07	0.05	0.04
query23	15.73	0.27	0.10
query24	2.98	0.63	0.27
query25	0.08	0.11	0.08
query26	0.14	0.13	0.13
query27	0.06	0.05	0.06
query28	4.60	1.13	0.97
query29	12.55	3.98	3.16
query30	0.27	0.13	0.14
query31	2.81	0.66	0.40
query32	3.23	0.60	0.49
query33	3.26	3.23	3.37
query34	16.24	5.42	4.69
query35	4.74	4.74	4.83
query36	0.65	0.52	0.49
query37	0.11	0.07	0.06
query38	0.07	0.04	0.04
query39	0.05	0.03	0.03
query40	0.18	0.17	0.16
query41	0.09	0.04	0.03
query42	0.04	0.03	0.03
query43	0.05	0.04	0.03
Total cold run time: 99.82 s
Total hot run time: 28.07 s

hello-stephen · 2026-01-31T12:28:57Z

BE Regression && UT Coverage Report

Increment line coverage 100.00% (15/15) 🎉

Increment coverage report
Complete coverage report

Category	Coverage
Function Coverage	71.58% (25767/35999)
Line Coverage	54.21% (269290/496797)
Region Coverage	51.85% (224741/433453)
Branch Coverage	53.16% (96262/181085)

hello-stephen · 2026-01-31T13:10:18Z

FE Regression Coverage Report

Increment line coverage 92.31% (12/13) 🎉
Increment coverage report
Complete coverage report

github-actions · 2026-02-03T14:40:56Z

PR approved by anyone and no changes requested.

Initial plan

80d8073

Copilot AI assigned Copilot and zclllyybb Jan 31, 2026

Copilot started work on behalf of zclllyybb January 31, 2026 08:26 View session

Copilot AI and others added 2 commits January 31, 2026 08:31

Implement cosine_similarity array function (BE, FE, tests)

197f56d

Co-authored-by: zclllyybb <[email protected]>

Add BE unit tests for cosine_similarity function

ee40779

Co-authored-by: zclllyybb <[email protected]>

Copilot AI changed the title ~~[WIP] Add Array function cosine_similarity implementation~~ [Feature] Implement cosine_similarity array function Jan 31, 2026

Copilot AI requested a review from zclllyybb January 31, 2026 08:36

Copilot finished work on behalf of zclllyybb January 31, 2026 08:36

linrrzqqq approved these changes Feb 3, 2026

View reviewed changes

github-actions bot added the reviewed label Feb 3, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Implement cosine_similarity array function #60403

[Feature] Implement cosine_similarity array function #60403

Copilot AI commented Jan 31, 2026 •

edited

Loading

Uh oh!

hello-stephen commented Jan 31, 2026

Uh oh!

zclllyybb commented Jan 31, 2026

Uh oh!

doris-robot commented Jan 31, 2026

Uh oh!

doris-robot commented Jan 31, 2026

Uh oh!

doris-robot commented Jan 31, 2026

Uh oh!

hello-stephen commented Jan 31, 2026

Uh oh!

hello-stephen commented Jan 31, 2026

Uh oh!

github-actions bot commented Feb 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[Feature] Implement cosine_similarity array function #60403

Are you sure you want to change the base?

[Feature] Implement cosine_similarity array function #60403

Conversation

Copilot AI commented Jan 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Usage

Behavior

I tried to connect to the following addresses, but was blocked by firewall rules:

Uh oh!

hello-stephen commented Jan 31, 2026

Uh oh!

zclllyybb commented Jan 31, 2026

Uh oh!

doris-robot commented Jan 31, 2026

BE UT Coverage Report

Uh oh!

doris-robot commented Jan 31, 2026

Uh oh!

doris-robot commented Jan 31, 2026

Uh oh!

hello-stephen commented Jan 31, 2026

BE Regression && UT Coverage Report

Uh oh!

hello-stephen commented Jan 31, 2026

FE Regression Coverage Report

Uh oh!

github-actions bot commented Feb 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Copilot AI commented Jan 31, 2026 •

edited

Loading