Skip to content

Conversation

Copy link

Copilot AI commented Feb 1, 2026

Implements ord(string) function that returns the Unicode code point of the first character, following DuckDB semantics.

Changes

BE Implementation (be/src/vec/functions/function_string.cpp)

  • Added StringOrd struct with proper UTF-8 to Unicode code point decoding
  • Handles 1-4 byte UTF-8 sequences with continuation byte validation
  • Returns Int64 to accommodate full Unicode range (0 to 0x10FFFF)
  • Returns 0 for empty strings and invalid UTF-8

FE Implementation

  • Added Ord.java scalar function class returning BigIntType
  • Registered in BuiltinScalarFunctions.java and ScalarFunctionVisitor.java

Tests

  • BE unit tests covering ASCII, 2/3/4-byte UTF-8, empty string, null
  • Regression tests in both query_p0 and nereids_p0 suites

Example

SELECT ord('A');      -- 65
SELECT ord('');     -- 20320
SELECT ord('😀');     -- 128512
SELECT ord('');       -- 0
SELECT ord(NULL);     -- NULL

Key difference from ascii()

  • ascii() returns the first byte value
  • ord() decodes UTF-8 and returns the actual Unicode code point

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • duckdb.org
    • Triggering command: /home/REDACTED/work/_temp/ghcca-node/node/bin/node /home/REDACTED/work/_temp/ghcca-node/node/bin/node --enable-source-maps /home/REDACTED/work/_temp/copilot-developer-action-main/dist/index.js (dns block)

If you need me to access, download, or install something from one of these locations, you can either:


💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

@Thearas
Copy link
Contributor

Thearas commented Feb 1, 2026

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

Copilot AI changed the title [WIP] Add String function ord implementation in DuckDB [Feature] Implement string function ord following DuckDB semantics Feb 1, 2026
Copilot AI requested a review from zclllyybb February 1, 2026 10:25
@zclllyybb
Copy link
Contributor

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 31748 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit a9a22399386f66d4733d503b4d7fce0debdd68ca, data reload: false

------ Round 1 ----------------------------------
q1	17635	5403	5022	5022
q2	2067	352	188	188
q3	10150	1317	726	726
q4	10196	794	312	312
q5	7544	2197	1878	1878
q6	198	177	149	149
q7	901	744	604	604
q8	9257	1353	1081	1081
q9	5213	4758	4868	4758
q10	6843	1957	1552	1552
q11	533	288	291	288
q12	336	375	231	231
q13	17776	4033	3252	3252
q14	243	244	224	224
q15	909	825	811	811
q16	680	683	613	613
q17	652	766	520	520
q18	6644	6364	7480	6364
q19	1287	1055	624	624
q20	415	382	256	256
q21	2895	2137	2007	2007
q22	375	341	288	288
Total cold run time: 102749 ms
Total hot run time: 31748 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5653	5499	5461	5461
q2	260	342	254	254
q3	2325	2866	2462	2462
q4	1547	1842	1650	1650
q5	4764	4608	4602	4602
q6	227	182	135	135
q7	2021	1885	1842	1842
q8	2521	2407	2339	2339
q9	7618	7909	7548	7548
q10	2946	3009	2472	2472
q11	542	455	437	437
q12	627	693	569	569
q13	3521	4052	3233	3233
q14	269	286	257	257
q15	837	804	795	795
q16	636	700	640	640
q17	1073	1247	1277	1247
q18	7515	7247	7276	7247
q19	799	787	791	787
q20	1956	2040	1885	1885
q21	4439	4194	4103	4103
q22	573	534	521	521
Total cold run time: 52669 ms
Total hot run time: 50486 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 28.61 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit a9a22399386f66d4733d503b4d7fce0debdd68ca, data reload: false

query1	0.06	0.05	0.05
query2	0.09	0.04	0.04
query3	0.25	0.09	0.09
query4	1.61	0.11	0.12
query5	0.27	0.25	0.25
query6	1.17	0.68	0.66
query7	0.04	0.03	0.03
query8	0.05	0.04	0.04
query9	0.54	0.50	0.49
query10	0.55	0.52	0.54
query11	0.15	0.10	0.09
query12	0.15	0.10	0.10
query13	0.63	0.61	0.61
query14	1.06	1.06	1.04
query15	0.88	0.86	0.87
query16	0.39	0.40	0.42
query17	1.17	1.14	1.16
query18	0.23	0.21	0.20
query19	2.11	1.96	2.05
query20	0.02	0.01	0.02
query21	15.41	0.26	0.14
query22	5.29	0.05	0.05
query23	16.05	0.27	0.11
query24	1.48	0.60	0.91
query25	0.08	0.11	0.05
query26	0.14	0.12	0.13
query27	0.06	0.06	0.06
query28	5.05	1.13	0.96
query29	12.54	3.91	3.19
query30	0.29	0.14	0.11
query31	2.82	0.63	0.41
query32	3.24	0.60	0.50
query33	3.22	3.26	3.25
query34	16.30	5.38	4.72
query35	4.84	4.78	4.81
query36	0.65	0.50	0.49
query37	0.12	0.08	0.07
query38	0.06	0.04	0.04
query39	0.05	0.03	0.04
query40	0.19	0.16	0.15
query41	0.08	0.04	0.03
query42	0.04	0.03	0.03
query43	0.05	0.04	0.03
Total cold run time: 99.47 s
Total hot run time: 28.61 s

@doris-robot
Copy link

BE UT Coverage Report

Increment line coverage 94.74% (36/38) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 52.49% (19276/36725)
Line Coverage 35.96% (179098/497997)
Region Coverage 32.36% (138862/429072)
Branch Coverage 33.32% (60105/180377)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 94.74% (36/38) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 71.50% (25741/36000)
Line Coverage 54.15% (269033/496820)
Region Coverage 51.83% (224664/433482)
Branch Coverage 53.14% (96235/181109)

@hello-stephen
Copy link
Contributor

FE Regression Coverage Report

Increment line coverage 92.31% (12/13) 🎉
Increment coverage report
Complete coverage report

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 94.74% (36/38) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 71.50% (25741/36000)
Line Coverage 54.15% (269033/496820)
Region Coverage 51.83% (224664/433482)
Branch Coverage 53.14% (96235/181109)

@hello-stephen
Copy link
Contributor

FE Regression Coverage Report

Increment line coverage 92.31% (12/13) 🎉
Increment coverage report
Complete coverage report

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants