[Feature] Implement string function ord following DuckDB semantics #60409

Copilot · 2026-02-01T10:15:18Z

Implements ord(string) function that returns the Unicode code point of the first character, following DuckDB semantics.

Changes

BE Implementation (be/src/vec/functions/function_string.cpp)

Added StringOrd struct with proper UTF-8 to Unicode code point decoding
Handles 1-4 byte UTF-8 sequences with continuation byte validation
Returns Int64 to accommodate full Unicode range (0 to 0x10FFFF)
Returns 0 for empty strings and invalid UTF-8

FE Implementation

Added Ord.java scalar function class returning BigIntType
Registered in BuiltinScalarFunctions.java and ScalarFunctionVisitor.java

Tests

BE unit tests covering ASCII, 2/3/4-byte UTF-8, empty string, null
Regression tests in both query_p0 and nereids_p0 suites

Example

SELECT ord('A');      -- 65
SELECT ord('你');     -- 20320
SELECT ord('😀');     -- 128512
SELECT ord('');       -- 0
SELECT ord(NULL);     -- NULL

Key difference from `ascii()`

ascii() returns the first byte value
ord() decodes UTF-8 and returns the actual Unicode code point

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

duckdb.org
- Triggering command: /home/REDACTED/work/_temp/ghcca-node/node/bin/node /home/REDACTED/work/_temp/ghcca-node/node/bin/node --enable-source-maps /home/REDACTED/work/_temp/copilot-developer-action-main/dist/index.js (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

Configure Actions setup steps to set up my environment, which run before the firewall is enabled
Add the appropriate URLs or hosts to the custom allowlist in this repository's Copilot coding agent settings (admins only)

💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Thearas · 2026-02-01T10:15:24Z

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

What problem was fixed (it's best to include specific error reporting information). How it was fixed.
Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
What features were added. Why was this function added?
Which code was refactored and why was this part of the code refactored?
Which functions were optimized and what is the difference before and after the optimization?

Co-authored-by: zclllyybb <[email protected]>

zclllyybb · 2026-02-01T11:03:42Z

run buildall

doris-robot · 2026-02-01T11:40:43Z

TPC-H: Total hot run time: 31748 ms

machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit a9a22399386f66d4733d503b4d7fce0debdd68ca, data reload: false

------ Round 1 ----------------------------------
q1	17635	5403	5022	5022
q2	2067	352	188	188
q3	10150	1317	726	726
q4	10196	794	312	312
q5	7544	2197	1878	1878
q6	198	177	149	149
q7	901	744	604	604
q8	9257	1353	1081	1081
q9	5213	4758	4868	4758
q10	6843	1957	1552	1552
q11	533	288	291	288
q12	336	375	231	231
q13	17776	4033	3252	3252
q14	243	244	224	224
q15	909	825	811	811
q16	680	683	613	613
q17	652	766	520	520
q18	6644	6364	7480	6364
q19	1287	1055	624	624
q20	415	382	256	256
q21	2895	2137	2007	2007
q22	375	341	288	288
Total cold run time: 102749 ms
Total hot run time: 31748 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5653	5499	5461	5461
q2	260	342	254	254
q3	2325	2866	2462	2462
q4	1547	1842	1650	1650
q5	4764	4608	4602	4602
q6	227	182	135	135
q7	2021	1885	1842	1842
q8	2521	2407	2339	2339
q9	7618	7909	7548	7548
q10	2946	3009	2472	2472
q11	542	455	437	437
q12	627	693	569	569
q13	3521	4052	3233	3233
q14	269	286	257	257
q15	837	804	795	795
q16	636	700	640	640
q17	1073	1247	1277	1247
q18	7515	7247	7276	7247
q19	799	787	791	787
q20	1956	2040	1885	1885
q21	4439	4194	4103	4103
q22	573	534	521	521
Total cold run time: 52669 ms
Total hot run time: 50486 ms

doris-robot · 2026-02-01T11:57:28Z

ClickBench: Total hot run time: 28.61 s

machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit a9a22399386f66d4733d503b4d7fce0debdd68ca, data reload: false

query1	0.06	0.05	0.05
query2	0.09	0.04	0.04
query3	0.25	0.09	0.09
query4	1.61	0.11	0.12
query5	0.27	0.25	0.25
query6	1.17	0.68	0.66
query7	0.04	0.03	0.03
query8	0.05	0.04	0.04
query9	0.54	0.50	0.49
query10	0.55	0.52	0.54
query11	0.15	0.10	0.09
query12	0.15	0.10	0.10
query13	0.63	0.61	0.61
query14	1.06	1.06	1.04
query15	0.88	0.86	0.87
query16	0.39	0.40	0.42
query17	1.17	1.14	1.16
query18	0.23	0.21	0.20
query19	2.11	1.96	2.05
query20	0.02	0.01	0.02
query21	15.41	0.26	0.14
query22	5.29	0.05	0.05
query23	16.05	0.27	0.11
query24	1.48	0.60	0.91
query25	0.08	0.11	0.05
query26	0.14	0.12	0.13
query27	0.06	0.06	0.06
query28	5.05	1.13	0.96
query29	12.54	3.91	3.19
query30	0.29	0.14	0.11
query31	2.82	0.63	0.41
query32	3.24	0.60	0.50
query33	3.22	3.26	3.25
query34	16.30	5.38	4.72
query35	4.84	4.78	4.81
query36	0.65	0.50	0.49
query37	0.12	0.08	0.07
query38	0.06	0.04	0.04
query39	0.05	0.03	0.04
query40	0.19	0.16	0.15
query41	0.08	0.04	0.03
query42	0.04	0.03	0.03
query43	0.05	0.04	0.03
Total cold run time: 99.47 s
Total hot run time: 28.61 s

doris-robot · 2026-02-01T12:27:17Z

BE UT Coverage Report

Increment line coverage 94.74% (36/38) 🎉

Increment coverage report
Complete coverage report

Category	Coverage
Function Coverage	52.49% (19276/36725)
Line Coverage	35.96% (179098/497997)
Region Coverage	32.36% (138862/429072)
Branch Coverage	33.32% (60105/180377)

hello-stephen · 2026-02-01T13:59:54Z

BE Regression && UT Coverage Report

Increment line coverage 94.74% (36/38) 🎉

Increment coverage report
Complete coverage report

Category	Coverage
Function Coverage	71.50% (25741/36000)
Line Coverage	54.15% (269033/496820)
Region Coverage	51.83% (224664/433482)
Branch Coverage	53.14% (96235/181109)

hello-stephen · 2026-02-01T14:11:11Z

FE Regression Coverage Report

Increment line coverage 92.31% (12/13) 🎉
Increment coverage report
Complete coverage report

hello-stephen · 2026-02-01T17:43:47Z

BE Regression && UT Coverage Report

Increment line coverage 94.74% (36/38) 🎉

Increment coverage report
Complete coverage report

Category	Coverage
Function Coverage	71.50% (25741/36000)
Line Coverage	54.15% (269033/496820)
Region Coverage	51.83% (224664/433482)
Branch Coverage	53.14% (96235/181109)

hello-stephen · 2026-02-01T17:58:00Z

FE Regression Coverage Report

Increment line coverage 92.31% (12/13) 🎉
Increment coverage report
Complete coverage report

Initial plan

5cf373f

Copilot AI assigned Copilot and zclllyybb Feb 1, 2026

Copilot started work on behalf of zclllyybb February 1, 2026 10:15 View session

Copilot AI and others added 2 commits February 1, 2026 10:21

Implement ord function for BE, FE, and tests

4c1b192

Co-authored-by: zclllyybb <[email protected]>

Fix UTF-8 decoding validation in ord function

a9a2239

Co-authored-by: zclllyybb <[email protected]>

Copilot AI changed the title ~~[WIP] Add String function ord implementation in DuckDB~~ [Feature] Implement string function ord following DuckDB semantics Feb 1, 2026

Copilot AI requested a review from zclllyybb February 1, 2026 10:25

Copilot finished work on behalf of zclllyybb February 1, 2026 10:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Implement string function ord following DuckDB semantics #60409

[Feature] Implement string function ord following DuckDB semantics #60409

Uh oh!

Copilot AI commented Feb 1, 2026 •

edited

Loading

Uh oh!

Thearas commented Feb 1, 2026

Uh oh!

zclllyybb commented Feb 1, 2026

Uh oh!

doris-robot commented Feb 1, 2026

Uh oh!

doris-robot commented Feb 1, 2026

Uh oh!

doris-robot commented Feb 1, 2026

Uh oh!

hello-stephen commented Feb 1, 2026

Uh oh!

hello-stephen commented Feb 1, 2026

Uh oh!

hello-stephen commented Feb 1, 2026

Uh oh!

hello-stephen commented Feb 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[Feature] Implement string function ord following DuckDB semantics #60409

Are you sure you want to change the base?

[Feature] Implement string function ord following DuckDB semantics #60409

Uh oh!

Conversation

Copilot AI commented Feb 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Example

Key difference from ascii()

I tried to connect to the following addresses, but was blocked by firewall rules:

Uh oh!

Thearas commented Feb 1, 2026

Uh oh!

zclllyybb commented Feb 1, 2026

Uh oh!

doris-robot commented Feb 1, 2026

Uh oh!

doris-robot commented Feb 1, 2026

Uh oh!

doris-robot commented Feb 1, 2026

BE UT Coverage Report

Uh oh!

hello-stephen commented Feb 1, 2026

BE Regression && UT Coverage Report

Uh oh!

hello-stephen commented Feb 1, 2026

FE Regression Coverage Report

Uh oh!

hello-stephen commented Feb 1, 2026

BE Regression && UT Coverage Report

Uh oh!

hello-stephen commented Feb 1, 2026

FE Regression Coverage Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Copilot AI commented Feb 1, 2026 •

edited

Loading

Key difference from `ascii()`