[feature](inverted index) support multiple tokenize index in one column #60415

airborne12 · 2026-02-02T01:59:22Z

Summary

Cherry-pick of #59117 to branch-4.0.

This PR adds support for multiple tokenize indexes on a single column, allowing users to create multiple inverted indexes with different analyzers (e.g., chinese, english, standard) on the same text column and query using specific analyzers.

Key changes:

Add USING ANALYZER syntax for MATCH predicates
Support multiple inverted indexes with different analyzers on the same column
Add analyzer key normalization and matching logic in BE
Add filterIndexesByAnalyzer method in OlapTable for index selection

Conflicts Resolved

MatchPredicate.java - Updated constructor signatures for analyzer support
OlapTable.java - Added imports and new filtering methods
ExpressionTranslator.java - Updated to use new MatchPredicate constructor

Test plan

Regression tests included in original PR
Unit tests for analyzer key matcher and normalizer

Check List (For Author)

Test
- Regression test
- Unit Test
- Manual test
- No need to test
Behavior changed:
- No.
- Yes. Adds new USING ANALYZER syntax for MATCH predicates.
Does this need documentation?
- Yes. (covered by original PR documentation)

Check List (For Reviewer who merge this PR)

Confirm the release note
Confirm test cases
Confirm document
Add branch pick label

…mn (apache#59117) This PR implements **Multi-Analyzer Inverted Index** feature, which allows creating multiple inverted indexes with different analyzers on a single column. 1. **Multiple Indexes on Single Column**: Create multiple inverted indexes with different analyzers (standard, keyword, chinese, custom) on the same column 2. **USING ANALYZER Syntax**: Query with specific analyzer using `MATCH ... USING ANALYZER analyzer_name` 3. **Smart Index Selection**: When specified analyzer's index is not built, automatically falls back to non-index path (correct results guaranteed) 4. **Analyzer Identity Detection**: Prevents duplicate indexes with same analyzer configuration - Multi-language search on same text column - Precision vs. recall trade-off (exact match vs. fuzzy search) - Autocomplete with edge_ngram while keeping standard search ```sql -- Create table with multiple indexes CREATE TABLE articles ( id INT, content TEXT, INDEX idx_std (content) USING INVERTED PROPERTIES("analyzer" = "std_analyzer"), INDEX idx_kw (content) USING INVERTED PROPERTIES("analyzer" = "kw_analyzer") ) ...; -- Query with specific analyzer SELECT * FROM articles WHERE content MATCH 'hello' USING ANALYZER std_analyzer; SELECT * FROM articles WHERE content MATCH 'hello' USING ANALYZER kw_analyzer; ```

Thearas · 2026-02-02T01:59:27Z

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

What problem was fixed (it's best to include specific error reporting information). How it was fixed.
Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
What features were added. Why was this function added?
Which code was refactored and why was this part of the code refactored?
Which functions were optimized and what is the difference before and after the optimization?

airborne12 · 2026-02-02T02:01:01Z

run buildall

- Add getAnalyzerIdentity() method to IndexDef (branch-4.0 uses IndexDef, not IndexDefinition) - Fix MatchPredicate to use setNullableFromNereids() instead of direct field assignment - Change IndexDefinition.IndexType to IndexDef.IndexType in Index.java - Remove unrelated ColumnSeqMapping methods from OlapTable (not present in branch-4.0)

airborne12 · 2026-02-02T03:42:36Z

run buildall

hello-stephen · 2026-02-02T07:05:57Z

BE UT Coverage Report

Increment line coverage 69.59% (238/342) 🎉

Increment coverage report
Complete coverage report

Category	Coverage
Function Coverage	52.99% (19033/35917)
Line Coverage	36.12% (177235/490668)
Region Coverage	32.76% (137441/419497)
Branch Coverage	33.62% (59528/177075)

airborne12 requested a review from yiguolei as a code owner February 2, 2026 01:59

yiguolei approved these changes Feb 2, 2026

View reviewed changes

yiguolei merged commit 9512b4a into apache:branch-4.0 Feb 2, 2026
24 of 28 checks passed

airborne12 deleted the pick-59117-to-4.0 branch February 2, 2026 08:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feature](inverted index) support multiple tokenize index in one column #60415

[feature](inverted index) support multiple tokenize index in one column #60415

airborne12 commented Feb 2, 2026

Uh oh!

Thearas commented Feb 2, 2026

Uh oh!

airborne12 commented Feb 2, 2026

Uh oh!

airborne12 commented Feb 2, 2026

Uh oh!

hello-stephen commented Feb 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[feature](inverted index) support multiple tokenize index in one column #60415

[feature](inverted index) support multiple tokenize index in one column #60415

Conversation

airborne12 commented Feb 2, 2026

Summary

Conflicts Resolved

Test plan

Check List (For Author)

Check List (For Reviewer who merge this PR)

Uh oh!

Thearas commented Feb 2, 2026

Uh oh!

airborne12 commented Feb 2, 2026

Uh oh!

airborne12 commented Feb 2, 2026

Uh oh!

hello-stephen commented Feb 2, 2026

BE UT Coverage Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants