perf: batched WAND and new WAND structure, ~50% faster#6241
Conversation
Signed-off-by: BubbleCal <bubble-cal@outlook.com>
Review: perf: batched WAND and new WAND structureNice performance improvement — the benchmarks show meaningful gains across the board, especially for phrase queries (~40% latency reduction single-threaded). The head/lead/tail split is a well-known WAND optimization. Issues to considerP0: Potential infinite loop in
P1:
P1: Floating-point subtraction in self.tail_max_score = self.tail_max_score - evicted.upper_bound + upper_bound;Repeated add/subtract of f32 upper bounds will accumulate rounding errors in P1:
Minor nits
Overall a solid optimization. The main concern is verifying the |
Signed-off-by: BubbleCal <bubble-cal@outlook.com>
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
- advances posting iterators in batch - splits posting iterators into `head`, `lead` and `tail`, reduces the cost of updating posting iterators - reuses `query_weight` - pruning documents more aggressive with block max scores - uses a Lucene-style conjunction path for phrase queries, with conjunction intersection and block-max pruning separated from OR WAND Query Type | Version | Mode | QPS | Avg Latency | P90 | P99 -- | -- | -- | -- | -- | -- | -- match | current | single-thread | — | 2.93 ms | 5.62 ms | 7.50 ms match | main (v2) | single-thread | — | 4.22 ms | 7.56 ms | 7.85 ms match | current | 8-concurrency | 612.80 | 13.05 ms | 17.80 ms | 20.78 ms match | main (v2) | 8-concurrency | 599.17 | 13.35 ms | 19.61 ms | 21.86 ms phrase | current | single-thread | — | 2.02 ms | 2.59 ms | 2.62 ms phrase | main (v2) | single-thread | — | 3.60 ms | 4.62 ms | 4.67 ms phrase | current | 8-concurrency | 1597.37 | 5.01 ms | 6.57 ms | 8.60 ms phrase | main (v2) | 8-concurrency | 1040.66 | 7.69 ms | 9.86 ms | 11.23 ms --------- Signed-off-by: BubbleCal <bubble-cal@outlook.com>
head,leadandtail, reduces the cost of updating posting iteratorsquery_weight