Replies: 1 comment
-
|
Most of the items are already done at 8b3efc7 point |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
NodeDB Optimization Opportunities
Current Architecture
All sharing the same memory, same WAL, same SPSC bridge, same Data Plane cores.
Memory Profile (observed during 10M benchmark)
The 150MB idle footprint is excellent. The query spike is the optimization target.
Optimization 1: Streaming Aggregation
Impact: Query memory 5GB → ~500MB
Currently loads all partitions into memory for a GROUP BY. At 100M rows that could be 10-20GB.
This matters even more for multi-model — if a vector search is running alongside a timeseries GROUP BY, both competing for RAM.
Optimization 2: Time Range Partition Pruning
Impact: 2-10x faster for dashboard queries
Each partition has
min_ts/max_tsin metadata. A query like:Should only read 1-2 partitions, not all 20. Most real-world dashboard queries have a time range filter. Currently all partitions are scanned regardless.
Optimization 3: Column Projection Pushdown
Impact: Less disk I/O, less memory
If a query only needs
qtypeandCOUNT(*), don't readclient_ip,qname,elapsed_mscolumns from disk. Each column is stored as a separate.colfile — only open and decompress the columns referenced in the query.The PR #10 mentioned projection pushdown but the 2-5GB RSS during queries suggests it's loading more than necessary.
Optimization 4: Parallel Partition Scan
Impact: 2-3x faster queries
Currently timeseries queries appear to run on a single Data Plane core. Partition scans could be parallelized across the 3 Data Plane cores — each core scans different partitions, merge results.
Optimization 5: Tiered Memory Budgets
Impact: Multi-model stability
Since multiple engines share one process, a memory budget system would prevent one workload from starving others:
Without this, a 100M-row GROUP BY could starve a concurrent vector search or document query.
Optimization 6: OS Page Cache Management
Impact: Better multi-model coexistence
NodeDB reads partition files via sequential I/O. After a query, the OS page cache holds decompressed data. If a document or vector workload then needs memory, the OS evicts those pages — next timeseries query reads from disk again.
Options:
madvise(MADV_SEQUENTIAL)for scan queriesmadvise(MADV_DONTNEED)after aggregation to release pages proactivelyOptimization 7: Ingest Throughput — Batch SPSC Dispatch
Impact: Potentially 85K/s → 600K+/s
Currently each ILP batch goes through SPSC one-at-a-time. Batching multiple ILP lines into a single SPSC message would reduce dispatch overhead. In earlier tests with larger batches (before WAL fix), we saw 685K/s.
Optimization 8: Materialized Aggregates
Impact: Instant dashboard queries
The
continuous aggregatefeature in the codebase (nodedb/src/engine/timeseries/continuous_aggregate) could pre-compute common GROUP BY and time_bucket results on flush:Dashboard queries would read pre-computed results instead of scanning raw data.
Optimization 9: WAL Isolation for Multi-Model
Impact: Workload isolation
The WAL is shared across all engines. A heavy timeseries ingest (100M rows) generates massive WAL segments that slow down WAL replay for KV and document operations. Per-engine or per-collection WAL segments would isolate workloads.
Optimization 10: WASM UDF Edge Computation
Impact: Unique competitive advantage
The Event Plane with WASM UDFs opens a unique optimization: push aggregation logic into UDFs that run at ingest time. Instead of storing 100M raw rows and aggregating at query time, compute the aggregates during ingest.
No other timeseries DB has this capability. This could make certain dashboard queries effectively zero-latency.
Priority Matrix
The first 3 would close most of the gap with ClickHouse while keeping NodeDB's multi-model advantage.
Beta Was this translation helpful? Give feedback.
All reactions