Skip to content

CLI for measuring execute_cuda encoding perf#6381

Open
a10y wants to merge 1 commit intodevelopfrom
aduffy/gpu-scan-measure
Open

CLI for measuring execute_cuda encoding perf#6381
a10y wants to merge 1 commit intodevelopfrom
aduffy/gpu-scan-measure

Conversation

@a10y
Copy link
Contributor

@a10y a10y commented Feb 9, 2026

Overview of changes

ergonomics/API focused changes

  • Introduced a new LaunchStrategy on the execution context. This by default will launch kernels and not track any timing information, but it is pluggable. For example in benchmarks we replace this with a TimedLaunchedStrategy which executes the kernels in blocking mode and logs their execution time.
  • Centralized the entrypoint for launching all kernels. They are now forced to be dispatched off of the execution context using the ctx.launch_kernel() method, which accepts a closure that is used to populate kernel arguments

A lot of test and benchmark code needed to be updated to use the new launch methods.

Fused FOR + BP

This has been shelved for a FLUP since this was too big

* I've updated the BP kernel generator to generate bp as FFOR, i.e. fused bitpacking with FOR. In practice, this is just adding a const T reference param. By default the execution for BitPackedArray passes zero, but there is a specialization in the ForArray execution tree where if it detects one of its descendants is BP, it fuses itself with the bit unpacking

GPU tracing tool

There's a new binary in vortex-test-e2e-cuda-scan which takes as input a Vortex file.

It will recompress the file using only GPU-supported encodings, scan it back, and collect timings for how long each column scan took. The results are printed as either pretty text, or as JSON to stdout, which can be piped into duckdb or similar for analysis

Example usage:

FLAT_LAYOUT_INLINE_ARRAY_NODE=true RUST_LOG=vortex_cuda=trace,info cargo run --release --bin vortex-test-e2e-cuda-scan -- ./vortex-bench/data/tpch/1.0/vortex-file-compressed/lineitem_0.vortex

@a10y a10y force-pushed the aduffy/gpu-scan-measure branch from 0249a54 to a4c923c Compare February 9, 2026 22:51
@a10y a10y marked this pull request as ready for review February 9, 2026 22:51
@a10y a10y added changelog/chore A trivial change changelog/skip Do not list PR in the changelog and removed changelog/chore A trivial change labels Feb 9, 2026
@a10y a10y requested a review from joseph-isaacs February 9, 2026 22:51
@a10y a10y force-pushed the aduffy/gpu-scan-measure branch 5 times, most recently from e52bb67 to 21537a6 Compare February 10, 2026 20:08
{
let reference = <T as From<u8>>::from(REFERENCE_VALUE);
let data: Vec<T> = (0..len)
.map(|i| <T as From<u8>>::from((i % 256) as u8) + reference)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was overflowing before?

@a10y a10y force-pushed the aduffy/gpu-scan-measure branch 3 times, most recently from 052da59 to 249c24c Compare February 12, 2026 14:33
let mut total_time = Duration::ZERO;
let mut cuda_ctx = CudaSession::create_execution_ctx(&VortexSession::empty())
.vortex_expect("failed to create execution context")
.with_launch_strategy(Arc::new(timed));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see here: instead of replicating the full launch setup in benchmark code, we can just stub in a launcher that collects timing information across runs

}};
/// Implementations can add tracing, async callbacks, or other behavior
/// around kernel launches.
pub trait LaunchStrategy: Debug + Send + Sync + 'static {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is where LaunchStrategy is defined and impled

Signed-off-by: Andrew Duffy <[email protected]>

fixup

Signed-off-by: Andrew Duffy <[email protected]>
@a10y a10y force-pushed the aduffy/gpu-scan-measure branch from 780efdb to 7b61bd6 Compare February 12, 2026 20:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog/skip Do not list PR in the changelog

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants