Open
Conversation
0249a54 to
a4c923c
Compare
e52bb67 to
21537a6
Compare
a10y
commented
Feb 11, 2026
| { | ||
| let reference = <T as From<u8>>::from(REFERENCE_VALUE); | ||
| let data: Vec<T> = (0..len) | ||
| .map(|i| <T as From<u8>>::from((i % 256) as u8) + reference) |
Contributor
Author
There was a problem hiding this comment.
this was overflowing before?
052da59 to
249c24c
Compare
a10y
commented
Feb 12, 2026
| let mut total_time = Duration::ZERO; | ||
| let mut cuda_ctx = CudaSession::create_execution_ctx(&VortexSession::empty()) | ||
| .vortex_expect("failed to create execution context") | ||
| .with_launch_strategy(Arc::new(timed)); |
Contributor
Author
There was a problem hiding this comment.
see here: instead of replicating the full launch setup in benchmark code, we can just stub in a launcher that collects timing information across runs
a10y
commented
Feb 12, 2026
| }}; | ||
| /// Implementations can add tracing, async callbacks, or other behavior | ||
| /// around kernel launches. | ||
| pub trait LaunchStrategy: Debug + Send + Sync + 'static { |
Contributor
Author
There was a problem hiding this comment.
this is where LaunchStrategy is defined and impled
Signed-off-by: Andrew Duffy <[email protected]> fixup Signed-off-by: Andrew Duffy <[email protected]>
780efdb to
7b61bd6
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Overview of changes
ergonomics/API focused changes
LaunchStrategyon the execution context. This by default will launch kernels and not track any timing information, but it is pluggable. For example in benchmarks we replace this with aTimedLaunchedStrategywhich executes the kernels in blocking mode and logs their execution time.ctx.launch_kernel()method, which accepts a closure that is used to populate kernel argumentsA lot of test and benchmark code needed to be updated to use the new launch methods.
Fused FOR + BPThis has been shelved for a FLUP since this was too big
* I've updated the BP kernel generator to generate bp as FFOR, i.e. fused bitpacking with FOR. In practice, this is just adding aconst T referenceparam. By default the execution for BitPackedArray passeszero, but there is a specialization in theForArrayexecution tree where if it detects one of its descendants is BP, it fuses itself with the bit unpackingGPU tracing tool
There's a new binary in
vortex-test-e2e-cuda-scanwhich takes as input a Vortex file.It will recompress the file using only GPU-supported encodings, scan it back, and collect timings for how long each column scan took. The results are printed as either pretty text, or as JSON to stdout, which can be piped into duckdb or similar for analysis
Example usage: