cuda.bindings.nvml: Big wave of handwritten API #1524

mdboom · 2026-01-22T17:40:47Z

This contains changes to ~60 APIs in NVML that needed to have their outside APIs changed to be useful and/or memory safe from Python. This is an important part of making cuda.bindings._nvml public, after which point it will be much harder to change the API.

@leofang mentioned elsewhere that in many cases, adding C shims to convert between what cybind handles and what NVML expects may be a more sustainable solution than hand-writing Cython as here. I think it makes sense to do that as a follow-up, since there is time pressure to name cuda.bindings.nvml public, and this gets us to the point where these APIs are updated to where the APIs wouldn't have to change in future, only the implementation details.

These changes fit into the following general categories:

Functions where the input or output argument is a struct containing only an API version and a single element, or sometimes even just a struct with a single element. For speed-of-light and usability reasons, it is much more efficient to make this look like a single input or output argument, rather than forcing the Python side to pack and unpack a single element into a struct/class with a lot of overhead for no gain. For these also, we can remove the AUTO_LOWPP_CLASS generated class, since it isn't needed on the Python side, reducing binary size. This includes get_device_performance_modes, get_dram_encryption_mode, set_dram_encrypytion_mode, system_get_conf_compute_key_rotation_threshold_info, system_set_conf_compute_key_rotation_threshold_info, device_set_power_management_limit_v2, device_get_pdi, device_set_nvlink_device_low_power_threshold, device_get_vgpu_heterogeneous_mode, device_set_vgpu_heterogeneous_mode, vgpu_instance_get_placement_id, vgpu_instance_get_runtime_state_size, vgpu_type_get_max_instances_per_gpu_instance, gpu_instance_get_vgpu_heterogeneous_mode, gpu_instance_set_vgpu_hetereogeneous_mode, device_get_unrepairable_memory_flag, device_set_rusd_settings_v1.
Structs containing a statically-sized array that also has a size member indicating the number of valid elements in that array. For example:

struct {
    int size
    Struct_t elements[MAX_NUMBER_OF_ELEMENTS]

By default, cybind would allow access to the uninitialized memory in elements[idx >= size]. By declaring the size element as special, recent cybind's can do the right thing. This affects the structs nvmlBridgeChipHierarchy_t, nvmlClkMonStatus_t, nvmlConfComputeGpuAttestationReport_t, nvmlConfComputeGpuCertificate_t, nvmlGridLicenseableFeatures_t, nvmlNvlinkSupportedBwModes_t.

Functions where input and output are in the same struct, but the inputs are few. Again, for speed-of-light, it is much better to let the binding handle allocating the struct and filling it with a small number of passed-in arguments. This includes device_get_vgpu_type_supported_placements.
Functions with structs where the input or output argument is a fixed-length string. This includes device_get_conf_compute_gpu_attestation_report, vgpu_type_get_license, device_read_write_prm_v1
Functions which take a reference to the value of an input argument. For example, the input is an int, but the API takes an int *. This is very rare (and unnecessary) for input arguments and is not supported by cybind. Includes device_set_temperature_threshold.
Functions that return a dynamically-sized array inside a struct (as a size and pointer pair). Includes device_get_vgpu_utilization.
Functions that take a versioned struct (a struct with a version number member), but the "base class" version is also a specific version. This needs to be handwritten just because the cast is invalid (though otherwise cybind handles it and generates something that would have worked, had the compiler not complained). Includes device_get_gpu_instance_profile_info_v and device_get_gpu_instance_profile_info_by_id_v
get_handle_by_uuid_v: Input to the function is a struct (a variant type to be exact) containing a union to two differently-sized fixed-sized char arrays (string). This isn't supported by AUTO_LOWPP_CLASS.
A function that takes a struct which is solely a size/pointer pair. Here, we should take as input a AUTO_LOWPP_ARRAY representing the array, not wrapped up in an additional AUTO_LOWPP_CLASS where the member array would be hard/impossible to initialize and manage the memory properly. This includes device_read_prm_counters_v1.
A few functions will be quite hard to support cleanly from Python. They are not on the top of the list to support right now, so I think it best to just SKIP them and come back to them later. These include nvmlGpm* and nvmlDeviceWorkloadPowerProfile*.
Functions were already deprecated in 12.9 and should never have been wrapped. (As a policy on this work, I have not wrapped anything that was deprecated in 12.9 or earlier). Includes device_set_power_management_limit

copy-pr-bot · 2026-01-22T17:40:51Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

mdboom · 2026-01-22T17:41:08Z

/ok to test

github-actions · 2026-01-22T17:52:22Z

Doc Preview CI
🚀 View preview at https://nvidia.github.io/cuda-python/pr-preview/pr-1524/
https://nvidia.github.io/cuda-python/pr-preview/pr-1524/cuda-core/
https://nvidia.github.io/cuda-python/pr-preview/pr-1524/cuda-bindings/
https://nvidia.github.io/cuda-python/pr-preview/pr-1524/cuda-pathfinder/
Preview will be ready when the GitHub Pages deployment is complete.

Copilot

Pull request overview

This PR hand-rewrites approximately 57 NVML APIs for improved memory safety, particularly for dynamically-sized arrays within structs, simplified calling conventions, and support for input/output parameters in the same struct. The changes involve removing auto-generated bindings and replacing them with handwritten implementations. Copyright years are updated to reflect the 2025-2026 timeframe.

Changes:

Removed multiple auto-generated NVML API function bindings (e.g., GPM-related functions, power management, temperature threshold, PRM, vGPU functions)
Added new test files and test cases for PCI, GPU, and device-related NVML functionality
Updated copyright years from 2025 to 2025-2026 across multiple files

Reviewed changes

Copilot reviewed 10 out of 11 changed files in this pull request and generated 17 comments.

Show a summary per file

File	Description
cuda_core/tests/system/test_system_device.py	Copyright year update only
cuda_bindings/tests/nvml/test_pci.py	New test file for PCI-related NVML APIs
cuda_bindings/tests/nvml/test_gpu.py	Added tests for GPU attestation reports and certificates
cuda_bindings/tests/nvml/test_device.py	New comprehensive test file for device APIs
cuda_bindings/cuda/bindings/cy_nvml.pyx	Removed function declarations for APIs being rewritten
cuda_bindings/cuda/bindings/cy_nvml.pxd	Removed function signatures for APIs being rewritten
cuda_bindings/cuda/bindings/_nvml.pxd	Removed API declarations and added new typedefs
cuda_bindings/cuda/bindings/_internal/_nvml_windows.pyx	Removed function pointers and implementations for Windows
cuda_bindings/cuda/bindings/_internal/_nvml_linux.pyx	Removed function pointers and implementations for Linux
cuda_bindings/cuda/bindings/_internal/_nvml.pxd	Removed internal function declarations

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

cuda_bindings/tests/nvml/test_device.py

cuda_bindings/tests/nvml/test_gpu.py

cuda_bindings/tests/nvml/test_device.py

mdboom · 2026-01-22T18:04:03Z

@Copilot: You provided some interesting (if mostly incorrect) feedback about the tests. Can you provide feedback about the actual implementation, especially in _nvml.pyx, or do you struggle with Cython?

mdboom · 2026-01-22T18:15:38Z

/ok to test

mdboom · 2026-01-22T18:54:31Z

/ok to test

mdboom · 2026-01-22T20:36:03Z

/ok to test

mdboom · 2026-01-23T14:44:45Z

/ok to test

copy-pr-bot · 2026-01-26T16:52:29Z

Auto-sync is disabled for ready for review pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

cuda.bindings.nvml: Big wave of handwritten API

338d5b1

mdboom requested a review from Copilot January 22, 2026 17:46

Copilot started reviewing on behalf of mdboom January 22, 2026 17:46 View session

Copilot AI reviewed Jan 22, 2026

View reviewed changes

Minor changes based on Copilot's comments

b11d828

Fix tests, having run on more hardware

1299944

Further fix tests

6fc9e77

Fix tests on even more hardware

3566c1d

More robust testing for Windows

a75e4c3

Merge remote-tracking branch 'upstream/main' into nvml-more-handwritten

ac156d6

mdboom marked this pull request as ready for review January 26, 2026 16:52

mdboom marked this pull request as draft January 26, 2026 17:01

cuda.bindings.nvml: Big wave of handwritten API #1524

Are you sure you want to change the base?

cuda.bindings.nvml: Big wave of handwritten API #1524

Uh oh!

Conversation

mdboom commented Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

copy-pr-bot bot commented Jan 22, 2026

Uh oh!

mdboom commented Jan 22, 2026

Uh oh!

github-actions bot commented Jan 22, 2026

Preview will be ready when the GitHub Pages deployment is complete.

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mdboom commented Jan 22, 2026

Uh oh!

mdboom commented Jan 22, 2026

Uh oh!

mdboom commented Jan 22, 2026

Uh oh!

mdboom commented Jan 22, 2026

Uh oh!

mdboom commented Jan 23, 2026

Uh oh!

copy-pr-bot bot commented Jan 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mdboom commented Jan 22, 2026 •

edited

Loading