Skip to content

Conversation

@mdboom
Copy link
Contributor

@mdboom mdboom commented Jan 22, 2026

This contains changes to ~60 APIs in NVML that needed to have their outside APIs changed to be useful and/or memory safe from Python. This is an important part of making cuda.bindings._nvml public, after which point it will be much harder to change the API.

@leofang mentioned elsewhere that in many cases, adding C shims to convert between what cybind handles and what NVML expects may be a more sustainable solution than hand-writing Cython as here. I think it makes sense to do that as a follow-up, since there is time pressure to name cuda.bindings.nvml public, and this gets us to the point where these APIs are updated to where the APIs wouldn't have to change in future, only the implementation details.

These changes fit into the following general categories:

  • Functions where the input or output argument is a struct containing only an API version and a single element, or sometimes even just a struct with a single element. For speed-of-light and usability reasons, it is much more efficient to make this look like a single input or output argument, rather than forcing the Python side to pack and unpack a single element into a struct/class with a lot of overhead for no gain. For these also, we can remove the AUTO_LOWPP_CLASS generated class, since it isn't needed on the Python side, reducing binary size. This includes get_device_performance_modes, get_dram_encryption_mode, set_dram_encrypytion_mode, system_get_conf_compute_key_rotation_threshold_info, system_set_conf_compute_key_rotation_threshold_info, device_set_power_management_limit_v2, device_get_pdi, device_set_nvlink_device_low_power_threshold, device_get_vgpu_heterogeneous_mode, device_set_vgpu_heterogeneous_mode, vgpu_instance_get_placement_id, vgpu_instance_get_runtime_state_size, vgpu_type_get_max_instances_per_gpu_instance, gpu_instance_get_vgpu_heterogeneous_mode, gpu_instance_set_vgpu_hetereogeneous_mode, device_get_unrepairable_memory_flag, device_set_rusd_settings_v1.

  • Structs containing a statically-sized array that also has a size member indicating the number of valid elements in that array. For example:

struct {
    int size
    Struct_t elements[MAX_NUMBER_OF_ELEMENTS]

By default, cybind would allow access to the uninitialized memory in elements[idx >= size]. By declaring the size element as special, recent cybind's can do the right thing. This affects the structs nvmlBridgeChipHierarchy_t, nvmlClkMonStatus_t, nvmlConfComputeGpuAttestationReport_t, nvmlConfComputeGpuCertificate_t, nvmlGridLicenseableFeatures_t, nvmlNvlinkSupportedBwModes_t.

  • Functions where input and output are in the same struct, but the inputs are few. Again, for speed-of-light, it is much better to let the binding handle allocating the struct and filling it with a small number of passed-in arguments. This includes device_get_vgpu_type_supported_placements.

  • Functions with structs where the input or output argument is a fixed-length string. This includes device_get_conf_compute_gpu_attestation_report, vgpu_type_get_license, device_read_write_prm_v1

  • Functions which take a reference to the value of an input argument. For example, the input is an int, but the API takes an int *. This is very rare (and unnecessary) for input arguments and is not supported by cybind. Includes device_set_temperature_threshold.

  • Functions that return a dynamically-sized array inside a struct (as a size and pointer pair). Includes device_get_vgpu_utilization.

  • Functions that take a versioned struct (a struct with a version number member), but the "base class" version is also a specific version. This needs to be handwritten just because the cast is invalid (though otherwise cybind handles it and generates something that would have worked, had the compiler not complained). Includes device_get_gpu_instance_profile_info_v and device_get_gpu_instance_profile_info_by_id_v

  • get_handle_by_uuid_v: Input to the function is a struct (a variant type to be exact) containing a union to two differently-sized fixed-sized char arrays (string). This isn't supported by AUTO_LOWPP_CLASS.

  • A function that takes a struct which is solely a size/pointer pair. Here, we should take as input a AUTO_LOWPP_ARRAY representing the array, not wrapped up in an additional AUTO_LOWPP_CLASS where the member array would be hard/impossible to initialize and manage the memory properly. This includes device_read_prm_counters_v1.

  • A few functions will be quite hard to support cleanly from Python. They are not on the top of the list to support right now, so I think it best to just SKIP them and come back to them later. These include nvmlGpm* and nvmlDeviceWorkloadPowerProfile*.

  • Functions were already deprecated in 12.9 and should never have been wrapped. (As a policy on this work, I have not wrapped anything that was deprecated in 12.9 or earlier). Includes device_set_power_management_limit

@copy-pr-bot
Copy link
Contributor

copy-pr-bot bot commented Jan 22, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@mdboom
Copy link
Contributor Author

mdboom commented Jan 22, 2026

/ok to test

@github-actions
Copy link

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hand-rewrites approximately 57 NVML APIs for improved memory safety, particularly for dynamically-sized arrays within structs, simplified calling conventions, and support for input/output parameters in the same struct. The changes involve removing auto-generated bindings and replacing them with handwritten implementations. Copyright years are updated to reflect the 2025-2026 timeframe.

Changes:

  • Removed multiple auto-generated NVML API function bindings (e.g., GPM-related functions, power management, temperature threshold, PRM, vGPU functions)
  • Added new test files and test cases for PCI, GPU, and device-related NVML functionality
  • Updated copyright years from 2025 to 2025-2026 across multiple files

Reviewed changes

Copilot reviewed 10 out of 11 changed files in this pull request and generated 17 comments.

Show a summary per file
File Description
cuda_core/tests/system/test_system_device.py Copyright year update only
cuda_bindings/tests/nvml/test_pci.py New test file for PCI-related NVML APIs
cuda_bindings/tests/nvml/test_gpu.py Added tests for GPU attestation reports and certificates
cuda_bindings/tests/nvml/test_device.py New comprehensive test file for device APIs
cuda_bindings/cuda/bindings/cy_nvml.pyx Removed function declarations for APIs being rewritten
cuda_bindings/cuda/bindings/cy_nvml.pxd Removed function signatures for APIs being rewritten
cuda_bindings/cuda/bindings/_nvml.pxd Removed API declarations and added new typedefs
cuda_bindings/cuda/bindings/_internal/_nvml_windows.pyx Removed function pointers and implementations for Windows
cuda_bindings/cuda/bindings/_internal/_nvml_linux.pyx Removed function pointers and implementations for Linux
cuda_bindings/cuda/bindings/_internal/_nvml.pxd Removed internal function declarations

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@mdboom
Copy link
Contributor Author

mdboom commented Jan 22, 2026

@Copilot: You provided some interesting (if mostly incorrect) feedback about the tests. Can you provide feedback about the actual implementation, especially in _nvml.pyx, or do you struggle with Cython?

@mdboom
Copy link
Contributor Author

mdboom commented Jan 22, 2026

/ok to test

@mdboom
Copy link
Contributor Author

mdboom commented Jan 22, 2026

/ok to test

@mdboom
Copy link
Contributor Author

mdboom commented Jan 22, 2026

/ok to test

@mdboom
Copy link
Contributor Author

mdboom commented Jan 23, 2026

/ok to test

@mdboom mdboom marked this pull request as ready for review January 26, 2026 16:52
@copy-pr-bot
Copy link
Contributor

copy-pr-bot bot commented Jan 26, 2026

Auto-sync is disabled for ready for review pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@mdboom mdboom marked this pull request as draft January 26, 2026 17:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant