-
Notifications
You must be signed in to change notification settings - Fork 242
cuda.bindings.nvml: Big wave of handwritten API #1524
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
/ok to test |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR hand-rewrites approximately 57 NVML APIs for improved memory safety, particularly for dynamically-sized arrays within structs, simplified calling conventions, and support for input/output parameters in the same struct. The changes involve removing auto-generated bindings and replacing them with handwritten implementations. Copyright years are updated to reflect the 2025-2026 timeframe.
Changes:
- Removed multiple auto-generated NVML API function bindings (e.g., GPM-related functions, power management, temperature threshold, PRM, vGPU functions)
- Added new test files and test cases for PCI, GPU, and device-related NVML functionality
- Updated copyright years from 2025 to 2025-2026 across multiple files
Reviewed changes
Copilot reviewed 10 out of 11 changed files in this pull request and generated 17 comments.
Show a summary per file
| File | Description |
|---|---|
| cuda_core/tests/system/test_system_device.py | Copyright year update only |
| cuda_bindings/tests/nvml/test_pci.py | New test file for PCI-related NVML APIs |
| cuda_bindings/tests/nvml/test_gpu.py | Added tests for GPU attestation reports and certificates |
| cuda_bindings/tests/nvml/test_device.py | New comprehensive test file for device APIs |
| cuda_bindings/cuda/bindings/cy_nvml.pyx | Removed function declarations for APIs being rewritten |
| cuda_bindings/cuda/bindings/cy_nvml.pxd | Removed function signatures for APIs being rewritten |
| cuda_bindings/cuda/bindings/_nvml.pxd | Removed API declarations and added new typedefs |
| cuda_bindings/cuda/bindings/_internal/_nvml_windows.pyx | Removed function pointers and implementations for Windows |
| cuda_bindings/cuda/bindings/_internal/_nvml_linux.pyx | Removed function pointers and implementations for Linux |
| cuda_bindings/cuda/bindings/_internal/_nvml.pxd | Removed internal function declarations |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
@Copilot: You provided some interesting (if mostly incorrect) feedback about the tests. Can you provide feedback about the actual implementation, especially in |
|
/ok to test |
|
/ok to test |
|
/ok to test |
|
/ok to test |
|
Auto-sync is disabled for ready for review pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
This contains changes to ~60 APIs in NVML that needed to have their outside APIs changed to be useful and/or memory safe from Python. This is an important part of making
cuda.bindings._nvmlpublic, after which point it will be much harder to change the API.@leofang mentioned elsewhere that in many cases, adding C shims to convert between what cybind handles and what NVML expects may be a more sustainable solution than hand-writing Cython as here. I think it makes sense to do that as a follow-up, since there is time pressure to name
cuda.bindings.nvmlpublic, and this gets us to the point where these APIs are updated to where the APIs wouldn't have to change in future, only the implementation details.These changes fit into the following general categories:
Functions where the input or output argument is a struct containing only an API version and a single element, or sometimes even just a struct with a single element. For speed-of-light and usability reasons, it is much more efficient to make this look like a single input or output argument, rather than forcing the Python side to pack and unpack a single element into a struct/class with a lot of overhead for no gain. For these also, we can remove the
AUTO_LOWPP_CLASSgenerated class, since it isn't needed on the Python side, reducing binary size. This includesget_device_performance_modes,get_dram_encryption_mode,set_dram_encrypytion_mode,system_get_conf_compute_key_rotation_threshold_info,system_set_conf_compute_key_rotation_threshold_info,device_set_power_management_limit_v2,device_get_pdi,device_set_nvlink_device_low_power_threshold,device_get_vgpu_heterogeneous_mode,device_set_vgpu_heterogeneous_mode,vgpu_instance_get_placement_id,vgpu_instance_get_runtime_state_size,vgpu_type_get_max_instances_per_gpu_instance,gpu_instance_get_vgpu_heterogeneous_mode,gpu_instance_set_vgpu_hetereogeneous_mode,device_get_unrepairable_memory_flag,device_set_rusd_settings_v1.Structs containing a statically-sized array that also has a size member indicating the number of valid elements in that array. For example:
By default, cybind would allow access to the uninitialized memory in
elements[idx >= size]. By declaring the size element as special, recent cybind's can do the right thing. This affects the structsnvmlBridgeChipHierarchy_t,nvmlClkMonStatus_t,nvmlConfComputeGpuAttestationReport_t,nvmlConfComputeGpuCertificate_t,nvmlGridLicenseableFeatures_t,nvmlNvlinkSupportedBwModes_t.Functions where input and output are in the same struct, but the inputs are few. Again, for speed-of-light, it is much better to let the binding handle allocating the struct and filling it with a small number of passed-in arguments. This includes
device_get_vgpu_type_supported_placements.Functions with structs where the input or output argument is a fixed-length string. This includes
device_get_conf_compute_gpu_attestation_report,vgpu_type_get_license,device_read_write_prm_v1Functions which take a reference to the value of an input argument. For example, the input is an
int, but the API takes anint *. This is very rare (and unnecessary) for input arguments and is not supported by cybind. Includesdevice_set_temperature_threshold.Functions that return a dynamically-sized array inside a struct (as a size and pointer pair). Includes
device_get_vgpu_utilization.Functions that take a versioned struct (a struct with a version number member), but the "base class" version is also a specific version. This needs to be handwritten just because the cast is invalid (though otherwise cybind handles it and generates something that would have worked, had the compiler not complained). Includes
device_get_gpu_instance_profile_info_vanddevice_get_gpu_instance_profile_info_by_id_vget_handle_by_uuid_v: Input to the function is a struct (a variant type to be exact) containing a union to two differently-sized fixed-sized char arrays (string). This isn't supported byAUTO_LOWPP_CLASS.A function that takes a struct which is solely a size/pointer pair. Here, we should take as input a
AUTO_LOWPP_ARRAYrepresenting the array, not wrapped up in an additionalAUTO_LOWPP_CLASSwhere the member array would be hard/impossible to initialize and manage the memory properly. This includesdevice_read_prm_counters_v1.A few functions will be quite hard to support cleanly from Python. They are not on the top of the list to support right now, so I think it best to just
SKIPthem and come back to them later. These includenvmlGpm*andnvmlDeviceWorkloadPowerProfile*.Functions were already deprecated in 12.9 and should never have been wrapped. (As a policy on this work, I have not wrapped anything that was deprecated in 12.9 or earlier). Includes
device_set_power_management_limit