Skip to content

virtUOS/nvidia_driver_cuda

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Ansible Role: nvidia_driver_cuda

Installs NVIDIA GPU drivers via DNF module streams on RHEL 9 systems using DKMS. This is a minimal driver install through the package nvidia-driver-cuda without any extras shipped by the whole nvidia-driver-module. The role can update and downgrade driver versions (through nvidia_module_stream), if you do not need the specific driver version but only the stream.

What it does

  1. Upgrades all system packages and reboots if needed (optional)
  2. Installs kernel headers, DKMS, and other prerequisites
  3. Blacklists the nouveau driver and rebuilds initramfs
  4. Adds the NVIDIA CUDA repository, installs the driver, then disables the repo
  5. Reboots to load the NVIDIA kernel module
  6. Verifies the driver is loaded via nvidia-smi

Requirements

  • RHEL 9 (or compatible: Rocky, Alma, CentOS)
  • A host with an NVIDIA GPU

Role Variables

Variable Default Description
nvidia_module_stream "590-dkms" DNF module stream for nvidia-driver
nvidia_driver_version "" Pin a specific driver version. Empty = latest from stream
nvidia_cuda_repo_url "http://developer.download.nvidia.com/compute/cuda/repos/rhel9/{{ ansible_facts['architecture'] }}" NVIDIA CUDA repository URL
nvidia_cuda_repo_gpg_key https://developer.download.nvidia.com/compute/cuda/repos/rhel9/{{ ansible_facts['architecture'] }}/D42D0685.pub GPG key for the CUDA repository
nvidia_system_upgrade true Whether to run a full dnf upgrade before installing (will cause a reboot)
nvidia_reboot_timeout 300 Seconds to wait for host to come back after reboot
nvidia_post_reboot_delay 30 Seconds to wait after host is reachable before continuing

Usage

Basic

- hosts: gpu_nodes
  become: true
  roles:
    - nvidia_driver_cuda

Pinned driver version

- hosts: gpu_nodes
  become: true
  roles:
    - role: nvidia_driver_cuda
      nvidia_module_stream: "550-dkms"
      nvidia_driver_version: "550.127.05"

Skip system upgrade

- hosts: gpu_nodes
  become: true
  roles:
    - role: nvidia_driver_cuda
      nvidia_system_upgrade: false

Handlers

  • Reboot to apply nvidia driver changes: triggered when nouveau is blacklisted or driver installed/updated

Handlers are flushed before the verification step, so nvidia-smi always runs against the freshly loaded driver.

About

Ansible role for a minimalistic nvidia driver + cuda install

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors