slurmd-pyxis:25.11-ubuntu24.04 appears to lack NVML-enabled Slurm build, so AutoDetect=nvml reports 0 GPUs

**Summary**

When using:

ghcr.io/slinkyproject/slurmd-pyxis:25.11-ubuntu24.04
Slurm 25.11.4
gres.conf with AutoDetect=nvml

the worker nodes report 0 GPUs to Slurm and get drained as invalid, even though the container can see all GPUs and NVML is present.

This looks like the Slurm build inside the image was compiled without NVML support (HAVE_NVML), so AutoDetect=nvml cannot work.

**Environment**

Image: ghcr.io/slinkyproject/slurmd-pyxis:25.11-ubuntu24.04

Slurm version:
```
slurmd -V → slurm 25.11.4
scontrol -V → slurm 25.11.4
```
Arch: `aarch64`
OS in worker: `Ubuntu 24.04-based image`
GPUs: 4 NVIDIA GPUs visible in the worker container

**Config**

Relevant config:

```
configFiles:
  gres.conf: |
    AutoDetect=nvml

nodesets:
  slinky:
    slurmd:
      resources:
        limits:
          nvidia.com/gpu: 4
    extraConfMap:
      Gres: "gpu:4"
```

**Observed behavior**

Slurm drains the nodes as invalid.

sinfo -R:
```
REASON               USER      TIMESTAMP           NODELIST
gres/gpu count repor slurm     2026-04-14T06:39:53 slinky-0
gres/gpu count repor slurm     2026-04-14T06:39:54 slinky-1
```
scontrol show node slinky-0:
```
Gres=gpu:4
State=IDLE+DRAIN+DYNAMIC_NORM+INVALID_REG
CfgTRES=cpu=80,mem=204420M,billing=80
Reason=gres/gpu count reported lower than configured (0 < 4)
```
Worker log:
```
[2026-04-14T06:39:53] We were configured to autodetect nvml functionality, but we weren't able to find that lib when Slurm was configured.
[2026-04-14T06:39:53] warning: Ignoring file-less GPU gpu:(null) from final GRES list
```

**What works inside the worker container**

The worker container can see the GPUs:

nvidia-smi -L

shows 4 GPUs.

NVML libraries are present:
```
find /usr /lib -name 'libnvidia-ml.so*' | sort
```
output:
```
/usr/lib/aarch64-linux-gnu/libnvidia-ml.so
/usr/lib/aarch64-linux-gnu/libnvidia-ml.so.1
/usr/lib/aarch64-linux-gnu/libnvidia-ml.so.590.48.01
```
ldconfig -p | grep -i nvidia-ml:
```
libnvidia-ml.so.1 (libc6,AArch64) => /usr/lib/aarch64-linux-gnu/libnvidia-ml.so.1
libnvidia-ml.so (libc6,AArch64) => /usr/lib/aarch64-linux-gnu/libnvidia-ml.so
```
Symlinks also exist correctly under both /usr/lib/aarch64-linux-gnu and /lib/aarch64-linux-gnu.

Installed packages include:
```
ii  nvslurm-plugin-pyxis             0.23.0-1
ii  slurm-smd                        25.11.4-1
ii  slurm-smd-slurmd                 25.11.4-1
```

**Why I think this is a build/package issue**

This does not look like a runtime NVML visibility problem, because:

nvidia-smi works in the worker container
libnvidia-ml.so is present
libnvidia-ml.so is in the linker cache

The exact log line:

We were configured to autodetect nvml functionality, but we weren't able to find that lib when Slurm was configured.

suggests Slurm itself was built without NVML support, so AutoDetect=nvml cannot work regardless of runtime library presence.

Also, strings /usr/sbin/slurmd | grep -i nvml returns nothing.

**Expected behavior**

With:
```
gres.conf: |
  AutoDetect=nvml
```
and:
```
Gres: "gpu:4"
```
I would expect slurmd to detect the 4 GPUs via NVML and register the node with:
```
Gres=gpu:4
CfgTRES=...gres/gpu=4
```
without draining the node.

**Actual behavior**

slurmd reports 0 GPUs to Slurm, and the node is drained with:
```
Reason=gres/gpu count reported lower than configured (0 < 4)
```
**Question**

Can you confirm whether ghcr.io/slinkyproject/slurmd-pyxis:25.11-ubuntu24.04 / the underlying slurm-smd packages were built without NVML support?

If yes, would it be possible to publish an image/package variant with NVML-enabled Slurm so AutoDetect=nvml works?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

slurmd-pyxis:25.11-ubuntu24.04 appears to lack NVML-enabled Slurm build, so AutoDetect=nvml reports 0 GPUs #14

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

slurmd-pyxis:25.11-ubuntu24.04 appears to lack NVML-enabled Slurm build, so AutoDetect=nvml reports 0 GPUs #14

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions