nf-core · pinin4fjords · Apr 14, 2026 · Apr 16, 2026 · Apr 16, 2026 · Apr 16, 2026
diff --git a/...ocs/src/content/docs/specifications/components/modules/resource-requirements.md b/...ocs/src/content/docs/specifications/components/modules/resource-requirements.md
@@ -20,6 +20,31 @@ For example, `--threads $task.cpus`.
 
 If the tool does not support multi-threading, consider `process_single` unless large amounts of RAM are required.
 
+## GPU acceleration
+
+Modules that support GPU acceleration SHOULD use `task.accelerator{:groovy}` to detect whether a GPU has been requested.
+Pipelines control GPU allocation by setting `accelerator = 1{:groovy}` in their process config (e.g., via a `process_gpu` label or a `withName` block).
+
+The module SHOULD NOT set the `accelerator` directive itself.
+
+:::info{title="Rationale" collapse}
+Placing GPU allocation in the pipeline config lets users control it through their pipeline config or profiles.
+A label-only alternative (e.g., requiring a `process_gpu` label) would not work for modules that support both CPU and GPU modes (e.g., [`ribodetector`](https://github.com/nf-core/modules/tree/master/modules/nf-core/ribodetector)), so the specification leaves this to the pipeline author.
+:::
+
+See [Software requirements: GPU-capable modules](/docs/specifications/components/modules/software-requirements#gpu-capable-modules) for container patterns based on `task.accelerator`.
+
+:::tip{title="Pipeline-side GPU configuration"}
+Pipelines set `accelerator = 1{:groovy}` and GPU container flags via `containerOptions` in their process config.
+Use `containerOptions` (not global `docker.runOptions`) to scope GPU flags to GPU processes only.
+:::
+
+:::caution{title="GPU concurrency under Singularity"}
+Multiple concurrent GPU processes sharing a single GPU can deadlock under Singularity.
+Docker's NVIDIA runtime handles GPU memory arbitration, but Singularity does not.
+When GPU tasks may land on the same machine (CI, local executor, shared HPC nodes), set `maxForks = 1` on GPU processes to serialise access.
+:::
+
 ## Specifying multiple threads for piped commands
 
 If a module contains _multiple_ tools that support multi-threading (e.g., [piping output into a samtools command](https://github.com/nf-core/modules/blob/c4cc1db284faba9fc4896f64bddf7703cedc7430/modules/nf-core/bowtie2/align/main.nf#L47-L54)), assign CPUs per tool:

diff --git a/...ocs/src/content/docs/specifications/components/modules/software-requirements.md b/...ocs/src/content/docs/specifications/components/modules/software-requirements.md
@@ -79,6 +79,112 @@ It is also possible for a new multi-tool container to be built and added to BioC
 
 - If the multi-tool container already exists, use [this](https://midnighter.github.io/mulled) helper tool to obtain the `mulled-*` path.
 
+## GPU-capable modules
+
+GPU-enabled software has two properties that make it awkward to package the same way as CPU-only tools: GPU builds (e.g., CUDA PyTorch) can be several GB larger than their CPU counterparts, and some vendor containers (e.g., NVIDIA Parabricks) are proprietary with no conda equivalent.
+The specification therefore allows three container approaches, chosen according to the tool:
+
+- **Dual CPU/GPU variants of a tool**, where the GPU build has significant overhead (e.g., CUDA PyTorch adds ~3 GB): use the [dual-container pattern](#dual-container-pattern) below so CPU-only users are not penalised.
+  For example, [`ribodetector`](https://github.com/nf-core/modules/tree/master/modules/nf-core/ribodetector).
+- **Minimal GPU overhead or CPU fallback within one container**: a single container is simpler and preferred.
+- **Vendor-provided GPU containers** with no conda equivalent.
+  For example, [`parabricks/rnafq2bam`](https://github.com/nf-core/modules/tree/master/modules/nf-core/parabricks/rnafq2bam) uses NVIDIA's container.
+  Vendor-provided container modules SHOULD guard against conda/mamba profiles:
+
+  ```groovy
+  if (workflow.profile.tokenize(',').intersect(['conda', 'mamba']).size() >= 1) {
+      error("This module does not support Conda. Please use Docker / Singularity / Podman instead.")
+  }
+  ```
+
+### Dual-container pattern
+
+When the GPU container is substantially larger, modules SHOULD switch between containers based on `task.accelerator`:
+
+```groovy
+conda "${ task.accelerator ? "${moduleDir}/environment.gpu.yml" : "${moduleDir}/environment.yml" }"
+container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
+    (task.accelerator ? '<singularity-gpu-url>' : '<singularity-cpu-url>') :
+    (task.accelerator ? '<docker-gpu-url>' : '<docker-cpu-url>') }"
+```
+
+A separate `environment.gpu.yml` SHOULD be provided for GPU-specific dependencies.
+The CPU `environment.yml` MUST remain unchanged so that non-GPU users are unaffected.
+
+GPU containers SHOULD be built using [Wave](https://wave.seqera.io) from the `environment.gpu.yml` file.
+Both Docker and Singularity URLs MUST be provided.
+
+:::note{title="Wave build template"}
+Pass `--build-template conda/micromamba:v2` to Wave when building GPU environments. Required for now until it becomes the default.
+:::
+
+### CUDA version pinning
+
+Pin `cuda-version` exactly (nf-core does not allow version ranges). The pin sets the minimum NVIDIA driver version required on hosts, so pick the lowest value the GPU package actually supports on conda-forge - that gives the widest host compatibility.
+
+```yaml
+dependencies:
+  - "bioconda::ribodetector=0.3.3"
+  - "conda-forge::pytorch-gpu=2.1.0"
+  - "conda-forge::cuda-version=11.2"
+```
+
+Use `micromamba search -c conda-forge '<package>=<version>'` to see which CUDA minor versions the GPU package was built against on conda-forge - this is usually the real floor. For instance `pytorch-gpu=2.1.0` has `cuda112` builds, while `pytorch-gpu=2.10.0` only has `cuda128`/`cuda129`/`cuda130` builds, raising the driver floor accordingly.
+
+NVIDIA drivers are backward compatible with older CUDA versions, so pinning lower widens host reach. The reverse is not true: a CUDA 12.x container cannot run on a host with a CUDA 11.x-only driver. Modules that specifically want broader reach on legacy hosts MAY provide an alternative `environment.gpu.yml` pinned to CUDA 11.
+
+### Capturing the CUDA runtime version
+
+GPU modules SHOULD emit the CUDA runtime version on the `versions` topic channel so it appears in provenance reports alongside the tool version. One simple pattern, using the pytorch dependency that most CUDA-aware conda envs already pull in:
+
+```groovy
+tuple val("${task.process}"), val('cuda'), eval('python -c "import torch; print(torch.version.cuda or \'no CUDA available\')"'), emit: versions_cuda, topic: versions
+```
+
+This reports the actual CUDA minor the container was built with on the GPU path, and `no CUDA available` on the non-GPU path of dual-container modules. Prefer a descriptive string over something like `cpu`, which reviewers reasonably flag as not a version.
+
+### Pip-based GPU packages
+
+Some GPU-compiled Python tools ship only as pre-built wheels from custom pip indexes rather than conda (e.g. [`llama-cpp-python`](https://abetlen.github.io/llama-cpp-python/whl/)). Pin the full wheel URL in the `pip:` block and pull the CUDA runtime from conda-forge alongside:
+
+```yaml
+dependencies:
+  - python=3.11
+  - pip
+  - "conda-forge::cuda-version=12.4"
+  - "conda-forge::cuda-runtime"
+  - pip:
+      - "https://github.com/<owner>/<project>/releases/download/v<version>-cu124/<wheel>.whl"
+```
+
+Build the container with Wave's `--config-env` so the wheel's binary can resolve the conda-provided CUDA libs at `dlopen` time:
+
+```bash
+wave --conda-file environment.gpu.yml --freeze --await --singularity \
+     --config-env 'LD_LIBRARY_PATH=/opt/conda/lib'
+```
+
+Conda's `activate.d` hooks don't fire under `docker run`, so the library path has to be set at image-build time.
+Conda-forge-native GPU packages (e.g. `pytorch-gpu`) ship RPATHs in their binaries and don't need this.
+
+### Binary and GPU count selection in scripts
+
+Tools that provide separate GPU and CPU binaries SHOULD select between them based on `task.accelerator`.
+For example, [`ribodetector`](https://github.com/nf-core/modules/blob/20423f58f6ff54c4bc851dfd143d75f9b9f86f41/modules/nf-core/ribodetector/main.nf):
+
+```groovy
+def binary = task.accelerator ? "ribodetector" : "ribodetector_cpu"
+```
+
+Tools that accept a GPU count SHOULD specify this in the command using `task.accelerator.request`, allowing users to override via their pipeline config (e.g., `accelerator = 2`).
+This parameter SHOULD NOT be hardcoded.
+
+For example, [`parabricks/rnafq2bam`](https://github.com/nf-core/modules/blob/0c44f69eefe8a8c373c7cdd9528b3e1d60cb895f/modules/nf-core/parabricks/rnafq2bam/main.nf):
+
+```groovy
+def num_gpus = task.accelerator ? "--num-gpus ${task.accelerator.request}" : ''
+```
+
 ## Software not on Bioconda
 
 If the software is not available on Bioconda a `Dockerfile` MUST be provided within the module directory. nf-core will use GitHub Actions to auto-build the containers on the [GitHub Packages registry](https://github.com/features/packages).
diff --git a/sites/docs/src/content/docs/specifications/components/modules/testing.md b/sites/docs/src/content/docs/specifications/components/modules/testing.md
@@ -51,6 +51,38 @@ assert snapshot(process.out).match()
 When the snapshot is unstable, use another way to test the output files.
 See [nf-test assertions](/docs/contributing/nf-test/assertions) for examples on how to do this.
 
+## GPU tests
+
+Modules that support both CPU and GPU modes SHOULD include a separate GPU test file (`main.gpu.nf.test`).
+GPU-only modules MAY use a single test file - see [`parabricks`](https://github.com/nf-core/modules/tree/master/modules/nf-core/parabricks) for an example.
+
+GPU tests MUST be tagged with `"gpu"` or `"gpu_highmem"` so the GPU CI workflow discovers and runs them on GPU-enabled runners.
+
+:::note
+The `"gpu"` tag runs on smaller AWS GPU instances (e.g., [`g4dn.xlarge`](https://aws.amazon.com/ec2/instance-types/g4/)), while `"gpu_highmem"` runs on larger instances (e.g., [`g4dn.2xlarge`](https://aws.amazon.com/ec2/instance-types/g4/)) for tools with higher memory requirements such as [Parabricks](https://github.com/nf-core/modules/tree/master/modules/nf-core/parabricks).
+:::
+
+GPU tests SHOULD include a `nextflow.gpu.config` that sets `accelerator = 1` on the process.
+
+GPU tests SHOULD use the same assertions as the CPU tests to verify that GPU and CPU modes produce equivalent results.
+
+GPU tests SHOULD include both a real test and a stub test.
+
+:::caution{title="GPU concurrency under Singularity"}
+When multiple GPU processes share a single GPU under Singularity, concurrent CUDA processes can deadlock.
+Docker handles GPU memory arbitration automatically, but Singularity does not.
+This can happen in CI (where all tasks share one GPU), on HPC nodes with a local executor, or any setup where multiple GPU tasks land on the same machine. Set `maxForks = 1` for GPU processes to serialise access when this is a risk.
+:::
+
+```
+modules/nf-core/<tool>/tests/
+  main.nf.test           # CPU tests
+  main.gpu.nf.test       # GPU tests (tag "gpu")
+  nextflow.gpu.config    # Sets accelerator = 1
+```
+
+For an example, see the [`ribodetector` GPU tests](https://github.com/nf-core/modules/tree/master/modules/nf-core/ribodetector/tests).
+
 ## Stub tests
 
 A stub test MUST be included for the module.