Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,31 @@ For example, `--threads $task.cpus`.

If the tool does not support multi-threading, consider `process_single` unless large amounts of RAM are required.

## GPU acceleration

Modules that support GPU acceleration SHOULD use `task.accelerator{:groovy}` to detect whether a GPU has been requested.
Pipelines control GPU allocation by setting `accelerator = 1{:groovy}` in their process config (e.g., via a `process_gpu` label or a `withName` block).

The module SHOULD NOT set the `accelerator` directive itself.

:::info{title="Rationale" collapse}
Placing GPU allocation in the pipeline config lets users control it through their pipeline config or profiles.
A label-only alternative (e.g., requiring a `process_gpu` label) would not work for modules that support both CPU and GPU modes (e.g., [`ribodetector`](https://github.com/nf-core/modules/tree/master/modules/nf-core/ribodetector)), so the specification leaves this to the pipeline author.
:::

See [Software requirements: GPU-capable modules](/docs/specifications/components/modules/software-requirements#gpu-capable-modules) for container patterns based on `task.accelerator`.

:::tip{title="Pipeline-side GPU configuration"}
Pipelines set `accelerator = 1{:groovy}` and GPU container flags via `containerOptions` in their process config.
Comment thread
pinin4fjords marked this conversation as resolved.
Use `containerOptions` (not global `docker.runOptions`) to scope GPU flags to GPU processes only.
:::

:::caution{title="GPU concurrency under Singularity"}
Multiple concurrent GPU processes sharing a single GPU can deadlock under Singularity.
Docker's NVIDIA runtime handles GPU memory arbitration, but Singularity does not.
When GPU tasks may land on the same machine (CI, local executor, shared HPC nodes), set `maxForks = 1` on GPU processes to serialise access.
:::

## Specifying multiple threads for piped commands

If a module contains _multiple_ tools that support multi-threading (e.g., [piping output into a samtools command](https://github.com/nf-core/modules/blob/c4cc1db284faba9fc4896f64bddf7703cedc7430/modules/nf-core/bowtie2/align/main.nf#L47-L54)), assign CPUs per tool:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,112 @@ It is also possible for a new multi-tool container to be built and added to BioC

- If the multi-tool container already exists, use [this](https://midnighter.github.io/mulled) helper tool to obtain the `mulled-*` path.

## GPU-capable modules

GPU-enabled software has two properties that make it awkward to package the same way as CPU-only tools: GPU builds (e.g., CUDA PyTorch) can be several GB larger than their CPU counterparts, and some vendor containers (e.g., NVIDIA Parabricks) are proprietary with no conda equivalent.
The specification therefore allows three container approaches, chosen according to the tool:

- **Dual CPU/GPU variants of a tool**, where the GPU build has significant overhead (e.g., CUDA PyTorch adds ~3 GB): use the [dual-container pattern](#dual-container-pattern) below so CPU-only users are not penalised.
For example, [`ribodetector`](https://github.com/nf-core/modules/tree/master/modules/nf-core/ribodetector).
- **Minimal GPU overhead or CPU fallback within one container**: a single container is simpler and preferred.
- **Vendor-provided GPU containers** with no conda equivalent.
For example, [`parabricks/rnafq2bam`](https://github.com/nf-core/modules/tree/master/modules/nf-core/parabricks/rnafq2bam) uses NVIDIA's container.
Comment thread
pinin4fjords marked this conversation as resolved.
Vendor-provided container modules SHOULD guard against conda/mamba profiles:

```groovy
if (workflow.profile.tokenize(',').intersect(['conda', 'mamba']).size() >= 1) {
error("This module does not support Conda. Please use Docker / Singularity / Podman instead.")
}
```

### Dual-container pattern

When the GPU container is substantially larger, modules SHOULD switch between containers based on `task.accelerator`:

```groovy
conda "${ task.accelerator ? "${moduleDir}/environment.gpu.yml" : "${moduleDir}/environment.yml" }"
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
(task.accelerator ? '<singularity-gpu-url>' : '<singularity-cpu-url>') :
(task.accelerator ? '<docker-gpu-url>' : '<docker-cpu-url>') }"
```

A separate `environment.gpu.yml` SHOULD be provided for GPU-specific dependencies.
The CPU `environment.yml` MUST remain unchanged so that non-GPU users are unaffected.

GPU containers SHOULD be built using [Wave](https://wave.seqera.io) from the `environment.gpu.yml` file.
Both Docker and Singularity URLs MUST be provided.

:::note{title="Wave build template"}
Pass `--build-template conda/micromamba:v2` to Wave when building GPU environments. Required for now until it becomes the default.
:::

### CUDA version pinning

Pin `cuda-version` exactly (nf-core does not allow version ranges). The pin sets the minimum NVIDIA driver version required on hosts, so pick the lowest value the GPU package actually supports on conda-forge - that gives the widest host compatibility.

```yaml
dependencies:
- "bioconda::ribodetector=0.3.3"
- "conda-forge::pytorch-gpu=2.1.0"
- "conda-forge::cuda-version=11.2"
```

Use `micromamba search -c conda-forge '<package>=<version>'` to see which CUDA minor versions the GPU package was built against on conda-forge - this is usually the real floor. For instance `pytorch-gpu=2.1.0` has `cuda112` builds, while `pytorch-gpu=2.10.0` only has `cuda128`/`cuda129`/`cuda130` builds, raising the driver floor accordingly.

NVIDIA drivers are backward compatible with older CUDA versions, so pinning lower widens host reach. The reverse is not true: a CUDA 12.x container cannot run on a host with a CUDA 11.x-only driver. Modules that specifically want broader reach on legacy hosts MAY provide an alternative `environment.gpu.yml` pinned to CUDA 11.

### Capturing the CUDA runtime version

GPU modules SHOULD emit the CUDA runtime version on the `versions` topic channel so it appears in provenance reports alongside the tool version. One simple pattern, using the pytorch dependency that most CUDA-aware conda envs already pull in:

```groovy
tuple val("${task.process}"), val('cuda'), eval('python -c "import torch; print(torch.version.cuda or \'no CUDA available\')"'), emit: versions_cuda, topic: versions
```

This reports the actual CUDA minor the container was built with on the GPU path, and `no CUDA available` on the non-GPU path of dual-container modules. Prefer a descriptive string over something like `cpu`, which reviewers reasonably flag as not a version.

### Pip-based GPU packages

Some GPU-compiled Python tools ship only as pre-built wheels from custom pip indexes rather than conda (e.g. [`llama-cpp-python`](https://abetlen.github.io/llama-cpp-python/whl/)). Pin the full wheel URL in the `pip:` block and pull the CUDA runtime from conda-forge alongside:

```yaml
dependencies:
- python=3.11
- pip
- "conda-forge::cuda-version=12.4"
- "conda-forge::cuda-runtime"
- pip:
- "https://github.com/<owner>/<project>/releases/download/v<version>-cu124/<wheel>.whl"
```

Build the container with Wave's `--config-env` so the wheel's binary can resolve the conda-provided CUDA libs at `dlopen` time:

```bash
wave --conda-file environment.gpu.yml --freeze --await --singularity \
--config-env 'LD_LIBRARY_PATH=/opt/conda/lib'
```

Conda's `activate.d` hooks don't fire under `docker run`, so the library path has to be set at image-build time.
Conda-forge-native GPU packages (e.g. `pytorch-gpu`) ship RPATHs in their binaries and don't need this.

### Binary and GPU count selection in scripts

Tools that provide separate GPU and CPU binaries SHOULD select between them based on `task.accelerator`.
For example, [`ribodetector`](https://github.com/nf-core/modules/blob/20423f58f6ff54c4bc851dfd143d75f9b9f86f41/modules/nf-core/ribodetector/main.nf):

```groovy
def binary = task.accelerator ? "ribodetector" : "ribodetector_cpu"
```

Tools that accept a GPU count SHOULD specify this in the command using `task.accelerator.request`, allowing users to override via their pipeline config (e.g., `accelerator = 2`).
This parameter SHOULD NOT be hardcoded.

For example, [`parabricks/rnafq2bam`](https://github.com/nf-core/modules/blob/0c44f69eefe8a8c373c7cdd9528b3e1d60cb895f/modules/nf-core/parabricks/rnafq2bam/main.nf):

```groovy
def num_gpus = task.accelerator ? "--num-gpus ${task.accelerator.request}" : ''
```

## Software not on Bioconda

If the software is not available on Bioconda a `Dockerfile` MUST be provided within the module directory. nf-core will use GitHub Actions to auto-build the containers on the [GitHub Packages registry](https://github.com/features/packages).
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,38 @@ assert snapshot(process.out).match()
When the snapshot is unstable, use another way to test the output files.
See [nf-test assertions](/docs/contributing/nf-test/assertions) for examples on how to do this.

## GPU tests

Modules that support both CPU and GPU modes SHOULD include a separate GPU test file (`main.gpu.nf.test`).
GPU-only modules MAY use a single test file - see [`parabricks`](https://github.com/nf-core/modules/tree/master/modules/nf-core/parabricks) for an example.

GPU tests MUST be tagged with `"gpu"` or `"gpu_highmem"` so the GPU CI workflow discovers and runs them on GPU-enabled runners.

:::note
The `"gpu"` tag runs on smaller AWS GPU instances (e.g., [`g4dn.xlarge`](https://aws.amazon.com/ec2/instance-types/g4/)), while `"gpu_highmem"` runs on larger instances (e.g., [`g4dn.2xlarge`](https://aws.amazon.com/ec2/instance-types/g4/)) for tools with higher memory requirements such as [Parabricks](https://github.com/nf-core/modules/tree/master/modules/nf-core/parabricks).
Comment thread
pinin4fjords marked this conversation as resolved.
:::

GPU tests SHOULD include a `nextflow.gpu.config` that sets `accelerator = 1` on the process.

GPU tests SHOULD use the same assertions as the CPU tests to verify that GPU and CPU modes produce equivalent results.

GPU tests SHOULD include both a real test and a stub test.

:::caution{title="GPU concurrency under Singularity"}
When multiple GPU processes share a single GPU under Singularity, concurrent CUDA processes can deadlock.
Docker handles GPU memory arbitration automatically, but Singularity does not.
This can happen in CI (where all tasks share one GPU), on HPC nodes with a local executor, or any setup where multiple GPU tasks land on the same machine. Set `maxForks = 1` for GPU processes to serialise access when this is a risk.
:::

```
modules/nf-core/<tool>/tests/
main.nf.test # CPU tests
main.gpu.nf.test # GPU tests (tag "gpu")
nextflow.gpu.config # Sets accelerator = 1
```

For an example, see the [`ribodetector` GPU tests](https://github.com/nf-core/modules/tree/master/modules/nf-core/ribodetector/tests).

## Stub tests

A stub test MUST be included for the module.
Expand Down