diff --git a/sites/docs/src/content/docs/specifications/components/modules/resource-requirements.md b/sites/docs/src/content/docs/specifications/components/modules/resource-requirements.md index f79cc3dc89..e1fe94d996 100644 --- a/sites/docs/src/content/docs/specifications/components/modules/resource-requirements.md +++ b/sites/docs/src/content/docs/specifications/components/modules/resource-requirements.md @@ -20,6 +20,31 @@ For example, `--threads $task.cpus`. If the tool does not support multi-threading, consider `process_single` unless large amounts of RAM are required. +## GPU acceleration + +Modules that support GPU acceleration SHOULD use `task.accelerator{:groovy}` to detect whether a GPU has been requested. +Pipelines control GPU allocation by setting `accelerator = 1{:groovy}` in their process config (e.g., via a `process_gpu` label or a `withName` block). + +The module SHOULD NOT set the `accelerator` directive itself. + +:::info{title="Rationale" collapse} +Placing GPU allocation in the pipeline config lets users control it through their pipeline config or profiles. +A label-only alternative (e.g., requiring a `process_gpu` label) would not work for modules that support both CPU and GPU modes (e.g., [`ribodetector`](https://github.com/nf-core/modules/tree/master/modules/nf-core/ribodetector)), so the specification leaves this to the pipeline author. +::: + +See [Software requirements: GPU-capable modules](/docs/specifications/components/modules/software-requirements#gpu-capable-modules) for container patterns based on `task.accelerator`. + +:::tip{title="Pipeline-side GPU configuration"} +Pipelines set `accelerator = 1{:groovy}` and GPU container flags via `containerOptions` in their process config. +Use `containerOptions` (not global `docker.runOptions`) to scope GPU flags to GPU processes only. +::: + +:::caution{title="GPU concurrency under Singularity"} +Multiple concurrent GPU processes sharing a single GPU can deadlock under Singularity. +Docker's NVIDIA runtime handles GPU memory arbitration, but Singularity does not. +When GPU tasks may land on the same machine (CI, local executor, shared HPC nodes), set `maxForks = 1` on GPU processes to serialise access. +::: + ## Specifying multiple threads for piped commands If a module contains _multiple_ tools that support multi-threading (e.g., [piping output into a samtools command](https://github.com/nf-core/modules/blob/c4cc1db284faba9fc4896f64bddf7703cedc7430/modules/nf-core/bowtie2/align/main.nf#L47-L54)), assign CPUs per tool: diff --git a/sites/docs/src/content/docs/specifications/components/modules/software-requirements.md b/sites/docs/src/content/docs/specifications/components/modules/software-requirements.md index 27256d200a..2dc16bb7ba 100644 --- a/sites/docs/src/content/docs/specifications/components/modules/software-requirements.md +++ b/sites/docs/src/content/docs/specifications/components/modules/software-requirements.md @@ -79,6 +79,112 @@ It is also possible for a new multi-tool container to be built and added to BioC - If the multi-tool container already exists, use [this](https://midnighter.github.io/mulled) helper tool to obtain the `mulled-*` path. +## GPU-capable modules + +GPU-enabled software has two properties that make it awkward to package the same way as CPU-only tools: GPU builds (e.g., CUDA PyTorch) can be several GB larger than their CPU counterparts, and some vendor containers (e.g., NVIDIA Parabricks) are proprietary with no conda equivalent. +The specification therefore allows three container approaches, chosen according to the tool: + +- **Dual CPU/GPU variants of a tool**, where the GPU build has significant overhead (e.g., CUDA PyTorch adds ~3 GB): use the [dual-container pattern](#dual-container-pattern) below so CPU-only users are not penalised. + For example, [`ribodetector`](https://github.com/nf-core/modules/tree/master/modules/nf-core/ribodetector). +- **Minimal GPU overhead or CPU fallback within one container**: a single container is simpler and preferred. +- **Vendor-provided GPU containers** with no conda equivalent. + For example, [`parabricks/rnafq2bam`](https://github.com/nf-core/modules/tree/master/modules/nf-core/parabricks/rnafq2bam) uses NVIDIA's container. + Vendor-provided container modules SHOULD guard against conda/mamba profiles: + + ```groovy + if (workflow.profile.tokenize(',').intersect(['conda', 'mamba']).size() >= 1) { + error("This module does not support Conda. Please use Docker / Singularity / Podman instead.") + } + ``` + +### Dual-container pattern + +When the GPU container is substantially larger, modules SHOULD switch between containers based on `task.accelerator`: + +```groovy +conda "${ task.accelerator ? "${moduleDir}/environment.gpu.yml" : "${moduleDir}/environment.yml" }" +container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + (task.accelerator ? '' : '') : + (task.accelerator ? '' : '') }" +``` + +A separate `environment.gpu.yml` SHOULD be provided for GPU-specific dependencies. +The CPU `environment.yml` MUST remain unchanged so that non-GPU users are unaffected. + +GPU containers SHOULD be built using [Wave](https://wave.seqera.io) from the `environment.gpu.yml` file. +Both Docker and Singularity URLs MUST be provided. + +:::note{title="Wave build template"} +Pass `--build-template conda/micromamba:v2` to Wave when building GPU environments. Required for now until it becomes the default. +::: + +### CUDA version pinning + +Pin `cuda-version` exactly (nf-core does not allow version ranges). The pin sets the minimum NVIDIA driver version required on hosts, so pick the lowest value the GPU package actually supports on conda-forge - that gives the widest host compatibility. + +```yaml +dependencies: + - "bioconda::ribodetector=0.3.3" + - "conda-forge::pytorch-gpu=2.1.0" + - "conda-forge::cuda-version=11.2" +``` + +Use `micromamba search -c conda-forge '='` to see which CUDA minor versions the GPU package was built against on conda-forge - this is usually the real floor. For instance `pytorch-gpu=2.1.0` has `cuda112` builds, while `pytorch-gpu=2.10.0` only has `cuda128`/`cuda129`/`cuda130` builds, raising the driver floor accordingly. + +NVIDIA drivers are backward compatible with older CUDA versions, so pinning lower widens host reach. The reverse is not true: a CUDA 12.x container cannot run on a host with a CUDA 11.x-only driver. Modules that specifically want broader reach on legacy hosts MAY provide an alternative `environment.gpu.yml` pinned to CUDA 11. + +### Capturing the CUDA runtime version + +GPU modules SHOULD emit the CUDA runtime version on the `versions` topic channel so it appears in provenance reports alongside the tool version. One simple pattern, using the pytorch dependency that most CUDA-aware conda envs already pull in: + +```groovy +tuple val("${task.process}"), val('cuda'), eval('python -c "import torch; print(torch.version.cuda or \'no CUDA available\')"'), emit: versions_cuda, topic: versions +``` + +This reports the actual CUDA minor the container was built with on the GPU path, and `no CUDA available` on the non-GPU path of dual-container modules. Prefer a descriptive string over something like `cpu`, which reviewers reasonably flag as not a version. + +### Pip-based GPU packages + +Some GPU-compiled Python tools ship only as pre-built wheels from custom pip indexes rather than conda (e.g. [`llama-cpp-python`](https://abetlen.github.io/llama-cpp-python/whl/)). Pin the full wheel URL in the `pip:` block and pull the CUDA runtime from conda-forge alongside: + +```yaml +dependencies: + - python=3.11 + - pip + - "conda-forge::cuda-version=12.4" + - "conda-forge::cuda-runtime" + - pip: + - "https://github.com///releases/download/v-cu124/.whl" +``` + +Build the container with Wave's `--config-env` so the wheel's binary can resolve the conda-provided CUDA libs at `dlopen` time: + +```bash +wave --conda-file environment.gpu.yml --freeze --await --singularity \ + --config-env 'LD_LIBRARY_PATH=/opt/conda/lib' +``` + +Conda's `activate.d` hooks don't fire under `docker run`, so the library path has to be set at image-build time. +Conda-forge-native GPU packages (e.g. `pytorch-gpu`) ship RPATHs in their binaries and don't need this. + +### Binary and GPU count selection in scripts + +Tools that provide separate GPU and CPU binaries SHOULD select between them based on `task.accelerator`. +For example, [`ribodetector`](https://github.com/nf-core/modules/blob/20423f58f6ff54c4bc851dfd143d75f9b9f86f41/modules/nf-core/ribodetector/main.nf): + +```groovy +def binary = task.accelerator ? "ribodetector" : "ribodetector_cpu" +``` + +Tools that accept a GPU count SHOULD specify this in the command using `task.accelerator.request`, allowing users to override via their pipeline config (e.g., `accelerator = 2`). +This parameter SHOULD NOT be hardcoded. + +For example, [`parabricks/rnafq2bam`](https://github.com/nf-core/modules/blob/0c44f69eefe8a8c373c7cdd9528b3e1d60cb895f/modules/nf-core/parabricks/rnafq2bam/main.nf): + +```groovy +def num_gpus = task.accelerator ? "--num-gpus ${task.accelerator.request}" : '' +``` + ## Software not on Bioconda If the software is not available on Bioconda a `Dockerfile` MUST be provided within the module directory. nf-core will use GitHub Actions to auto-build the containers on the [GitHub Packages registry](https://github.com/features/packages). diff --git a/sites/docs/src/content/docs/specifications/components/modules/testing.md b/sites/docs/src/content/docs/specifications/components/modules/testing.md index 396b5e3f86..414f419cac 100644 --- a/sites/docs/src/content/docs/specifications/components/modules/testing.md +++ b/sites/docs/src/content/docs/specifications/components/modules/testing.md @@ -51,6 +51,38 @@ assert snapshot(process.out).match() When the snapshot is unstable, use another way to test the output files. See [nf-test assertions](/docs/contributing/nf-test/assertions) for examples on how to do this. +## GPU tests + +Modules that support both CPU and GPU modes SHOULD include a separate GPU test file (`main.gpu.nf.test`). +GPU-only modules MAY use a single test file - see [`parabricks`](https://github.com/nf-core/modules/tree/master/modules/nf-core/parabricks) for an example. + +GPU tests MUST be tagged with `"gpu"` or `"gpu_highmem"` so the GPU CI workflow discovers and runs them on GPU-enabled runners. + +:::note +The `"gpu"` tag runs on smaller AWS GPU instances (e.g., [`g4dn.xlarge`](https://aws.amazon.com/ec2/instance-types/g4/)), while `"gpu_highmem"` runs on larger instances (e.g., [`g4dn.2xlarge`](https://aws.amazon.com/ec2/instance-types/g4/)) for tools with higher memory requirements such as [Parabricks](https://github.com/nf-core/modules/tree/master/modules/nf-core/parabricks). +::: + +GPU tests SHOULD include a `nextflow.gpu.config` that sets `accelerator = 1` on the process. + +GPU tests SHOULD use the same assertions as the CPU tests to verify that GPU and CPU modes produce equivalent results. + +GPU tests SHOULD include both a real test and a stub test. + +:::caution{title="GPU concurrency under Singularity"} +When multiple GPU processes share a single GPU under Singularity, concurrent CUDA processes can deadlock. +Docker handles GPU memory arbitration automatically, but Singularity does not. +This can happen in CI (where all tasks share one GPU), on HPC nodes with a local executor, or any setup where multiple GPU tasks land on the same machine. Set `maxForks = 1` for GPU processes to serialise access when this is a risk. +::: + +``` +modules/nf-core//tests/ + main.nf.test # CPU tests + main.gpu.nf.test # GPU tests (tag "gpu") + nextflow.gpu.config # Sets accelerator = 1 +``` + +For an example, see the [`ribodetector` GPU tests](https://github.com/nf-core/modules/tree/master/modules/nf-core/ribodetector/tests). + ## Stub tests A stub test MUST be included for the module.