Skip to content

docs: add GPU module guidelines#4142

Open
pinin4fjords wants to merge 13 commits intomainfrom
docs/gpu-module-guidelines
Open

docs: add GPU module guidelines#4142
pinin4fjords wants to merge 13 commits intomainfrom
docs/gpu-module-guidelines

Conversation

@pinin4fjords
Copy link
Copy Markdown
Member

@pinin4fjords pinin4fjords commented Apr 14, 2026

Summary

Add GPU module guidelines to the component specifications, based on patterns established in nf-core/modules#11178 (ribodetector) and existing parabricks conventions.

Software requirements

  • Three container approaches: dual-container (significant GPU overhead), single container (minimal overhead/CPU fallback), vendor-provided (no conda equivalent)
  • Dual-container ternary pattern for both conda and container directives based on task.accelerator
  • environment.gpu.yml convention for GPU-specific dependencies
  • Conda/mamba profile guard for vendor containers (copyable code block)
  • Binary selection via task.accelerator (ribodetector example)
  • Multi-GPU via task.accelerator.request (parabricks example)
  • ARM __cuda limitation note

Resource requirements

  • task.accelerator for GPU detection: module reads it, pipeline sets it
  • Module SHOULD NOT set accelerator itself
  • Tip pointing to pipeline-side containerOptions pattern

Testing

  • Separate GPU test file (main.gpu.nf.test) for dual-mode modules; GPU-only modules MAY use a single file
  • gpu and gpu_highmem tags for CI runner selection
  • nextflow.gpu.config setting accelerator = 1
  • Same assertions for GPU and CPU to catch divergence

Context

🤖 Generated with Claude Code

@netlify /docs/specifications/components/modules/resource-requirements

@netlify
Copy link
Copy Markdown

netlify Bot commented Apr 14, 2026

Deploy Preview for nf-core-docs ready!

Name Link
🔨 Latest commit c46eaeb
🔍 Latest deploy log https://app.netlify.com/projects/nf-core-docs/deploys/69e8d1361504c40008e18851
😎 Deploy Preview https://deploy-preview-4142--nf-core-docs.netlify.app/docs/specifications/components/modules/resource-requirements
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@netlify
Copy link
Copy Markdown

netlify Bot commented Apr 14, 2026

Deploy Preview for nf-core-main-site ready!

Name Link
🔨 Latest commit c46eaeb
🔍 Latest deploy log https://app.netlify.com/projects/nf-core-main-site/deploys/69e8d1366f84b200082fc3c6
😎 Deploy Preview https://deploy-preview-4142--nf-core-main-site.netlify.app/docs/specifications/components/modules/resource-requirements
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@jfy133 jfy133 self-requested a review April 14, 2026 14:39
Add guidance for nf-core modules that support GPU acceleration,
based on patterns from ribodetector (nf-core/modules#11178) and
parabricks.

Software requirements:
- Three container approaches (dual-container, single, vendor-provided)
- Dual-container ternary pattern for conda + container directives
- environment.gpu.yml convention
- Conda/mamba guard for vendor containers
- Binary selection and multi-GPU via task.accelerator
- ARM __cuda limitation note

Resource requirements:
- task.accelerator for GPU detection (module reads, pipeline sets)
- Tip pointing to pipeline-side containerOptions pattern

Testing:
- GPU test file conventions (main.gpu.nf.test, nextflow.gpu.config)
- gpu and gpu_highmem tags for CI runner selection
- Same assertions for GPU and CPU to catch divergence

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@pinin4fjords pinin4fjords force-pushed the docs/gpu-module-guidelines branch from c277e79 to 7d2a701 Compare April 14, 2026 14:46
@pinin4fjords pinin4fjords marked this pull request as ready for review April 14, 2026 14:47
pinin4fjords and others added 3 commits April 16, 2026 13:44
- Replace the __cuda/ARM note with CUDA version targeting guidance:
  pin cuda-version to a major version, target CUDA 12.x as default,
  optionally provide CUDA 11.8 for older systems
- Add Singularity GPU concurrency warning: multiple concurrent CUDA
  processes deadlock under Singularity, use maxForks = 1 in CI

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The deadlock from concurrent CUDA processes under Singularity applies
to any shared-GPU scenario (CI, local executor, HPC nodes), not just
CI. Add the warning to resource requirements as well as testing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
NVIDIA drivers are backward compatible - a CUDA 12 host runs CUDA 11
containers fine. CUDA 12 is the best default because newer tools drop
CUDA 11 builds, not because the host requires it.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

## GPU acceleration

Modules that support GPU acceleration SHOULD use `task.accelerator` to detect whether a GPU has been requested. Pipelines control GPU allocation by setting `accelerator = 1` in their process config (e.g., via a `process_gpu` label or a `withName` block).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think requiring a process_gpu label is the most straight forward way. or is there any specific case why we are giving an alternative? This makes it easier to lint for.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not all things are always GPU. Ribodetector for example (which kicked me off on this whole thing) can be CPU or GPU. So a label wouldn't work.

…resource-requirements.md

Co-authored-by: Matthias Hörtenhuber <mashehu@users.noreply.github.com>
@jfy133
Copy link
Copy Markdown
Member

jfy133 commented Apr 17, 2026

@nf-core-bot fix linting

Copy link
Copy Markdown
Member

@jfy133 jfy133 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will finish alter

Comment thread sites/docs/src/content/docs/specifications/components/modules/testing.md Outdated
Comment thread sites/docs/src/content/docs/specifications/components/modules/testing.md Outdated
- Split multi-sentence lines per reviewer formatting preferences
- Note why label-only GPU allocation does not work for dual-mode modules
- Introduce GPU-capable modules section with rationale for multiple container approaches
- Trim CUDA version targeting to pinning guidance suitable for a spec
- Link Parabricks example module and AWS GPU instance types in GPU testing section

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Member

@jfy133 jfy133 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last pass from me for now!

I've pinged maintainers team on slack so we can have a thorough going through.

Once we have agreement on the specs, we can ask docs team to do a readability/style-guide pass, and finally merge it in

Comment thread sites/docs/src/content/docs/specifications/components/modules/testing.md Outdated
@pontus
Copy link
Copy Markdown
Contributor

pontus commented Apr 20, 2026

This seems like a good improvement/standardisation, and I hate to be a perfect as the enemy of the good guy.

With that said, e.g. PyTorch is mentioned several times, that seems like something that could reasonably be made to support vendors other than Nvidia (mostly AMD for the moment, although it seems there's other things happening).

Could we think of some way of specifying required details (vendor, possibly minimum architecture family and so on) in a way that would people running on e.g. HPC resources with AMD GPUs to utilise those for the processes where it works?

(Of course it always comes down to what the binaries in question support, and for some where everything supports AMD GPUs, the module of the author might not want to deal with that - and that's fine, but it would be great if it would be possible to support it.)

@pinin4fjords
Copy link
Copy Markdown
Member Author

This seems like a good improvement/standardisation, and I hate to be a perfect as the enemy of the good guy.

With that said, e.g. PyTorch is mentioned several times, that seems like something that could reasonably be made to support vendors other than Nvidia (mostly AMD for the moment, although it seems there's other things happening).

Could we think of some way of specifying required details (vendor, possibly minimum architecture family and so on) in a way that would people running on e.g. HPC resources with AMD GPUs to utilise those for the processes where it works?

(Of course it always comes down to what the binaries in question support, and for some where everything supports AMD GPUs, the module of the author might not want to deal with that - and that's fine, but it would be great if it would be possible to support it.)

Thanks! This feels like a future iteration though, I don't know enough about the things you're asking for to provide good coverage, and I don't have the bandwidth to test those pieces.

- Move accelerator-directive rationale into a collapsed Rationale admonition
- Drop redundant "use the vendor container directly" clause from GPU container bullets
- Reword "These modules" to "Vendor-provided container modules" for clarity
- Rewrite CUDA driver compatibility note to not read as self-contradictory
- Rename "Script patterns" section to "Binary and GPU count selection in scripts"
- Pin ribodetector and parabricks example URLs to commit hashes
- Split task.accelerator.request guidance into its own sentence and forbid hardcoding
- Put AWS GPU instance descriptions into a note admonition
- Split testing.md Singularity concurrency warning onto separate lines

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@pontus
Copy link
Copy Markdown
Contributor

pontus commented Apr 21, 2026

Yes, I was also triggered by others that it will probably be more common with multi-architecture clusters (with login/CPU compute on x86-64 and GPU nodes using ARM when going with Nvidia.

But yes, probably a further iteration, although we should probably try to get to that soonish.

pinin4fjords and others added 5 commits April 21, 2026 16:03
Covers the pattern for GPU-compiled Python tools that ship only as
pre-built wheels from custom pip indexes (e.g. llama-cpp-python),
based on experience integrating nf-core/modules#11053:

- Pin the full wheel URL in pip:, not --extra-index-url (which leaks
  into Wave's image tag) or --index-url (which breaks transitive deps).
- Use wave --config-env 'LD_LIBRARY_PATH=/opt/conda/lib' so the pip
  binary can resolve conda-provided CUDA libs at dlopen time; conda's
  activate.d hooks don't fire under docker run.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Nobody needs to read about the --extra-index-url / --index-url dead
ends. Keep the recipe.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Drop the `cuda-version>=12,<13` range example; nf-core policy is exact
  pins only, and the pin's real job is setting the host driver floor, not
  dodging solver quirks. Rewrite surrounding prose to reflect that.
- Recommend picking the lowest cuda-version the GPU package actually has
  a conda-forge build for, with a `micromamba search` pointer.
- Note the transitional requirement to build with
  `--build-template conda/micromamba:v2` for `__cuda` packages until that
  becomes Wave's default, plus the `--await 60m` caveat for solver-heavy
  GPU envs.
- Add a short section on emitting the CUDA runtime as a versions topic
  entry for provenance reports.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ble'

Reviewer-flagged on ribodetector PR that 'cpu' isn't really a version
string. A descriptive fallback is clearer about what the task's build
actually supports.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants