Conversation
✅ Deploy Preview for nf-core-docs ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
✅ Deploy Preview for nf-core-main-site ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
Add guidance for nf-core modules that support GPU acceleration, based on patterns from ribodetector (nf-core/modules#11178) and parabricks. Software requirements: - Three container approaches (dual-container, single, vendor-provided) - Dual-container ternary pattern for conda + container directives - environment.gpu.yml convention - Conda/mamba guard for vendor containers - Binary selection and multi-GPU via task.accelerator - ARM __cuda limitation note Resource requirements: - task.accelerator for GPU detection (module reads, pipeline sets) - Tip pointing to pipeline-side containerOptions pattern Testing: - GPU test file conventions (main.gpu.nf.test, nextflow.gpu.config) - gpu and gpu_highmem tags for CI runner selection - Same assertions for GPU and CPU to catch divergence Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
c277e79 to
7d2a701
Compare
- Replace the __cuda/ARM note with CUDA version targeting guidance: pin cuda-version to a major version, target CUDA 12.x as default, optionally provide CUDA 11.8 for older systems - Add Singularity GPU concurrency warning: multiple concurrent CUDA processes deadlock under Singularity, use maxForks = 1 in CI Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The deadlock from concurrent CUDA processes under Singularity applies to any shared-GPU scenario (CI, local executor, HPC nodes), not just CI. Add the warning to resource requirements as well as testing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
NVIDIA drivers are backward compatible - a CUDA 12 host runs CUDA 11 containers fine. CUDA 12 is the best default because newer tools drop CUDA 11 builds, not because the host requires it. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
|
||
| ## GPU acceleration | ||
|
|
||
| Modules that support GPU acceleration SHOULD use `task.accelerator` to detect whether a GPU has been requested. Pipelines control GPU allocation by setting `accelerator = 1` in their process config (e.g., via a `process_gpu` label or a `withName` block). |
There was a problem hiding this comment.
I think requiring a process_gpu label is the most straight forward way. or is there any specific case why we are giving an alternative? This makes it easier to lint for.
There was a problem hiding this comment.
Not all things are always GPU. Ribodetector for example (which kicked me off on this whole thing) can be CPU or GPU. So a label wouldn't work.
…resource-requirements.md Co-authored-by: Matthias Hörtenhuber <mashehu@users.noreply.github.com>
|
@nf-core-bot fix linting |
- Split multi-sentence lines per reviewer formatting preferences - Note why label-only GPU allocation does not work for dual-mode modules - Introduce GPU-capable modules section with rationale for multiple container approaches - Trim CUDA version targeting to pinning guidance suitable for a spec - Link Parabricks example module and AWS GPU instance types in GPU testing section Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
jfy133
left a comment
There was a problem hiding this comment.
Last pass from me for now!
I've pinged maintainers team on slack so we can have a thorough going through.
Once we have agreement on the specs, we can ask docs team to do a readability/style-guide pass, and finally merge it in
|
This seems like a good improvement/standardisation, and I hate to be a perfect as the enemy of the good guy. With that said, e.g. PyTorch is mentioned several times, that seems like something that could reasonably be made to support vendors other than Nvidia (mostly AMD for the moment, although it seems there's other things happening). Could we think of some way of specifying required details (vendor, possibly minimum architecture family and so on) in a way that would people running on e.g. HPC resources with AMD GPUs to utilise those for the processes where it works? (Of course it always comes down to what the binaries in question support, and for some where everything supports AMD GPUs, the module of the author might not want to deal with that - and that's fine, but it would be great if it would be possible to support it.) |
Thanks! This feels like a future iteration though, I don't know enough about the things you're asking for to provide good coverage, and I don't have the bandwidth to test those pieces. |
- Move accelerator-directive rationale into a collapsed Rationale admonition - Drop redundant "use the vendor container directly" clause from GPU container bullets - Reword "These modules" to "Vendor-provided container modules" for clarity - Rewrite CUDA driver compatibility note to not read as self-contradictory - Rename "Script patterns" section to "Binary and GPU count selection in scripts" - Pin ribodetector and parabricks example URLs to commit hashes - Split task.accelerator.request guidance into its own sentence and forbid hardcoding - Put AWS GPU instance descriptions into a note admonition - Split testing.md Singularity concurrency warning onto separate lines Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Yes, I was also triggered by others that it will probably be more common with multi-architecture clusters (with login/CPU compute on x86-64 and GPU nodes using ARM when going with Nvidia. But yes, probably a further iteration, although we should probably try to get to that soonish. |
Covers the pattern for GPU-compiled Python tools that ship only as pre-built wheels from custom pip indexes (e.g. llama-cpp-python), based on experience integrating nf-core/modules#11053: - Pin the full wheel URL in pip:, not --extra-index-url (which leaks into Wave's image tag) or --index-url (which breaks transitive deps). - Use wave --config-env 'LD_LIBRARY_PATH=/opt/conda/lib' so the pip binary can resolve conda-provided CUDA libs at dlopen time; conda's activate.d hooks don't fire under docker run. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Nobody needs to read about the --extra-index-url / --index-url dead ends. Keep the recipe. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Drop the `cuda-version>=12,<13` range example; nf-core policy is exact pins only, and the pin's real job is setting the host driver floor, not dodging solver quirks. Rewrite surrounding prose to reflect that. - Recommend picking the lowest cuda-version the GPU package actually has a conda-forge build for, with a `micromamba search` pointer. - Note the transitional requirement to build with `--build-template conda/micromamba:v2` for `__cuda` packages until that becomes Wave's default, plus the `--await 60m` caveat for solver-heavy GPU envs. - Add a short section on emitting the CUDA runtime as a versions topic entry for provenance reports. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ble' Reviewer-flagged on ribodetector PR that 'cpu' isn't really a version string. A descriptive fallback is clearer about what the task's build actually supports. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
Add GPU module guidelines to the component specifications, based on patterns established in nf-core/modules#11178 (ribodetector) and existing parabricks conventions.
Software requirements
condaandcontainerdirectives based ontask.acceleratorenvironment.gpu.ymlconvention for GPU-specific dependenciestask.accelerator(ribodetector example)task.accelerator.request(parabricks example)__cudalimitation noteResource requirements
task.acceleratorfor GPU detection: module reads it, pipeline sets itacceleratoritselfcontainerOptionspatternTesting
main.gpu.nf.test) for dual-mode modules; GPU-only modules MAY use a single filegpuandgpu_highmemtags for CI runner selectionnextflow.gpu.configsettingaccelerator = 1Context
🤖 Generated with Claude Code
@netlify /docs/specifications/components/modules/resource-requirements