New modules: Llamacpp-python/run and huggingface/download for allowing to run simple text workloads with local LLMs #11053
New modules: Llamacpp-python/run and huggingface/download for allowing to run simple text workloads with local LLMs #11053toniher wants to merge 40 commits intonf-core:masterfrom
Conversation
famosab
left a comment
There was a problem hiding this comment.
Thank you for your contribution to nf-core! We really appreciate it. I added a few comments to your PR.
We usually recommend to have one module per PR. That makes the review process easier and its more likely that someone will review your PR. You can keep that in mind for the next PRs.
Hi @famosab . Thanks for the feedback and I will go through your comments! I was told about the module submission, but since you need the output of one of the processes for the other, I thought it would help potential users once it could become eventually accepted. But, certainly, is more work for everyone. Sorry about this and I will avoid it in future PRs. |
Not so many assertions for stub test Co-authored-by: Famke Bäuerle <45968370+famosab@users.noreply.github.com>
output to ${prefix}
Co-authored-by: Famke Bäuerle <45968370+famosab@users.noreply.github.com>
|
I think from my side it looks good, but now I spend so much time back-and-forth with you and this module is a bit unusual :) so I would like to have another set of eyes on this PR! I will ask for another review :) |
pinin4fjords
left a comment
There was a problem hiding this comment.
I really think we should avoid using module binaries until we're satisfied they don't limit module portability. Currently Wave is required for Cloud scenarios without a shared file system.
Templates are the more portable, if less pretty, approach.
pinin4fjords
left a comment
There was a problem hiding this comment.
I think there's a few things to resolve here since we're colouring outside the lines a bit.
| label 'process_gpu' | ||
|
|
||
| conda "${moduleDir}/environment.yml" | ||
| container "${task.accelerator ? 'quay.io/nf-core/llama-cpp-python:0.1.9' : 'community.wave.seqera.io/library/llama-cpp-python:0.3.16--b351398cd0ea7fc5'}" |
There was a problem hiding this comment.
Are you sure you need this? A little bit of AI chat suggests that:
llama-cpp-python compiled with CUDA support does runtime GPU detection and falls back to CPU when no GPU is present
... which, if correct, would mean you could just supply the same container in either case and avoid the complexity.
You might be able to do a multi-stage thing like this to bring the container size down:
FROM nvidia/cuda:12.4.1-devel-ubuntu22.04 AS builder
RUN apt-get update && apt-get install -y python3 python3-pip python3-dev
RUN pip3 install --prefix=/install llama-cpp-python==0.3.16 \
--extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu124
FROM nvidia/cuda:12.4.1-runtime-ubuntu22.04
RUN apt-get update && apt-get install -y --no-install-recommends python3 python3-pip \
&& rm -rf /var/lib/apt/lists/*
COPY --from=builder /install /usr/local
... so it's less overhead for the non-GPU case (untested).
There was a problem hiding this comment.
I just tried again running Docker in my laptop (without GPU) using quay.io/nf-core/llama-cpp-python:0.1.9 and I cannot make it work.
│ RuntimeError: Failed to load shared library '/usr/local/lib/python3.10/dist-packages/llama_cpp/lib/libllama.so': libcuda.so.1: cannot open shared object file: No │
│ such file or directory
So, it looks that after building the Docker image with GPU support, you cannot run it without any GPU device. It's particularly annoying because there are different GPU types and CUDA versions. :(
| label 'process_gpu' | ||
|
|
||
| conda "${moduleDir}/environment.yml" | ||
| container "${task.accelerator ? 'quay.io/nf-core/llama-cpp-python:0.1.9' : 'community.wave.seqera.io/library/llama-cpp-python:0.3.16--b351398cd0ea7fc5'}" |
There was a problem hiding this comment.
@pinin4fjords Added Singularity equivalent 9069b44 - What else would you miss so far considering the present restrictions?
There was a problem hiding this comment.
Hi, any feedback? So far, the only weak point I see is the way to handle container GPU-enabled scenarios. Please let me know!
There was a problem hiding this comment.
since this tool is on conda-forge, you don't need to add a dockerfile, just use seqera containers https://nf-co.re/docs/developing/containers/seqera-containers
There was a problem hiding this comment.
I am actually using seqera containers for no gpu situations.
https://seqera.io/containers/?packages=conda-forge::llama-cpp-python=0.3.16
I understand I could approach it similarly as here:
modules/modules/nf-core/multiqc/meta.yml
Line 110 in dd6396b
Until there is any better solution, I could enable when it is, at the same time containers, accelerator (and amd64?) to use quay.io/nf-core/llama-cpp-python:0.1.9 and for the rest of situations to do as with the multiqc module.
Otherwise, I could simply remove the gpu container until there must be a better idea. It must be said that the speed gain is huge with gpu...
Any feedback is appreciated!
Any module example you could suggest as a model suitable for this case? |
Just look for the
Note that these are not quite script files, and you have to escape |
|
@pinin4fjords I tried to make it work, taking your modules as examples. It was relatively easy using: and a bit more code bits, but... I got another issue when running the test: This is caused by the recent convention in the way to generate versions... So, I guess I would need to generate |
Yep, the eval won't work, just do exactly what the gtf2bed example I gave you does. You still output the versions.yml, but also send it to a topic. The code used by pipelines to assemble versions handles both files and strings. |
|
Did a bit of AI-assisted iteration to make biocorecrg#1 - see what you think. We're still trying to figure out the standard patterns for this- see nf-core/website#4142. |
Covers the pattern for GPU-compiled Python tools that ship only as pre-built wheels from custom pip indexes (e.g. llama-cpp-python), based on experience integrating nf-core/modules#11053: - Pin the full wheel URL in pip:, not --extra-index-url (which leaks into Wave's image tag) or --index-url (which breaks transitive deps). - Use wave --config-env 'LD_LIBRARY_PATH=/opt/conda/lib' so the pip binary can resolve conda-provided CUDA libs at dlopen time; conda's activate.d hooks don't fire under docker run. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This pull request, contributed jointly with @lucacozzuto , provides a simple workload for running text inference tasks using llamacpp-python against local LLMs.
This effort was worked on during the nf-core Hackathon in March 2026.
PR checklist
Closes #XXX
topic: versions- See version_topicslabelnf-core modules test <MODULE> --profile dockernf-core modules test <MODULE> --profile singularitynf-core modules test <MODULE> --profile condanf-core subworkflows test <SUBWORKFLOW> --profile dockernf-core subworkflows test <SUBWORKFLOW> --profile singularitynf-core subworkflows test <SUBWORKFLOW> --profile conda