Skip to content

[amdgpu] LLVM 20 updates for AMD MI3xx GPUs#8793

Open
tmm77 wants to merge 52 commits intotaichi-dev:masterfrom
ROCm:amd-integration
Open

[amdgpu] LLVM 20 updates for AMD MI3xx GPUs#8793
tmm77 wants to merge 52 commits intotaichi-dev:masterfrom
ROCm:amd-integration

Conversation

@tmm77
Copy link
Copy Markdown

@tmm77 tmm77 commented Apr 15, 2026

Issue: #

Brief Summary

These code changes update LLVM to version 20 for AMD GPU code generation to enable Taichi on MI300X, MI325X, and MI355X.


Note

High Risk
High risk because it updates LLVM/JIT codegen paths across AMDGPU/CUDA/CPU/DX12 backends (new pass manager, opaque pointers, intrinsics changes), which can affect correctness and performance on multiple platforms and toolchains.

Overview
LLVM 20 enablement (ROCm-focused): Updates build and CI tooling to recognize Clang/LLVM up to v20 and adjusts the workflow scripts to prefer OS-provided LLVM/CUDA paths, with special handling for AMDGPU/ROCm builds.

Codegen/JIT modernization: Migrates multiple backends (AMDGPU, CUDA, CPU, DX12) away from legacy LLVM pass managers and typed pointers toward newer APIs (NPM PassBuilder, opaque pointer usage, updated Host/Triple headers, and CUDA ldg replacement). Includes LLVM-version conditionals to keep compatibility across LLVM 16–20.

API + usability additions: Adds erf/erfc as first-class unary ops from Python through IR and LLVM codegen, tweaks microbenchmark runner to support --arch amdgpu and selecting a single plan, and adds a ROCm multi-stage Dockerfile.rocm plus a new Sphinx/ReadTheDocs documentation set for ROCm installation/examples.

Reviewed by Cursor Bugbot for commit f47d1b8. Bugbot is set up for automated code reviews on this repo. Configure here.

tmm77 and others added 30 commits April 29, 2025 17:14
Parameterize microbenchmarks and vulkan sdk update
fix: Patch to avoid the need to fetch source to build Taichi wheel
Taichi Dockerfile
Co-authored-by: Bhavesh Lad <Bhavesh.Lad@amd.com>
Co-authored-by: Tiffany Mintz <tiffany.mintz@amd.com>
…TX handling, and implement new pass manager setup
 from johnnynunez/taichi master branch; some of the changes from these were captured in the previous commit to rocm/taichi
Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 4 potential issues.

Fix All in Cursor

Reviewed by Cursor Bugbot for commit f47d1b8. Configure here.

// but to insert passes in the middle, we construct it manually. A simpler way is to
// use `parsePassPipeline`. For now, we build the default pipeline first.
if (config.opt_level > 0) {
MPM = PB.buildPerModuleDefaultPipeline(opt_level);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DX12 intrinsic lowering pass lost on reassignment

High Severity

When config.opt_level > 0, MPM is reassigned via MPM = PB.buildPerModuleDefaultPipeline(opt_level), which completely discards the previously added createTaichiIntrinsicLowerPass. The original code added this pass first, then populated optimization passes on the same manager. Now the intrinsic lowering pass never runs for DX12 when optimizations are enabled.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit f47d1b8. Configure here.


machine_gen_gcn->registerPassBuilderCallbacks(module_gen_gcn_pass_manager);

builder.run(*module_clone, MAM);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMDGPU GCN output empty for LLVM 17+

Medium Severity

In the print_kernel_amdgcn path for LLVM_VERSION_MAJOR >= 17, the code sets up a new pass manager and runs optimization passes on the cloned module, but never calls addPassesToEmitFile to write assembly to llvm_stream_gcn. The gcnstr buffer remains empty, so the written GCN file will contain no content. The legacy path correctly emits assembly via addPassesToEmitFile.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit f47d1b8. Configure here.

if ((u.system, u.machine) not in (("Linux", "arm64"), ("Linux", "aarch64"))) and not (cmake_args.get_effective("TI_WITH_AMDGPU")):
os.environ["LLVM_DIR"] = "/usr/lib/llvm-20/cmake"
os.environ["CUDA_HOME"] = "/usr/local/cuda"
os.environ["CPATH"] = "/usr/local/cuda/include"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LLVM_DIR hardcoded to Linux path for all platforms

Medium Severity

The final LLVM_DIR assignment unconditionally sets it to /usr/lib/llvm-20/cmake for all non-ARM-Linux, non-AMDGPU platforms, including macOS and Windows. The original code used str(out) which pointed to the platform-specific downloaded LLVM path. This overwrites the correct out-based paths for Darwin and Windows, breaking LLVM discovery on those platforms. Similarly, CUDA_HOME and CPATH are set to Linux-specific paths.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit f47d1b8. Configure here.

Comment thread docs/conf.py
f.read())
if not match:
raise ValueError("VERSION not found!")
version_number = match[1]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docs conf.py searches for nonexistent CMake function

Medium Severity

The docs/conf.py searches for rocm_setup_version(VERSION ...) in CMakeLists.txt, but the project's CMakeLists.txt does not contain this function call. This causes a ValueError("VERSION not found!") to be raised every time the documentation is built, completely breaking the docs build pipeline.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit f47d1b8. Configure here.

@tmm77 tmm77 changed the title LLVM 20 updates for AMD MI3xx GPUs [amdgpu] LLVM 20 updates for AMD MI3xx GPUs Apr 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants