[amdgpu] LLVM 20 updates for AMD MI3xx GPUs#8793
[amdgpu] LLVM 20 updates for AMD MI3xx GPUs#8793tmm77 wants to merge 52 commits intotaichi-dev:masterfrom
Conversation
Parameterize microbenchmarks and vulkan sdk update
fix: Patch to avoid the need to fetch source to build Taichi wheel
Taichi Dockerfile
Co-authored-by: Bhavesh Lad <Bhavesh.Lad@amd.com> Co-authored-by: Tiffany Mintz <tiffany.mintz@amd.com>
Merge latest upstream
Merge master updates
Merge latest Updates
…TX handling, and implement new pass manager setup
Mintz/llvm20 update
Syncing latest release branch with amd-integration branch
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 4 potential issues.
Reviewed by Cursor Bugbot for commit f47d1b8. Configure here.
| // but to insert passes in the middle, we construct it manually. A simpler way is to | ||
| // use `parsePassPipeline`. For now, we build the default pipeline first. | ||
| if (config.opt_level > 0) { | ||
| MPM = PB.buildPerModuleDefaultPipeline(opt_level); |
There was a problem hiding this comment.
DX12 intrinsic lowering pass lost on reassignment
High Severity
When config.opt_level > 0, MPM is reassigned via MPM = PB.buildPerModuleDefaultPipeline(opt_level), which completely discards the previously added createTaichiIntrinsicLowerPass. The original code added this pass first, then populated optimization passes on the same manager. Now the intrinsic lowering pass never runs for DX12 when optimizations are enabled.
Reviewed by Cursor Bugbot for commit f47d1b8. Configure here.
|
|
||
| machine_gen_gcn->registerPassBuilderCallbacks(module_gen_gcn_pass_manager); | ||
|
|
||
| builder.run(*module_clone, MAM); |
There was a problem hiding this comment.
AMDGPU GCN output empty for LLVM 17+
Medium Severity
In the print_kernel_amdgcn path for LLVM_VERSION_MAJOR >= 17, the code sets up a new pass manager and runs optimization passes on the cloned module, but never calls addPassesToEmitFile to write assembly to llvm_stream_gcn. The gcnstr buffer remains empty, so the written GCN file will contain no content. The legacy path correctly emits assembly via addPassesToEmitFile.
Reviewed by Cursor Bugbot for commit f47d1b8. Configure here.
| if ((u.system, u.machine) not in (("Linux", "arm64"), ("Linux", "aarch64"))) and not (cmake_args.get_effective("TI_WITH_AMDGPU")): | ||
| os.environ["LLVM_DIR"] = "/usr/lib/llvm-20/cmake" | ||
| os.environ["CUDA_HOME"] = "/usr/local/cuda" | ||
| os.environ["CPATH"] = "/usr/local/cuda/include" |
There was a problem hiding this comment.
LLVM_DIR hardcoded to Linux path for all platforms
Medium Severity
The final LLVM_DIR assignment unconditionally sets it to /usr/lib/llvm-20/cmake for all non-ARM-Linux, non-AMDGPU platforms, including macOS and Windows. The original code used str(out) which pointed to the platform-specific downloaded LLVM path. This overwrites the correct out-based paths for Darwin and Windows, breaking LLVM discovery on those platforms. Similarly, CUDA_HOME and CPATH are set to Linux-specific paths.
Reviewed by Cursor Bugbot for commit f47d1b8. Configure here.
| f.read()) | ||
| if not match: | ||
| raise ValueError("VERSION not found!") | ||
| version_number = match[1] |
There was a problem hiding this comment.
Docs conf.py searches for nonexistent CMake function
Medium Severity
The docs/conf.py searches for rocm_setup_version(VERSION ...) in CMakeLists.txt, but the project's CMakeLists.txt does not contain this function call. This causes a ValueError("VERSION not found!") to be raised every time the documentation is built, completely breaking the docs build pipeline.
Reviewed by Cursor Bugbot for commit f47d1b8. Configure here.


Issue: #
Brief Summary
These code changes update LLVM to version 20 for AMD GPU code generation to enable Taichi on MI300X, MI325X, and MI355X.
Note
High Risk
High risk because it updates LLVM/JIT codegen paths across AMDGPU/CUDA/CPU/DX12 backends (new pass manager, opaque pointers, intrinsics changes), which can affect correctness and performance on multiple platforms and toolchains.
Overview
LLVM 20 enablement (ROCm-focused): Updates build and CI tooling to recognize Clang/LLVM up to v20 and adjusts the workflow scripts to prefer OS-provided LLVM/CUDA paths, with special handling for AMDGPU/ROCm builds.
Codegen/JIT modernization: Migrates multiple backends (AMDGPU, CUDA, CPU, DX12) away from legacy LLVM pass managers and typed pointers toward newer APIs (NPM
PassBuilder, opaque pointer usage, updated Host/Triple headers, and CUDAldgreplacement). Includes LLVM-version conditionals to keep compatibility across LLVM 16–20.API + usability additions: Adds
erf/erfcas first-class unary ops from Python through IR and LLVM codegen, tweaks microbenchmark runner to support--arch amdgpuand selecting a single plan, and adds a ROCm multi-stageDockerfile.rocmplus a new Sphinx/ReadTheDocs documentation set for ROCm installation/examples.Reviewed by Cursor Bugbot for commit f47d1b8. Bugbot is set up for automated code reviews on this repo. Configure here.