Summary
Got MuseTalk V1.5 running end-to-end on an RTX 5060 Ti (Blackwell, sm_120, 16GB VRAM) with Python 3.12 on Ubuntu 24.04. Sharing findings for the community since Blackwell GPU + Python 3.12 is a common setup that currently doesn't work out of the box.
Two Issues & Solutions
1. PyTorch + Blackwell (sm_120)
MuseTalk recommends PyTorch 2.0.1+cu118, but Blackwell GPUs need cu128+ for native sm_120 kernel support.
| PyTorch |
CUDA |
sm_120? |
| 2.6.0+cu124 |
12.4 |
❌ no kernel image errors |
| 2.10.0+cu126 |
12.6 |
❌ Same errors |
| 2.10.0+cu128 |
12.8 |
✅ Works! |
pip install torch==2.10.0 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128
2. mmpose/mmcv on Python 3.12
mmcv has no pre-built wheels for Python 3.12 on any CUDA index. Building from source fails due to pkg_resources removal in Python 3.12.
Workaround: Replace mmpose with mediapipe (Tasks API) + face_alignment in musetalk/utils/preprocessing.py:
pip install mediapipe face-alignment
- Use mediapipe's 478-point face mesh instead of mmpose's wholebody model
- Map mediapipe landmarks → MuseTalk's nose bridge indices (28-30):
face_lm[28] = pts_478[6] (nose bridge top)
face_lm[29] = pts_478[197] (nose bridge mid)
face_lm[30] = pts_478[195] (nose bridge lower)
- Use
fa.get_landmarks_from_image() instead of deprecated get_detections_for_batch()
Performance
- 7sec audio → 30sec inference → MP4 output
- ~3.6GB VRAM for MuseTalk models
- Reference image face detection cached after first call
Suggestion
Consider adding mediapipe as an alternative preprocessing backend for users on Python 3.12+ where mmpose isn't installable. Happy to contribute a PR if there's interest.
Setup: RTX 5060 Ti 16GB | Ubuntu 24.04 | Python 3.12.3 | PyTorch 2.10.0+cu128 | MuseTalk V1.5
Summary
Got MuseTalk V1.5 running end-to-end on an RTX 5060 Ti (Blackwell, sm_120, 16GB VRAM) with Python 3.12 on Ubuntu 24.04. Sharing findings for the community since Blackwell GPU + Python 3.12 is a common setup that currently doesn't work out of the box.
Two Issues & Solutions
1. PyTorch + Blackwell (sm_120)
MuseTalk recommends PyTorch 2.0.1+cu118, but Blackwell GPUs need cu128+ for native sm_120 kernel support.
no kernel imageerrors2. mmpose/mmcv on Python 3.12
mmcv has no pre-built wheels for Python 3.12 on any CUDA index. Building from source fails due to
pkg_resourcesremoval in Python 3.12.Workaround: Replace mmpose with mediapipe (Tasks API) + face_alignment in
musetalk/utils/preprocessing.py:pip install mediapipe face-alignmentface_lm[28] = pts_478[6](nose bridge top)face_lm[29] = pts_478[197](nose bridge mid)face_lm[30] = pts_478[195](nose bridge lower)fa.get_landmarks_from_image()instead of deprecatedget_detections_for_batch()Performance
Suggestion
Consider adding mediapipe as an alternative preprocessing backend for users on Python 3.12+ where mmpose isn't installable. Happy to contribute a PR if there's interest.
Setup: RTX 5060 Ti 16GB | Ubuntu 24.04 | Python 3.12.3 | PyTorch 2.10.0+cu128 | MuseTalk V1.5