Field Report: MuseTalk V1.5 working on RTX 5060 Ti (Blackwell sm_120) with Python 3.12 + mediapipe patch

## Summary

Got MuseTalk V1.5 running end-to-end on an **RTX 5060 Ti** (Blackwell, sm_120, 16GB VRAM) with **Python 3.12** on Ubuntu 24.04. Sharing findings for the community since Blackwell GPU + Python 3.12 is a common setup that currently doesn't work out of the box.

## Two Issues & Solutions

### 1. PyTorch + Blackwell (sm_120)

MuseTalk recommends PyTorch 2.0.1+cu118, but Blackwell GPUs need **cu128+** for native sm_120 kernel support.

| PyTorch | CUDA | sm_120? |
|---------|------|---------|
| 2.6.0+cu124 | 12.4 | ❌ `no kernel image` errors |
| 2.10.0+cu126 | 12.6 | ❌ Same errors |
| **2.10.0+cu128** | **12.8** | ✅ **Works!** |

```bash
pip install torch==2.10.0 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128
```

### 2. mmpose/mmcv on Python 3.12

mmcv has **no pre-built wheels for Python 3.12** on any CUDA index. Building from source fails due to `pkg_resources` removal in Python 3.12.

**Workaround:** Replace mmpose with **mediapipe** (Tasks API) + **face_alignment** in `musetalk/utils/preprocessing.py`:

- `pip install mediapipe face-alignment`
- Use mediapipe's 478-point face mesh instead of mmpose's wholebody model
- Map mediapipe landmarks → MuseTalk's nose bridge indices (28-30):
  - `face_lm[28] = pts_478[6]` (nose bridge top)
  - `face_lm[29] = pts_478[197]` (nose bridge mid)
  - `face_lm[30] = pts_478[195]` (nose bridge lower)
- Use `fa.get_landmarks_from_image()` instead of deprecated `get_detections_for_batch()`

### Performance

- 7sec audio → 30sec inference → MP4 output
- ~3.6GB VRAM for MuseTalk models
- Reference image face detection cached after first call

## Suggestion

Consider adding mediapipe as an alternative preprocessing backend for users on Python 3.12+ where mmpose isn't installable. Happy to contribute a PR if there's interest.

**Setup:** RTX 5060 Ti 16GB | Ubuntu 24.04 | Python 3.12.3 | PyTorch 2.10.0+cu128 | MuseTalk V1.5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Field Report: MuseTalk V1.5 working on RTX 5060 Ti (Blackwell sm_120) with Python 3.12 + mediapipe patch #409

Summary

Two Issues & Solutions

1. PyTorch + Blackwell (sm_120)

2. mmpose/mmcv on Python 3.12

Performance

Suggestion

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

PyTorch	CUDA	sm_120?
2.6.0+cu124	12.4	❌ `no kernel image` errors
2.10.0+cu126	12.6	❌ Same errors
2.10.0+cu128	12.8	✅ Works!

Field Report: MuseTalk V1.5 working on RTX 5060 Ti (Blackwell sm_120) with Python 3.12 + mediapipe patch #409

Description

Summary

Two Issues & Solutions

1. PyTorch + Blackwell (sm_120)

2. mmpose/mmcv on Python 3.12

Performance

Suggestion

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions