Skip to content

[RFC] Fine-tuning on Intel hardware in the PyTorch ecosystem — where to invest? #773

@DamianSzwichtenberg

Description

@DamianSzwichtenberg

Motivation

We (Intel) want to enable fine-tuning on Intel GPU hardware within the PyTorch ecosystem. We started with torchforge and have been contributing device-agnostic improvements. Now we need to understand where to focus next.

Work completed

What Status Description
#749 — Make SFT hardware-agnostic Merged Replaced torch.cuda.* with torch.accelerator.*, introduced DeviceProxy for device counting and env var mapping across backends. Updated tests accordingly.
#760 — XPU: add install script and docs Open (awaiting review) Adds scripts/install_xpu.sh following the install_rocm.sh pattern.
#759 — SFT checkpoint bug Open Bug found during testing, reported upstream.

Next goal: GRPO — but where?

Our next target is enabling the GRPO workflow on Intel hardware. GRPO has a deeper stack than SFT — it relies on Monarch actors, TorchStore (RDMA-based weight sync), and vLLM. Some of these have CUDA-specific paths.

The question is where this work should land. We've noticed that torchforge activity has slowed down, while torchtitan has added its own RL support via experiments/rl — an alternative GRPO implementation using the same core dependencies (Monarch, TorchStore, vLLM) but directly within torchtitan.

We'd like to understand: is torchforge still the right place to invest, or should we shift our fine-tuning enablement efforts toward torchtitan?

Any guidance from maintainers would be greatly appreciated.

/cc @felipemello1

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions