Motivation
We (Intel) want to enable fine-tuning on Intel GPU hardware within the PyTorch ecosystem. We started with torchforge and have been contributing device-agnostic improvements. Now we need to understand where to focus next.
Work completed
Next goal: GRPO — but where?
Our next target is enabling the GRPO workflow on Intel hardware. GRPO has a deeper stack than SFT — it relies on Monarch actors, TorchStore (RDMA-based weight sync), and vLLM. Some of these have CUDA-specific paths.
The question is where this work should land. We've noticed that torchforge activity has slowed down, while torchtitan has added its own RL support via experiments/rl — an alternative GRPO implementation using the same core dependencies (Monarch, TorchStore, vLLM) but directly within torchtitan.
We'd like to understand: is torchforge still the right place to invest, or should we shift our fine-tuning enablement efforts toward torchtitan?
Any guidance from maintainers would be greatly appreciated.
/cc @felipemello1
Motivation
We (Intel) want to enable fine-tuning on Intel GPU hardware within the PyTorch ecosystem. We started with torchforge and have been contributing device-agnostic improvements. Now we need to understand where to focus next.
Work completed
torch.cuda.*withtorch.accelerator.*, introducedDeviceProxyfor device counting and env var mapping across backends. Updated tests accordingly.scripts/install_xpu.shfollowing theinstall_rocm.shpattern.Next goal: GRPO — but where?
Our next target is enabling the GRPO workflow on Intel hardware. GRPO has a deeper stack than SFT — it relies on Monarch actors, TorchStore (RDMA-based weight sync), and vLLM. Some of these have CUDA-specific paths.
The question is where this work should land. We've noticed that torchforge activity has slowed down, while torchtitan has added its own RL support via
experiments/rl— an alternative GRPO implementation using the same core dependencies (Monarch, TorchStore, vLLM) but directly within torchtitan.We'd like to understand: is torchforge still the right place to invest, or should we shift our fine-tuning enablement efforts toward torchtitan?
Any guidance from maintainers would be greatly appreciated.
/cc @felipemello1