Hi @jonhue, nice work and thanks for open-sourcing the code!
I also encountered package confilcts (H100) following the instructions. Looking at #30, it seems that verl:vllm017.latest works for people. However, the assertion error below still happens at my side, possbily due to some mismatches between the infra in SDPO and vllm017:
AssertionError: local_world_size (2) must be less than or equal to the number of visible devices (1).
In my case, simply using docker pull verlai/verl:vllm012.latest makes training work on H100. Hope this helps folks using H100s :)
Hi @jonhue, nice work and thanks for open-sourcing the code!
I also encountered package confilcts (H100) following the instructions. Looking at #30, it seems that
verl:vllm017.latestworks for people. However, the assertion error below still happens at my side, possbily due to some mismatches between the infra in SDPO and vllm017:In my case, simply using
docker pull verlai/verl:vllm012.latestmakes training work on H100. Hope this helps folks using H100s :)