Merge mbridge distillation for any_model#1036
Merge mbridge distillation for any_model#1036danielkorzekwa wants to merge 1 commit intodkorzekwa/anymodel_tutorialfrom
Conversation
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. 🗂️ Base branches to auto review (3)
Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Comment |
There was a problem hiding this comment.
Pretty much everything in this PR seems like we should instead merge to M-Bridge. Are we confident enough to upstream these changes?
| # Prepare output directory | ||
| output_dir = tmp_path / "distill_output" | ||
| output_dir.mkdir(parents=True, exist_ok=True) | ||
|
|
||
| # Prepare HF export directory | ||
| hf_export_dir = tmp_path / "hf_export" | ||
| hf_export_dir.mkdir(parents=True, exist_ok=True) |
There was a problem hiding this comment.
I dont think we need to prepare output directories as the script already does that
| nproc_per_node = 1 | ||
| tp_size = 1 |
There was a problem hiding this comment.
Why not use all available GPUs?
| "--master-addr", | ||
| "127.0.0.1", # Explicitly set master address | ||
| "--master-port", | ||
| str(get_free_port()), # Pass port directly to torchrun to avoid conflicts |
There was a problem hiding this comment.
is this necessary? I've never had the need to manually set these
| "--student_hf_path", | ||
| student_hf_path, | ||
| "--teacher_hf_path", | ||
| teacher_hf_path, | ||
| "--output_dir", | ||
| str(output_dir), | ||
| "--hf-export-path", | ||
| str(hf_export_dir), # Export to HuggingFace format | ||
| "--hf-model", | ||
| "meta-llama/Llama-3.1-8B-Instruct", # Note: uses hyphen, not underscore | ||
| "--tp_size", | ||
| str(tp_size), | ||
| "--pp_size", |
There was a problem hiding this comment.
We can use extend_cmd_parts util to simplfiy cmdline arguments using
from _test_utils.examples.run_command import run_example_command, extend_cmd_parts
cmd_parts = extend_cmd_parts(
["torchrun", f"--nproc_per_node={nproc_per_node}", "distill_hf.py" "--use_mock_data"],
student_hf_path=student_hf_path,
teacher_hf_path=teacher_hf_path,
output_dir=output_dir,
seq_length=128,
...
)
run_example_command(cmd_parts, "puzzletron/mbridge_distillation")
It will also handle string conversion of arguments where needed
What does this PR do?
Merge anymodel mbridge distillation