Skip to content

Merge mbridge distillation for any_model#1036

Open
danielkorzekwa wants to merge 1 commit intodkorzekwa/anymodel_tutorialfrom
dkorzekwa/anymodel_mbridgedist
Open

Merge mbridge distillation for any_model#1036
danielkorzekwa wants to merge 1 commit intodkorzekwa/anymodel_tutorialfrom
dkorzekwa/anymodel_mbridgedist

Conversation

@danielkorzekwa
Copy link

What does this PR do?

Merge anymodel mbridge distillation

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
@danielkorzekwa danielkorzekwa requested a review from a team as a code owner March 13, 2026 17:51
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 13, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

🗂️ Base branches to auto review (3)
  • main
  • release/.*
  • feature/.*

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: c0f420b2-da3a-4690-a92a-03844ee7d19c

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch dkorzekwa/anymodel_mbridgedist
📝 Coding Plan
  • Generate coding plan for human review comments

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty much everything in this PR seems like we should instead merge to M-Bridge. Are we confident enough to upstream these changes?

Comment on lines +39 to +45
# Prepare output directory
output_dir = tmp_path / "distill_output"
output_dir.mkdir(parents=True, exist_ok=True)

# Prepare HF export directory
hf_export_dir = tmp_path / "hf_export"
hf_export_dir.mkdir(parents=True, exist_ok=True)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont think we need to prepare output directories as the script already does that

Comment on lines +49 to +50
nproc_per_node = 1
tp_size = 1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use all available GPUs?

Comment on lines +56 to +59
"--master-addr",
"127.0.0.1", # Explicitly set master address
"--master-port",
str(get_free_port()), # Pass port directly to torchrun to avoid conflicts
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this necessary? I've never had the need to manually set these

Comment on lines +61 to +73
"--student_hf_path",
student_hf_path,
"--teacher_hf_path",
teacher_hf_path,
"--output_dir",
str(output_dir),
"--hf-export-path",
str(hf_export_dir), # Export to HuggingFace format
"--hf-model",
"meta-llama/Llama-3.1-8B-Instruct", # Note: uses hyphen, not underscore
"--tp_size",
str(tp_size),
"--pp_size",
Copy link
Collaborator

@kevalmorabia97 kevalmorabia97 Mar 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use extend_cmd_parts util to simplfiy cmdline arguments using

from _test_utils.examples.run_command import run_example_command, extend_cmd_parts

cmd_parts = extend_cmd_parts(
    ["torchrun", f"--nproc_per_node={nproc_per_node}", "distill_hf.py" "--use_mock_data"], 
    student_hf_path=student_hf_path,
    teacher_hf_path=teacher_hf_path,
    output_dir=output_dir,
    seq_length=128,
    ...
)

run_example_command(cmd_parts, "puzzletron/mbridge_distillation")

It will also handle string conversion of arguments where needed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants