SpecDec Bench Tweaks by benchislett · Pull Request #939 · NVIDIA/Model-Optimizer

benchislett · 2026-02-26T15:03:42Z

What does this PR do?

Add HumanEval to SpecDec Bench
- Skip first token since prefill doesn't draft tokens. Important for HumanEval since completions are short, so even one step can affect the average AR. Also skip last token since EOS interrupts speculation
Add parallel drafting support, which vLLM uses e.g. for P-EAGLE

Example usage:

mkdir results
python3 run.py --model_dir openai/gpt-oss-20b --tokenizer openai/gpt-oss-20b --draft_model_dir amazon/GPT-OSS-20B-P-EAGLE --dataset humaneval --dataset_path openai/openai_humaneval --tp_size 1 --ep_size 1 --draft_length 15 --output_length 256 --engine VLLM --concurrency 1 --num_requests 164 --show_progress --speculative_algorithm EAGLE3 --postprocess gptoss --save_dir results/ --parallel_drafting > results/log.txt 2>&1

Summary by CodeRabbit

New Features
- HumanEval dataset is now available for use in benchmarks
- Added a --parallel_drafting command-line option to enable parallel drafting during model inference
Bug Fixes
- Acceptance-rate metric preprocessing refined to trim extraneous tokens for more accurate per-turn measurements

Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>

coderabbitai · 2026-02-26T15:03:57Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 79292c5e-f888-4a89-bbb1-8d5875a2cb14

📥 Commits

Reviewing files that changed from the base of the PR and between 6c5c507 and 8815a70.

📒 Files selected for processing (2)

examples/specdec_bench/specdec_bench/datasets/humaneval.py
examples/specdec_bench/specdec_bench/models/vllm.py

🚧 Files skipped from review as they are similar to previous changes (2)

examples/specdec_bench/specdec_bench/models/vllm.py
examples/specdec_bench/specdec_bench/datasets/humaneval.py

📝 Walkthrough

Walkthrough

Adds HumanEval dataset support, a --parallel_drafting CLI flag wired into model/speculative-decoding, and trims turn data in acceptance rate computation before per-turn calculations.

Changes

Cohort / File(s)	Summary
HumanEval dataset `examples/specdec_bench/specdec_bench/datasets/humaneval.py`, `examples/specdec_bench/specdec_bench/datasets/__init__.py`	New `HumanEval` Dataset adapter, `format_prompt` helper, and export of `HumanEval` in package `__all__`. Loads dataset, builds `Request` objects, and limits samples by `num_samples`.
CLI & wiring `examples/specdec_bench/run.py`	Added `--parallel_drafting` CLI flag, added `"humaneval": datasets.HumanEval` to available datasets, and passed `parallel_drafting=args.parallel_drafting` to model constructor.
Model / speculative decoding `examples/specdec_bench/specdec_bench/models/vllm.py`	Propagates `parallel_drafting` into speculative decoding config when present (sets `specdec["parallel_drafting"]=True` if `kwargs.get("parallel_drafting")` is truthy and `specdec` exists).
Metrics adjustment `examples/specdec_bench/specdec_bench/metrics/acceptance_rate.py`	`process_final` now trims per-turn sequences: drops prefill when first element <= 1 and removes final element (EOS) before computing per-turn acceptance rate and histogram lengths.

Sequence Diagram(s)

sequenceDiagram
    participant CLI as CLI (run.py)
    participant Dataset as HumanEval dataset loader
    participant Model as Model constructor (vllm)
    participant SpecDec as SpeculativeDecoding config
    CLI->>Dataset: select dataset "humaneval" and load samples
    Dataset-->>CLI: list of Request objects
    CLI->>Model: instantiate with parallel_drafting flag
    Model->>SpecDec: if specdec present and parallel_drafting true
    SpecDec-->>Model: specdec config includes parallel_drafting
    Model-->>CLI: model ready (with speculative decoding config)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check	❓ Inconclusive	The title 'SpecDec Bench Tweaks' is vague and generic. While it references the SpecDec benchmark, it does not clearly indicate the main changes: HumanEval dataset addition, parallel drafting support, and acceptance rate calculation adjustments.	Use a more specific title that captures the primary changes, such as 'Add HumanEval dataset and parallel drafting support to SpecDec Bench' or similar.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Security Anti-Patterns	✅ Passed	The PR does not introduce any security anti-patterns. HumanEval dataset loads safely without unsafe deserialization, parallel_drafting is properly configurable, no eval()/exec() on untrusted input present, and all dependencies use permissive licenses.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch bchislett/specbench-tweaks

📝 Coding Plan

Generate coding plan for human review comments

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Tip

You can customize the high-level summary generated by CodeRabbit.

Configure the reviews.high_level_summary_instructions setting to provide custom instructions for generating the high-level summary.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

examples/specdec_bench/specdec_bench/datasets/humaneval.py (1)

24-28: Minor: Inaccurate inline comment.

The comment # list of list of questions. is misleading—self.data is actually a list[Request], not a nested list. Consider updating or removing the comment.

📝 Suggested fix

 class HumanEval(Dataset):
     def __init__(self, path, num_samples=164, **kwargs):
-        self.data: list[Request] = []  # list of list of questions.
+        self.data: list[Request] = []
         self.num_samples = num_samples
         self._preprocess(path)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@examples/specdec_bench/specdec_bench/datasets/humaneval.py` around lines 24 -
28, Update the inaccurate inline comment on the attribute in class HumanEval:
the field self.data is declared as a list[Request] (not a nested list), so
change or remove the comment "// list of list of questions." to accurately
reflect that self.data is a flat list of Request objects; locate the declaration
in the __init__ of HumanEval and adjust the comment to something like "list of
Request" or remove it entirely.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@examples/specdec_bench/specdec_bench/models/vllm.py`:
- Around line 68-70: The code currently sets specdec["parallel_drafting"] = True
when kwargs.get("parallel_drafting") is truthy but doesn't handle the case where
specdec is None; update the block around kwargs.get("parallel_drafting") to
first ensure specdec is a dict (e.g., if specdec is None, assign an empty dict)
or only perform the item assignment when specdec is not None, so that the
assignment to specdec["parallel_drafting"] cannot raise a TypeError; reference
the specdec variable and the kwargs.get("parallel_drafting") check to locate
where to add the guard/initialization.

---

Nitpick comments:
In `@examples/specdec_bench/specdec_bench/datasets/humaneval.py`:
- Around line 24-28: Update the inaccurate inline comment on the attribute in
class HumanEval: the field self.data is declared as a list[Request] (not a
nested list), so change or remove the comment "// list of list of questions." to
accurately reflect that self.data is a flat list of Request objects; locate the
declaration in the __init__ of HumanEval and adjust the comment to something
like "list of Request" or remove it entirely.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ef5a2df and 6c5c507.

📒 Files selected for processing (5)

examples/specdec_bench/run.py
examples/specdec_bench/specdec_bench/datasets/__init__.py
examples/specdec_bench/specdec_bench/datasets/humaneval.py
examples/specdec_bench/specdec_bench/metrics/acceptance_rate.py
examples/specdec_bench/specdec_bench/models/vllm.py

examples/specdec_bench/specdec_bench/models/vllm.py

codecov · 2026-02-26T15:14:44Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 70.09%. Comparing base (1070d89) to head (8815a70).

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #939      +/-   ##
==========================================
- Coverage   70.10%   70.09%   -0.02%     
==========================================
  Files         221      221              
  Lines       25541    25541              
==========================================
- Hits        17905    17902       -3     
- Misses       7636     7639       +3

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>

copy-pr-bot · 2026-02-26T17:14:48Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

IzzyPutterman · 2026-02-26T17:16:08Z

examples/specdec_bench/specdec_bench/metrics/acceptance_rate.py

+                if len(turn) > 1 and turn[0] <= 1:
+                    turn = turn[1:] # Skip prefill if it is 1 or less, indicating no specdec
+                if len(turn) > 1:
+                    turn = turn[:-1] # Skip final acceptance due to EOS truncating speculation


Only skip if EOS is present?

I think it's reasonable to skip anyways, since truncation for any reason (EOS, length, stop token) might misrepresent the AR since it is not aligned with the draft size per step

examples/specdec_bench/specdec_bench/datasets/humaneval.py

Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>

…Optimizer into bchislett/specbench-tweaks

…weaks

benchislett added 4 commits February 26, 2026 14:56

add HumanEval dataset

0dd6a30

Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>

support parallel drafting (vllm)

3bbf4c0

Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>

add humaneval to available datasets

e06d926

Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>

skip first and last turn for accuracy

6c5c507

Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>

benchislett requested a review from a team as a code owner February 26, 2026 15:03

benchislett requested review from IzzyPutterman and yeyu-nvidia February 26, 2026 15:03

coderabbitai bot reviewed Feb 26, 2026

View reviewed changes

examples/specdec_bench/specdec_bench/models/vllm.py Show resolved Hide resolved

yeyu-nvidia approved these changes Feb 26, 2026

View reviewed changes

resolve bot comment

3c328bf

Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>

IzzyPutterman reviewed Feb 26, 2026

View reviewed changes

IzzyPutterman reviewed Feb 27, 2026

View reviewed changes

examples/specdec_bench/specdec_bench/datasets/humaneval.py Outdated Show resolved Hide resolved

benchislett added 3 commits March 15, 2026 21:47

safe import for datasets dependency

6da6828

Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>

Merge branch 'bchislett/specbench-tweaks' of github.com:NVIDIA/Model-…

9c78b75

…Optimizer into bchislett/specbench-tweaks

Merge remote-tracking branch 'origin/main' into bchislett/specbench-t…

8815a70

…weaks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SpecDec Bench Tweaks#939

SpecDec Bench Tweaks#939
benchislett wants to merge 8 commits intomainfrom
bchislett/specbench-tweaks

benchislett commented Feb 26, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Feb 26, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

codecov bot commented Feb 26, 2026 •

edited

Loading

Uh oh!

copy-pr-bot bot commented Feb 26, 2026

Uh oh!

IzzyPutterman Feb 26, 2026

Uh oh!

benchislett Mar 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

benchislett commented Feb 26, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

codecov bot commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

copy-pr-bot bot commented Feb 26, 2026

Uh oh!

IzzyPutterman Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

benchislett Mar 15, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

benchislett commented Feb 26, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 26, 2026 •

edited

Loading

codecov bot commented Feb 26, 2026 •

edited

Loading