[MRV2][Spec Decode] DFlash by benchislett · Pull Request #44586 · vllm-project/vllm

benchislett · 2026-06-04T22:13:27Z

Purpose

Implement DFlash for ModelRunnerV2, with full cudagraph support.

Performance result:
MRV1:

MRV2:

Running Qwen3 8B FP8 + DFlash on 1xGB200. ~1.2x speedup, from 5.2ms to 4.3ms per step (on independent benchmark, separate from profiling run).

Testing

Ran existing DFlash correctness testing with MRV1 and MRV2, got pass and identical acceptance rates.

Enabled E2E DFlash correctness and AR regression tests for both MRV1 and MRV2, both passing locally

Signed-off-by: Giancarlo Delfin <gdelfin@inferact.ai>

Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>

claude

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

mergify · 2026-06-04T22:14:42Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @benchislett.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>

mergify · 2026-06-04T22:24:24Z

Hi @benchislett, the pre-commit checks have failed. Please run:

uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy failing?

mypy is run differently in CI. If the failure is related to this check, please use the following command to run it locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10

Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>

TheEpicDolphin

This code looks good overall. I left a few suggestions to reduce code duplications, and flagged a potential perf improvement

…mma4mtp

Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>

Signed-off-by: Benjamin Chislett <chislett.ben@gmail.com>

Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>

…nto mrv2-dflash-atop-gemma4mtp

TheEpicDolphin

Looks good to me, thank you! cc: @WoosukKwon

Signed-off-by: Giancarlo Delfin <gdelfin@inferact.ai> Signed-off-by: Benjamin Chislett <bchislett@nvidia.com> Signed-off-by: Benjamin Chislett <chislett.ben@gmail.com> Co-authored-by: Giancarlo Delfin <gdelfin@inferact.ai>

Signed-off-by: Giancarlo Delfin <gdelfin@inferact.ai> Signed-off-by: Benjamin Chislett <bchislett@nvidia.com> Signed-off-by: Benjamin Chislett <chislett.ben@gmail.com> Co-authored-by: Giancarlo Delfin <gdelfin@inferact.ai> Signed-off-by: divineearthly <divineearthly@gmail.com>

Signed-off-by: Giancarlo Delfin <gdelfin@inferact.ai> Signed-off-by: Benjamin Chislett <bchislett@nvidia.com> Signed-off-by: Benjamin Chislett <chislett.ben@gmail.com> Co-authored-by: Giancarlo Delfin <gdelfin@inferact.ai>

TheEpicDolphin and others added 7 commits June 2, 2026 17:18

[Model Runner V2][Spec Decode] Add Gemma4 MTP support

68b97a8

Signed-off-by: Giancarlo Delfin <gdelfin@inferact.ai>

extract ModelBackedSpeculator base from AutoRegressiveSpeculator

ee1d657

Signed-off-by: Giancarlo Delfin <gdelfin@inferact.ai>

address Ben's comments

2a66fa2

Signed-off-by: Giancarlo Delfin <gdelfin@inferact.ai>

fix failed rebase

1ac38a1

Signed-off-by: Giancarlo Delfin <gdelfin@inferact.ai>

rebase + add missing __init__.py file

189ea62

Signed-off-by: Giancarlo Delfin <gdelfin@inferact.ai>

functional implementation of DFlash in MRV2

2c70445

Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>

update dflash for mrv2

02affce

Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>

benchislett requested review from ProExpertProg, WoosukKwon, hmellor, houseroad, mgoin, njhill, robertgshaw2-redhat, tlrmchlsmth, yewentao256 and youkaichao as code owners June 4, 2026 22:13

claude Bot reviewed Jun 4, 2026

View reviewed changes

mergify Bot added nvidia v1 labels Jun 4, 2026

github-project-automation Bot added this to NVIDIA Jun 4, 2026

mergify Bot added the needs-rebase label Jun 4, 2026

Merge branch 'main' into mrv2-dflash-atop-gemma4mtp

7a18e9f

Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>

mergify Bot removed the needs-rebase label Jun 4, 2026

benchislett commented Jun 4, 2026

View reviewed changes

Comment thread vllm/v1/worker/gpu/spec_decode/dflash/cudagraph.py Outdated

benchislett commented Jun 4, 2026

View reviewed changes

Comment thread vllm/v1/worker/gpu/spec_decode/dflash/speculator.py

benchislett commented Jun 4, 2026

View reviewed changes

Comment thread vllm/v1/worker/gpu/spec_decode/dflash/speculator.py Outdated

fix merge conflict issues

64011d1

Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>

cleanup cudagraph init

6a2382b

Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>

WoosukKwon requested a review from TheEpicDolphin June 5, 2026 04:51

WoosukKwon reviewed Jun 5, 2026

View reviewed changes

Comment thread vllm/config/vllm.py Outdated

TheEpicDolphin reviewed Jun 5, 2026

View reviewed changes

Comment thread vllm/v1/worker/gpu/spec_decode/dflash/speculator.py Outdated

Comment thread vllm/v1/worker/gpu/spec_decode/dflash/speculator.py Outdated

Comment thread vllm/v1/worker/gpu/spec_decode/dflash/speculator.py Outdated

benchislett added 5 commits June 9, 2026 14:02

Merge remote-tracking branch 'upstream/main' into mrv2-dflash-atop-ge…

72ca967

…mma4mtp

polish dflash refactor

f317f35

Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>

allow causal dflash

0f4e543

Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>

enable mrv1+mrv2 dflash testing

79936d5

Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>

fix nit

6a1072a

Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>

benchislett commented Jun 9, 2026

View reviewed changes

Comment thread tests/v1/e2e/spec_decode/test_spec_decode.py Outdated

benchislett and others added 3 commits June 9, 2026 16:20

Apply suggestion from @benchislett

66b9584

Signed-off-by: Benjamin Chislett <chislett.ben@gmail.com>

nitpicks, polish

7ceb4f6

Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>

Merge branch 'mrv2-dflash-atop-gemma4mtp' of github.com:CentML/vllm i…

5d5198b

…nto mrv2-dflash-atop-gemma4mtp

TheEpicDolphin approved these changes Jun 9, 2026

View reviewed changes

WoosukKwon added the ready ONLY add when PR is ready to merge/full CI is needed label Jun 9, 2026

WoosukKwon approved these changes Jun 9, 2026

View reviewed changes

github-project-automation Bot moved this to Ready in NVIDIA Jun 9, 2026

WoosukKwon enabled auto-merge (squash) June 9, 2026 22:34

vllm-bot merged commit 0bae1d3 into vllm-project:main Jun 10, 2026
76 of 78 checks passed

github-project-automation Bot moved this from Ready to Done in NVIDIA Jun 10, 2026

benchislett deleted the mrv2-dflash-atop-gemma4mtp branch June 11, 2026 15:25

mgoin mentioned this pull request Jun 30, 2026

[Roadmap] vLLM Roadmap Q2 2026 #39749

Open

75 tasks

MrZ20 mentioned this pull request Jul 1, 2026

[Misc] main2main 0612 vllm-project/vllm-ascend#10459

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[MRV2][Spec Decode] DFlash#44586

[MRV2][Spec Decode] DFlash#44586
vllm-bot merged 18 commits into
vllm-project:mainfrom
CentML:mrv2-dflash-atop-gemma4mtp

benchislett commented Jun 4, 2026 •

edited

Loading

claude Bot left a comment

mergify Bot commented Jun 4, 2026

Uh oh!

Uh oh!

mergify Bot commented Jun 4, 2026

Uh oh!

Uh oh!

TheEpicDolphin left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

TheEpicDolphin left a comment

Uh oh!

Labels

4 participants

Uh oh!

Uh oh!

Conversation

benchislett commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Testing

claude Bot left a comment

Choose a reason for hiding this comment

Claude Code Review

mergify Bot commented Jun 4, 2026

Uh oh!

Uh oh!

mergify Bot commented Jun 4, 2026

Uh oh!

Uh oh!

TheEpicDolphin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

TheEpicDolphin left a comment

Choose a reason for hiding this comment

Uh oh!

Labels

4 participants

benchislett commented Jun 4, 2026 •

edited

Loading