Skip to content

[MRV2][Spec Decode] DFlash#44586

Merged
vllm-bot merged 18 commits into
vllm-project:mainfrom
CentML:mrv2-dflash-atop-gemma4mtp
Jun 10, 2026
Merged

[MRV2][Spec Decode] DFlash#44586
vllm-bot merged 18 commits into
vllm-project:mainfrom
CentML:mrv2-dflash-atop-gemma4mtp

Conversation

@benchislett

@benchislett benchislett commented Jun 4, 2026

Copy link
Copy Markdown
Member

Purpose

Implement DFlash for ModelRunnerV2, with full cudagraph support.

Performance result:
MRV1:
image
MRV2:
image

Running Qwen3 8B FP8 + DFlash on 1xGB200. ~1.2x speedup, from 5.2ms to 4.3ms per step (on independent benchmark, separate from profiling run).

Testing

Ran existing DFlash correctness testing with MRV1 and MRV2, got pass and identical acceptance rates.

Enabled E2E DFlash correctness and AR regression tests for both MRV1 and MRV2, both passing locally

TheEpicDolphin and others added 7 commits June 2, 2026 17:18
Signed-off-by: Giancarlo Delfin <gdelfin@inferact.ai>
Signed-off-by: Giancarlo Delfin <gdelfin@inferact.ai>
Signed-off-by: Giancarlo Delfin <gdelfin@inferact.ai>
Signed-off-by: Giancarlo Delfin <gdelfin@inferact.ai>
Signed-off-by: Giancarlo Delfin <gdelfin@inferact.ai>
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>

@claude claude Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

@mergify

mergify Bot commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @benchislett.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify Bot added the needs-rebase label Jun 4, 2026
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
@mergify mergify Bot removed the needs-rebase label Jun 4, 2026
Comment thread vllm/v1/worker/gpu/spec_decode/dflash/cudagraph.py Outdated
Comment thread vllm/v1/worker/gpu/spec_decode/dflash/speculator.py
@mergify

mergify Bot commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Hi @benchislett, the pre-commit checks have failed. Please run:

uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy failing?
mypy is run differently in CI. If the failure is related to this check, please use the following command to run it locally:
# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
Comment thread vllm/v1/worker/gpu/spec_decode/dflash/speculator.py Outdated
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
@WoosukKwon WoosukKwon requested a review from TheEpicDolphin June 5, 2026 04:51
Comment thread vllm/config/vllm.py Outdated

@TheEpicDolphin TheEpicDolphin left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code looks good overall. I left a few suggestions to reduce code duplications, and flagged a potential perf improvement

Comment thread vllm/v1/worker/gpu/spec_decode/dflash/speculator.py Outdated
Comment thread vllm/v1/worker/gpu/spec_decode/dflash/speculator.py Outdated
Comment thread vllm/v1/worker/gpu/spec_decode/dflash/speculator.py Outdated
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
Comment thread tests/v1/e2e/spec_decode/test_spec_decode.py Outdated
benchislett and others added 3 commits June 9, 2026 16:20
Signed-off-by: Benjamin Chislett <chislett.ben@gmail.com>
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>

@TheEpicDolphin TheEpicDolphin left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, thank you! cc: @WoosukKwon

@WoosukKwon WoosukKwon added the ready ONLY add when PR is ready to merge/full CI is needed label Jun 9, 2026
@github-project-automation github-project-automation Bot moved this to Ready in NVIDIA Jun 9, 2026
@WoosukKwon WoosukKwon enabled auto-merge (squash) June 9, 2026 22:34
@vllm-bot vllm-bot merged commit 0bae1d3 into vllm-project:main Jun 10, 2026
76 of 78 checks passed
@github-project-automation github-project-automation Bot moved this from Ready to Done in NVIDIA Jun 10, 2026
wcynb1023 pushed a commit to wcynb1023/vllm that referenced this pull request Jun 11, 2026
Signed-off-by: Giancarlo Delfin <gdelfin@inferact.ai>
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
Signed-off-by: Benjamin Chislett <chislett.ben@gmail.com>
Co-authored-by: Giancarlo Delfin <gdelfin@inferact.ai>
@benchislett benchislett deleted the mrv2-dflash-atop-gemma4mtp branch June 11, 2026 15:25
Saddss pushed a commit to Saddss/vllm that referenced this pull request Jun 14, 2026
Signed-off-by: Giancarlo Delfin <gdelfin@inferact.ai>
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
Signed-off-by: Benjamin Chislett <chislett.ben@gmail.com>
Co-authored-by: Giancarlo Delfin <gdelfin@inferact.ai>
vivek8123 pushed a commit to odh-on-pz/vllm-upstream that referenced this pull request Jun 18, 2026
Signed-off-by: Giancarlo Delfin <gdelfin@inferact.ai>
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
Signed-off-by: Benjamin Chislett <chislett.ben@gmail.com>
Co-authored-by: Giancarlo Delfin <gdelfin@inferact.ai>
divineearthly pushed a commit to divineearthly/vllm that referenced this pull request Jun 19, 2026
Signed-off-by: Giancarlo Delfin <gdelfin@inferact.ai>
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
Signed-off-by: Benjamin Chislett <chislett.ben@gmail.com>
Co-authored-by: Giancarlo Delfin <gdelfin@inferact.ai>
Signed-off-by: divineearthly <divineearthly@gmail.com>
tunglinwood pushed a commit to tunglinwood/vllm that referenced this pull request Jun 22, 2026
Signed-off-by: Giancarlo Delfin <gdelfin@inferact.ai>
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
Signed-off-by: Benjamin Chislett <chislett.ben@gmail.com>
Co-authored-by: Giancarlo Delfin <gdelfin@inferact.ai>
nkzhenhua pushed a commit to nkzhenhua/vllm that referenced this pull request Jun 24, 2026
Signed-off-by: Giancarlo Delfin <gdelfin@inferact.ai>
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
Signed-off-by: Benjamin Chislett <chislett.ben@gmail.com>
Co-authored-by: Giancarlo Delfin <gdelfin@inferact.ai>
@mgoin mgoin mentioned this pull request Jun 30, 2026
75 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

nvidia ready ONLY add when PR is ready to merge/full CI is needed v1

4 participants