[WideEP] Integrate DeepEP v2 by tlrmchlsmth · Pull Request #41183 · vllm-project/vllm

tlrmchlsmth · 2026-04-29T02:02:53Z

Unit tests are passing, and it is working e2e. GSM8k appears to be good and will follow up with more thorough e2e tests.

Notes:

I couldn't get this working on an 8xB200 system as DeepEP v2's ElasticBuffer unconditionally asserts NCCL GIN availability even for intra-node NVLink-only. This is a TODO.
This requires NCCL 2.30.4, and PyTorch pins to NCCL 2.28.9, so for now this requires users to manually install NCCL after installing torch via uv pip install "nvidia-nccl-cu13>=2.30.4"

I'm using this Containerfile for now https://github.com/tlrmchlsmth/j-llm-d/blob/deepep-v2-dev-image/dev/Containerfile.deepep-v2

gemini-code-assist

Code Review

This pull request adds support for the DeepEP v2 (ElasticBuffer) all2all backend, including a new DeepEPV2PrepareAndFinalize implementation for MoE kernels, a dedicated All2AllManager, and associated configuration and environment variables. A comprehensive test suite for DeepEP v2 MoE is also introduced. Feedback identifies a critical issue where a strict bfloat16 assertion in the finalization logic would cause crashes for float16 models, recommending a cast to bfloat16 instead to maintain compatibility.

Add a new `deepep_v2` all2all backend that uses the DeepEP v2 ElasticBuffer API (NCCL GIN backend). This provides a unified dispatch/combine interface that works for both intra-node and inter-node expert parallelism with analytical SM calculation. Key changes: - New DeepEPV2PrepareAndFinalize class using do_expand=True for per-expert-contiguous layout with weighted reduction in combine - DeepEPV2All2AllManager with ElasticBuffer handle caching and theoretical SM calculation via get_theoretical_num_sms() - NCCL >= 4.30.4 version gating in has_deep_ep_v2() since the GIN backend requires a newer NCCL than PyTorch typically bundles - FP8 block-quantized dispatch support - DBO (micro-batching) support with async prepare/finalize - Environment variables: VLLM_DEEPEP_V2_ALLOW_HYBRID_MODE, VLLM_DEEPEP_V2_PREFER_OVERLAP, VLLM_DEEPEP_V2_ALLOW_MULTIPLE_REDUCTION - Update DeepEP install script to pin v2.0 release (b306af06af) - Comprehensive multi-process test suite Usage: --all2all-backend=deepep_v2 --enable-expert-parallel Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>

claude

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

use_fp8_dispatch requires the ElasticBuffer to receive FP8 input. In production, this is ensured by pre-quantizing via moe_kernel_quantize_input when is_block_quantized=True. The test was parametrizing use_fp8_dispatch independently of dtype, allowing bf16 input with use_fp8_dispatch=True which triggers a buffer size assertion in DeepEP v2. Fix: - Derive use_fp8_dispatch from dtype (True only for FP8 weights) - Add block_shape=[128, 128] to quant config for FP8 to enable the block quantization path that pre-quantizes input Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>

Test DeepEPV2All2AllManager init, ElasticBuffer handle creation and caching, SM calculation, and destroy/re-create cycle. Skipped when DeepEP v2 or NCCL >= 4.30.4 is not available. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>

torch.cuda.nccl.version() returns the compile-time NCCL version baked into the PyTorch wheel, not the runtime library. Use ctypes to load the actual libnccl.so and call ncclGetVersion() directly, which respects VLLM_NCCL_SO_PATH and LD_LIBRARY_PATH. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>

- Remove explicit two-stream DBO switching (dbo_yield_and_switch_*), use synchronous dispatch/combine (async_with_compute_stream=False). The ElasticBuffer handles comm internally on its comm_stream. - Switch from do_expand=True to do_expand=False for cudagraph compat. do_expand=True requires do_cpu_sync=True (CPU polling loop) which can't be captured in a cudagraph. do_expand=False with do_cpu_sync=False is fully capturable. - Handle worst-case padding from do_cpu_sync=False: use handle.psum_num_recv_tokens_per_scaleup_rank to get real token count, zero out padding rows in recv_x, recv_topk_weights, and expert_x_scale. - Add explicitly_destroy=True to ElasticBuffer creation in all2all.py. - Add cudagraph capture/replay unit test (test_deep_ep_v2_moe_cudagraph). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>

Document the four key design decisions (do_expand=False, do_cpu_sync=False, async_with_compute_stream=False, expert_tokens_meta=None) and why each is necessary for cudagraph + DBO compatibility. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>

mergify · 2026-05-01T02:00:25Z

Hi @tlrmchlsmth, the pre-commit checks have failed. Please run:

uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy failing?

mypy is run differently in CI. If the failure is related to this check, please use the following command to run it locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10

Prefill (use_cudagraph=False): do_expand=True + do_cpu_sync=True — exact memory allocation, per-expert-contiguous layout. Saves GPU memory for large batches. Decode (use_cudagraph=True): do_expand=False + do_cpu_sync=False — worst-case allocation, scattered layout. Fully cudagraph-capturable. Mode selected based on enforce_eager config. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>

mergify · 2026-05-01T02:24:58Z

Hi @tlrmchlsmth, the pre-commit checks have failed. Please run:

uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy failing?

mypy is run differently in CI. If the failure is related to this check, please use the following command to run it locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10

Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>

bnellnm · 2026-06-04T18:51:03Z

        input_ids: torch.Tensor | None = None,
    ) -> tuple[torch.Tensor, torch.Tensor]:
        """Compute routing using fused top-k with bias."""
+        # The topk kernel dispatches dtype based on topk_ids (set by


Is this logic only applicable to this router?

This is a targeted fix that I ran into for the hash routing layers (first few layers of DSv4)

I'm inclined to leave this here as a special case but do you think we should generalize to other routers?

I'm fine with a spot fix but I think it would be simple enough to move the code to base_router.py.

Would type mismatches lead to a crash for other routers if the fix stays here?

bnellnm

LGTM. Just had a few questions.

- Rev DeepEP - Envs changes - Disable DBO for deepepv2 (it doesn't work yet) Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>

mergify · 2026-06-08T14:45:38Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @tlrmchlsmth.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com> # Conflicts: # vllm/model_executor/layers/fused_moe/layer.py

mergify · 2026-06-08T18:12:35Z

Hi @tlrmchlsmth, the pre-commit checks have failed. Please run:

uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

The topk kernel dispatches on topk_ids dtype and assumes input_tokens/hash_indices_table match. Move the cast from the FusedTopKBiasRouter method into fused_topk_bias() so any caller gets it automatically. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>

mergify · 2026-06-08T19:16:37Z

Hi @tlrmchlsmth, the pre-commit checks have failed. Please run:

uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

- Fix invalid kwarg `intermediate_size_per_partition` → `intermediate_size` in test_deepep_v2_moe.py - Replace `FusedMoE.make_expert_params_mapping()` with the standalone `fused_moe_make_expert_params_mapping()` in XPU model/mtp (FusedMoE is now a function, not a class) - Collapse multi-line if to single line in fused_topk_bias_router.py Co-authored-by: Claude <noreply@anthropic.com> Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>

Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com> # Conflicts: # vllm/models/deepseek_v4/xpu/mtp.py

Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>

Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Waqar Ahmed <waqar.ahmed@amd.com>

Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: divineearthly <divineearthly@gmail.com>

Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

mergify Bot added the nvidia label Apr 29, 2026

github-project-automation Bot added this to NVIDIA Apr 29, 2026

gemini-code-assist Bot reviewed Apr 29, 2026

View reviewed changes

Comment thread vllm/model_executor/layers/fused_moe/prepare_finalize/deepep_v2.py Outdated

tlrmchlsmth force-pushed the deepep-v2-integration branch from 48622cb to a2a4b00 Compare April 30, 2026 01:03

tlrmchlsmth marked this pull request as ready for review April 30, 2026 01:11

tlrmchlsmth requested review from ProExpertProg, WoosukKwon, hmellor, houseroad, mgoin, pavanimajety, robertgshaw2-redhat, yewentao256 and youkaichao as code owners April 30, 2026 01:11

claude Bot reviewed Apr 30, 2026

View reviewed changes

tlrmchlsmth and others added 4 commits April 29, 2026 22:56

Fix NCCL version check: 2.30.4, not 4.30.4

67d26f7

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>

tlrmchlsmth force-pushed the deepep-v2-integration branch from 5f78797 to 75149ae Compare April 30, 2026 02:59

bnellnm reviewed Apr 30, 2026

View reviewed changes

Comment thread vllm/model_executor/layers/fused_moe/prepare_finalize/deepep_v2.py Outdated

bnellnm reviewed Apr 30, 2026

View reviewed changes

Comment thread tests/kernels/moe/test_deepep_v2_moe.py Outdated

bnellnm reviewed Apr 30, 2026

View reviewed changes

Comment thread tests/kernels/moe/test_deepep_v2_moe.py Outdated

bnellnm reviewed Apr 30, 2026

View reviewed changes

Comment thread vllm/model_executor/layers/fused_moe/prepare_finalize/deepep_v2.py

tlrmchlsmth and others added 2 commits April 30, 2026 21:51

update

3e40020

Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>

bnellnm reviewed Jun 4, 2026

View reviewed changes

bnellnm approved these changes Jun 4, 2026

View reviewed changes

Updates

5aca03a

- Rev DeepEP - Envs changes - Disable DBO for deepepv2 (it doesn't work yet) Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>

tlrmchlsmth added the ready ONLY add when PR is ready to merge/full CI is needed label Jun 5, 2026

mergify Bot added the needs-rebase label Jun 8, 2026

Merge remote-tracking branch 'origin/main' into deepep-v2-integration

9ec7e47

Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com> # Conflicts: # vllm/model_executor/layers/fused_moe/layer.py

mergify Bot removed the needs-rebase label Jun 8, 2026

tlrmchlsmth and others added 2 commits June 8, 2026 16:48

Merge remote-tracking branch 'origin/main' into deepep-v2-integration

d4a84fb

Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com> # Conflicts: # vllm/models/deepseek_v4/xpu/mtp.py

WoosukKwon merged commit e2f993d into vllm-project:main Jun 9, 2026
96 of 97 checks passed

github-project-automation Bot moved this to Done in NVIDIA Jun 9, 2026

vllm-agent mentioned this pull request Jun 9, 2026

Revert "[WideEP] Integrate DeepEP v2" (#41183) #45008

Closed

tlrmchlsmth mentioned this pull request Jun 11, 2026

[WideEP] Update DeepEP version in Dockerfile #45321

Open

Saddss pushed a commit to Saddss/vllm that referenced this pull request Jun 14, 2026

[WideEP] Integrate DeepEP v2 (vllm-project#41183)

716f5bd

Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

vivek8123 pushed a commit to odh-on-pz/vllm-upstream that referenced this pull request Jun 18, 2026

[WideEP] Integrate DeepEP v2 (vllm-project#41183)

cc43859

Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

tunglinwood pushed a commit to tunglinwood/vllm that referenced this pull request Jun 22, 2026

[WideEP] Integrate DeepEP v2 (vllm-project#41183)

80cb136

Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

nkzhenhua pushed a commit to nkzhenhua/vllm that referenced this pull request Jun 24, 2026

[WideEP] Integrate DeepEP v2 (vllm-project#41183)

f104a83

Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

hmellor mentioned this pull request Jun 30, 2026

[GPT-OSS] Save extra_weight_attrs and use at load_weights time for Marlin kernel #25694

Closed

5 tasks

ohsono pushed a commit to ohsono/vllm that referenced this pull request Jul 3, 2026

[WideEP] Integrate DeepEP v2 (vllm-project#41183)

1d65e04

Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[WideEP] Integrate DeepEP v2#41183

[WideEP] Integrate DeepEP v2#41183
WoosukKwon merged 34 commits into
vllm-project:mainfrom
tlrmchlsmth:deepep-v2-integration

tlrmchlsmth commented Apr 29, 2026 •

edited

Loading

gemini-code-assist Bot left a comment

Uh oh!

claude Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mergify Bot commented May 1, 2026

mergify Bot commented May 1, 2026

bnellnm Jun 4, 2026

tlrmchlsmth Jun 4, 2026

bnellnm Jun 4, 2026

bnellnm left a comment

mergify Bot commented Jun 8, 2026

mergify Bot commented Jun 8, 2026

mergify Bot commented Jun 8, 2026

Uh oh!

Labels

6 participants

Uh oh!

Uh oh!

Conversation

tlrmchlsmth commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Claude Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mergify Bot commented May 1, 2026

mergify Bot commented May 1, 2026

bnellnm Jun 4, 2026

Choose a reason for hiding this comment

tlrmchlsmth Jun 4, 2026

Choose a reason for hiding this comment

bnellnm Jun 4, 2026

Choose a reason for hiding this comment

bnellnm left a comment

Choose a reason for hiding this comment

mergify Bot commented Jun 8, 2026

mergify Bot commented Jun 8, 2026

mergify Bot commented Jun 8, 2026

Uh oh!

Labels

6 participants

tlrmchlsmth commented Apr 29, 2026 •

edited

Loading