[DeepEP V2] Fill invalid recv_topk_idx with -1 by WoosukKwon · Pull Request #46432 · vllm-project/vllm

WoosukKwon · 2026-06-23T01:07:21Z

DeepEP V2 can currently leave portions of the recv_topk_idx buffer uninitialized. Some MoE backends (e.g., triton_unfused) may then interpret those uninitialized entries as valid slots. This PR fixes the issue by using a fused Triton kernel to fill invalid slots with -1.

In do_expand=False (decode/cudagraph) mode, DeepEPV2 dispatch writes only rows [0, num_recv_tokens) of the worst-case-allocated recv buffer; the rest is left UNINITIALIZED. The previous globalization only added the rank expert offset to local ids >= 0, leaving the uninitialized padding rows with stale contents that can alias valid expert ids. Experts that build routing over *all* rows (e.g. the Triton MoE backend's make_routing_data) then treat them as real routed tokens, polluting the per-expert token lists and corrupting real tokens. Replace the torch where-chain with a fused Triton kernel that, in one pass, converts valid local ids to global and forces everything else to -1: non-local / out-of-range expert slots, and every row past num_recv_tokens (read on-device from the recv prefix sum, so it stays cudagraph-safe). Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>

Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>

Signed-off-by: Woosuk Kwon <woosuk@inferact.ai> Signed-off-by: Qiang Li <qiang.li2@amd.com>

WoosukKwon requested review from mgoin, pavanimajety and zyongye as code owners June 23, 2026 01:07

WoosukKwon force-pushed the woosuk/deepep-v2-globalize-recv-topk branch from 6171f86 to 0bd0f36 Compare June 23, 2026 01:12

WoosukKwon merged commit 04c2a8d into main Jun 23, 2026
10 checks passed

WoosukKwon deleted the woosuk/deepep-v2-globalize-recv-topk branch June 23, 2026 04:45

nkzhenhua pushed a commit to nkzhenhua/vllm that referenced this pull request Jun 24, 2026

[DeepEP V2] Fill invalid recv_topk_idx with -1 (vllm-project#46432)

fa676b6

Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>

qli88 pushed a commit to qli88/vllm that referenced this pull request Jun 26, 2026

[DeepEP V2] Fill invalid recv_topk_idx with -1 (vllm-project#46432)

581a738

Signed-off-by: Woosuk Kwon <woosuk@inferact.ai> Signed-off-by: Qiang Li <qiang.li2@amd.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[DeepEP V2] Fill invalid recv_topk_idx with -1#46432

[DeepEP V2] Fill invalid recv_topk_idx with -1#46432
WoosukKwon merged 1 commit into
mainfrom
woosuk/deepep-v2-globalize-recv-topk

WoosukKwon commented Jun 23, 2026

Uh oh!

Labels

1 participant

Uh oh!

Uh oh!

Conversation

WoosukKwon commented Jun 23, 2026

Uh oh!

Labels

1 participant