Skip to content

[DeepEP V2] Fill invalid recv_topk_idx with -1#46432

Merged
WoosukKwon merged 1 commit into
mainfrom
woosuk/deepep-v2-globalize-recv-topk
Jun 23, 2026
Merged

[DeepEP V2] Fill invalid recv_topk_idx with -1#46432
WoosukKwon merged 1 commit into
mainfrom
woosuk/deepep-v2-globalize-recv-topk

Conversation

@WoosukKwon

Copy link
Copy Markdown
Collaborator

DeepEP V2 can currently leave portions of the recv_topk_idx buffer uninitialized. Some MoE backends (e.g., triton_unfused) may then interpret those uninitialized entries as valid slots. This PR fixes the issue by using a fused Triton kernel to fill invalid slots with -1.

In do_expand=False (decode/cudagraph) mode, DeepEPV2 dispatch writes only
rows [0, num_recv_tokens) of the worst-case-allocated recv buffer; the rest
is left UNINITIALIZED. The previous globalization only added the rank expert
offset to local ids >= 0, leaving the uninitialized padding rows with stale
contents that can alias valid expert ids. Experts that build routing over
*all* rows (e.g. the Triton MoE backend's make_routing_data) then treat them
as real routed tokens, polluting the per-expert token lists and corrupting
real tokens.

Replace the torch where-chain with a fused Triton kernel that, in one pass,
converts valid local ids to global and forces everything else to -1:
non-local / out-of-range expert slots, and every row past num_recv_tokens
(read on-device from the recv prefix sum, so it stays cudagraph-safe).

Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
@WoosukKwon WoosukKwon force-pushed the woosuk/deepep-v2-globalize-recv-topk branch from 6171f86 to 0bd0f36 Compare June 23, 2026 01:12
@WoosukKwon WoosukKwon merged commit 04c2a8d into main Jun 23, 2026
10 checks passed
@WoosukKwon WoosukKwon deleted the woosuk/deepep-v2-globalize-recv-topk branch June 23, 2026 04:45
nkzhenhua pushed a commit to nkzhenhua/vllm that referenced this pull request Jun 24, 2026
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
qli88 pushed a commit to qli88/vllm that referenced this pull request Jun 26, 2026
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
Signed-off-by: Qiang Li <qiang.li2@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

1 participant