Skip to content

[EPLB] Reject NCCL-based EPLB communicators with async EPLB#44978

Merged
tlrmchlsmth merged 6 commits into
vllm-project:mainfrom
neuralmagic:imarkov/eplb-ux-update
Jun 10, 2026
Merged

[EPLB] Reject NCCL-based EPLB communicators with async EPLB#44978
tlrmchlsmth merged 6 commits into
vllm-project:mainfrom
neuralmagic:imarkov/eplb-ux-update

Conversation

@ilmarkov

@ilmarkov ilmarkov commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

NCCL is fundamentally incompatible with async EPLB due to multi-stream conflicts (see pytorch/pytorch#174288). Previously, the auto-selection logic avoided torch_nccl for async EPLB, but users could still explicitly set communicator="torch_nccl" or communicator="pynccl" and hit hangs. This PR adds explicit validation to catch these invalid combinations early

The DeepEP low-latency + async EPLB case that previously needed the NCCL_MAX_CTAS workaround in eplb_utils.py can no longer occur because NCCL-based communicators are now rejected at config validation when async EPLB is enabled. The workaround is simplified to only cover the DeepGEMM Mega MoE case.


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
ilmarkov added 5 commits June 9, 2026 08:46
Signed-off-by: Markov Ilya <markovilya197@gmail.com>
Signed-off-by: Markov Ilya <markovilya197@gmail.com>
Signed-off-by: Markov Ilya <markovilya197@gmail.com>
Signed-off-by: Markov Ilya <markovilya197@gmail.com>
@tlrmchlsmth tlrmchlsmth added the ready ONLY add when PR is ready to merge/full CI is needed label Jun 9, 2026
@tlrmchlsmth tlrmchlsmth enabled auto-merge (squash) June 9, 2026 12:58
@tlrmchlsmth tlrmchlsmth merged commit 6471ec7 into vllm-project:main Jun 10, 2026
82 checks passed
@mayuyuace mayuyuace mentioned this pull request Jun 11, 2026
wcynb1023 pushed a commit to wcynb1023/vllm that referenced this pull request Jun 11, 2026
Saddss pushed a commit to Saddss/vllm that referenced this pull request Jun 14, 2026
vivek8123 pushed a commit to odh-on-pz/vllm-upstream that referenced this pull request Jun 18, 2026
divineearthly pushed a commit to divineearthly/vllm that referenced this pull request Jun 19, 2026
…ject#44978)

Signed-off-by: Markov Ilya <markovilya197@gmail.com>
Signed-off-by: divineearthly <divineearthly@gmail.com>
tunglinwood pushed a commit to tunglinwood/vllm that referenced this pull request Jun 22, 2026
nkzhenhua pushed a commit to nkzhenhua/vllm that referenced this pull request Jun 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed

2 participants