Skip to content

[Bugfix] Fix EPLB initialization for VLM wrapper models#39805

Merged
ywang96 merged 5 commits into
mainfrom
fix-eplb-vlm
May 14, 2026
Merged

[Bugfix] Fix EPLB initialization for VLM wrapper models#39805
ywang96 merged 5 commits into
mainfrom
fix-eplb-vlm

Conversation

@esmeetu

@esmeetu esmeetu commented Apr 14, 2026

Copy link
Copy Markdown
Member

Purpose

EPLB fails for VLM models that wrap a MoE language model (e.g. KimiK25ForConditionalGeneration wrapping DeepseekV2ForCausalLM). The wrapper doesn't implement the MixtureOfExperts protocol, so is_mixture_of_experts(self.model) returns False, add_model() is never called, and the first forward pass crashes with:
ValueError: enable_eplb=True requires expert_load_view != None

Three code paths are affected: load_model() init, eplb_step() runtime assert, and setup_eplb_from_mapping().

Test Plan

  • enable-eplb: true with Kimi-K2.5 (VLM wrapper over DeepseekV2) — previously crashes, now initializes and runs EPLB steps
  • enable-eplb: true with DeepSeek-R1 (native MoE, no wrapper) — no regression
  • enable-eplb: false — no behavior change

Test Result

nvidia/DeepSeek-R1-0528-NVFP4-v2 gsm8k 0.9636


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.
Signed-off-by: esmeetu <jasonailu87@gmail.com>
@esmeetu esmeetu requested a review from njhill as a code owner April 14, 2026 15:31
@mergify mergify Bot added v1 bug Something isn't working labels Apr 14, 2026

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a _moe_model attribute to the GPUModelRunner class to cache the resolved Mixture of Experts (MoE) model, specifically handling cases where the MoE model is nested within a multi-modal wrapper. This change optimizes Expert Parallel Load Balancing (EPLB) by replacing redundant model retrieval and type-checking logic. A review comment suggests refactoring the MoE resolution logic into a dedicated helper method or property to enhance code maintainability and ensure consistency across the class.

Comment thread vllm/v1/worker/gpu_model_runner.py Outdated
Comment on lines +4861 to +4864
if not is_mixture_of_experts(moe_candidate) and isinstance(
moe_candidate, SupportsMultiModal
):
moe_candidate = moe_candidate.get_language_model()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The logic for resolving the MoE model is duplicated in load_model and potentially elsewhere. Consider moving this resolution logic into a helper method or property to improve maintainability and ensure consistency.

@esmeetu

esmeetu commented Apr 15, 2026

Copy link
Copy Markdown
Member Author

@claude review

@ywang96 ywang96 added the ready ONLY add when PR is ready to merge/full CI is needed label Apr 18, 2026
@ywang96 ywang96 enabled auto-merge (squash) April 18, 2026 07:50
esmeetu and others added 2 commits May 13, 2026 23:02
Co-authored-by: Claude <noreply@anthropic.com>

Signed-off-by: esmeetu <jasonailu87@gmail.com>
@ywang96 ywang96 merged commit 77e1421 into main May 14, 2026
64 checks passed
@ywang96 ywang96 deleted the fix-eplb-vlm branch May 14, 2026 02:26
@LopezCastroRoberto

Copy link
Copy Markdown
Contributor

@esmeetu it looks like the ci-bot flagged this PR as a potential cause of the two failing tests on CI: Basic Models Tests (Extra Initialization) and Multi-Modal Models (Extended Generation 1). Would you mind taking a look when you get a chance? Thanks!

@esmeetu

esmeetu commented May 14, 2026

Copy link
Copy Markdown
Member Author

@LopezCastroRoberto Thanks for your reminder! It should be resolved in #42641

@LopezCastroRoberto

Copy link
Copy Markdown
Contributor

Oh, I missed that PR - I think it was opened almost on-par with my comment :) Thanks @esmeetu!

mfylcek pushed a commit to mfylcek/vllm that referenced this pull request May 19, 2026
jhu960213 pushed a commit to jhu960213/vllm that referenced this pull request May 20, 2026
h1t35h pushed a commit to h1t35h/vllm that referenced this pull request May 21, 2026
mvanhorn pushed a commit to mvanhorn/vllm that referenced this pull request Jun 4, 2026
…#39805)

Signed-off-by: esmeetu <jasonailu87@gmail.com>
Signed-off-by: Matt Van Horn <455140+mvanhorn@users.noreply.github.com>
knight0528 pushed a commit to knight0528/vllm that referenced this pull request Jun 8, 2026
tunglinwood pushed a commit to tunglinwood/vllm that referenced this pull request Jun 22, 2026
MingqiWang-coder added a commit to vLLM-HUST/vllm-hust that referenced this pull request Jun 30, 2026
Cherry-pick 62 bugfix/security PRs from upstream vllm-project/vllm main
(2026-05-03 to 2026-06-17), covering scheduler, engine core, model runner,
worker, attention, KV cache, compilation, and structured output fixes.

Security (4): vllm-project#43286 vllm-project#44744 vllm-project#45118 vllm-project#45252
Bugfix (56): vllm-project#35536 vllm-project#36616 vllm-project#38895 vllm-project#39155 vllm-project#39324 vllm-project#39562 vllm-project#39805 vllm-project#40398 vllm-project#40726
vllm-project#40727 vllm-project#40737 vllm-project#40749 vllm-project#40961 vllm-project#41119 vllm-project#41133 vllm-project#41233 vllm-project#41237 vllm-project#41411 vllm-project#41496 vllm-project#41549
vllm-project#41674 vllm-project#41873 vllm-project#41895 vllm-project#42040 vllm-project#42112 vllm-project#42289 vllm-project#42479 vllm-project#42585 vllm-project#42692 vllm-project#42706 vllm-project#42709
vllm-project#42739 vllm-project#42967 vllm-project#43001 vllm-project#43079 vllm-project#43125 vllm-project#43160 vllm-project#43616 vllm-project#43669 vllm-project#43719 vllm-project#43768 vllm-project#43808
vllm-project#43961 vllm-project#43982 vllm-project#43988 vllm-project#43998 vllm-project#44057 vllm-project#44560 vllm-project#44574 vllm-project#44568 vllm-project#44603 vllm-project#44744 vllm-project#45195
vllm-project#45345 vllm-project#45383 vllm-project#45487 vllm-project#45564 vllm-project#45673
Runner fix (2): vllm-project#44568 vllm-project#44603

Skipped: vllm-project#43781 (ROCm-specific, not applicable to Ascend NPU)

Conflict resolutions:
- Manual merge: vllm-project#43286 vllm-project#45118 vllm-project#42112 vllm-project#43160 vllm-project#43719 vllm-project#44560
- Upstream-preferred (-X theirs): vllm-project#43808 vllm-project#43988 vllm-project#42967 vllm-project#35536 vllm-project#45195
- Test files (--theirs): vllm-project#44744 vllm-project#41895 vllm-project#42040 vllm-project#41233 vllm-project#45345 vllm-project#43982

Co-authored-by: GitHub Copilot
Signed-off-by: MingqiWang-coder <mingqiwang@hust.edu.cn>
MingqiWang-coder added a commit to vLLM-HUST/vllm-hust that referenced this pull request Jun 30, 2026
Cherry-pick 62 bugfix/security PRs from upstream vllm-project/vllm main
(2026-05-03 to 2026-06-17), covering scheduler, engine core, model runner,
worker, attention, KV cache, compilation, and structured output fixes.

Security (4): vllm-project#43286 vllm-project#44744 vllm-project#45118 vllm-project#45252
Bugfix (56): vllm-project#35536 vllm-project#36616 vllm-project#38895 vllm-project#39155 vllm-project#39324 vllm-project#39562 vllm-project#39805 vllm-project#40398 vllm-project#40726
vllm-project#40727 vllm-project#40737 vllm-project#40749 vllm-project#40961 vllm-project#41119 vllm-project#41133 vllm-project#41233 vllm-project#41237 vllm-project#41411 vllm-project#41496 vllm-project#41549
vllm-project#41674 vllm-project#41873 vllm-project#41895 vllm-project#42040 vllm-project#42112 vllm-project#42289 vllm-project#42479 vllm-project#42585 vllm-project#42692 vllm-project#42706 vllm-project#42709
vllm-project#42739 vllm-project#42967 vllm-project#43001 vllm-project#43079 vllm-project#43125 vllm-project#43160 vllm-project#43616 vllm-project#43669 vllm-project#43719 vllm-project#43768 vllm-project#43808
vllm-project#43961 vllm-project#43982 vllm-project#43988 vllm-project#43998 vllm-project#44057 vllm-project#44560 vllm-project#44574 vllm-project#44568 vllm-project#44603 vllm-project#44744 vllm-project#45195
vllm-project#45345 vllm-project#45383 vllm-project#45487 vllm-project#45564 vllm-project#45673
Runner fix (2): vllm-project#44568 vllm-project#44603

Skipped: vllm-project#43781 (ROCm-specific, not applicable to Ascend NPU)

Conflict resolutions:
- Manual merge: vllm-project#43286 vllm-project#45118 vllm-project#42112 vllm-project#43160 vllm-project#43719 vllm-project#44560
- Upstream-preferred (-X theirs): vllm-project#43808 vllm-project#43988 vllm-project#42967 vllm-project#35536 vllm-project#45195
- Test files (--theirs): vllm-project#44744 vllm-project#41895 vllm-project#42040 vllm-project#41233 vllm-project#45345 vllm-project#43982

Co-authored-by: GitHub Copilot
Signed-off-by: MingqiWang-coder <mingqiwang@hust.edu.cn>
MingqiWang-coder added a commit to vLLM-HUST/vllm-hust that referenced this pull request Jul 2, 2026
Cherry-pick 62 bugfix/security PRs from upstream vllm-project/vllm main
(2026-05-03 to 2026-06-17), covering scheduler, engine core, model runner,
worker, attention, KV cache, compilation, and structured output fixes.

Security (4): vllm-project#43286 vllm-project#44744 vllm-project#45118 vllm-project#45252
Bugfix (56): vllm-project#35536 vllm-project#36616 vllm-project#38895 vllm-project#39155 vllm-project#39324 vllm-project#39562 vllm-project#39805 vllm-project#40398 vllm-project#40726
Runner fix (2): vllm-project#44568 vllm-project#44603

Skipped: vllm-project#43781 (ROCm-specific, not applicable to Ascend NPU)

Conflict resolutions:
- Manual merge: vllm-project#43286 vllm-project#45118 vllm-project#42112 vllm-project#43160 vllm-project#43719 vllm-project#44560
- Upstream-preferred (-X theirs): vllm-project#43808 vllm-project#43988 vllm-project#42967 vllm-project#35536 vllm-project#45195
- Test files (--theirs): vllm-project#44744 vllm-project#41895 vllm-project#42040 vllm-project#41233 vllm-project#45345 vllm-project#43982

Co-authored-by: GitHub Copilot
Signed-off-by: MingqiWang-coder <mingqiwang@hust.edu.cn>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working ready ONLY add when PR is ready to merge/full CI is needed v1

3 participants