[Bugfix] Fix EPLB initialization for VLM wrapper models by esmeetu · Pull Request #39805 · vllm-project/vllm

esmeetu · 2026-04-14T15:31:38Z

Purpose

EPLB fails for VLM models that wrap a MoE language model (e.g. KimiK25ForConditionalGeneration wrapping DeepseekV2ForCausalLM). The wrapper doesn't implement the MixtureOfExperts protocol, so is_mixture_of_experts(self.model) returns False, add_model() is never called, and the first forward pass crashes with:
ValueError: enable_eplb=True requires expert_load_view != None

Three code paths are affected: load_model() init, eplb_step() runtime assert, and setup_eplb_from_mapping().

Test Plan

enable-eplb: true with Kimi-K2.5 (VLM wrapper over DeepseekV2) — previously crashes, now initializes and runs EPLB steps
enable-eplb: true with DeepSeek-R1 (native MoE, no wrapper) — no regression
enable-eplb: false — no behavior change

Test Result

nvidia/DeepSeek-R1-0528-NVFP4-v2 gsm8k 0.9636

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: esmeetu <jasonailu87@gmail.com>

gemini-code-assist

Code Review

This pull request introduces a _moe_model attribute to the GPUModelRunner class to cache the resolved Mixture of Experts (MoE) model, specifically handling cases where the MoE model is nested within a multi-modal wrapper. This change optimizes Expert Parallel Load Balancing (EPLB) by replacing redundant model retrieval and type-checking logic. A review comment suggests refactoring the MoE resolution logic into a dedicated helper method or property to enhance code maintainability and ensure consistency across the class.

gemini-code-assist · 2026-04-14T15:34:28Z

+        if not is_mixture_of_experts(moe_candidate) and isinstance(
+            moe_candidate, SupportsMultiModal
+        ):
+            moe_candidate = moe_candidate.get_language_model()


The logic for resolving the MoE model is duplicated in load_model and potentially elsewhere. Consider moving this resolution logic into a helper method or property to improve maintainability and ensure consistency.

esmeetu · 2026-04-15T01:21:16Z

@claude review

Co-authored-by: Claude <noreply@anthropic.com> Signed-off-by: esmeetu <jasonailu87@gmail.com>

LopezCastroRoberto · 2026-05-14T13:14:24Z

@esmeetu it looks like the ci-bot flagged this PR as a potential cause of the two failing tests on CI: Basic Models Tests (Extra Initialization) and Multi-Modal Models (Extended Generation 1). Would you mind taking a look when you get a chance? Thanks!

esmeetu · 2026-05-14T13:53:41Z

@LopezCastroRoberto Thanks for your reminder! It should be resolved in #42641

LopezCastroRoberto · 2026-05-14T14:51:28Z

Oh, I missed that PR - I think it was opened almost on-par with my comment :) Thanks @esmeetu!

…#39805) Signed-off-by: esmeetu <jasonailu87@gmail.com>

…#39805) Signed-off-by: esmeetu <jasonailu87@gmail.com> Signed-off-by: Matt Van Horn <455140+mvanhorn@users.noreply.github.com>

…#39805) Signed-off-by: esmeetu <jasonailu87@gmail.com>

Cherry-pick 62 bugfix/security PRs from upstream vllm-project/vllm main (2026-05-03 to 2026-06-17), covering scheduler, engine core, model runner, worker, attention, KV cache, compilation, and structured output fixes. Security (4): vllm-project#43286 vllm-project#44744 vllm-project#45118 vllm-project#45252 Bugfix (56): vllm-project#35536 vllm-project#36616 vllm-project#38895 vllm-project#39155 vllm-project#39324 vllm-project#39562 vllm-project#39805 vllm-project#40398 vllm-project#40726 vllm-project#40727 vllm-project#40737 vllm-project#40749 vllm-project#40961 vllm-project#41119 vllm-project#41133 vllm-project#41233 vllm-project#41237 vllm-project#41411 vllm-project#41496 vllm-project#41549 vllm-project#41674 vllm-project#41873 vllm-project#41895 vllm-project#42040 vllm-project#42112 vllm-project#42289 vllm-project#42479 vllm-project#42585 vllm-project#42692 vllm-project#42706 vllm-project#42709 vllm-project#42739 vllm-project#42967 vllm-project#43001 vllm-project#43079 vllm-project#43125 vllm-project#43160 vllm-project#43616 vllm-project#43669 vllm-project#43719 vllm-project#43768 vllm-project#43808 vllm-project#43961 vllm-project#43982 vllm-project#43988 vllm-project#43998 vllm-project#44057 vllm-project#44560 vllm-project#44574 vllm-project#44568 vllm-project#44603 vllm-project#44744 vllm-project#45195 vllm-project#45345 vllm-project#45383 vllm-project#45487 vllm-project#45564 vllm-project#45673 Runner fix (2): vllm-project#44568 vllm-project#44603 Skipped: vllm-project#43781 (ROCm-specific, not applicable to Ascend NPU) Conflict resolutions: - Manual merge: vllm-project#43286 vllm-project#45118 vllm-project#42112 vllm-project#43160 vllm-project#43719 vllm-project#44560 - Upstream-preferred (-X theirs): vllm-project#43808 vllm-project#43988 vllm-project#42967 vllm-project#35536 vllm-project#45195 - Test files (--theirs): vllm-project#44744 vllm-project#41895 vllm-project#42040 vllm-project#41233 vllm-project#45345 vllm-project#43982 Co-authored-by: GitHub Copilot Signed-off-by: MingqiWang-coder <mingqiwang@hust.edu.cn>

Cherry-pick 62 bugfix/security PRs from upstream vllm-project/vllm main (2026-05-03 to 2026-06-17), covering scheduler, engine core, model runner, worker, attention, KV cache, compilation, and structured output fixes. Security (4): vllm-project#43286 vllm-project#44744 vllm-project#45118 vllm-project#45252 Bugfix (56): vllm-project#35536 vllm-project#36616 vllm-project#38895 vllm-project#39155 vllm-project#39324 vllm-project#39562 vllm-project#39805 vllm-project#40398 vllm-project#40726 Runner fix (2): vllm-project#44568 vllm-project#44603 Skipped: vllm-project#43781 (ROCm-specific, not applicable to Ascend NPU) Conflict resolutions: - Manual merge: vllm-project#43286 vllm-project#45118 vllm-project#42112 vllm-project#43160 vllm-project#43719 vllm-project#44560 - Upstream-preferred (-X theirs): vllm-project#43808 vllm-project#43988 vllm-project#42967 vllm-project#35536 vllm-project#45195 - Test files (--theirs): vllm-project#44744 vllm-project#41895 vllm-project#42040 vllm-project#41233 vllm-project#45345 vllm-project#43982 Co-authored-by: GitHub Copilot Signed-off-by: MingqiWang-coder <mingqiwang@hust.edu.cn>

fix eplb on multimodal models

342b1a7

Signed-off-by: esmeetu <jasonailu87@gmail.com>

esmeetu requested a review from njhill as a code owner April 14, 2026 15:31

mergify Bot added v1 bug Something isn't working labels Apr 14, 2026

gemini-code-assist Bot reviewed Apr 14, 2026

View reviewed changes

Merge branch 'main' into fix-eplb-vlm

faa3b8e

ywang96 added the ready ONLY add when PR is ready to merge/full CI is needed label Apr 18, 2026

Merge branch 'main' into fix-eplb-vlm

3d9068d

ywang96 enabled auto-merge (squash) April 18, 2026 07:50

esmeetu and others added 2 commits May 13, 2026 23:02

Merge branch 'main' into fix-eplb-vlm

68ebf51

Co-authored-by: Claude <noreply@anthropic.com> Signed-off-by: esmeetu <jasonailu87@gmail.com>

Merge branch 'main' into fix-eplb-vlm

87f450e

ywang96 approved these changes May 14, 2026

View reviewed changes

ywang96 merged commit 77e1421 into main May 14, 2026
64 checks passed

ywang96 deleted the fix-eplb-vlm branch May 14, 2026 02:26

vllm-agent mentioned this pull request May 14, 2026

Revert "[Bugfix] Fix EPLB initialization for VLM wrapper models" (#39805) #42636

Closed

esmeetu mentioned this pull request May 14, 2026

[Bugfix] Guard EPLB VLM unwrap for models without language_model #42643

Closed

3 tasks

JasonKeyiL mentioned this pull request May 15, 2026

[Bugfix] Unwrap VLM wrappers for EPLB on Model Runner V2 #42706

Merged

mfylcek pushed a commit to mfylcek/vllm that referenced this pull request May 19, 2026

[Bugfix] Fix EPLB initialization for VLM wrapper models (vllm-project…

aa124db

…#39805) Signed-off-by: esmeetu <jasonailu87@gmail.com>

jhu960213 pushed a commit to jhu960213/vllm that referenced this pull request May 20, 2026

[Bugfix] Fix EPLB initialization for VLM wrapper models (vllm-project…

3fae3ad

…#39805) Signed-off-by: esmeetu <jasonailu87@gmail.com>

h1t35h pushed a commit to h1t35h/vllm that referenced this pull request May 21, 2026

[Bugfix] Fix EPLB initialization for VLM wrapper models (vllm-project…

568418d

…#39805) Signed-off-by: esmeetu <jasonailu87@gmail.com>

SunskyXH mentioned this pull request Jun 1, 2026

[Bugfix] Fix FunASR-Nano crash during initialization #44215

Merged

4 tasks

knight0528 pushed a commit to knight0528/vllm that referenced this pull request Jun 8, 2026

[Bugfix] Fix EPLB initialization for VLM wrapper models (vllm-project…

aa93d2c

…#39805) Signed-off-by: esmeetu <jasonailu87@gmail.com>

tunglinwood pushed a commit to tunglinwood/vllm that referenced this pull request Jun 22, 2026

[Bugfix] Fix EPLB initialization for VLM wrapper models (vllm-project…

64871f7

…#39805) Signed-off-by: esmeetu <jasonailu87@gmail.com>

MingqiWang-coder mentioned this pull request Jul 1, 2026

[Sync] Upstream V1 engine core — 89 PRs (bugfix, scheduler, runner, worker, hardware) vLLM-HUST/vllm-hust#82

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bugfix] Fix EPLB initialization for VLM wrapper models#39805

[Bugfix] Fix EPLB initialization for VLM wrapper models#39805
ywang96 merged 5 commits into
mainfrom
fix-eplb-vlm

esmeetu commented Apr 14, 2026 •

edited

Loading

gemini-code-assist Bot left a comment

gemini-code-assist Bot Apr 14, 2026

esmeetu commented Apr 15, 2026

Uh oh!

LopezCastroRoberto commented May 14, 2026

esmeetu commented May 14, 2026 •

edited

Loading

LopezCastroRoberto commented May 14, 2026

Labels

3 participants

Uh oh!

Uh oh!

Conversation

esmeetu commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

gemini-code-assist Bot Apr 14, 2026

Choose a reason for hiding this comment

esmeetu commented Apr 15, 2026

Uh oh!

LopezCastroRoberto commented May 14, 2026

esmeetu commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

LopezCastroRoberto commented May 14, 2026

Labels

3 participants

esmeetu commented Apr 14, 2026 •

edited

Loading

esmeetu commented May 14, 2026 •

edited

Loading