[Bugfix] Fix EPLB initialization for VLM wrapper models#39805
Conversation
Signed-off-by: esmeetu <jasonailu87@gmail.com>
There was a problem hiding this comment.
Code Review
This pull request introduces a _moe_model attribute to the GPUModelRunner class to cache the resolved Mixture of Experts (MoE) model, specifically handling cases where the MoE model is nested within a multi-modal wrapper. This change optimizes Expert Parallel Load Balancing (EPLB) by replacing redundant model retrieval and type-checking logic. A review comment suggests refactoring the MoE resolution logic into a dedicated helper method or property to enhance code maintainability and ensure consistency across the class.
| if not is_mixture_of_experts(moe_candidate) and isinstance( | ||
| moe_candidate, SupportsMultiModal | ||
| ): | ||
| moe_candidate = moe_candidate.get_language_model() |
|
@claude review |
Co-authored-by: Claude <noreply@anthropic.com> Signed-off-by: esmeetu <jasonailu87@gmail.com>
|
@esmeetu it looks like the ci-bot flagged this PR as a potential cause of the two failing tests on CI: |
|
@LopezCastroRoberto Thanks for your reminder! It should be resolved in #42641 |
|
Oh, I missed that PR - I think it was opened almost on-par with my comment :) Thanks @esmeetu! |
…#39805) Signed-off-by: esmeetu <jasonailu87@gmail.com>
…#39805) Signed-off-by: esmeetu <jasonailu87@gmail.com>
…#39805) Signed-off-by: esmeetu <jasonailu87@gmail.com>
…#39805) Signed-off-by: esmeetu <jasonailu87@gmail.com> Signed-off-by: Matt Van Horn <455140+mvanhorn@users.noreply.github.com>
…#39805) Signed-off-by: esmeetu <jasonailu87@gmail.com>
…#39805) Signed-off-by: esmeetu <jasonailu87@gmail.com>
Cherry-pick 62 bugfix/security PRs from upstream vllm-project/vllm main (2026-05-03 to 2026-06-17), covering scheduler, engine core, model runner, worker, attention, KV cache, compilation, and structured output fixes. Security (4): vllm-project#43286 vllm-project#44744 vllm-project#45118 vllm-project#45252 Bugfix (56): vllm-project#35536 vllm-project#36616 vllm-project#38895 vllm-project#39155 vllm-project#39324 vllm-project#39562 vllm-project#39805 vllm-project#40398 vllm-project#40726 vllm-project#40727 vllm-project#40737 vllm-project#40749 vllm-project#40961 vllm-project#41119 vllm-project#41133 vllm-project#41233 vllm-project#41237 vllm-project#41411 vllm-project#41496 vllm-project#41549 vllm-project#41674 vllm-project#41873 vllm-project#41895 vllm-project#42040 vllm-project#42112 vllm-project#42289 vllm-project#42479 vllm-project#42585 vllm-project#42692 vllm-project#42706 vllm-project#42709 vllm-project#42739 vllm-project#42967 vllm-project#43001 vllm-project#43079 vllm-project#43125 vllm-project#43160 vllm-project#43616 vllm-project#43669 vllm-project#43719 vllm-project#43768 vllm-project#43808 vllm-project#43961 vllm-project#43982 vllm-project#43988 vllm-project#43998 vllm-project#44057 vllm-project#44560 vllm-project#44574 vllm-project#44568 vllm-project#44603 vllm-project#44744 vllm-project#45195 vllm-project#45345 vllm-project#45383 vllm-project#45487 vllm-project#45564 vllm-project#45673 Runner fix (2): vllm-project#44568 vllm-project#44603 Skipped: vllm-project#43781 (ROCm-specific, not applicable to Ascend NPU) Conflict resolutions: - Manual merge: vllm-project#43286 vllm-project#45118 vllm-project#42112 vllm-project#43160 vllm-project#43719 vllm-project#44560 - Upstream-preferred (-X theirs): vllm-project#43808 vllm-project#43988 vllm-project#42967 vllm-project#35536 vllm-project#45195 - Test files (--theirs): vllm-project#44744 vllm-project#41895 vllm-project#42040 vllm-project#41233 vllm-project#45345 vllm-project#43982 Co-authored-by: GitHub Copilot Signed-off-by: MingqiWang-coder <mingqiwang@hust.edu.cn>
Cherry-pick 62 bugfix/security PRs from upstream vllm-project/vllm main (2026-05-03 to 2026-06-17), covering scheduler, engine core, model runner, worker, attention, KV cache, compilation, and structured output fixes. Security (4): vllm-project#43286 vllm-project#44744 vllm-project#45118 vllm-project#45252 Bugfix (56): vllm-project#35536 vllm-project#36616 vllm-project#38895 vllm-project#39155 vllm-project#39324 vllm-project#39562 vllm-project#39805 vllm-project#40398 vllm-project#40726 vllm-project#40727 vllm-project#40737 vllm-project#40749 vllm-project#40961 vllm-project#41119 vllm-project#41133 vllm-project#41233 vllm-project#41237 vllm-project#41411 vllm-project#41496 vllm-project#41549 vllm-project#41674 vllm-project#41873 vllm-project#41895 vllm-project#42040 vllm-project#42112 vllm-project#42289 vllm-project#42479 vllm-project#42585 vllm-project#42692 vllm-project#42706 vllm-project#42709 vllm-project#42739 vllm-project#42967 vllm-project#43001 vllm-project#43079 vllm-project#43125 vllm-project#43160 vllm-project#43616 vllm-project#43669 vllm-project#43719 vllm-project#43768 vllm-project#43808 vllm-project#43961 vllm-project#43982 vllm-project#43988 vllm-project#43998 vllm-project#44057 vllm-project#44560 vllm-project#44574 vllm-project#44568 vllm-project#44603 vllm-project#44744 vllm-project#45195 vllm-project#45345 vllm-project#45383 vllm-project#45487 vllm-project#45564 vllm-project#45673 Runner fix (2): vllm-project#44568 vllm-project#44603 Skipped: vllm-project#43781 (ROCm-specific, not applicable to Ascend NPU) Conflict resolutions: - Manual merge: vllm-project#43286 vllm-project#45118 vllm-project#42112 vllm-project#43160 vllm-project#43719 vllm-project#44560 - Upstream-preferred (-X theirs): vllm-project#43808 vllm-project#43988 vllm-project#42967 vllm-project#35536 vllm-project#45195 - Test files (--theirs): vllm-project#44744 vllm-project#41895 vllm-project#42040 vllm-project#41233 vllm-project#45345 vllm-project#43982 Co-authored-by: GitHub Copilot Signed-off-by: MingqiWang-coder <mingqiwang@hust.edu.cn>
Cherry-pick 62 bugfix/security PRs from upstream vllm-project/vllm main (2026-05-03 to 2026-06-17), covering scheduler, engine core, model runner, worker, attention, KV cache, compilation, and structured output fixes. Security (4): vllm-project#43286 vllm-project#44744 vllm-project#45118 vllm-project#45252 Bugfix (56): vllm-project#35536 vllm-project#36616 vllm-project#38895 vllm-project#39155 vllm-project#39324 vllm-project#39562 vllm-project#39805 vllm-project#40398 vllm-project#40726 Runner fix (2): vllm-project#44568 vllm-project#44603 Skipped: vllm-project#43781 (ROCm-specific, not applicable to Ascend NPU) Conflict resolutions: - Manual merge: vllm-project#43286 vllm-project#45118 vllm-project#42112 vllm-project#43160 vllm-project#43719 vllm-project#44560 - Upstream-preferred (-X theirs): vllm-project#43808 vllm-project#43988 vllm-project#42967 vllm-project#35536 vllm-project#45195 - Test files (--theirs): vllm-project#44744 vllm-project#41895 vllm-project#42040 vllm-project#41233 vllm-project#45345 vllm-project#43982 Co-authored-by: GitHub Copilot Signed-off-by: MingqiWang-coder <mingqiwang@hust.edu.cn>
Purpose
EPLB fails for VLM models that wrap a MoE language model (e.g.
KimiK25ForConditionalGenerationwrappingDeepseekV2ForCausalLM). The wrapper doesn't implement theMixtureOfExpertsprotocol, sois_mixture_of_experts(self.model)returnsFalse,add_model()is never called, and the first forward pass crashes with:ValueError: enable_eplb=True requires expert_load_view != NoneThree code paths are affected:
load_model()init,eplb_step()runtime assert, andsetup_eplb_from_mapping().Test Plan
enable-eplb: truewith Kimi-K2.5 (VLM wrapper over DeepseekV2) — previously crashes, now initializes and runs EPLB stepsenable-eplb: truewith DeepSeek-R1 (native MoE, no wrapper) — no regressionenable-eplb: false— no behavior changeTest Result
nvidia/DeepSeek-R1-0528-NVFP4-v2 gsm8k 0.9636
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.