[Bugfix] Guard EPLB VLM unwrap for models without language_model by esmeetu · Pull Request #42643 · vllm-project/vllm

esmeetu · 2026-05-14T13:49:40Z

Summary

Forward-fix for the regression reported in #42636 (open as a draft revert of #39805).

In #39805 the EPLB initialization path was extended to unwrap VLM wrappers via moe_candidate.get_language_model() so wrapper models like KimiK25ForConditionalGeneration would register their inner MoE language model with EPLB. That call was made unconditionally for any SupportsMultiModal model, including non-EPLB loads.

For NemotronParseForConditionalGeneration, although it does call _mark_language_model (vllm/model_executor/models/nemotron_parse.py:593), its marked module MBartDecoderNoPos does not expose embed_input_ids. SupportsMultiModal.get_language_model() therefore falls through both the marked-name lookup and the children-fallback in vllm/model_executor/models/interfaces.py:176-211 and raises NotImplementedError, breaking engine-core init for any user loading NemotronParse, even without EPLB.

This PR:

Gates the MoE-resolution block on self.parallel_config.enable_eplb, so non-EPLB loads never call get_language_model() — restoring pre-[Bugfix] Fix EPLB initialization for VLM wrapper models #39805 behavior for all non-EPLB users.
Defensively wraps the get_language_model() call in try/except NotImplementedError so EPLB-on configs with a VLM that cannot resolve its language model fall through cleanly instead of crashing during load.

Behavior matrix:

Scenario	Before	After
EPLB off, NemotronParse (regression in #42636)	`NotImplementedError`	block skipped
EPLB on, KimiK25 (VLM+MoE) — original #39805 target	unwrap → register	unwrap → register
EPLB on, DeepSeek-style (plain MoE)	`_moe_model = self.model`	unchanged
EPLB on, VLM without resolvable LM	`NotImplementedError`	caught, `_moe_model=None`, skipped

Test plan

Verified the two failing tests cited in Revert "[Bugfix] Fix EPLB initialization for VLM wrapper models" (#39805) #42636 do not use EPLB, so they exercise scenario A (EPLB off → block skipped, no get_language_model() call):
- tests/models/multimodal/generation/test_nemotron_parse.py::test_models[nvidia/NVIDIA-Nemotron-Parse-v1.2]
- tests/models/test_initialization.py::test_can_initialize_large_subset[NemotronParseForConditionalGeneration]
ast.parse syntax check passes; pre-commit hooks pass locally.
CI re-run on these tests (please add ready label).

AI assistance disclosure

Code change drafted with assistance from Claude (Anthropic). The submitter reviewed every changed line and verified correctness against the four scenarios above.

Fix regression introduced in #39805 where `load_model()` unconditionally called `moe_candidate.get_language_model()` on any `SupportsMultiModal`, raising `NotImplementedError` for VLMs whose marked language module does not expose `embed_input_ids` (e.g. `NemotronParseForConditionalGeneration` with `MBartDecoderNoPos`). - Gate the MoE-unwrap on `enable_eplb` so non-EPLB loads never call `get_language_model()`. - Defensively catch `NotImplementedError` for EPLB-on cases where the VLM cannot resolve its language model. Co-authored-by: Claude <noreply@anthropic.com> Signed-off-by: esmeetu <jasonailu87@gmail.com>

claude

Claude Code Review

This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.

_{Tip: disable this comment in your organization's Code Review settings.}

gemini-code-assist

Code Review

This pull request updates the load_model method in GPUModelRunner to conditionally resolve MoE models only when EPLB is enabled. It also introduces error handling for NotImplementedError when calling get_language_model() on multi-modal models, which prevents potential crashes for models like NemotronParse that do not expose a language module. I have no feedback to provide as there were no review comments.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c323462725

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-14T13:54:02Z

+                        try:
+                            moe_candidate = moe_candidate.get_language_model()
+                        except NotImplementedError:
+                            moe_candidate = None


Disable EPLB after MoE unwrap failure

Catching NotImplementedError here sets moe_candidate to None and continues loading, but EPLB remains enabled. In the V1 runner, execute_model always calls eplb_step(), and eplb_step asserts self._moe_model is not None whenever enable_eplb is true, so this path deterministically crashes on the first step for multimodal models whose get_language_model() cannot resolve. Before this change, that configuration failed fast during load; now it fails later with an assertion, so the intended “skip cleanly” behavior is not achieved.

Useful? React with 👍 / 👎.

esmeetu requested a review from njhill as a code owner May 14, 2026 13:49

claude Bot reviewed May 14, 2026

View reviewed changes

mergify Bot added v1 bug Something isn't working labels May 14, 2026

gemini-code-assist Bot reviewed May 14, 2026

View reviewed changes

esmeetu mentioned this pull request May 14, 2026

[Bugfix] Fix EPLB initialization for VLM wrapper models #39805

Merged

8 tasks

chatgpt-codex-connector Bot reviewed May 14, 2026

View reviewed changes

esmeetu added the ready ONLY add when PR is ready to merge/full CI is needed label May 14, 2026

esmeetu mentioned this pull request May 14, 2026

[Bugfix] Fix LM detection for Nemotron Parse #42641

Merged

4 tasks

esmeetu closed this May 14, 2026

esmeetu deleted the fix-eplb-vlm-nemotron branch May 25, 2026 05:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bugfix] Guard EPLB VLM unwrap for models without language_model#42643

[Bugfix] Guard EPLB VLM unwrap for models without language_model#42643
esmeetu wants to merge 1 commit into
mainfrom
fix-eplb-vlm-nemotron

esmeetu commented May 14, 2026 •

edited

Loading

claude Bot left a comment

gemini-code-assist Bot left a comment

chatgpt-codex-connector Bot left a comment

chatgpt-codex-connector Bot May 14, 2026

Labels

1 participant

Uh oh!

Uh oh!

Conversation

esmeetu commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

AI assistance disclosure

claude Bot left a comment

Choose a reason for hiding this comment

Claude Code Review

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

chatgpt-codex-connector Bot May 14, 2026

Choose a reason for hiding this comment

Labels

1 participant

esmeetu commented May 14, 2026 •

edited

Loading