Skip to content

[Bugfix] Guard EPLB VLM unwrap for models without language_model#42643

Closed
esmeetu wants to merge 1 commit into
mainfrom
fix-eplb-vlm-nemotron
Closed

[Bugfix] Guard EPLB VLM unwrap for models without language_model#42643
esmeetu wants to merge 1 commit into
mainfrom
fix-eplb-vlm-nemotron

Conversation

@esmeetu

@esmeetu esmeetu commented May 14, 2026

Copy link
Copy Markdown
Member

Summary

Forward-fix for the regression reported in #42636 (open as a draft revert of #39805).

In #39805 the EPLB initialization path was extended to unwrap VLM wrappers via moe_candidate.get_language_model() so wrapper models like KimiK25ForConditionalGeneration would register their inner MoE language model with EPLB. That call was made unconditionally for any SupportsMultiModal model, including non-EPLB loads.

For NemotronParseForConditionalGeneration, although it does call _mark_language_model (vllm/model_executor/models/nemotron_parse.py:593), its marked module MBartDecoderNoPos does not expose embed_input_ids. SupportsMultiModal.get_language_model() therefore falls through both the marked-name lookup and the children-fallback in vllm/model_executor/models/interfaces.py:176-211 and raises NotImplementedError, breaking engine-core init for any user loading NemotronParse, even without EPLB.

This PR:

  • Gates the MoE-resolution block on self.parallel_config.enable_eplb, so non-EPLB loads never call get_language_model() — restoring pre-[Bugfix] Fix EPLB initialization for VLM wrapper models #39805 behavior for all non-EPLB users.
  • Defensively wraps the get_language_model() call in try/except NotImplementedError so EPLB-on configs with a VLM that cannot resolve its language model fall through cleanly instead of crashing during load.

Behavior matrix:

Scenario Before After
EPLB off, NemotronParse (regression in #42636) NotImplementedError block skipped
EPLB on, KimiK25 (VLM+MoE) — original #39805 target unwrap → register unwrap → register
EPLB on, DeepSeek-style (plain MoE) _moe_model = self.model unchanged
EPLB on, VLM without resolvable LM NotImplementedError caught, _moe_model=None, skipped

Test plan

  • Verified the two failing tests cited in Revert "[Bugfix] Fix EPLB initialization for VLM wrapper models" (#39805) #42636 do not use EPLB, so they exercise scenario A (EPLB off → block skipped, no get_language_model() call):
    • tests/models/multimodal/generation/test_nemotron_parse.py::test_models[nvidia/NVIDIA-Nemotron-Parse-v1.2]
    • tests/models/test_initialization.py::test_can_initialize_large_subset[NemotronParseForConditionalGeneration]
  • ast.parse syntax check passes; pre-commit hooks pass locally.
  • CI re-run on these tests (please add ready label).

AI assistance disclosure

Code change drafted with assistance from Claude (Anthropic). The submitter reviewed every changed line and verified correctness against the four scenarios above.

Fix regression introduced in #39805 where `load_model()` unconditionally
called `moe_candidate.get_language_model()` on any `SupportsMultiModal`,
raising `NotImplementedError` for VLMs whose marked language module does
not expose `embed_input_ids` (e.g. `NemotronParseForConditionalGeneration`
with `MBartDecoderNoPos`).

- Gate the MoE-unwrap on `enable_eplb` so non-EPLB loads never call
  `get_language_model()`.
- Defensively catch `NotImplementedError` for EPLB-on cases where the
  VLM cannot resolve its language model.

Co-authored-by: Claude <noreply@anthropic.com>

Signed-off-by: esmeetu <jasonailu87@gmail.com>
@esmeetu esmeetu requested a review from njhill as a code owner May 14, 2026 13:49

@claude claude Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.

Tip: disable this comment in your organization's Code Review settings.

@mergify mergify Bot added v1 bug Something isn't working labels May 14, 2026

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the load_model method in GPUModelRunner to conditionally resolve MoE models only when EPLB is enabled. It also introduces error handling for NotImplementedError when calling get_language_model() on multi-modal models, which prevents potential crashes for models like NemotronParse that do not expose a language module. I have no feedback to provide as there were no review comments.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c323462725

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +4995 to +4998
try:
moe_candidate = moe_candidate.get_language_model()
except NotImplementedError:
moe_candidate = None

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Disable EPLB after MoE unwrap failure

Catching NotImplementedError here sets moe_candidate to None and continues loading, but EPLB remains enabled. In the V1 runner, execute_model always calls eplb_step(), and eplb_step asserts self._moe_model is not None whenever enable_eplb is true, so this path deterministically crashes on the first step for multimodal models whose get_language_model() cannot resolve. Before this change, that configuration failed fast during load; now it fails later with an assertion, so the intended “skip cleanly” behavior is not achieved.

Useful? React with 👍 / 👎.

@esmeetu esmeetu added the ready ONLY add when PR is ready to merge/full CI is needed label May 14, 2026
@esmeetu esmeetu closed this May 14, 2026
@esmeetu esmeetu deleted the fix-eplb-vlm-nemotron branch May 25, 2026 05:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working ready ONLY add when PR is ready to merge/full CI is needed v1

1 participant