[Bugfix] Guard EPLB VLM unwrap for models without language_model#42643
[Bugfix] Guard EPLB VLM unwrap for models without language_model#42643esmeetu wants to merge 1 commit into
Conversation
Fix regression introduced in #39805 where `load_model()` unconditionally called `moe_candidate.get_language_model()` on any `SupportsMultiModal`, raising `NotImplementedError` for VLMs whose marked language module does not expose `embed_input_ids` (e.g. `NemotronParseForConditionalGeneration` with `MBartDecoderNoPos`). - Gate the MoE-unwrap on `enable_eplb` so non-EPLB loads never call `get_language_model()`. - Defensively catch `NotImplementedError` for EPLB-on cases where the VLM cannot resolve its language model. Co-authored-by: Claude <noreply@anthropic.com> Signed-off-by: esmeetu <jasonailu87@gmail.com>
There was a problem hiding this comment.
Claude Code Review
This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.
Tip: disable this comment in your organization's Code Review settings.
There was a problem hiding this comment.
Code Review
This pull request updates the load_model method in GPUModelRunner to conditionally resolve MoE models only when EPLB is enabled. It also introduces error handling for NotImplementedError when calling get_language_model() on multi-modal models, which prevents potential crashes for models like NemotronParse that do not expose a language module. I have no feedback to provide as there were no review comments.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c323462725
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| try: | ||
| moe_candidate = moe_candidate.get_language_model() | ||
| except NotImplementedError: | ||
| moe_candidate = None |
There was a problem hiding this comment.
Disable EPLB after MoE unwrap failure
Catching NotImplementedError here sets moe_candidate to None and continues loading, but EPLB remains enabled. In the V1 runner, execute_model always calls eplb_step(), and eplb_step asserts self._moe_model is not None whenever enable_eplb is true, so this path deterministically crashes on the first step for multimodal models whose get_language_model() cannot resolve. Before this change, that configuration failed fast during load; now it fails later with an assertion, so the intended “skip cleanly” behavior is not achieved.
Useful? React with 👍 / 👎.
Summary
Forward-fix for the regression reported in #42636 (open as a draft revert of #39805).
In #39805 the EPLB initialization path was extended to unwrap VLM wrappers via
moe_candidate.get_language_model()so wrapper models likeKimiK25ForConditionalGenerationwould register their inner MoE language model with EPLB. That call was made unconditionally for anySupportsMultiModalmodel, including non-EPLB loads.For
NemotronParseForConditionalGeneration, although it does call_mark_language_model(vllm/model_executor/models/nemotron_parse.py:593), its marked moduleMBartDecoderNoPosdoes not exposeembed_input_ids.SupportsMultiModal.get_language_model()therefore falls through both the marked-name lookup and the children-fallback invllm/model_executor/models/interfaces.py:176-211and raisesNotImplementedError, breaking engine-core init for any user loading NemotronParse, even without EPLB.This PR:
self.parallel_config.enable_eplb, so non-EPLB loads never callget_language_model()— restoring pre-[Bugfix] Fix EPLB initialization for VLM wrapper models #39805 behavior for all non-EPLB users.get_language_model()call intry/except NotImplementedErrorso EPLB-on configs with a VLM that cannot resolve its language model fall through cleanly instead of crashing during load.Behavior matrix:
NotImplementedError_moe_model = self.modelNotImplementedError_moe_model=None, skippedTest plan
get_language_model()call):tests/models/multimodal/generation/test_nemotron_parse.py::test_models[nvidia/NVIDIA-Nemotron-Parse-v1.2]tests/models/test_initialization.py::test_can_initialize_large_subset[NemotronParseForConditionalGeneration]ast.parsesyntax check passes; pre-commit hooks pass locally.readylabel).AI assistance disclosure
Code change drafted with assistance from Claude (Anthropic). The submitter reviewed every changed line and verified correctness against the four scenarios above.