[Model Runner V2] Fix v2 `AttributeError: 'CohereASRDecoder' object has no attribute 'embed_input_ids'` by yewentao256 · Pull Request #44568 · vllm-project/vllm

yewentao256 · 2026-06-04T19:04:18Z

Purpose

Fixing for #44443

VLLM_USE_V2_MODEL_RUNNER=1 pytest tests/models/test_initialization.py::test_can_initialize_large_subset[CohereAsrForConditionalGeneration]

Originally

(EngineCore pid=1833568)   File "/home/yewentao256/vllm-source/vllm/v1/executor/uniproc_executor.py", line 92, in collective_rpc
(EngineCore pid=1833568)     result = run_method(self.driver_worker, method, args, kwargs)
(EngineCore pid=1833568)              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=1833568)   File "/home/yewentao256/vllm-source/vllm/v1/serial_utils.py", line 510, in run_method
(EngineCore pid=1833568)     return func(*args, **kwargs)
(EngineCore pid=1833568)            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=1833568)   File "/home/yewentao256/.venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
(EngineCore pid=1833568)     return func(*args, **kwargs)
(EngineCore pid=1833568)            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=1833568)   File "/home/yewentao256/vllm-source/vllm/v1/worker/gpu_worker.py", line 400, in determine_available_memory
(EngineCore pid=1833568)     self.model_runner.profile_run()
(EngineCore pid=1833568)   File "/home/yewentao256/.venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
(EngineCore pid=1833568)     return func(*args, **kwargs)
(EngineCore pid=1833568)            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=1833568)   File "/home/yewentao256/vllm-source/vllm/v1/worker/gpu/model_runner.py", line 629, in profile_run
(EngineCore pid=1833568)     hidden_states, sample_hidden_states = self._dummy_run(
(EngineCore pid=1833568)                                           ^^^^^^^^^^^^^^^^
(EngineCore pid=1833568)   File "/home/yewentao256/.venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
(EngineCore pid=1833568)     return func(*args, **kwargs)
(EngineCore pid=1833568)            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=1833568)   File "/home/yewentao256/vllm-source/vllm/v1/worker/gpu/eplb_utils.py", line 37, in wrapper
(EngineCore pid=1833568)     result = fn(self, *args, **kwargs)
(EngineCore pid=1833568)              ^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=1833568)   File "/home/yewentao256/vllm-source/vllm/v1/worker/gpu/model_runner.py", line 539, in _dummy_run
(EngineCore pid=1833568)     self.execute_model(
(EngineCore pid=1833568)   File "/home/yewentao256/.venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
(EngineCore pid=1833568)     return func(*args, **kwargs)
(EngineCore pid=1833568)            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=1833568)   File "/home/yewentao256/vllm-source/vllm/v1/worker/gpu/model_runner.py", line 1210, in execute_model
(EngineCore pid=1833568)     inputs_embeds = self.model_state.get_mm_embeddings(
(EngineCore pid=1833568)                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=1833568)   File "/home/yewentao256/vllm-source/vllm/v1/worker/gpu/model_states/default.py", line 126, in get_mm_embeddings
(EngineCore pid=1833568)     inputs_embeds = self.encoder_runner.get_inputs_embeds(
(EngineCore pid=1833568)                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=1833568)   File "/home/yewentao256/.venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
(EngineCore pid=1833568)     return func(*args, **kwargs)
(EngineCore pid=1833568)            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=1833568)   File "/home/yewentao256/vllm-source/vllm/v1/worker/gpu/mm/encoder_runner.py", line 143, in get_inputs_embeds
(EngineCore pid=1833568)     x = self.model.embed_input_ids(
(EngineCore pid=1833568)         ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=1833568)   File "/home/yewentao256/vllm-source/vllm/model_executor/models/interfaces.py", line 398, in embed_input_ids
(EngineCore pid=1833568)     self.get_language_model().embed_input_ids,
(EngineCore pid=1833568)     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=1833568)   File "/home/yewentao256/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1968, in __getattr__
(EngineCore pid=1833568)     raise AttributeError(
(EngineCore pid=1833568) AttributeError: 'CohereASRDecoder' object has no attribute 'embed_input_ids'

Now

====================== 1 passed, 17 warnings in 72.84s (0:01:12) ======================

Signed-off-by: yewentao256 <zhyanwentao@126.com>

claude

Claude Code Review

This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.

_{Tip: disable this comment in your organization's Code Review settings.}

njhill

Thanks @yewentao256

njhill · 2026-06-04T19:37:32Z

+        "WhisperForConditionalGeneration" in vllm_config.model_config.architectures
+        or "CohereAsrForConditionalGeneration" in vllm_config.model_config.architectures


Are there any other models in this category that we should add?

Do we any other way to determine this from the model and/or config?

Seems not, took a look and vllm/model_executor/models/funasr.py vllm/model_executor/models/fireredasr2.py vllm/model_executor/models/fireredlid.py they all have embed_input_ids(), I don't find other models missing

If we check using supports_transcription_only instead, may influence other models, so I think current fix is minimal and precise.

…as no attribute 'embed_input_ids'` (vllm-project#44568) Signed-off-by: yewentao256 <zhyanwentao@126.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

…as no attribute 'embed_input_ids'` (vllm-project#44568) Signed-off-by: yewentao256 <zhyanwentao@126.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Signed-off-by: divineearthly <divineearthly@gmail.com>

…as no attribute 'embed_input_ids'` (vllm-project#44568) Signed-off-by: yewentao256 <zhyanwentao@126.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

Cherry-pick 62 bugfix/security PRs from upstream vllm-project/vllm main (2026-05-03 to 2026-06-17), covering scheduler, engine core, model runner, worker, attention, KV cache, compilation, and structured output fixes. Security (4): vllm-project#43286 vllm-project#44744 vllm-project#45118 vllm-project#45252 Bugfix (56): vllm-project#35536 vllm-project#36616 vllm-project#38895 vllm-project#39155 vllm-project#39324 vllm-project#39562 vllm-project#39805 vllm-project#40398 vllm-project#40726 vllm-project#40727 vllm-project#40737 vllm-project#40749 vllm-project#40961 vllm-project#41119 vllm-project#41133 vllm-project#41233 vllm-project#41237 vllm-project#41411 vllm-project#41496 vllm-project#41549 vllm-project#41674 vllm-project#41873 vllm-project#41895 vllm-project#42040 vllm-project#42112 vllm-project#42289 vllm-project#42479 vllm-project#42585 vllm-project#42692 vllm-project#42706 vllm-project#42709 vllm-project#42739 vllm-project#42967 vllm-project#43001 vllm-project#43079 vllm-project#43125 vllm-project#43160 vllm-project#43616 vllm-project#43669 vllm-project#43719 vllm-project#43768 vllm-project#43808 vllm-project#43961 vllm-project#43982 vllm-project#43988 vllm-project#43998 vllm-project#44057 vllm-project#44560 vllm-project#44574 vllm-project#44568 vllm-project#44603 vllm-project#44744 vllm-project#45195 vllm-project#45345 vllm-project#45383 vllm-project#45487 vllm-project#45564 vllm-project#45673 Runner fix (2): vllm-project#44568 vllm-project#44603 Skipped: vllm-project#43781 (ROCm-specific, not applicable to Ascend NPU) Conflict resolutions: - Manual merge: vllm-project#43286 vllm-project#45118 vllm-project#42112 vllm-project#43160 vllm-project#43719 vllm-project#44560 - Upstream-preferred (-X theirs): vllm-project#43808 vllm-project#43988 vllm-project#42967 vllm-project#35536 vllm-project#45195 - Test files (--theirs): vllm-project#44744 vllm-project#41895 vllm-project#42040 vllm-project#41233 vllm-project#45345 vllm-project#43982 Co-authored-by: GitHub Copilot Signed-off-by: MingqiWang-coder <mingqiwang@hust.edu.cn>

Cherry-pick 62 bugfix/security PRs from upstream vllm-project/vllm main (2026-05-03 to 2026-06-17), covering scheduler, engine core, model runner, worker, attention, KV cache, compilation, and structured output fixes. Security (4): vllm-project#43286 vllm-project#44744 vllm-project#45118 vllm-project#45252 Bugfix (56): vllm-project#35536 vllm-project#36616 vllm-project#38895 vllm-project#39155 vllm-project#39324 vllm-project#39562 vllm-project#39805 vllm-project#40398 vllm-project#40726 Runner fix (2): vllm-project#44568 vllm-project#44603 Skipped: vllm-project#43781 (ROCm-specific, not applicable to Ascend NPU) Conflict resolutions: - Manual merge: vllm-project#43286 vllm-project#45118 vllm-project#42112 vllm-project#43160 vllm-project#43719 vllm-project#44560 - Upstream-preferred (-X theirs): vllm-project#43808 vllm-project#43988 vllm-project#42967 vllm-project#35536 vllm-project#45195 - Test files (--theirs): vllm-project#44744 vllm-project#41895 vllm-project#42040 vllm-project#41233 vllm-project#45345 vllm-project#43982 Co-authored-by: GitHub Copilot Signed-off-by: MingqiWang-coder <mingqiwang@hust.edu.cn>

fix v2 CohereASRDecoder

9008873

Signed-off-by: yewentao256 <zhyanwentao@126.com>

yewentao256 requested review from WoosukKwon and njhill as code owners June 4, 2026 19:04

claude Bot reviewed Jun 4, 2026

View reviewed changes

yewentao256 added the ready ONLY add when PR is ready to merge/full CI is needed label Jun 4, 2026

mergify Bot added the v1 label Jun 4, 2026

yewentao256 mentioned this pull request Jun 4, 2026

[Feature]: Migration from Model Runner v1 to Model Runner v2 #41286

Open

40 tasks

njhill reviewed Jun 4, 2026

View reviewed changes

Merge branch 'main' into wentao-fix-v2-CohereASRDecoder

ef61b97

njhill approved these changes Jun 9, 2026

View reviewed changes

njhill enabled auto-merge (squash) June 9, 2026 17:03

Merge branch 'main' into wentao-fix-v2-CohereASRDecoder

e96518d

WoosukKwon disabled auto-merge June 10, 2026 18:05

WoosukKwon merged commit ffce72c into main Jun 10, 2026
75 of 78 checks passed

WoosukKwon deleted the wentao-fix-v2-CohereASRDecoder branch June 10, 2026 18:06

MingqiWang-coder mentioned this pull request Jul 1, 2026

[Sync] Upstream V1 engine core — 89 PRs (bugfix, scheduler, runner, worker, hardware) vLLM-HUST/vllm-hust#82

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Model Runner V2] Fix v2 `AttributeError: 'CohereASRDecoder' object has no attribute 'embed_input_ids'`#44568

[Model Runner V2] Fix v2 `AttributeError: 'CohereASRDecoder' object has no attribute 'embed_input_ids'`#44568
WoosukKwon merged 3 commits into
mainfrom
wentao-fix-v2-CohereASRDecoder

yewentao256 commented Jun 4, 2026

claude Bot left a comment

njhill left a comment

njhill Jun 4, 2026

yewentao256 Jun 4, 2026 •

edited

Loading

yewentao256 Jun 4, 2026

Uh oh!

Labels

3 participants

		"WhisperForConditionalGeneration" in vllm_config.model_config.architectures
		or "CohereAsrForConditionalGeneration" in vllm_config.model_config.architectures

Uh oh!

Uh oh!

Conversation

yewentao256 commented Jun 4, 2026

Purpose

claude Bot left a comment

Choose a reason for hiding this comment

Claude Code Review

njhill left a comment

Choose a reason for hiding this comment

njhill Jun 4, 2026

Choose a reason for hiding this comment

yewentao256 Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

yewentao256 Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

Labels

3 participants

yewentao256 Jun 4, 2026 •

edited

Loading