[Bugfix][Qwen3-VL] Fix multi-video crash with list-valued fps/num_frames#46305
Merged
Conversation
A Qwen3-VL request with more than one video that passes per-video mm_processor_kwargs as a list (one value per video, e.g. fps=[2.0, 4.0] or num_frames=[8, 16]) crashes during preprocessing. Qwen3VLMultiModalProcessor._call_hf_processor processes videos in a per-item loop but copies the full mm_kwargs to every video without slicing the list-valued per-video kwargs, so _get_video_second_idx receives the whole list where a scalar is expected: - num_frames=[8, 16] -> TypeError: '>' not supported between 'int' and 'list' - fps=[2.0, 4.0] -> TypeError: can't multiply sequence by non-int of type 'float' List-valued per-video fps is an intended representation; _get_prompt_updates already slices it with is_list_of(sampled_fps, float). The per-video processing path simply missed the same slicing. There is a second leak: after the video loop, the text/image processor call also receives the unsliced mm_kwargs. fps/num_frames are video-only kwargs already consumed by the loop, so forwarding the list there fails with ValueError: Failed to apply Qwen3VLProcessor. Slice list-valued fps/num_frames by item index in the per-video loop (mirroring _get_prompt_updates) and drop these video-only kwargs from the final text/image processor call. The scalar path is unchanged. Signed-off-by: Ting Sun <suntcrick@gmail.com>
ywang96
reviewed
Jun 21, 2026
ywang96
left a comment
Member
There was a problem hiding this comment.
Thanks for the fix! I left a nit
Comment on lines
+1372
to
+1376
| # fps/num_frames are video-only kwargs already consumed by the loop; | ||
| # exclude them so the text/image processor call below never gets a list. | ||
| text_mm_kwargs = { | ||
| k: v for k, v in mm_kwargs.items() if k not in ("fps", "num_frames") | ||
| } |
Member
There was a problem hiding this comment.
Nit: rename this to non_video_mm_kwargs for clarity.
Signed-off-by: Ting Sun <suntcrick@gmail.com>
ywang96
approved these changes
Jun 21, 2026
tunglinwood
pushed a commit
to tunglinwood/vllm
that referenced
this pull request
Jun 22, 2026
…mes (vllm-project#46305) Signed-off-by: Ting Sun <suntcrick@gmail.com>
nkzhenhua
pushed a commit
to nkzhenhua/vllm
that referenced
this pull request
Jun 24, 2026
…mes (vllm-project#46305) Signed-off-by: Ting Sun <suntcrick@gmail.com>
qli88
pushed a commit
to qli88/vllm
that referenced
this pull request
Jun 26, 2026
…mes (vllm-project#46305) Signed-off-by: Ting Sun <suntcrick@gmail.com> Signed-off-by: Qiang Li <qiang.li2@amd.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Purpose
A Qwen3-VL request that carries more than one video and passes per-video
mm_processor_kwargsas a list (one value per video, e.g.fps=[2.0, 4.0]ornum_frames=[8, 16]) crashes during preprocessing, before the request reaches inference.Qwen3VLMultiModalProcessor._call_hf_processor()processes videos in a per-item loop, but it copies the fullmm_kwargsto every video without slicing the list-valued per-video kwargs._get_video_second_idx()then receives the whole list where it expects a scalar:num_frames=[8, 16]raisesTypeError: '>' not supported between instances of 'int' and 'list'fps=[2.0, 4.0]raisesTypeError: can't multiply sequence by non-int of type 'float'List-valued per-video
fpsis an intended representation:_get_prompt_updates.get_video_replacement_qwen3vlalready slices it withis_list_of(sampled_fps, float)/sampled_fps[item_idx]. The later per-video processing path simply missed the same slicing.There is a second leak in the same method: after the video loop, the text/image HF processor call also receives the unsliced
mm_kwargs.fps/num_framesare video-only kwargs already consumed by the loop, so forwarding the list there fails again withValueError: Failed to apply Qwen3VLProcessor.This PR slices list-valued
fps/num_framesby item index inside the per-video loop (mirroring_get_prompt_updates), and drops these video-only kwargs from the final text/image processor call. The scalar path is unchanged.This is not covered by #36136 (single-video scalar
num_framestimestamp) or #37439 (merge_sizetimestamp fix).Test Plan
Processor-level regression added to
tests/models/multimodal/processing/test_qwen3_vl.py: a two-video request with list-valuednum_frames=[8, 16]andfps=[2.0, 4.0], asserting two video placeholders are produced.End-to-end repro on a real
LLM.generatewithQwen3-VL-4B-Instruct, two videos, four cases: scalarnum_frames/fpsas negative controls and listnum_frames/fpsas the failing cases.Test Result
Processor tests (current main + fix), all pass; without the fix the two new list-kwargs cases fail:
End-to-end repro (hardware, environment, script, before/after output)
Hardware: 1x RTX 4090. Model:
Qwen3-VL-4B-Instruct,enforce_eager=True,max_model_len=2048.Before (current main):
After (with fix):
AI assistance was used to investigate, reproduce, and draft this change; the author reviewed the diff and validation output.
cc @DarkLight1337