[Bugfix][Qwen3-VL] Fix multi-video crash with list-valued fps/num_frames by Sunt-ing · Pull Request #46305 · vllm-project/vllm

Sunt-ing · 2026-06-21T21:14:17Z

Purpose

A Qwen3-VL request that carries more than one video and passes per-video mm_processor_kwargs as a list (one value per video, e.g. fps=[2.0, 4.0] or num_frames=[8, 16]) crashes during preprocessing, before the request reaches inference.

Qwen3VLMultiModalProcessor._call_hf_processor() processes videos in a per-item loop, but it copies the full mm_kwargs to every video without slicing the list-valued per-video kwargs. _get_video_second_idx() then receives the whole list where it expects a scalar:

num_frames=[8, 16] raises TypeError: '>' not supported between instances of 'int' and 'list'
fps=[2.0, 4.0] raises TypeError: can't multiply sequence by non-int of type 'float'

List-valued per-video fps is an intended representation: _get_prompt_updates.get_video_replacement_qwen3vl already slices it with is_list_of(sampled_fps, float) / sampled_fps[item_idx]. The later per-video processing path simply missed the same slicing.

There is a second leak in the same method: after the video loop, the text/image HF processor call also receives the unsliced mm_kwargs. fps/num_frames are video-only kwargs already consumed by the loop, so forwarding the list there fails again with ValueError: Failed to apply Qwen3VLProcessor.

This PR slices list-valued fps/num_frames by item index inside the per-video loop (mirroring _get_prompt_updates), and drops these video-only kwargs from the final text/image processor call. The scalar path is unchanged.

This is not covered by #36136 (single-video scalar num_frames timestamp) or #37439 (merge_size timestamp fix).

Test Plan

Processor-level regression added to tests/models/multimodal/processing/test_qwen3_vl.py: a two-video request with list-valued num_frames=[8, 16] and fps=[2.0, 4.0], asserting two video placeholders are produced.

pytest tests/models/multimodal/processing/test_qwen3_vl.py -q

End-to-end repro on a real LLM.generate with Qwen3-VL-4B-Instruct, two videos, four cases: scalar num_frames/fps as negative controls and list num_frames/fps as the failing cases.

Test Result

Processor tests (current main + fix), all pass; without the fix the two new list-kwargs cases fail:

PASSED test_processor_num_frames_timestamp[8-...]
PASSED test_processor_num_frames_timestamp[16-...]
PASSED test_processor_multi_video[2-...]
PASSED test_processor_multi_video[4-...]
PASSED test_processor_multi_video_list_kwargs[hf_mm_kwargs0-...]   # num_frames=[8,16]
PASSED test_processor_multi_video_list_kwargs[hf_mm_kwargs1-...]   # fps=[2.0,4.0]
6 passed

# baseline (fix reverted), same two new cases:
FAILED test_processor_multi_video_list_kwargs[hf_mm_kwargs0-...]
FAILED test_processor_multi_video_list_kwargs[hf_mm_kwargs1-...]
2 failed

End-to-end repro (hardware, environment, script, before/after output)

Hardware: 1x RTX 4090. Model: Qwen3-VL-4B-Instruct, enforce_eager=True, max_model_len=2048.

import numpy as np
from vllm import LLM, SamplingParams


def video(num_frames, fps=30.0):
    arr = np.zeros((num_frames, 128, 128, 3), dtype=np.uint8)
    metadata = {
        "fps": fps,
        "duration": num_frames / fps,
        "total_num_frames": num_frames,
        "frames_indices": list(range(num_frames)),
        "video_backend": "opencv",
        "do_sample_frames": True,
    }
    return arr, metadata


prompt = {
    "prompt": (
        "<|vision_start|><|video_pad|><|vision_end|>"
        "<|vision_start|><|video_pad|><|vision_end|>"
        "Describe the two videos briefly."
    ),
    "multi_modal_data": {"video": [video(16), video(32)]},
}

llm = LLM(
    model="Qwen/Qwen3-VL-4B-Instruct",
    max_model_len=2048,
    max_num_seqs=1,
    enforce_eager=True,
    limit_mm_per_prompt={"image": 0, "video": 2},
)
sp = SamplingParams(temperature=0.0, max_tokens=2)

for name, kwargs in [
    ("scalar_num_frames", {"num_frames": 8}),
    ("list_num_frames", {"num_frames": [8, 16]}),
    ("scalar_fps", {"fps": 2.0}),
    ("list_fps", {"fps": [2.0, 4.0]}),
]:
    try:
        llm.generate(prompt, sampling_params=sp, mm_processor_kwargs=kwargs)
        print(f"CASE {name} OK")
    except Exception as exc:
        print(f"CASE {name} FAIL {type(exc).__name__}: {exc}")

Before (current main):

CASE scalar_num_frames OK
CASE list_num_frames FAIL TypeError: '>' not supported between instances of 'int' and 'list'
CASE scalar_fps OK
CASE list_fps FAIL TypeError: can't multiply sequence by non-int of type 'float'

After (with fix):

CASE scalar_num_frames OK
CASE list_num_frames OK
CASE scalar_fps OK
CASE list_fps OK

AI assistance was used to investigate, reproduce, and draft this change; the author reviewed the diff and validation output.

cc @DarkLight1337

A Qwen3-VL request with more than one video that passes per-video mm_processor_kwargs as a list (one value per video, e.g. fps=[2.0, 4.0] or num_frames=[8, 16]) crashes during preprocessing. Qwen3VLMultiModalProcessor._call_hf_processor processes videos in a per-item loop but copies the full mm_kwargs to every video without slicing the list-valued per-video kwargs, so _get_video_second_idx receives the whole list where a scalar is expected: - num_frames=[8, 16] -> TypeError: '>' not supported between 'int' and 'list' - fps=[2.0, 4.0] -> TypeError: can't multiply sequence by non-int of type 'float' List-valued per-video fps is an intended representation; _get_prompt_updates already slices it with is_list_of(sampled_fps, float). The per-video processing path simply missed the same slicing. There is a second leak: after the video loop, the text/image processor call also receives the unsliced mm_kwargs. fps/num_frames are video-only kwargs already consumed by the loop, so forwarding the list there fails with ValueError: Failed to apply Qwen3VLProcessor. Slice list-valued fps/num_frames by item index in the per-video loop (mirroring _get_prompt_updates) and drop these video-only kwargs from the final text/image processor call. The scalar path is unchanged. Signed-off-by: Ting Sun <suntcrick@gmail.com>

ywang96

Thanks for the fix! I left a nit

ywang96 · 2026-06-21T21:20:35Z

+        # fps/num_frames are video-only kwargs already consumed by the loop;
+        # exclude them so the text/image processor call below never gets a list.
+        text_mm_kwargs = {
+            k: v for k, v in mm_kwargs.items() if k not in ("fps", "num_frames")
+        }


Nit: rename this to non_video_mm_kwargs for clarity.

Done, thanks~

Signed-off-by: Ting Sun <suntcrick@gmail.com>

…mes (vllm-project#46305) Signed-off-by: Ting Sun <suntcrick@gmail.com>

…mes (vllm-project#46305) Signed-off-by: Ting Sun <suntcrick@gmail.com> Signed-off-by: Qiang Li <qiang.li2@amd.com>

Sunt-ing requested review from AndreasKaratzas, DarkLight1337, sighingnow, vadiklyutiy and ywang96 as code owners June 21, 2026 21:14

mergify Bot added multi-modality Related to multi-modality (#4194) qwen Related to Qwen models bug Something isn't working labels Jun 21, 2026

ywang96 assigned Isotr0py Jun 21, 2026

ywang96 reviewed Jun 21, 2026

View reviewed changes

Rename text_mm_kwargs to non_video_mm_kwargs

4f0d04a

Signed-off-by: Ting Sun <suntcrick@gmail.com>

ywang96 added the ready ONLY add when PR is ready to merge/full CI is needed label Jun 21, 2026

ywang96 approved these changes Jun 21, 2026

View reviewed changes

ywang96 merged commit 12fe2a9 into vllm-project:main Jun 21, 2026
6 of 7 checks passed

tunglinwood pushed a commit to tunglinwood/vllm that referenced this pull request Jun 22, 2026

[Bugfix][Qwen3-VL] Fix multi-video crash with list-valued fps/num_fra…

f7baf4b

…mes (vllm-project#46305) Signed-off-by: Ting Sun <suntcrick@gmail.com>

nkzhenhua pushed a commit to nkzhenhua/vllm that referenced this pull request Jun 24, 2026

[Bugfix][Qwen3-VL] Fix multi-video crash with list-valued fps/num_fra…

a4b1c24

…mes (vllm-project#46305) Signed-off-by: Ting Sun <suntcrick@gmail.com>

qli88 pushed a commit to qli88/vllm that referenced this pull request Jun 26, 2026

[Bugfix][Qwen3-VL] Fix multi-video crash with list-valued fps/num_fra…

8a1e00c

…mes (vllm-project#46305) Signed-off-by: Ting Sun <suntcrick@gmail.com> Signed-off-by: Qiang Li <qiang.li2@amd.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bugfix][Qwen3-VL] Fix multi-video crash with list-valued fps/num_frames#46305

[Bugfix][Qwen3-VL] Fix multi-video crash with list-valued fps/num_frames#46305
ywang96 merged 2 commits into
vllm-project:mainfrom
Sunt-ing:mm-2

Sunt-ing commented Jun 21, 2026

ywang96 left a comment

ywang96 Jun 21, 2026

Sunt-ing Jun 21, 2026

Uh oh!

Labels

3 participants

Uh oh!

Uh oh!

Conversation

Sunt-ing commented Jun 21, 2026

Purpose

Test Plan

Test Result

ywang96 left a comment

Choose a reason for hiding this comment

ywang96 Jun 21, 2026

Choose a reason for hiding this comment

Sunt-ing Jun 21, 2026

Choose a reason for hiding this comment

Uh oh!

Labels

3 participants