Skip to content

minicpmv4_6: fix ImageSize (W,H) order for placeholder token calculation#45244

Merged
DarkLight1337 merged 2 commits into
vllm-project:mainfrom
tc-mb:fix/minicpmv46-image-size-order
Jun 11, 2026
Merged

minicpmv4_6: fix ImageSize (W,H) order for placeholder token calculation#45244
DarkLight1337 merged 2 commits into
vllm-project:mainfrom
tc-mb:fix/minicpmv46-image-size-order

Conversation

@tc-mb

@tc-mb tc-mb commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Summary

Fix a bug where _compute_visual_tokens passes ImageSize(width, height)
directly to transformers image processor methods (get_sliced_grid,
find_best_resize, get_refine_size) that expect (height, width) order.
For certain aspect ratios (e.g. 365x211) this causes the text-side
placeholder count to mismatch the actual vision embedding count, triggering
a ValueError crash.

  • minicpmv4_6.py: convert ImageSize to (H, W) tuple before calling
    HF image processor methods
  • minicpmv.py: remove dead version == (4, 6) branches that are fully
    overridden by MiniCPMV4_6ProcessingInfo

Why this was not caught earlier

The actual pixel processing goes through image_processor._preprocess,
which reads image size directly from tensor shape (H, W) — always correct.
The placeholder text side goes through _compute_visual_tokens, which
passes vLLM's ImageSize(W, H) to the same HF methods. Both sides call the
same find_best_resize but with different input order.

For most common sizes (e.g. 1280x720, 448x448), ensure_divide rounding
happens to produce the same result regardless of W/H ordering, making the
bug invisible. Only extreme aspect ratios like 365x211 (1.73:1) or
153x174 (0.88:1) cause the rounding to diverge, surfacing the mismatch.

Test plan

Manually verified token count for edge-case sizes before/after fix:

Size (WxH) Before (bug) After (fix)
365x211 66 60
153x174 72 63
1280x720 66 66 (unchanged)
Assisted-by: Claude
Signed-off-by: tc-mb <tianchi_cai@icloud.com>
@tc-mb

tc-mb commented Jun 11, 2026

Copy link
Copy Markdown
Contributor Author

@DarkLight1337 This is a minor fix, and it might be easy to merge in. Could you please take a look?

@DarkLight1337 DarkLight1337 enabled auto-merge (squash) June 11, 2026 11:54
@github-actions github-actions Bot added the ready ONLY add when PR is ready to merge/full CI is needed label Jun 11, 2026
@DarkLight1337 DarkLight1337 merged commit ab3a1fd into vllm-project:main Jun 11, 2026
68 checks passed
@tc-mb tc-mb deleted the fix/minicpmv46-image-size-order branch June 11, 2026 14:31
Saddss pushed a commit to Saddss/vllm that referenced this pull request Jun 14, 2026
divineearthly pushed a commit to divineearthly/vllm that referenced this pull request Jun 19, 2026
…ion (vllm-project#45244)

Signed-off-by: tc-mb <tianchi_cai@icloud.com>
Signed-off-by: divineearthly <divineearthly@gmail.com>
tunglinwood pushed a commit to tunglinwood/vllm that referenced this pull request Jun 22, 2026
nkzhenhua pushed a commit to nkzhenhua/vllm that referenced this pull request Jun 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed

2 participants