[Bugfix][Model Runner V2] Fix min_tokens off-by-one in the V2 GPU sampler by Sunt-ing · Pull Request #46243 · vllm-project/vllm

Sunt-ing · 2026-06-20T17:57:07Z

Purpose

min_tokens=N should let EOS through at output index N (the N+1-th token), as the V1 MinTokensLogitsProcessor does. The V2 GPU sampler releases it one step late, so min_tokens=N silently forces N+1 non-EOS tokens. This is the default path for mainstream archs (Llama, Qwen3, Mistral, ...).

The kernel in vllm/v1/worker/gpu/sample/logit_bias.py suppresses stop tokens while pos < min_len, but pos is the last token's position (current length minus one), so it stops one step late. Compare the current length instead:

if num_stop_token_ids > 0 and pos + 1 < min_len:

min_tokens=0 is untouched (already guarded by num_stop_token_ids > 0).

Test Plan

Force EOS via logit_bias so it is selected the instant it is unblocked, then compare the generated length against V1.

from vllm import LLM, SamplingParams

llm = LLM("Qwen/Qwen3-0.6B", enforce_eager=True)
eos = llm.get_tokenizer().eos_token_id
out = llm.generate(
    "Hello",
    SamplingParams(temperature=0, min_tokens=4, max_tokens=32, logit_bias={eos: 100.0}),
)
print(len(out[0].outputs[0].token_ids))  # main: 6 (min_tokens + 2); fixed: 5 (min_tokens + 1)

Test Result

RTX 4090, Qwen/Qwen3-0.6B, forced EOS, generated length per min_tokens:

min_tokens	V1 reference	V2 + fix	V2 (main)
0	1	1	1
1	2	2	3
2	3	3	4
4	5	5	6

V2 with the fix matches V1 exactly; main is one token long for every min_tokens >= 1.

AI assistance was used to investigate, reproduce, and draft this change; the author reviewed the diff and validation output.

…pler The V2 GPU sampler suppressed stop tokens while pos < min_len, where pos is the position of the last existing token (current length minus one), so EOS was released at output index min_tokens + 1 instead of min_tokens. Compare the current length (pos + 1) against min_len so EOS becomes selectable at exactly min_tokens, matching the V1 MinTokensLogitsProcessor. Signed-off-by: Ting Sun <suntcrick@gmail.com>

Sunt-ing · 2026-06-20T17:58:08Z

Hi @yewentao256, PTAL. No UT added :-)

njhill

Thanks @Sunt-ing, good catch

…pler (vllm-project#46243) Signed-off-by: Ting Sun <suntcrick@gmail.com>

…pler (vllm-project#46243) Signed-off-by: Ting Sun <suntcrick@gmail.com> Signed-off-by: Qiang Li <qiang.li2@amd.com>

Sunt-ing requested review from WoosukKwon, njhill and yewentao256 as code owners June 20, 2026 17:57

mergify Bot added v1 bug Something isn't working labels Jun 20, 2026

njhill approved these changes Jun 20, 2026

View reviewed changes

njhill added the ready ONLY add when PR is ready to merge/full CI is needed label Jun 20, 2026

njhill enabled auto-merge (squash) June 20, 2026 19:12

Merge branch 'main' into samp-2

b0186bf

njhill merged commit 183a430 into vllm-project:main Jun 21, 2026
79 checks passed

tunglinwood pushed a commit to tunglinwood/vllm that referenced this pull request Jun 22, 2026

[Bugfix][Model Runner V2] Fix min_tokens off-by-one in the V2 GPU sam…

c9cbf9a

…pler (vllm-project#46243) Signed-off-by: Ting Sun <suntcrick@gmail.com>

nkzhenhua pushed a commit to nkzhenhua/vllm that referenced this pull request Jun 24, 2026

[Bugfix][Model Runner V2] Fix min_tokens off-by-one in the V2 GPU sam…

f7b9037

…pler (vllm-project#46243) Signed-off-by: Ting Sun <suntcrick@gmail.com>

qli88 pushed a commit to qli88/vllm that referenced this pull request Jun 26, 2026

[Bugfix][Model Runner V2] Fix min_tokens off-by-one in the V2 GPU sam…

a3962a9

…pler (vllm-project#46243) Signed-off-by: Ting Sun <suntcrick@gmail.com> Signed-off-by: Qiang Li <qiang.li2@amd.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bugfix][Model Runner V2] Fix min_tokens off-by-one in the V2 GPU sampler#46243

[Bugfix][Model Runner V2] Fix min_tokens off-by-one in the V2 GPU sampler#46243
njhill merged 2 commits into
vllm-project:mainfrom
Sunt-ing:samp-2

Sunt-ing commented Jun 20, 2026

Sunt-ing commented Jun 20, 2026

njhill left a comment

Uh oh!

Labels

2 participants

Uh oh!

Uh oh!

Conversation

Sunt-ing commented Jun 20, 2026

Purpose

Test Plan

Test Result

Sunt-ing commented Jun 20, 2026

njhill left a comment

Choose a reason for hiding this comment

Uh oh!

Labels

2 participants