fix: pad dummy run query_start_loc by UranusSeven · Pull Request #44603 · vllm-project/vllm

UranusSeven · 2026-06-05T02:41:45Z

Hi from novita.ai team 👋

Purpose

The decoder instance crashes when running GLM-5.1-FP8 in disaggregation way. Cuda coredump shows:

/pytorch/aten/src/ATen/native/cuda/Repeat.cu:33: compute_cuda_kernel: block: [0,0,0], thread: [49,0,0] Assertion `repeat >= 0` failed.
/pytorch/aten/src/ATen/native/cuda/Repeat.cu:33: compute_cuda_kernel: block: [0,0,0], thread: [50,0,0/pytorch/aten/src/ATen/native/cuda/Repeat.cu] Assertion `repeat >= 0` failed.

By adding log before every torch.repeat_interleave call, I got:

WARNING 06-04 08:36:48 [logger.py:276] repeat_interleave[mla.indexer.expanded_offsets]: repeats={'shape': (2,), 'dtype': 'torch.int32', 'device': 'cuda:2', 'numel': 2, 'min': 0, 'max': 3, 'head': [3, 0]} negatives=[] input={'shape': (2,), 'dtype': 'torch.int32', 'device': 'cuda:2', 'numel': 2, 'min': -3, 'max': 14622, 'head': [14622, -3]} output_size=3 dim=None extra={'seq_lens': [14625, 0], 'decode_lens': [3, 0], 'decode_lens_cpu': [3, 0], 'query_start_loc': [0, 3], 'num_decodes': 2, 'num_decode_tokens': 6, 'max_decode_len': 3, 'min_decode_len': 0}
/pytorch/aten/src/ATen/native/cuda/Repeat.cu:33: compute_cuda_kernel: block: [0,0,0], thread: [32,0,0] Assertion `repeat >= 0` failed.
/pytorch/aten/src/ATen/native/cuda/Repeat.cu:33: compute_cuda_kernel: block: [0,0,0], thread: [33,0,0] Assertion `repeat >= 0` failed.

The root cause it that with dummy run, query_start_loc is not a monotonic sequence.

Test Plan

Run same requests with the fixed decoder.

Test Result

The decoder worked fine for several hours.

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

Signed-off-by: UranusSeven <109661872+UranusSeven@users.noreply.github.com>

claude

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

WoosukKwon

LGTM. Thanks for the fix!

Signed-off-by: UranusSeven <109661872+UranusSeven@users.noreply.github.com> Signed-off-by: JisoLya <523420504@qq.com>

Signed-off-by: UranusSeven <109661872+UranusSeven@users.noreply.github.com>

Signed-off-by: UranusSeven <109661872+UranusSeven@users.noreply.github.com> Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>

Signed-off-by: UranusSeven <109661872+UranusSeven@users.noreply.github.com> Signed-off-by: Waqar Ahmed <waqar.ahmed@amd.com>

Signed-off-by: UranusSeven <109661872+UranusSeven@users.noreply.github.com>

Signed-off-by: UranusSeven <109661872+UranusSeven@users.noreply.github.com> Signed-off-by: divineearthly <divineearthly@gmail.com>

Signed-off-by: UranusSeven <109661872+UranusSeven@users.noreply.github.com>

Cherry-pick 62 bugfix/security PRs from upstream vllm-project/vllm main (2026-05-03 to 2026-06-17), covering scheduler, engine core, model runner, worker, attention, KV cache, compilation, and structured output fixes. Security (4): vllm-project#43286 vllm-project#44744 vllm-project#45118 vllm-project#45252 Bugfix (56): vllm-project#35536 vllm-project#36616 vllm-project#38895 vllm-project#39155 vllm-project#39324 vllm-project#39562 vllm-project#39805 vllm-project#40398 vllm-project#40726 vllm-project#40727 vllm-project#40737 vllm-project#40749 vllm-project#40961 vllm-project#41119 vllm-project#41133 vllm-project#41233 vllm-project#41237 vllm-project#41411 vllm-project#41496 vllm-project#41549 vllm-project#41674 vllm-project#41873 vllm-project#41895 vllm-project#42040 vllm-project#42112 vllm-project#42289 vllm-project#42479 vllm-project#42585 vllm-project#42692 vllm-project#42706 vllm-project#42709 vllm-project#42739 vllm-project#42967 vllm-project#43001 vllm-project#43079 vllm-project#43125 vllm-project#43160 vllm-project#43616 vllm-project#43669 vllm-project#43719 vllm-project#43768 vllm-project#43808 vllm-project#43961 vllm-project#43982 vllm-project#43988 vllm-project#43998 vllm-project#44057 vllm-project#44560 vllm-project#44574 vllm-project#44568 vllm-project#44603 vllm-project#44744 vllm-project#45195 vllm-project#45345 vllm-project#45383 vllm-project#45487 vllm-project#45564 vllm-project#45673 Runner fix (2): vllm-project#44568 vllm-project#44603 Skipped: vllm-project#43781 (ROCm-specific, not applicable to Ascend NPU) Conflict resolutions: - Manual merge: vllm-project#43286 vllm-project#45118 vllm-project#42112 vllm-project#43160 vllm-project#43719 vllm-project#44560 - Upstream-preferred (-X theirs): vllm-project#43808 vllm-project#43988 vllm-project#42967 vllm-project#35536 vllm-project#45195 - Test files (--theirs): vllm-project#44744 vllm-project#41895 vllm-project#42040 vllm-project#41233 vllm-project#45345 vllm-project#43982 Co-authored-by: GitHub Copilot Signed-off-by: MingqiWang-coder <mingqiwang@hust.edu.cn>

Cherry-pick 62 bugfix/security PRs from upstream vllm-project/vllm main (2026-05-03 to 2026-06-17), covering scheduler, engine core, model runner, worker, attention, KV cache, compilation, and structured output fixes. Security (4): vllm-project#43286 vllm-project#44744 vllm-project#45118 vllm-project#45252 Bugfix (56): vllm-project#35536 vllm-project#36616 vllm-project#38895 vllm-project#39155 vllm-project#39324 vllm-project#39562 vllm-project#39805 vllm-project#40398 vllm-project#40726 Runner fix (2): vllm-project#44568 vllm-project#44603 Skipped: vllm-project#43781 (ROCm-specific, not applicable to Ascend NPU) Conflict resolutions: - Manual merge: vllm-project#43286 vllm-project#45118 vllm-project#42112 vllm-project#43160 vllm-project#43719 vllm-project#44560 - Upstream-preferred (-X theirs): vllm-project#43808 vllm-project#43988 vllm-project#42967 vllm-project#35536 vllm-project#45195 - Test files (--theirs): vllm-project#44744 vllm-project#41895 vllm-project#42040 vllm-project#41233 vllm-project#45345 vllm-project#43982 Co-authored-by: GitHub Copilot Signed-off-by: MingqiWang-coder <mingqiwang@hust.edu.cn>

Signed-off-by: UranusSeven <109661872+UranusSeven@users.noreply.github.com>

fix: pad dummy run query_start_loc

8efc473

Signed-off-by: UranusSeven <109661872+UranusSeven@users.noreply.github.com>

UranusSeven requested a review from njhill as a code owner June 5, 2026 02:41

claude Bot reviewed Jun 5, 2026

View reviewed changes

mergify Bot added the v1 label Jun 5, 2026

Merge branch 'main' into fix_dsa_dummy_run

f8715b5

WoosukKwon added the ready ONLY add when PR is ready to merge/full CI is needed label Jun 5, 2026

WoosukKwon approved these changes Jun 5, 2026

View reviewed changes

WoosukKwon enabled auto-merge (squash) June 5, 2026 03:33

ywang96 disabled auto-merge June 5, 2026 07:42

ywang96 merged commit d2f70da into vllm-project:main Jun 5, 2026
56 of 58 checks passed

JisoLya pushed a commit to JisoLya/vllm that referenced this pull request Jun 5, 2026

fix: pad dummy run query_start_loc (vllm-project#44603)

a8ea3f7

Signed-off-by: UranusSeven <109661872+UranusSeven@users.noreply.github.com> Signed-off-by: JisoLya <523420504@qq.com>

knight0528 pushed a commit to knight0528/vllm that referenced this pull request Jun 8, 2026

fix: pad dummy run query_start_loc (vllm-project#44603)

7156a4b

Signed-off-by: UranusSeven <109661872+UranusSeven@users.noreply.github.com>

waqahmed-amd-fi pushed a commit to waqahmed-amd-fi/vllm that referenced this pull request Jun 10, 2026

fix: pad dummy run query_start_loc (vllm-project#44603)

ee35864

Signed-off-by: UranusSeven <109661872+UranusSeven@users.noreply.github.com> Signed-off-by: Waqar Ahmed <waqar.ahmed@amd.com>

Saddss pushed a commit to Saddss/vllm that referenced this pull request Jun 14, 2026

fix: pad dummy run query_start_loc (vllm-project#44603)

9a83a5d

Signed-off-by: UranusSeven <109661872+UranusSeven@users.noreply.github.com>

divineearthly pushed a commit to divineearthly/vllm that referenced this pull request Jun 19, 2026

fix: pad dummy run query_start_loc (vllm-project#44603)

9ad07c5

Signed-off-by: UranusSeven <109661872+UranusSeven@users.noreply.github.com> Signed-off-by: divineearthly <divineearthly@gmail.com>

tunglinwood pushed a commit to tunglinwood/vllm that referenced this pull request Jun 22, 2026

fix: pad dummy run query_start_loc (vllm-project#44603)

903514d

Signed-off-by: UranusSeven <109661872+UranusSeven@users.noreply.github.com>

nkzhenhua pushed a commit to nkzhenhua/vllm that referenced this pull request Jun 24, 2026

fix: pad dummy run query_start_loc (vllm-project#44603)

7dc15e8

Signed-off-by: UranusSeven <109661872+UranusSeven@users.noreply.github.com>

MingqiWang-coder mentioned this pull request Jul 1, 2026

[Sync] Upstream V1 engine core — 89 PRs (bugfix, scheduler, runner, worker, hardware) vLLM-HUST/vllm-hust#82

Open

ohsono pushed a commit to ohsono/vllm that referenced this pull request Jul 3, 2026

fix: pad dummy run query_start_loc (vllm-project#44603)

5ac02ec

Signed-off-by: UranusSeven <109661872+UranusSeven@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

fix: pad dummy run query_start_loc#44603

fix: pad dummy run query_start_loc#44603
ywang96 merged 2 commits into
vllm-project:mainfrom
UranusSeven:fix_dsa_dummy_run

UranusSeven commented Jun 5, 2026 •

edited

Loading

claude Bot left a comment

WoosukKwon left a comment

Uh oh!

Labels

3 participants

Uh oh!

Uh oh!

Conversation

UranusSeven commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

claude Bot left a comment

Choose a reason for hiding this comment

Claude Code Review

WoosukKwon left a comment

Choose a reason for hiding this comment

Uh oh!

Labels

3 participants

UranusSeven commented Jun 5, 2026 •

edited

Loading