Skip to content

fix: pad dummy run query_start_loc#44603

Merged
ywang96 merged 2 commits into
vllm-project:mainfrom
UranusSeven:fix_dsa_dummy_run
Jun 5, 2026
Merged

fix: pad dummy run query_start_loc#44603
ywang96 merged 2 commits into
vllm-project:mainfrom
UranusSeven:fix_dsa_dummy_run

Conversation

@UranusSeven

@UranusSeven UranusSeven commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

Hi from novita.ai team 👋

Purpose

The decoder instance crashes when running GLM-5.1-FP8 in disaggregation way. Cuda coredump shows:

/pytorch/aten/src/ATen/native/cuda/Repeat.cu:33: compute_cuda_kernel: block: [0,0,0], thread: [49,0,0] Assertion `repeat >= 0` failed.
/pytorch/aten/src/ATen/native/cuda/Repeat.cu:33: compute_cuda_kernel: block: [0,0,0], thread: [50,0,0/pytorch/aten/src/ATen/native/cuda/Repeat.cu] Assertion `repeat >= 0` failed.

By adding log before every torch.repeat_interleave call, I got:

WARNING 06-04 08:36:48 [logger.py:276] repeat_interleave[mla.indexer.expanded_offsets]: repeats={'shape': (2,), 'dtype': 'torch.int32', 'device': 'cuda:2', 'numel': 2, 'min': 0, 'max': 3, 'head': [3, 0]} negatives=[] input={'shape': (2,), 'dtype': 'torch.int32', 'device': 'cuda:2', 'numel': 2, 'min': -3, 'max': 14622, 'head': [14622, -3]} output_size=3 dim=None extra={'seq_lens': [14625, 0], 'decode_lens': [3, 0], 'decode_lens_cpu': [3, 0], 'query_start_loc': [0, 3], 'num_decodes': 2, 'num_decode_tokens': 6, 'max_decode_len': 3, 'min_decode_len': 0}
/pytorch/aten/src/ATen/native/cuda/Repeat.cu:33: compute_cuda_kernel: block: [0,0,0], thread: [32,0,0] Assertion `repeat >= 0` failed.
/pytorch/aten/src/ATen/native/cuda/Repeat.cu:33: compute_cuda_kernel: block: [0,0,0], thread: [33,0,0] Assertion `repeat >= 0` failed.

The root cause it that with dummy run, query_start_loc is not a monotonic sequence.

Test Plan

Run same requests with the fixed decoder.

Test Result

The decoder worked fine for several hours.


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
Signed-off-by: UranusSeven <109661872+UranusSeven@users.noreply.github.com>
@UranusSeven UranusSeven requested a review from njhill as a code owner June 5, 2026 02:41

@claude claude Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

@mergify mergify Bot added the v1 label Jun 5, 2026
@WoosukKwon WoosukKwon added the ready ONLY add when PR is ready to merge/full CI is needed label Jun 5, 2026

@WoosukKwon WoosukKwon left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for the fix!

@WoosukKwon WoosukKwon enabled auto-merge (squash) June 5, 2026 03:33
@ywang96 ywang96 disabled auto-merge June 5, 2026 07:42
@ywang96 ywang96 merged commit d2f70da into vllm-project:main Jun 5, 2026
56 of 58 checks passed
JisoLya pushed a commit to JisoLya/vllm that referenced this pull request Jun 5, 2026
Signed-off-by: UranusSeven <109661872+UranusSeven@users.noreply.github.com>
Signed-off-by: JisoLya <523420504@qq.com>
knight0528 pushed a commit to knight0528/vllm that referenced this pull request Jun 8, 2026
Signed-off-by: UranusSeven <109661872+UranusSeven@users.noreply.github.com>
ekagra-ranjan pushed a commit to ekagra-ranjan/vllm that referenced this pull request Jun 9, 2026
Signed-off-by: UranusSeven <109661872+UranusSeven@users.noreply.github.com>
Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>
waqahmed-amd-fi pushed a commit to waqahmed-amd-fi/vllm that referenced this pull request Jun 10, 2026
Signed-off-by: UranusSeven <109661872+UranusSeven@users.noreply.github.com>
Signed-off-by: Waqar Ahmed <waqar.ahmed@amd.com>
Saddss pushed a commit to Saddss/vllm that referenced this pull request Jun 14, 2026
Signed-off-by: UranusSeven <109661872+UranusSeven@users.noreply.github.com>
divineearthly pushed a commit to divineearthly/vllm that referenced this pull request Jun 19, 2026
Signed-off-by: UranusSeven <109661872+UranusSeven@users.noreply.github.com>
Signed-off-by: divineearthly <divineearthly@gmail.com>
tunglinwood pushed a commit to tunglinwood/vllm that referenced this pull request Jun 22, 2026
Signed-off-by: UranusSeven <109661872+UranusSeven@users.noreply.github.com>
nkzhenhua pushed a commit to nkzhenhua/vllm that referenced this pull request Jun 24, 2026
Signed-off-by: UranusSeven <109661872+UranusSeven@users.noreply.github.com>
MingqiWang-coder added a commit to vLLM-HUST/vllm-hust that referenced this pull request Jun 30, 2026
Cherry-pick 62 bugfix/security PRs from upstream vllm-project/vllm main
(2026-05-03 to 2026-06-17), covering scheduler, engine core, model runner,
worker, attention, KV cache, compilation, and structured output fixes.

Security (4): vllm-project#43286 vllm-project#44744 vllm-project#45118 vllm-project#45252
Bugfix (56): vllm-project#35536 vllm-project#36616 vllm-project#38895 vllm-project#39155 vllm-project#39324 vllm-project#39562 vllm-project#39805 vllm-project#40398 vllm-project#40726
vllm-project#40727 vllm-project#40737 vllm-project#40749 vllm-project#40961 vllm-project#41119 vllm-project#41133 vllm-project#41233 vllm-project#41237 vllm-project#41411 vllm-project#41496 vllm-project#41549
vllm-project#41674 vllm-project#41873 vllm-project#41895 vllm-project#42040 vllm-project#42112 vllm-project#42289 vllm-project#42479 vllm-project#42585 vllm-project#42692 vllm-project#42706 vllm-project#42709
vllm-project#42739 vllm-project#42967 vllm-project#43001 vllm-project#43079 vllm-project#43125 vllm-project#43160 vllm-project#43616 vllm-project#43669 vllm-project#43719 vllm-project#43768 vllm-project#43808
vllm-project#43961 vllm-project#43982 vllm-project#43988 vllm-project#43998 vllm-project#44057 vllm-project#44560 vllm-project#44574 vllm-project#44568 vllm-project#44603 vllm-project#44744 vllm-project#45195
vllm-project#45345 vllm-project#45383 vllm-project#45487 vllm-project#45564 vllm-project#45673
Runner fix (2): vllm-project#44568 vllm-project#44603

Skipped: vllm-project#43781 (ROCm-specific, not applicable to Ascend NPU)

Conflict resolutions:
- Manual merge: vllm-project#43286 vllm-project#45118 vllm-project#42112 vllm-project#43160 vllm-project#43719 vllm-project#44560
- Upstream-preferred (-X theirs): vllm-project#43808 vllm-project#43988 vllm-project#42967 vllm-project#35536 vllm-project#45195
- Test files (--theirs): vllm-project#44744 vllm-project#41895 vllm-project#42040 vllm-project#41233 vllm-project#45345 vllm-project#43982

Co-authored-by: GitHub Copilot
Signed-off-by: MingqiWang-coder <mingqiwang@hust.edu.cn>
MingqiWang-coder added a commit to vLLM-HUST/vllm-hust that referenced this pull request Jun 30, 2026
Cherry-pick 62 bugfix/security PRs from upstream vllm-project/vllm main
(2026-05-03 to 2026-06-17), covering scheduler, engine core, model runner,
worker, attention, KV cache, compilation, and structured output fixes.

Security (4): vllm-project#43286 vllm-project#44744 vllm-project#45118 vllm-project#45252
Bugfix (56): vllm-project#35536 vllm-project#36616 vllm-project#38895 vllm-project#39155 vllm-project#39324 vllm-project#39562 vllm-project#39805 vllm-project#40398 vllm-project#40726
vllm-project#40727 vllm-project#40737 vllm-project#40749 vllm-project#40961 vllm-project#41119 vllm-project#41133 vllm-project#41233 vllm-project#41237 vllm-project#41411 vllm-project#41496 vllm-project#41549
vllm-project#41674 vllm-project#41873 vllm-project#41895 vllm-project#42040 vllm-project#42112 vllm-project#42289 vllm-project#42479 vllm-project#42585 vllm-project#42692 vllm-project#42706 vllm-project#42709
vllm-project#42739 vllm-project#42967 vllm-project#43001 vllm-project#43079 vllm-project#43125 vllm-project#43160 vllm-project#43616 vllm-project#43669 vllm-project#43719 vllm-project#43768 vllm-project#43808
vllm-project#43961 vllm-project#43982 vllm-project#43988 vllm-project#43998 vllm-project#44057 vllm-project#44560 vllm-project#44574 vllm-project#44568 vllm-project#44603 vllm-project#44744 vllm-project#45195
vllm-project#45345 vllm-project#45383 vllm-project#45487 vllm-project#45564 vllm-project#45673
Runner fix (2): vllm-project#44568 vllm-project#44603

Skipped: vllm-project#43781 (ROCm-specific, not applicable to Ascend NPU)

Conflict resolutions:
- Manual merge: vllm-project#43286 vllm-project#45118 vllm-project#42112 vllm-project#43160 vllm-project#43719 vllm-project#44560
- Upstream-preferred (-X theirs): vllm-project#43808 vllm-project#43988 vllm-project#42967 vllm-project#35536 vllm-project#45195
- Test files (--theirs): vllm-project#44744 vllm-project#41895 vllm-project#42040 vllm-project#41233 vllm-project#45345 vllm-project#43982

Co-authored-by: GitHub Copilot
Signed-off-by: MingqiWang-coder <mingqiwang@hust.edu.cn>
MingqiWang-coder added a commit to vLLM-HUST/vllm-hust that referenced this pull request Jul 2, 2026
Cherry-pick 62 bugfix/security PRs from upstream vllm-project/vllm main
(2026-05-03 to 2026-06-17), covering scheduler, engine core, model runner,
worker, attention, KV cache, compilation, and structured output fixes.

Security (4): vllm-project#43286 vllm-project#44744 vllm-project#45118 vllm-project#45252
Bugfix (56): vllm-project#35536 vllm-project#36616 vllm-project#38895 vllm-project#39155 vllm-project#39324 vllm-project#39562 vllm-project#39805 vllm-project#40398 vllm-project#40726
Runner fix (2): vllm-project#44568 vllm-project#44603

Skipped: vllm-project#43781 (ROCm-specific, not applicable to Ascend NPU)

Conflict resolutions:
- Manual merge: vllm-project#43286 vllm-project#45118 vllm-project#42112 vllm-project#43160 vllm-project#43719 vllm-project#44560
- Upstream-preferred (-X theirs): vllm-project#43808 vllm-project#43988 vllm-project#42967 vllm-project#35536 vllm-project#45195
- Test files (--theirs): vllm-project#44744 vllm-project#41895 vllm-project#42040 vllm-project#41233 vllm-project#45345 vllm-project#43982

Co-authored-by: GitHub Copilot
Signed-off-by: MingqiWang-coder <mingqiwang@hust.edu.cn>
ohsono pushed a commit to ohsono/vllm that referenced this pull request Jul 3, 2026
Signed-off-by: UranusSeven <109661872+UranusSeven@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed v1

3 participants