Skip to content

[Bugfix][KVConnector] Support DCP/PCP in OffloadingConnector#41549

Merged
orozery merged 3 commits into
vllm-project:mainfrom
Etelis:kv-offload-dcp-pcp
May 5, 2026
Merged

[Bugfix][KVConnector] Support DCP/PCP in OffloadingConnector#41549
orozery merged 3 commits into
vllm-project:mainfrom
Etelis:kv-offload-dcp-pcp

Conversation

@Etelis

@Etelis Etelis commented May 3, 2026

Copy link
Copy Markdown
Contributor

Closes #40992. Refs #40259.

OffloadingConnector does not work with --decode-context-parallel-size > 1: the engine-core dies on the first request with AssertionError at offloading/scheduler.py:269.

Reproducer (4× H100, single node):

vllm serve Qwen/Qwen2.5-1.5B-Instruct \
  --tensor-parallel-size=4 --decode-context-parallel-size=2 \
  --kv-offloading-size=8 --disable-hybrid-kv-cache-manager \
  --enable-prefix-caching --block-size=16 --enforce-eager
# then: POST /v1/completions with a prompt spanning multiple blocks (~600 tokens)
Before fix After fix
First request AssertionError at scheduler.py:269, engine-core dies, HTTP 500 HTTP 200, valid completion
num_blocks vs len(offload_keys) 37 vs 18 (off by ×DCP) 18 vs 18
Multi-shot (3 requests, 601/741/601 tokens, partial prefix overlap) n/a All 200 OK; zero AssertionError/ERROR lines
Multiply OffloadingSpec hash_block_size and gpu_block_size by
decode_context_parallel_size * prefill_context_parallel_size to match
the logical-unit convention used by Request.block_hashes (and already
applied in kv_cache_coordinator, single_type_kv_cache_manager, and
kv_cache_utils).

Closes vllm-project#40992

Signed-off-by: Itay Etelis <itay.etelis@ibm.com>
@Etelis Etelis requested review from ApostaC and orozery as code owners May 3, 2026 11:39

@claude claude Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

@mergify mergify Bot added v1 bug Something isn't working labels May 3, 2026

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the KV offload base class to scale the hash and GPU block sizes by a context parallel factor derived from the parallel configuration. I have no feedback to provide.

@orozery orozery left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Etelis !

@orozery orozery added the ready ONLY add when PR is ready to merge/full CI is needed label May 4, 2026
@orozery orozery merged commit 98661fe into vllm-project:main May 5, 2026
49 checks passed
chaojun-zhang pushed a commit to chaojun-zhang/vllm that referenced this pull request May 6, 2026
…oject#41549)

Signed-off-by: Itay Etelis <itay.etelis@ibm.com>
Co-authored-by: Itay Etelis <itay.etelis@ibm.com>
Copilot AI pushed a commit to hongbolv/vllm that referenced this pull request May 7, 2026
…oject#41549)

Signed-off-by: Itay Etelis <itay.etelis@ibm.com>
Co-authored-by: Itay Etelis <itay.etelis@ibm.com>
Co-authored-by: hongbolv <33214277+hongbolv@users.noreply.github.com>
ikaadil pushed a commit to ikaadil/vllm that referenced this pull request May 7, 2026
…oject#41549)

Signed-off-by: Itay Etelis <itay.etelis@ibm.com>
Co-authored-by: Itay Etelis <itay.etelis@ibm.com>
Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>
libinta pushed a commit to libinta/vllm that referenced this pull request May 8, 2026
…oject#41549)

Signed-off-by: Itay Etelis <itay.etelis@ibm.com>
Co-authored-by: Itay Etelis <itay.etelis@ibm.com>
Signed-off-by: Libin Tang <libin.tang@intel.com>
weifang231 pushed a commit to weifang231/eb-vllm that referenced this pull request May 13, 2026
…oject#41549)

Signed-off-by: Itay Etelis <itay.etelis@ibm.com>
Co-authored-by: Itay Etelis <itay.etelis@ibm.com>
mfylcek pushed a commit to mfylcek/vllm that referenced this pull request May 19, 2026
…oject#41549)

Signed-off-by: Itay Etelis <itay.etelis@ibm.com>
Co-authored-by: Itay Etelis <itay.etelis@ibm.com>
jhu960213 pushed a commit to jhu960213/vllm that referenced this pull request May 20, 2026
…oject#41549)

Signed-off-by: Itay Etelis <itay.etelis@ibm.com>
Co-authored-by: Itay Etelis <itay.etelis@ibm.com>
mvanhorn pushed a commit to mvanhorn/vllm that referenced this pull request Jun 4, 2026
…oject#41549)

Signed-off-by: Itay Etelis <itay.etelis@ibm.com>
Co-authored-by: Itay Etelis <itay.etelis@ibm.com>
Signed-off-by: Matt Van Horn <455140+mvanhorn@users.noreply.github.com>
tunglinwood pushed a commit to tunglinwood/vllm that referenced this pull request Jun 22, 2026
…oject#41549)

Signed-off-by: Itay Etelis <itay.etelis@ibm.com>
Co-authored-by: Itay Etelis <itay.etelis@ibm.com>
MingqiWang-coder added a commit to vLLM-HUST/vllm-hust that referenced this pull request Jun 30, 2026
Cherry-pick 62 bugfix/security PRs from upstream vllm-project/vllm main
(2026-05-03 to 2026-06-17), covering scheduler, engine core, model runner,
worker, attention, KV cache, compilation, and structured output fixes.

Security (4): vllm-project#43286 vllm-project#44744 vllm-project#45118 vllm-project#45252
Bugfix (56): vllm-project#35536 vllm-project#36616 vllm-project#38895 vllm-project#39155 vllm-project#39324 vllm-project#39562 vllm-project#39805 vllm-project#40398 vllm-project#40726
vllm-project#40727 vllm-project#40737 vllm-project#40749 vllm-project#40961 vllm-project#41119 vllm-project#41133 vllm-project#41233 vllm-project#41237 vllm-project#41411 vllm-project#41496 vllm-project#41549
vllm-project#41674 vllm-project#41873 vllm-project#41895 vllm-project#42040 vllm-project#42112 vllm-project#42289 vllm-project#42479 vllm-project#42585 vllm-project#42692 vllm-project#42706 vllm-project#42709
vllm-project#42739 vllm-project#42967 vllm-project#43001 vllm-project#43079 vllm-project#43125 vllm-project#43160 vllm-project#43616 vllm-project#43669 vllm-project#43719 vllm-project#43768 vllm-project#43808
vllm-project#43961 vllm-project#43982 vllm-project#43988 vllm-project#43998 vllm-project#44057 vllm-project#44560 vllm-project#44574 vllm-project#44568 vllm-project#44603 vllm-project#44744 vllm-project#45195
vllm-project#45345 vllm-project#45383 vllm-project#45487 vllm-project#45564 vllm-project#45673
Runner fix (2): vllm-project#44568 vllm-project#44603

Skipped: vllm-project#43781 (ROCm-specific, not applicable to Ascend NPU)

Conflict resolutions:
- Manual merge: vllm-project#43286 vllm-project#45118 vllm-project#42112 vllm-project#43160 vllm-project#43719 vllm-project#44560
- Upstream-preferred (-X theirs): vllm-project#43808 vllm-project#43988 vllm-project#42967 vllm-project#35536 vllm-project#45195
- Test files (--theirs): vllm-project#44744 vllm-project#41895 vllm-project#42040 vllm-project#41233 vllm-project#45345 vllm-project#43982

Co-authored-by: GitHub Copilot
Signed-off-by: MingqiWang-coder <mingqiwang@hust.edu.cn>
MingqiWang-coder added a commit to vLLM-HUST/vllm-hust that referenced this pull request Jun 30, 2026
Cherry-pick 62 bugfix/security PRs from upstream vllm-project/vllm main
(2026-05-03 to 2026-06-17), covering scheduler, engine core, model runner,
worker, attention, KV cache, compilation, and structured output fixes.

Security (4): vllm-project#43286 vllm-project#44744 vllm-project#45118 vllm-project#45252
Bugfix (56): vllm-project#35536 vllm-project#36616 vllm-project#38895 vllm-project#39155 vllm-project#39324 vllm-project#39562 vllm-project#39805 vllm-project#40398 vllm-project#40726
vllm-project#40727 vllm-project#40737 vllm-project#40749 vllm-project#40961 vllm-project#41119 vllm-project#41133 vllm-project#41233 vllm-project#41237 vllm-project#41411 vllm-project#41496 vllm-project#41549
vllm-project#41674 vllm-project#41873 vllm-project#41895 vllm-project#42040 vllm-project#42112 vllm-project#42289 vllm-project#42479 vllm-project#42585 vllm-project#42692 vllm-project#42706 vllm-project#42709
vllm-project#42739 vllm-project#42967 vllm-project#43001 vllm-project#43079 vllm-project#43125 vllm-project#43160 vllm-project#43616 vllm-project#43669 vllm-project#43719 vllm-project#43768 vllm-project#43808
vllm-project#43961 vllm-project#43982 vllm-project#43988 vllm-project#43998 vllm-project#44057 vllm-project#44560 vllm-project#44574 vllm-project#44568 vllm-project#44603 vllm-project#44744 vllm-project#45195
vllm-project#45345 vllm-project#45383 vllm-project#45487 vllm-project#45564 vllm-project#45673
Runner fix (2): vllm-project#44568 vllm-project#44603

Skipped: vllm-project#43781 (ROCm-specific, not applicable to Ascend NPU)

Conflict resolutions:
- Manual merge: vllm-project#43286 vllm-project#45118 vllm-project#42112 vllm-project#43160 vllm-project#43719 vllm-project#44560
- Upstream-preferred (-X theirs): vllm-project#43808 vllm-project#43988 vllm-project#42967 vllm-project#35536 vllm-project#45195
- Test files (--theirs): vllm-project#44744 vllm-project#41895 vllm-project#42040 vllm-project#41233 vllm-project#45345 vllm-project#43982

Co-authored-by: GitHub Copilot
Signed-off-by: MingqiWang-coder <mingqiwang@hust.edu.cn>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working ready ONLY add when PR is ready to merge/full CI is needed v1

3 participants