Skip to content

[Mooncake] Skip KV lookup for non-reachable SWA blocks#45444

Merged
ywang96 merged 2 commits into
vllm-project:mainfrom
wzhao18:wzhao/cache-block-mask
Jun 18, 2026
Merged

[Mooncake] Skip KV lookup for non-reachable SWA blocks#45444
ywang96 merged 2 commits into
vllm-project:mainfrom
wzhao18:wzhao/cache-block-mask

Conversation

@wzhao18

@wzhao18 wzhao18 commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Purpose

This PR adds some optimizations for reducing overhead in Mooncake KV offloading.

  • In lookup, skip SWA blocks that are not eligible to be considered cache hit, using kv cache group's reachable_block_mask.
  • Use None as all-True mask for store_mask, saving list construction overhead for full cache: masks.append([True] * num_chunks if mask is None else mask).

Performance benchmark:

DeepSeek v4 TP4 on 4 x GB300:

inferencex-agentx-mvp

Test Plan

Mooncake store unit tests:

  • tests/v1/kv_connector/unit/test_mooncake_store_coordinator.py
  • tests/v1/kv_connector/unit/test_mooncake_store_worker.py

Checked output of dsv4 using KV offloading looks correct.

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
@wzhao18 wzhao18 force-pushed the wzhao/cache-block-mask branch 3 times, most recently from cd8b96e to dfa9e74 Compare June 12, 2026 22:07
@wzhao18 wzhao18 changed the title [Mooncake] Skip KV offloading lookup for non-reachable SWA blocks Jun 12, 2026
@wzhao18 wzhao18 marked this pull request as ready for review June 13, 2026 03:35
wzhao18 added 2 commits June 17, 2026 09:10
Signed-off-by: wzhao18 <wzhao18.sz@gmail.com>
Signed-off-by: wzhao18 <wzhao18.sz@gmail.com>
@wzhao18 wzhao18 force-pushed the wzhao/cache-block-mask branch from dfa9e74 to 0581dbb Compare June 17, 2026 16:10

@ivanium ivanium left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for looking into the issue and the fix!

@ivanium ivanium added the ready ONLY add when PR is ready to merge/full CI is needed label Jun 17, 2026
@ywang96 ywang96 merged commit 5fd3b27 into vllm-project:main Jun 18, 2026
75 checks passed
djramic pushed a commit to djramic/vllm that referenced this pull request Jun 18, 2026
divineearthly pushed a commit to divineearthly/vllm that referenced this pull request Jun 19, 2026
…45444)

Signed-off-by: wzhao18 <wzhao18.sz@gmail.com>
Signed-off-by: divineearthly <divineearthly@gmail.com>
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Jun 21, 2026
tunglinwood pushed a commit to tunglinwood/vllm that referenced this pull request Jun 22, 2026
nkzhenhua pushed a commit to nkzhenhua/vllm that referenced this pull request Jun 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kv-connector ready ONLY add when PR is ready to merge/full CI is needed v1

5 participants