[Mooncake] Skip KV lookup for non-reachable SWA blocks by wzhao18 · Pull Request #45444 · vllm-project/vllm

wzhao18 · 2026-06-12T19:20:50Z

Purpose

This PR adds some optimizations for reducing overhead in Mooncake KV offloading.

In lookup, skip SWA blocks that are not eligible to be considered cache hit, using kv cache group's reachable_block_mask.
Use None as all-True mask for store_mask, saving list construction overhead for full cache: masks.append([True] * num_chunks if mask is None else mask).

Performance benchmark:

DeepSeek v4 TP4 on 4 x GB300:

Test Plan

Mooncake store unit tests:

tests/v1/kv_connector/unit/test_mooncake_store_coordinator.py
tests/v1/kv_connector/unit/test_mooncake_store_worker.py

Checked output of dsv4 using KV offloading looks correct.

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

Signed-off-by: wzhao18 <wzhao18.sz@gmail.com>

ivanium

LGTM. Thanks for looking into the issue and the fix!

…45444) Signed-off-by: wzhao18 <wzhao18.sz@gmail.com>

…45444) Signed-off-by: wzhao18 <wzhao18.sz@gmail.com> Signed-off-by: divineearthly <divineearthly@gmail.com>

…45444) Signed-off-by: wzhao18 <wzhao18.sz@gmail.com>

mergify Bot added v1 kv-connector labels Jun 12, 2026

wzhao18 force-pushed the wzhao/cache-block-mask branch 3 times, most recently from cd8b96e to dfa9e74 Compare June 12, 2026 22:07

wzhao18 changed the title ~~[Mooncake] Skip KV offloading lookup for non-reachable SWA blocks~~ Jun 12, 2026

wzhao18 marked this pull request as ready for review June 13, 2026 03:35

wzhao18 requested review from ApostaC, NickLucche, orozery and xuechendi as code owners June 13, 2026 03:35

wzhao18 added 2 commits June 17, 2026 09:10

Skip lookup for non-reachable SWA blocks

629afa3

Signed-off-by: wzhao18 <wzhao18.sz@gmail.com>

fix precommit

0581dbb

Signed-off-by: wzhao18 <wzhao18.sz@gmail.com>

wzhao18 force-pushed the wzhao/cache-block-mask branch from dfa9e74 to 0581dbb Compare June 17, 2026 16:10

ivanium approved these changes Jun 17, 2026

View reviewed changes

ivanium added the ready ONLY add when PR is ready to merge/full CI is needed label Jun 17, 2026

ivanium mentioned this pull request Jun 17, 2026

[Perf][KVConnector][Mooncake] Parallelize KV load with a receive-thread pool #45971

Merged

njhill approved these changes Jun 18, 2026

View reviewed changes

Dao007forever approved these changes Jun 18, 2026

View reviewed changes

ywang96 merged commit 5fd3b27 into vllm-project:main Jun 18, 2026
75 checks passed

djramic pushed a commit to djramic/vllm that referenced this pull request Jun 18, 2026

[Mooncake] Skip KV lookup for non-reachable SWA blocks (vllm-project#…

6d26867

…45444) Signed-off-by: wzhao18 <wzhao18.sz@gmail.com>

divineearthly pushed a commit to divineearthly/vllm that referenced this pull request Jun 19, 2026

[Mooncake] Skip KV lookup for non-reachable SWA blocks (vllm-project#…

81d2910

…45444) Signed-off-by: wzhao18 <wzhao18.sz@gmail.com> Signed-off-by: divineearthly <divineearthly@gmail.com>

xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Jun 21, 2026

[Mooncake] Skip KV lookup for non-reachable SWA blocks (vllm-project#…

50e6942

…45444) Signed-off-by: wzhao18 <wzhao18.sz@gmail.com>

tunglinwood pushed a commit to tunglinwood/vllm that referenced this pull request Jun 22, 2026

[Mooncake] Skip KV lookup for non-reachable SWA blocks (vllm-project#…

bd51713

…45444) Signed-off-by: wzhao18 <wzhao18.sz@gmail.com>

nkzhenhua pushed a commit to nkzhenhua/vllm that referenced this pull request Jun 24, 2026

[Mooncake] Skip KV lookup for non-reachable SWA blocks (vllm-project#…

53ceb6f

…45444) Signed-off-by: wzhao18 <wzhao18.sz@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Mooncake] Skip KV lookup for non-reachable SWA blocks#45444

[Mooncake] Skip KV lookup for non-reachable SWA blocks#45444
ywang96 merged 2 commits into
vllm-project:mainfrom
wzhao18:wzhao/cache-block-mask

wzhao18 commented Jun 12, 2026 •

edited

Loading

ivanium left a comment

Uh oh!

Labels

5 participants

Uh oh!

Uh oh!

Conversation

wzhao18 commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Performance benchmark:

Test Plan

Test Result

ivanium left a comment

Choose a reason for hiding this comment

Uh oh!

Labels

5 participants

wzhao18 commented Jun 12, 2026 •

edited

Loading