[Perf][KVConnector][Mooncake] Compact chunk-hash keys and zero-copy lookup wire format by ivanium · Pull Request #45969 · vllm-project/vllm

ivanium · 2026-06-17T22:42:17Z

Purpose

Two related optimizations to the MooncakeStoreConnector prefix-lookup path. Both reduce the cost of moving block hashes around, which becomes acute when the model's block_size is much larger than the connector's hash_block_size.

1. Compact chunk-hash keys

When block_size > hash_block_size, a single block_size chunk previously keyed Mooncake by concatenating all of its fine-grained sub-hashes (BlockHashListWithBlockSize joined every sub-hash). The Mooncake key therefore grew linearly with the block_size / hash_block_size ratio.

This is especially costly for DeepSeek-V4-style configs, which run block_size=256 with hash_block_size=4 — a ratio of 64, so every Mooncake key became 64 hash digests concatenated together.

Because the engine chains block hashes (each block hash folds in its predecessor), a chunk's last sub-hash already uniquely identifies the whole chunk and its prefix. The new chunk_hashes_for_block_size / _CompactChunkHashList key each chunk by that single trailing digest, so the Mooncake key stays one digest regardless of the ratio (64x smaller keys for the DSv4 config above, and proportionally smaller anywhere block_size != hash_block_size).

2. Zero-copy lookup wire format

The lookup RPC previously msgpack-encoded a list[str] of hex digests — hex doubles the byte count and msgpack adds per-element framing. The protocol now sends a hash_len frame (u16) followed by the raw fixed-size hashes concatenated back-to-back. The server reconstructs them through BlobBlockHashes, a lazy Sequence[BlockHash] view over the flat buffer, so it never materializes the full hash list upfront. This removes the hex inflation, the msgpack framing, and the eager allocation on the hot lookup path.

Test Plan

.venv/bin/python -m pytest \
  tests/v1/kv_connector/unit/test_mooncake_store_coordinator.py \
  tests/v1/kv_connector/unit/test_mooncake_store_worker.py \
  tests/v1/kv_connector/unit/test_mooncake_store_hma_e2e.py

Tests are updated to assert the compact last-sub-hash keying and the new wire format.

Test Result

85 passed.

Notes

This is not a duplicate of any open PR. No other open PR touches the chunk-hash key construction or the lookup serialization format; the only adjacent work is my own #45659 (async lookup), which changes the lookup call site but not its payload encoding. This change is rebased to stand alone on main.

AI assistance was used in preparing this change; the author has reviewed every changed line.

🤖 Generated with Claude Code

ivanium · 2026-06-17T22:45:57Z

cc @wzhao18 and @Dao007forever for comments

njhill · 2026-06-18T21:02:00Z

+        all_frames = [
+            LOOKUP_MSG,
+            token_len.to_bytes(4, byteorder="big"),
+            hash_len.to_bytes(2, byteorder="big"),
+            b"".join(block_hashes),
+        ]


nit: use tuple

Suggested change

all_frames = [

LOOKUP_MSG,

token_len.to_bytes(4, byteorder="big"),

hash_len.to_bytes(2, byteorder="big"),

b"".join(block_hashes),

]

all_frames = (

LOOKUP_MSG,

token_len.to_bytes(4, byteorder="big"),

hash_len.to_bytes(2, byteorder="big"),

b"".join(block_hashes),

)

njhill · 2026-06-18T21:53:53Z

-                    hashes_str = self.decoder.decode(hash_frames)
-                    block_hashes = [BlockHash(bytes.fromhex(s)) for s in hashes_str]
+                    hash_len = int.from_bytes(all_frames[2], byteorder="big")
+                    blob = bytes(all_frames[3])


Suggested change

blob = bytes(all_frames[3])

blob = all_frames[3].buffer

This will make a copy otherwise.

njhill · 2026-06-18T21:54:32Z

+    of materializing all hashes upfront.
+    """
+
+    def __init__(self, blob: bytes, hash_len: int):


combined with other suggestion

Suggested change

def __init__(self, blob: bytes, hash_len: int):

def __init__(self, blob: memoryview, hash_len: int):

mergify · 2026-06-18T22:04:05Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @ivanium.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

ivanium · 2026-06-18T23:59:56Z

@njhill Thanks for the comments! I updated the PR with the suggestions.

njhill

Thanks @ivanium lgtm

mergify · 2026-06-19T03:25:55Z

Hi @ivanium, the pre-commit checks have failed. Please run:

uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

…head of transfering block hashes Signed-off-by: Yifan Qiao <yifanqiao@inferact.ai>

Signed-off-by: Yifan Qiao <yifanqiao@inferact.ai>

…ookup wire format (vllm-project#45969)

…ookup wire format (vllm-project#45969) Signed-off-by: Qiang Li <qiang.li2@amd.com>

mergify Bot added v1 kv-connector labels Jun 17, 2026

ivanium marked this pull request as ready for review June 17, 2026 22:45

ivanium requested review from ApostaC, NickLucche, orozery and xuechendi as code owners June 17, 2026 22:45

ivanium mentioned this pull request Jun 17, 2026

[Perf][KVConnector][Mooncake] Parallelize KV load with a receive-thread pool #45971

Merged

njhill reviewed Jun 18, 2026

View reviewed changes

mergify Bot added the needs-rebase label Jun 18, 2026

ivanium force-pushed the mk-store/protocol-opt branch from 8342af1 to 39451ac Compare June 18, 2026 23:48

mergify Bot removed the needs-rebase label Jun 18, 2026

njhill approved these changes Jun 19, 2026

View reviewed changes

njhill added the ready ONLY add when PR is ready to merge/full CI is needed label Jun 19, 2026

ivanium added 4 commits June 19, 2026 04:35

perf (mk-store): reduce bhash key side; and reduce serialization over…

340fe56

…head of transfering block hashes Signed-off-by: Yifan Qiao <yifanqiao@inferact.ai>

chore: review comments

c805243

Signed-off-by: Yifan Qiao <yifanqiao@inferact.ai>

test: test blob block hash

1bee483

Signed-off-by: Yifan Qiao <yifanqiao@inferact.ai>

refactor: metadata template in MooncakeStoreWorker for better perf

4381743

Signed-off-by: Yifan Qiao <yifanqiao@inferact.ai>

ivanium force-pushed the mk-store/protocol-opt branch from f201656 to 4381743 Compare June 19, 2026 04:37

Merge branch 'main' into mk-store/protocol-opt

a00b9c4

WoosukKwon merged commit ab7fcbd into vllm-project:main Jun 20, 2026
70 of 74 checks passed

xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Jun 21, 2026

[Perf][KVConnector][Mooncake] Compact chunk-hash keys and zero-copy l…

86354a5

…ookup wire format (vllm-project#45969)

tunglinwood pushed a commit to tunglinwood/vllm that referenced this pull request Jun 22, 2026

[Perf][KVConnector][Mooncake] Compact chunk-hash keys and zero-copy l…

2476bb0

…ookup wire format (vllm-project#45969)

wzhao18 mentioned this pull request Jun 23, 2026

[Mooncake] Optimize lookup pool key string construction #46188

Merged

4 tasks

nkzhenhua pushed a commit to nkzhenhua/vllm that referenced this pull request Jun 24, 2026

[Perf][KVConnector][Mooncake] Compact chunk-hash keys and zero-copy l…

4da3165

…ookup wire format (vllm-project#45969)

qli88 pushed a commit to qli88/vllm that referenced this pull request Jun 26, 2026

[Perf][KVConnector][Mooncake] Compact chunk-hash keys and zero-copy l…

4044973

…ookup wire format (vllm-project#45969) Signed-off-by: Qiang Li <qiang.li2@amd.com>

zhewenl mentioned this pull request Jul 1, 2026

[KV Connector][Mooncake] Apply SWA lookup mask before hashing/key build #47317

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Perf][KVConnector][Mooncake] Compact chunk-hash keys and zero-copy lookup wire format#45969

[Perf][KVConnector][Mooncake] Compact chunk-hash keys and zero-copy lookup wire format#45969
WoosukKwon merged 5 commits into
vllm-project:mainfrom
ivanium:mk-store/protocol-opt

ivanium commented Jun 17, 2026

ivanium commented Jun 17, 2026

njhill Jun 18, 2026

njhill Jun 18, 2026

njhill Jun 18, 2026

mergify Bot commented Jun 18, 2026

ivanium commented Jun 18, 2026

njhill left a comment

mergify Bot commented Jun 19, 2026

Uh oh!

Labels

3 participants

	def __init__(self, blob: bytes, hash_len: int):
	def __init__(self, blob: memoryview, hash_len: int):

Uh oh!

Uh oh!

Conversation

ivanium commented Jun 17, 2026

Purpose

1. Compact chunk-hash keys

2. Zero-copy lookup wire format

Test Plan

Test Result

Notes

ivanium commented Jun 17, 2026

njhill Jun 18, 2026

Choose a reason for hiding this comment

njhill Jun 18, 2026

Choose a reason for hiding this comment

njhill Jun 18, 2026

Choose a reason for hiding this comment

mergify Bot commented Jun 18, 2026

ivanium commented Jun 18, 2026

njhill left a comment

Choose a reason for hiding this comment

mergify Bot commented Jun 19, 2026

Uh oh!

Labels

3 participants