Skip to content

[Perf][KVConnector][Mooncake] Compact chunk-hash keys and zero-copy lookup wire format#45969

Merged
WoosukKwon merged 5 commits into
vllm-project:mainfrom
ivanium:mk-store/protocol-opt
Jun 20, 2026
Merged

[Perf][KVConnector][Mooncake] Compact chunk-hash keys and zero-copy lookup wire format#45969
WoosukKwon merged 5 commits into
vllm-project:mainfrom
ivanium:mk-store/protocol-opt

Conversation

@ivanium

@ivanium ivanium commented Jun 17, 2026

Copy link
Copy Markdown
Collaborator

Purpose

Two related optimizations to the MooncakeStoreConnector prefix-lookup path. Both reduce the cost of moving block hashes around, which becomes acute when the model's block_size is much larger than the connector's hash_block_size.

1. Compact chunk-hash keys

When block_size > hash_block_size, a single block_size chunk previously keyed Mooncake by concatenating all of its fine-grained sub-hashes (BlockHashListWithBlockSize joined every sub-hash). The Mooncake key therefore grew linearly with the block_size / hash_block_size ratio.

This is especially costly for DeepSeek-V4-style configs, which run block_size=256 with hash_block_size=4 — a ratio of 64, so every Mooncake key became 64 hash digests concatenated together.

Because the engine chains block hashes (each block hash folds in its predecessor), a chunk's last sub-hash already uniquely identifies the whole chunk and its prefix. The new chunk_hashes_for_block_size / _CompactChunkHashList key each chunk by that single trailing digest, so the Mooncake key stays one digest regardless of the ratio (64x smaller keys for the DSv4 config above, and proportionally smaller anywhere block_size != hash_block_size).

2. Zero-copy lookup wire format

The lookup RPC previously msgpack-encoded a list[str] of hex digests — hex doubles the byte count and msgpack adds per-element framing. The protocol now sends a hash_len frame (u16) followed by the raw fixed-size hashes concatenated back-to-back. The server reconstructs them through BlobBlockHashes, a lazy Sequence[BlockHash] view over the flat buffer, so it never materializes the full hash list upfront. This removes the hex inflation, the msgpack framing, and the eager allocation on the hot lookup path.

Test Plan

.venv/bin/python -m pytest \
  tests/v1/kv_connector/unit/test_mooncake_store_coordinator.py \
  tests/v1/kv_connector/unit/test_mooncake_store_worker.py \
  tests/v1/kv_connector/unit/test_mooncake_store_hma_e2e.py

Tests are updated to assert the compact last-sub-hash keying and the new wire format.

Test Result

85 passed.

Notes

This is not a duplicate of any open PR. No other open PR touches the chunk-hash key construction or the lookup serialization format; the only adjacent work is my own #45659 (async lookup), which changes the lookup call site but not its payload encoding. This change is rebased to stand alone on main.

AI assistance was used in preparing this change; the author has reviewed every changed line.

🤖 Generated with Claude Code

@ivanium ivanium marked this pull request as ready for review June 17, 2026 22:45
@ivanium

ivanium commented Jun 17, 2026

Copy link
Copy Markdown
Collaborator Author

cc @wzhao18 and @Dao007forever for comments

Comment on lines +1555 to +1560
all_frames = [
LOOKUP_MSG,
token_len.to_bytes(4, byteorder="big"),
hash_len.to_bytes(2, byteorder="big"),
b"".join(block_hashes),
]

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: use tuple

Suggested change
all_frames = [
LOOKUP_MSG,
token_len.to_bytes(4, byteorder="big"),
hash_len.to_bytes(2, byteorder="big"),
b"".join(block_hashes),
]
all_frames = (
LOOKUP_MSG,
token_len.to_bytes(4, byteorder="big"),
hash_len.to_bytes(2, byteorder="big"),
b"".join(block_hashes),
)
hashes_str = self.decoder.decode(hash_frames)
block_hashes = [BlockHash(bytes.fromhex(s)) for s in hashes_str]
hash_len = int.from_bytes(all_frames[2], byteorder="big")
blob = bytes(all_frames[3])

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
blob = bytes(all_frames[3])
blob = all_frames[3].buffer

This will make a copy otherwise.

of materializing all hashes upfront.
"""

def __init__(self, blob: bytes, hash_len: int):

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

combined with other suggestion

Suggested change
def __init__(self, blob: bytes, hash_len: int):
def __init__(self, blob: memoryview, hash_len: int):
@mergify

mergify Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @ivanium.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify Bot added the needs-rebase label Jun 18, 2026
@ivanium ivanium force-pushed the mk-store/protocol-opt branch from 8342af1 to 39451ac Compare June 18, 2026 23:48
@mergify mergify Bot removed the needs-rebase label Jun 18, 2026
@ivanium

ivanium commented Jun 18, 2026

Copy link
Copy Markdown
Collaborator Author

@njhill Thanks for the comments! I updated the PR with the suggestions.

@njhill njhill left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ivanium lgtm

@njhill njhill added the ready ONLY add when PR is ready to merge/full CI is needed label Jun 19, 2026
@mergify

mergify Bot commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

Hi @ivanium, the pre-commit checks have failed. Please run:

uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

ivanium added 4 commits June 19, 2026 04:35
…head of transfering block hashes

Signed-off-by: Yifan Qiao <yifanqiao@inferact.ai>
Signed-off-by: Yifan Qiao <yifanqiao@inferact.ai>
Signed-off-by: Yifan Qiao <yifanqiao@inferact.ai>
Signed-off-by: Yifan Qiao <yifanqiao@inferact.ai>
@ivanium ivanium force-pushed the mk-store/protocol-opt branch from f201656 to 4381743 Compare June 19, 2026 04:37
@WoosukKwon WoosukKwon merged commit ab7fcbd into vllm-project:main Jun 20, 2026
70 of 74 checks passed
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Jun 21, 2026
tunglinwood pushed a commit to tunglinwood/vllm that referenced this pull request Jun 22, 2026
nkzhenhua pushed a commit to nkzhenhua/vllm that referenced this pull request Jun 24, 2026
qli88 pushed a commit to qli88/vllm that referenced this pull request Jun 26, 2026
…ookup wire format (vllm-project#45969)

Signed-off-by: Qiang Li <qiang.li2@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kv-connector ready ONLY add when PR is ready to merge/full CI is needed v1

3 participants