[KV Events] Switch event structs from array to map encoding#42892
Conversation
Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>
Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>
|
Documentation preview: https://vllm--42892.org.readthedocs.build/en/42892/ |
BlockStored wire format from positional to named-key encodingThere was a problem hiding this comment.
Code Review
This pull request removes the "array_like=True" parameter from the msgspec.Struct definitions for EventBatch and KVCacheEvent in both the core implementation and the example subscriber. I have no feedback to provide as there were no review comments.
|
Thanks for pushing this. Moving One question: do we also need to change to: {"ts": ts, "events": events, "data_parallel_rank": data_parallel_rank}On the Dynamo side, we already tolerate array/map forms for individual events, but would still need a top-level batch parser update if |
Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>
@vMaroon We could add a flag to keep the old format during a transition. Worth noting the schema has grown 5 times without compat protocol so far, and the consumer update is one line. Happy to go either way, just want to flag the tradeoff of keeping two formats alive.
@PeaBrane You're right, scoped it out — |
NickLucche
left a comment
There was a problem hiding this comment.
Thanks for following up with this @sagearc .
I would personally list this as a breaking change and just change the KVCacheEvent format. Clients/consumers would have to handle backward compatibility if aiming to support multiple vllm versions. From this change onward, retro-compat is ensured by this format.
Otherwise with current code we're anyway breaking compatibility on each new addition.
NickLucche
left a comment
There was a problem hiding this comment.
LGTM but would appreciate another stamp here
…ject#42892) Signed-off-by: Sage Ahrac <sagiahrak@gmail.com> Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>
…ject#42892) Signed-off-by: Sage Ahrac <sagiahrak@gmail.com> Signed-off-by: Waqar Ahmed <waqar.ahmed@amd.com>
…ject#42892) Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>
…ibility) (#661) * fix(kvevents): support map-encoded vLLM KV events vLLM dropped msgspec array_like=True from its KV cache event structs (vllm-project/vllm#42892, merged 2026-06-09), so newer vLLM versions publish each event as a field-name map with the tag under the "type" key instead of a positional array. The VLLMAdapter only decoded positional arrays, so every KV event from a new vLLM fails to parse and the whole index goes dark. Normalize map-encoded events to the existing positional layout in decodeVLLMEvent before dispatch, keeping the converters and their forward/backward-compatibility guards encoding-agnostic. Positional arrays from older vLLM versions keep working unchanged; absent map fields become nil exactly like omitted trailing array fields. Verified against a captured event stream from a live vLLM serve run (map encoding, multi-turn traffic with CPU offload store and eviction events): all 313 frames parse, and replaying them through the kvevents.Pool -> kvblock.InMemoryIndex path indexes and evicts every block correctly. Unit tests cover map-encoded BlockStored / BlockRemoved / AllBlocksCleared, a mixed-encoding batch, and malformed map events. Assisted-by: Claude (AI assistance for implementation and tests) Signed-off-by: Change72 <changg@nvidia.com> * fix(kvevents): distinct errors for malformed map-encoded events Address review: report a dedicated error for a missing "type" tag (instead of conflating it with the non-string case), drop the stale "tagged union" wording from the encoding-agnostic unmarshal error, and pin each malformed-map failure mode to its distinct error message in the test. Assisted-by: Claude (AI assistance for implementation and tests) Signed-off-by: Change72 <changg@nvidia.com> * fix(kvevents): tighten unmarshal error, drop unreachable map branch Address review: the event-level unmarshal failure now wraps as "unmarshal event payload" so ParseMessage's "failed to decode vLLM event" wrap no longer doubles the same prefix. Drop the map[any]any normalization branch: msgpack v5 decodes untyped maps via DecodeMap(), which only produces map[string]any and rejects non-string keys inside Unmarshal itself, so the branch was unreachable. Condense the encoding doc comments; the PR description carries the full background. Assisted-by: Claude (AI assistance for implementation and tests) Signed-off-by: Change72 <changg@nvidia.com> --------- Signed-off-by: Change72 <changg@nvidia.com>
…ject#42892) Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>
…ject#42892) Signed-off-by: Sage Ahrac <sagiahrak@gmail.com> Signed-off-by: divineearthly <divineearthly@gmail.com>
…ject#42892) Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>
…ject#42892) Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>
…ject#42892) Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>
Context
KV cache events are published over ZMQ as msgpack-encoded batches consumed by external subscribers (routing daemons, cache managers, etc.). The encoding used a positional array format — every field identified purely by its index — which makes every schema addition a potential break for subscribers that haven't been updated at the same time. This was first flagged by @njhill in #27577, and the schema has been extended four times since:
block_hashes,parent_block_hash,token_ids,block_size,lora_id— #16750medium— #19737lora_name— #27577extra_keys— #33304group_idx— #37688kv_cache_spec_kind,kv_cache_spec_sliding_window— #40984This PR removes
array_like=Truefrom the event structs, switching to named-key (map) encoding.Example
Before — the entire message is a positional array. Inserting a field anywhere but the end shifts every subsequent position and breaks deserialization at an unrelated field.
After — fields are matched by name. A subscriber that doesn't know about
extra_keysignores it. An older subscriber still decodes correctly as long as the fields it knows about are present.{ "ts": 1.0, "events": [ { "type": "BlockStored", "block_hashes": [4291203, 1837291, 9104857], "parent_block_hash": null, "token_ids": [0, 1, 2, 3], "block_size": 64, "lora_id": null, "medium": "GPU", "lora_name": null, "extra_keys": [["mm_hash1", 50]] } ] }This is a one-time breaking change. Existing subscribers must drop
array_like=Truefrom their local struct definitions (the updatedkv_events_subscriber.pyexample reflects this). After this, the schema can grow without further breaks.Performance
Measured on 2048-tokens per request, 32-block event with 3 multimodal features (block_size=64):
The overhead is a fixed ~98 bytes per message — immeasurable at realistic KV event rates.
CC
@njhill @NickLucche @vMaroon @orozery