Skip to content

Feature/offloading manager stats#35669

Merged
orozery merged 27 commits into
vllm-project:mainfrom
Srinivasoo7:feature/offloading-manager-stats
Jun 10, 2026
Merged

Feature/offloading manager stats#35669
orozery merged 27 commits into
vllm-project:mainfrom
Srinivasoo7:feature/offloading-manager-stats

Conversation

@Srinivasoo7

@Srinivasoo7 Srinivasoo7 commented Mar 1, 2026

Copy link
Copy Markdown
Contributor

Purpose

This PR adds support for telemetry emissions from the OffloadingManager interface inside the KV Connector.

As part of isolating the changes from the block-reuse frequency tracking PR, the OffloadingManager interface requires a standardized method to emit statistics up to the scheduler's KVConnector interfaces. This change:

  1. Adds get_stats() -> dict[str, Any] to the base OffloadingManager abstract class.
  2. Implements stats aggregation inside OffloadingConnectorStats.reduce to gracefully support flat scalar statistics (ints, floats) alongside its standard list metrics.
  3. Aggregates metrics polled from the currently active self.connector_scheduler.manager within OffloadingConnector.get_kv_connector_stats().
  4. Exposes stores_skipped directly from the FilteredOffloadingManager (formerly StoreReusedOffloadingManager), tracking exactly how many block hashes failed the LRU reuse threshold.

Test Plan

  • Ran pytest tests/v1/kv_offload to ensure manager abstraction and logic remain structurally sound.
  • Ran ruff format and ruff check on the modified files to ensure compliance with the repository layout.

Test Result

  • Pytest and Ruff linters pass cleanly across the modified abstract and connector files.
  • Emitted OffloadingConnectorStats successfully aggregate scalar dict values without validation errors.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively adds telemetry support for the OffloadingManager and introduces a FilteredOffloadingManager to gate offloading based on block reuse frequency. The implementation is well-structured, particularly the use of a decorator pattern for the filtering logic and the inclusion of comprehensive unit tests. I have identified a couple of areas for improvement to enhance configurability and robustness.

Comment thread vllm/v1/kv_offload/cpu.py Outdated
Comment on lines +88 to +93
store_threshold = int(self.extra_config.get("store_threshold", 0))
if store_threshold > 1:
self._manager = FilteredOffloadingManager(
backing=self._manager,
store_threshold=store_threshold,
)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The FilteredOffloadingManager is initialized without a configurable max_tracker_size. This means the LRU tracker for block reuse frequency will always use the default size. This parameter can have a significant impact on memory usage and filtering effectiveness, and should be configurable for performance tuning. I suggest reading max_tracker_size from self.extra_config, similar to how store_threshold is handled.

            store_threshold = int(self.extra_config.get("store_threshold", 0))
            if store_threshold > 1:
                max_tracker_size = int(
                    self.extra_config.get("max_tracker_size", 64_000)
                )
                self._manager = FilteredOffloadingManager(
                    backing=self._manager,
                    store_threshold=store_threshold,
                    max_tracker_size=max_tracker_size,
                )
Comment thread vllm/v1/kv_offload/reuse_manager.py Outdated
Comment on lines +157 to +159
stats = self._backing.get_stats()
stats["stores_skipped"] = self.stores_skipped
return stats

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The get_stats method modifies the dictionary returned by self._backing.get_stats() in-place. While this is currently safe because the existing backing managers return a new empty dictionary, this pattern is fragile. If a backing manager's implementation changes in the future to return a shared or cached dictionary, this could lead to unintended side effects. It's safer to create a new dictionary to avoid mutating the object returned by the backing manager.

Suggested change
stats = self._backing.get_stats()
stats["stores_skipped"] = self.stores_skipped
return stats
return {**self._backing.get_stats(), "stores_skipped": self.stores_skipped}
@mergify

mergify Bot commented Mar 3, 2026

Copy link
Copy Markdown
Contributor

Hi @Srinivasoo7, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?
mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:
# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint
@Srinivasoo7

Copy link
Copy Markdown
Contributor Author

Hi @orozery
Since we merged PR #35342, can you look into the supporting stats for this functionality?

Thanks

@orozery

orozery commented Mar 16, 2026

Copy link
Copy Markdown
Collaborator

@Srinivasoo7 can you please rebase?

@Srinivasoo7 Srinivasoo7 force-pushed the feature/offloading-manager-stats branch from 944208b to 6c0ce3a Compare March 17, 2026 00:15
@orozery

orozery commented Mar 17, 2026

Copy link
Copy Markdown
Collaborator

The stats dictionary returned by the connector is later used to feed OffloadingConnectorStats.
OffloadingConnectorStats currently assumes the a hard-code structure of transfer_type -> ops_list.
The changed proposed here will break this assumption.
Specifically, I expect the OffloadingConnectorStats.aggregate and OffloadingConnectorStats.reduce to fail.

@Srinivasoo7

Copy link
Copy Markdown
Contributor Author

Hi @orozery
You are right, in the original OffloadingConnectorStats, I assumed a hard-coded transfer_type -> ops_list structure, but the branch already updates both aggregate() and reduce() to handle scalar values alongside list values.
Hence, the stores_skipped flows through both methods correctly without breaking the existing transfer-type logic.

Comment thread vllm/distributed/kv_transfer/kv_connector/v1/offloading_connector.py Outdated
@mergify

mergify Bot commented Mar 18, 2026

Copy link
Copy Markdown
Contributor

Hi @Srinivasoo7, the pre-commit checks have failed. Please run:

uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy failing?
mypy is run differently in CI. If the failure is related to this check, please use the following command to run it locally:
# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
@Srinivasoo7 Srinivasoo7 force-pushed the feature/offloading-manager-stats branch from e53557c to 6c0ce3a Compare March 18, 2026 00:21
@mergify

mergify Bot commented Mar 18, 2026

Copy link
Copy Markdown
Contributor

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @Srinivasoo7.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify Bot added the needs-rebase label Mar 18, 2026
@orozery

orozery commented Mar 19, 2026

Copy link
Copy Markdown
Collaborator

Please rebase :)

@Srinivasoo7 Srinivasoo7 force-pushed the feature/offloading-manager-stats branch from 48c4e97 to 5d75841 Compare March 19, 2026 13:27
@Srinivasoo7

Copy link
Copy Markdown
Contributor Author

Done @orozery

@mergify mergify Bot removed the needs-rebase label Mar 19, 2026
@Srinivasoo7

Copy link
Copy Markdown
Contributor Author

/gemini review

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces telemetry for the offloading manager, which is a valuable addition. The changes to support a new statistics structure with both transfers and gauges are well-implemented in metrics.py and offloading_connector.py. However, I've identified a critical bug in reuse_manager.py that could lead to a runtime error, and another high-risk issue related to in-place data mutation. I have provided suggestions to address both of these concerns.

Comment thread vllm/v1/kv_offload/reuse_manager.py Outdated
Comment on lines +94 to +98
bh for bh in block_hashes if self.counts.get(bh, 0) >= self.store_threshold
]

self.stores_skipped += len(block_hashes) - len(eligible)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The block_hashes parameter is an Iterable. Calling len(block_hashes) after it has been potentially consumed by the list comprehension on the preceding lines can lead to incorrect behavior or a TypeError if block_hashes is a generator. To ensure correctness, you should first convert the iterable to a list and use that list for both operations.

        block_hashes_list = list(block_hashes)
        eligible = [
            bh for bh in block_hashes_list
            if self.counts.get(bh, 0) >= self.store_threshold
        ]

        self.stores_skipped += len(block_hashes_list) - len(eligible)
Comment thread vllm/v1/kv_offload/reuse_manager.py Outdated
Comment on lines +118 to +121
def get_stats(self) -> dict[str, Any]:
stats = self._backing.get_stats()
stats.setdefault("gauges", {})["stores_skipped"] = self.stores_skipped
return stats

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The current implementation modifies the dictionary returned by self._backing.get_stats() in-place. This can cause unexpected side effects if the backing manager's returned dictionary is not meant to be mutated. To prevent this, you should operate on a copy. The safest approach is to use copy.deepcopy.

You will need to add import copy to the file.

    def get_stats(self) -> dict[str, Any]:
        stats = copy.deepcopy(self._backing.get_stats())
        stats.setdefault("gauges", {})["stores_skipped"] = self.stores_skipped
        return stats
@Srinivasoo7

Copy link
Copy Markdown
Contributor Author

/gemini review

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces telemetry for the OffloadingManager, allowing statistics to be collected and emitted. The changes include adding a get_stats method to the OffloadingManager interface, implementing it in FilterReusedOffloadingManager to report stores_skipped, and updating the metrics aggregation logic to handle these new scalar statistics (gauges) alongside existing transfer metrics. The OffloadingConnector is also modified to collect and aggregate these stats from the scheduler's manager.

My main concern is with the new test file tests/v1/kv_offload/test_reuse_manager.py. It appears to be written for a different version of the code, referencing classes (FilteredOffloadingManager, BlockReuseTracker) that do not exist in vllm/v1/kv_offload/reuse_manager.py. This is a critical issue as it means the new functionality is not being tested correctly. Please see my detailed comment on this file.

Comment on lines +151 to +154
BlockReuseTracker = _mod.BlockReuseTracker # type: ignore[assignment,misc]
FilteredOffloadingManager = ( # type: ignore[assignment,misc]
_mod.FilteredOffloadingManager
)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

This test file attempts to import BlockReuseTracker and FilteredOffloadingManager from the reuse_manager module. However, the implementation in vllm/v1/kv_offload/reuse_manager.py defines a class named FilterReusedOffloadingManager and does not contain a BlockReuseTracker class. This will cause an AttributeError at runtime, and these tests will fail to run.

Please ensure the test file is updated to use the correct class names from the module under test. FilteredOffloadingManager should likely be FilterReusedOffloadingManager. The logic for BlockReuseTracker seems to be part of FilterReusedOffloadingManager now, so the tests might need significant refactoring to target the public API of FilterReusedOffloadingManager.

Comment on lines +152 to +159
mgr_stats_data = self.connector_scheduler.manager.get_stats()
if mgr_stats_data:
mgr_stats = self.build_kv_connector_stats(mgr_stats_data)
if mgr_stats is not None:
if stats is not None:
stats = stats.aggregate(mgr_stats)
else:
stats = mgr_stats

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need aggregation, as it is not possible for both self.connector_scheduler and self.connector_worker to be not None.


def is_empty(self) -> bool:
return not self.data
return not self.data.get("transfers") and not self.data.get("gauges")

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this change?


def reset(self):
self.data: dict[str, list[OffloadingOperationMetrics]] = {}
self.data: dict[str, Any] = {"transfers": {}, "gauges": {}}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's try and make it clearer by:

  1. Defining "transfers" and "gauges" as string constants, e..g TRANSFERS_KEY = "transfers", GAUGUES_KEY = "gauges".
  2. Add a docstring to OffloadingConnectorStats describing its expected structure.
Comment thread vllm/v1/kv_offload/reuse_manager.py Outdated

def get_stats(self) -> dict[str, Any]:
stats = copy.deepcopy(self._backing.get_stats())
stats.setdefault("gauges", {})["stores_skipped"] = self.stores_skipped

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we create and use a KVConnectorStats.set_gauge(gauge_name, gauge_value) method?

Comment thread vllm/v1/kv_offload/abstract.py Outdated
"""
return ()

def get_stats(self) -> dict[str, Any]:

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's define it as:

Suggested change
def get_stats(self) -> dict[str, Any]:
def get_stats(self) -> dict[str, Any] | None:

to save the dictionary allocation if stats are unused.

@@ -0,0 +1,360 @@
# SPDX-License-Identifier: Apache-2.0

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this is an old file?
We already have a unit test in test_cpu_manager.py
You should instead add a test that verifies the stores_skipped gauge.

@Srinivasoo7

Copy link
Copy Markdown
Contributor Author

Hi @orozery
Addressed the changes in the latest commit.

Thanks

@orozery orozery added the ready ONLY add when PR is ready to merge/full CI is needed label Jun 9, 2026
@mergify

mergify Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Hi @Srinivasoo7, the pre-commit checks have failed. Please run:

uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

@orozery

orozery commented Jun 9, 2026

Copy link
Copy Markdown
Collaborator

@Srinivasoo7 Also need to fix 2 more things:

  1. pre-commit
  2. assert self._connector_stats is None which fails some of the existing tests (see CI logs).
@Srinivasoo7

Copy link
Copy Markdown
Contributor Author

Yes boss @orozery, we'll fix it asap!

srinivas_oo7 and others added 7 commits June 9, 2026 20:54
Signed-off-by: srinivas_oo7 <sklinkedin0120@gmail.com>
Signed-off-by: Or Ozeri <oro@il.ibm.com>
Signed-off-by: Or Ozeri <oro@il.ibm.com>
Signed-off-by: Or Ozeri <oro@il.ibm.com>
Signed-off-by: Or Ozeri <oro@il.ibm.com>
Signed-off-by: Or Ozeri <oro@il.ibm.com>
@orozery orozery added ready ONLY add when PR is ready to merge/full CI is needed and removed ready ONLY add when PR is ready to merge/full CI is needed labels Jun 10, 2026
Signed-off-by: Or Ozeri <oro@il.ibm.com>

@orozery orozery left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Srinivasoo7 for the hard work! (and sorry for all of the nit picking :) )
I added an e2e test and noticed an issue that metric metadata is not serialized.
I changed the stats structs to next the metadata under the self.data dictionary, which is serialized.
Also another issue was we were missing empty worker side stats (until #43877 lands).
I pushed the fixes to your branch.

@Srinivasoo7

Copy link
Copy Markdown
Contributor Author

Gotcha @orozery.
This was a good stuff we worked on, though it took significant time, it was a great learning experience all together.
Yup I was trying to land both offloading metrics and worker side stats at once.

Also, from this PR we branched out to metrics redesign (#44008) request to help with your views there to start the redesign PR against the RFC.

Looking forward to more such contributions!

@orozery orozery enabled auto-merge (squash) June 10, 2026 10:55
@orozery orozery merged commit 9dfc313 into vllm-project:main Jun 10, 2026
77 checks passed
@github-project-automation github-project-automation Bot moved this from Backlog to Done in Prometheus Metrics Jun 10, 2026
wcynb1023 pushed a commit to wcynb1023/vllm that referenced this pull request Jun 11, 2026
Signed-off-by: Sriusa4414@gmail.com
Signed-off-by: srinivas_oo7 <Sriusa4414@gmail.com>
Signed-off-by: srinivas_oo7 <sklinkedin0120@gmail.com>
Signed-off-by: Srinivasoo7 <158864704+Srinivasoo7@users.noreply.github.com>
Signed-off-by: Or Ozeri <oro@il.ibm.com>
Co-authored-by: srinivas_oo7 <sklinkedin0120@gmail.com>
Co-authored-by: Srinivasoo7 <158864704+Srinivasoo7@users.noreply.github.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Co-authored-by: Or Ozeri <oro@il.ibm.com>
Saddss pushed a commit to Saddss/vllm that referenced this pull request Jun 14, 2026
Signed-off-by: Sriusa4414@gmail.com
Signed-off-by: srinivas_oo7 <Sriusa4414@gmail.com>
Signed-off-by: srinivas_oo7 <sklinkedin0120@gmail.com>
Signed-off-by: Srinivasoo7 <158864704+Srinivasoo7@users.noreply.github.com>
Signed-off-by: Or Ozeri <oro@il.ibm.com>
Co-authored-by: srinivas_oo7 <sklinkedin0120@gmail.com>
Co-authored-by: Srinivasoo7 <158864704+Srinivasoo7@users.noreply.github.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Co-authored-by: Or Ozeri <oro@il.ibm.com>
vivek8123 pushed a commit to odh-on-pz/vllm-upstream that referenced this pull request Jun 18, 2026
Signed-off-by: Sriusa4414@gmail.com
Signed-off-by: srinivas_oo7 <Sriusa4414@gmail.com>
Signed-off-by: srinivas_oo7 <sklinkedin0120@gmail.com>
Signed-off-by: Srinivasoo7 <158864704+Srinivasoo7@users.noreply.github.com>
Signed-off-by: Or Ozeri <oro@il.ibm.com>
Co-authored-by: srinivas_oo7 <sklinkedin0120@gmail.com>
Co-authored-by: Srinivasoo7 <158864704+Srinivasoo7@users.noreply.github.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Co-authored-by: Or Ozeri <oro@il.ibm.com>
divineearthly pushed a commit to divineearthly/vllm that referenced this pull request Jun 19, 2026
Signed-off-by: Sriusa4414@gmail.com
Signed-off-by: srinivas_oo7 <Sriusa4414@gmail.com>
Signed-off-by: srinivas_oo7 <sklinkedin0120@gmail.com>
Signed-off-by: Srinivasoo7 <158864704+Srinivasoo7@users.noreply.github.com>
Signed-off-by: Or Ozeri <oro@il.ibm.com>
Co-authored-by: srinivas_oo7 <sklinkedin0120@gmail.com>
Co-authored-by: Srinivasoo7 <158864704+Srinivasoo7@users.noreply.github.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Co-authored-by: Or Ozeri <oro@il.ibm.com>
Signed-off-by: divineearthly <divineearthly@gmail.com>
tunglinwood pushed a commit to tunglinwood/vllm that referenced this pull request Jun 22, 2026
Signed-off-by: Sriusa4414@gmail.com
Signed-off-by: srinivas_oo7 <Sriusa4414@gmail.com>
Signed-off-by: srinivas_oo7 <sklinkedin0120@gmail.com>
Signed-off-by: Srinivasoo7 <158864704+Srinivasoo7@users.noreply.github.com>
Signed-off-by: Or Ozeri <oro@il.ibm.com>
Co-authored-by: srinivas_oo7 <sklinkedin0120@gmail.com>
Co-authored-by: Srinivasoo7 <158864704+Srinivasoo7@users.noreply.github.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Co-authored-by: Or Ozeri <oro@il.ibm.com>
iboiko-habana pushed a commit to vllm-project/vllm-gaudi that referenced this pull request Jun 22, 2026
…HPU scheduler, ngram proposer and offloading connector tests to upstream API drift (#1556)

## Bug 1: Forward throttle_prefills in HPUAsyncScheduler.schedule

- **State machine id**: hpu_async_scheduler_schedule_positional_arg
- **Commit**: 957ba4d

### Root cause
vLLM PR #44558 added a throttle_prefills positional arg to
Scheduler.schedule(); EngineCore calls it positionally but the HPU
override only accepted self.

### Upstream PR
vllm-project/vllm#44558

### Fix
Accept throttle_prefills (default False) on the
HPUAsyncScheduler.schedule override and forward it to
super().schedule().

## Bug 2: Pass num_speculative_tokens to NgramProposer.propose

- **State machine id**: ngram_proposer_propose_missing_positional_arg
- **Commit**: 82155ea

### Root cause
vLLM PR #32374 (Dynamic SD) added a leading num_speculative_tokens
positional arg to NgramProposer.propose().

### Upstream PR
vllm-project/vllm#32374

### Fix
Prepend self.speculative_config.num_speculative_tokens in
propose_ngram_draft_token_ids to match the new upstream signature.

## Bug 3: Align OffloadingConnector stats tests with upstream
flat-metrics API

- **State machine id**: offloading_connector_cpu_to_gpu_metrics_missing
- **Commit**: c1eb9e3

### Root cause
vLLM PR #35669 rewrote OffloadingConnectorStats to a self-describing
{types, data} flat-metric payload, dropping the per-direction
CPU_to_GPU/GPU_to_CPU list shape the tests still asserted.

### Upstream PR
vllm-project/vllm#35669

### Fix
Rewrite test_metrics.py to exercise
increase_counter/observe_histogram/aggregate/reduce/reset against the
new self-describing stats contract.

## Bug 4: Align OffloadingConnector scheduler flush assertions with
upstream defer-on-finish

- **State machine id**: offloading_connector_flush_on_finish_deferred
- **Commit**: 575a178

### Root cause
vLLM commit f428718ffe (PR #45823, "Defer on_request_finished until
in-flight
transfers drain") changed OffloadingConnectorScheduler: a finishing
request with
in-flight store jobs no longer flushes those stores immediately —
finalization
is deferred until transfers drain, and flush now fires only on
preemption or
block reuse. test_concurrent_lookups_of_the_same_prefix and
test_abort_loading_requests still asserted flush-on-finish, so they
failed once
the target vLLM SHA picked up #45823.

### Upstream PR
vllm-project/vllm#45823

### Fix
Drop the stale expected_flushed_gpu_block_indexes assertions in the two
affected
tests (matching upstream's own equivalents, which assert no flush in
these
scenarios). test_request_preemption keeps its flush-on-preemption
assertion,
which upstream still honors.

---------

Signed-off-by: Paweł Olejniczak <pawelx.olejniczak@intel.com>
nkzhenhua pushed a commit to nkzhenhua/vllm that referenced this pull request Jun 24, 2026
Signed-off-by: Sriusa4414@gmail.com
Signed-off-by: srinivas_oo7 <Sriusa4414@gmail.com>
Signed-off-by: srinivas_oo7 <sklinkedin0120@gmail.com>
Signed-off-by: Srinivasoo7 <158864704+Srinivasoo7@users.noreply.github.com>
Signed-off-by: Or Ozeri <oro@il.ibm.com>
Co-authored-by: srinivas_oo7 <sklinkedin0120@gmail.com>
Co-authored-by: Srinivasoo7 <158864704+Srinivasoo7@users.noreply.github.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Co-authored-by: Or Ozeri <oro@il.ibm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kv-connector ready ONLY add when PR is ready to merge/full CI is needed v1

3 participants