[KV Offloading] Add labeled metrics support by Srinivasoo7 · Pull Request #45957 · vllm-project/vllm

Srinivasoo7 · 2026-06-17T19:19:35Z

Summary

add labelnames to offloading metric metadata
normalize offloading stats payloads to {metric_name: {labelvalues_tuple: value}}
add labeled aggregation/reduction and lazy Prometheus child creation for labeled values
keep existing unlabeled payloads compatible by mapping them to the empty label tuple

orozery

Thanks @Srinivasoo7 for the quick help on this!

orozery · 2026-06-18T03:23:01Z

@@ -255,7 +310,9 @@ def __init__(

        self._observe_deprecated_metrics = issubclass(spec_cls, CPUOffloadingSpec)
        self._offloading_metric_defs: dict[str, PromMetricT] = {}
-        self.offloading_metrics: dict[tuple[int, str], PromMetricT] = {}
+        self.offloading_metrics: dict[
+            tuple[int, str, tuple[object, ...]], PromMetricT


Why not tuple[str, ..]?

From what I understand, label values come from the internal stats payload and may not always be strings (e.g. tier IDs as integers). Prometheus converts them to strings when .labels() is called.
If we use String here, it would require callers to pre-convert values.
Let me know your guidance on this!

I think we want to force labels to be strings.

orozery · 2026-06-18T03:23:26Z

@@ -255,7 +310,9 @@ def __init__(

        self._observe_deprecated_metrics = issubclass(spec_cls, CPUOffloadingSpec)
        self._offloading_metric_defs: dict[str, PromMetricT] = {}
-        self.offloading_metrics: dict[tuple[int, str], PromMetricT] = {}
+        self.offloading_metrics: dict[


Let's add a comment explaining the type of this variable

orozery · 2026-06-18T03:35:16Z

@@ -326,11 +384,35 @@ def _create_metric(
            raise AssertionError(f"Unknown offloading metric metadata: {metadata}")
        return metric_cls(**kwargs)

+    def _get_offloading_metric(


This should actually be _get_prometheus_metric

Sure, tried to keep the naming consistent with existing class terminology. Will change this

orozery · 2026-06-18T03:38:17Z

+        if key not in self.offloading_metrics:
+            engine_labelvalues = self.per_engine_labelvalues[engine_idx]
+            self.offloading_metrics[key] = self._offloading_metric_defs[
+                metric_name
+            ].labels(*(engine_labelvalues + list(labelvalues)))
+        return self.offloading_metrics[key]


avoid double reading from dict:

Suggested change

if key not in self.offloading_metrics:

engine_labelvalues = self.per_engine_labelvalues[engine_idx]

self.offloading_metrics[key] = self._offloading_metric_defs[

metric_name

].labels(*(engine_labelvalues + list(labelvalues)))

return self.offloading_metrics[key]

prom_metric = self.offloading_metrics.get(key)

if prom_metric is None:

engine_labelvalues = self.per_engine_labelvalues[engine_idx]

prom_metric = self._offloading_metric_defs[

metric_name

].labels(*(engine_labelvalues + list(labelvalues)))

self.offloading_metrics[key] = prom_metric

return prom_metric

Okay, .get() is cleaner and avoids the extra lookup I made.
Will change this!

orozery · 2026-06-18T03:40:14Z

+            if not metadata.labelnames:
+                for engine_idx, labelvalues in per_engine_labelvalues.items():
+                    self.offloading_metrics[(engine_idx, metric_name, ())] = (
+                        self._offloading_metric_defs[metric_name].labels(*labelvalues)
+                    )


I think we can remove this (covered lazily by _get_offloading_metric).

Hmm.. yeah wanted your opinion on this.
This was intentional, so unlabeled metrics are registered and visible in Prometheus even before they're observed.

Let's remove it.

orozery · 2026-06-18T03:49:37Z

+    @staticmethod
+    def _is_labeled_value_map(value: Any) -> bool:
+        return isinstance(value, Mapping) and all(
+            isinstance(labelvalues, tuple) for labelvalues in value
+        )
+


I don't think we need this check. Remove? (along with its callsites)

Oh, I added this as it will help us with the rolling-upgrade compatibility.
Surely can be removed.

orozery · 2026-06-18T03:51:25Z

+    def _normalize_values(self) -> None:
+        values = self.data[_StatsKey.DATA]
+        for metric_name, value in list(values.items()):
+            if self._is_labeled_value_map(value):
+                continue
+            values[metric_name] = {(): value}


Don't think we need this.
All stats by construction should be labeled.

Similar intention, it will help us with the rolling-upgrade compatibility.
Happy to remove!

orozery · 2026-06-18T03:56:04Z

+    with patch.object(
+        CPUOffloadingSpec, "build_metric_definitions", return_value=metric_definitions
+    ):


Let's use a mocked spec instead of patching a real (CPU) spec.
Same for the tests below.

orozery · 2026-06-18T03:58:30Z

+        TIER_BYTES: OffloadingCounterMetadata(
+            documentation="tier bytes",
+            labelnames=("tier",),
+        ),


It's a bit weird to use TIER_BYTES as the test case.
Why not just e.g. "my_counter" and "my_label"?

orozery · 2026-06-18T04:11:48Z

-    engine0 = prom_metrics.offloading_metrics[(0, LOAD_BYTES)]
-    engine1 = prom_metrics.offloading_metrics[(1, LOAD_BYTES)]
+    engine0 = prom_metrics.offloading_metrics[(0, LOAD_BYTES, ())]
+    engine1 = prom_metrics.offloading_metrics[(1, LOAD_BYTES, ())]
    assert engine0.increments == []
    assert engine1.increments == [100]



Claude:

Missing test: aggregate into self when self already has some labels but not others

The test test_aggregate_labeled_metrics starts with both stats having TIER_BYTES. There's no test for aggregating when self has a labeled metric and other has a different labeled metric that self doesn't have at all (metric name not present in self). The self._values.setdefault(key, {}) handles it, but a test would be good.

Okay, I'll add a case on the missing-metric path as well!

Srinivasoo7 · 2026-06-20T01:57:22Z

Hi @orozery
PTAL
Thanks

orozery · 2026-06-21T08:31:53Z

+            if not metadata.labelnames:
+                for engine_idx, labelvalues in per_engine_labelvalues.items():
+                    self.offloading_metrics[(engine_idx, metric_name, ())] = (
+                        self._offloading_metric_defs[metric_name].labels(*labelvalues)
+                    )


Let's remove it.

orozery · 2026-06-21T08:32:38Z

    ) -> None:
        for observation in value:
-            self.offloading_metrics[(engine_idx, metric_name)].observe(observation)
-            if not self._observe_deprecated_metrics:
+            self._get_prometheus_metric(metric_name, labelvalues, engine_idx).observe(


Let's move self._get_prometheus_metric(metric_name, labelvalues, engine_idx) outside of the loop.

mergify · 2026-06-21T08:43:06Z

Hi @Srinivasoo7, the pre-commit checks have failed. Please run:

uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Signed-off-by: srinivas_oo7 <sklinkedin0120@gmail.com>

orozery

Thanks @Srinivasoo7 !

Signed-off-by: srinivas_oo7 <sklinkedin0120@gmail.com> Co-authored-by: srinivas_oo7 <sklinkedin0120@gmail.com>

Signed-off-by: srinivas_oo7 <sklinkedin0120@gmail.com> Co-authored-by: srinivas_oo7 <sklinkedin0120@gmail.com> Signed-off-by: Qiang Li <qiang.li2@amd.com>

Wire SimpleCPUOffloadConnector into the offloading stats / labeled-metrics API so its CPU KV transfers report through the shared vllm:kv_offload_* family (store/load bytes/time/size), plus per-connector CPU-pool gauges (vllm:simple_cpu_offload_*: total/free/used blocks, usage, pending loads/stores). Transfer timing brackets only the DMA: host-side batch-argument preparation is split out of copy_blocks() into prepare_copy()/launch_prepared_copy() and runs before the start CUDA event, and the vllm-project#46278 compute-done wait stays ordered before it, so kv_offload_*_time excludes host enqueue and compute-wait latency. Timing events are recycled via a small pool. Supersedes vllm-project#41790 (built on the pre-vllm-project#45957 metrics API); preserves vllm-project#43877 (scheduler post-completion stats aggregation) and vllm-project#46278 (GPU->CPU store barrier + srcAccessOrder). Tests: tests/v1/simple_kv_offload/test_metrics.py covers stats helpers/reduce, prom observe() end-to-end, serialized round-trip, scheduler pool gauges, worker transfer recording + reset-on-read, and event-pool reuse (no GPU required). AI assistance was used; every line was human-reviewed. Co-authored-by: OCWC22 <OCWC22@users.noreply.github.com> Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Change72 <cguo51@asu.edu>

Wire SimpleCPUOffloadConnector into the offloading stats / labeled-metrics API so its CPU KV transfers report through the shared vllm:kv_offload_* family (store/load bytes/time/size), plus per-connector CPU-pool gauges (vllm:simple_cpu_offload_*: total/free/used blocks, usage, pending loads/stores). Transfer timing brackets only the DMA: host-side batch-argument preparation is split out of copy_blocks() into prepare_copy()/launch_prepared_copy() and runs before the start CUDA event; the vllm-project#46278 compute-done wait is enqueued before the prepare so it captures the shared compute event's current state and stays outside the timing bracket. So kv_offload_*_time excludes host enqueue and compute-wait latency. Timing events are recycled via a pool. Supersedes vllm-project#41790 (built on the pre-vllm-project#45957 metrics API); preserves vllm-project#43877 (scheduler post-completion stats aggregation) and vllm-project#46278 (GPU->CPU store barrier + srcAccessOrder). Tests: tests/v1/simple_kv_offload/test_metrics.py covers stats build/reduce, prom observe() end-to-end (transfer + pool gauges), scheduler pool gauges, worker transfer recording + reset-on-read, and event-pool reuse (no GPU required); tests/v1/simple_kv_offload/test_worker.py store-ordering still passes on GPU. AI assistance was used. Co-authored-by: OCWC22 <OCWC22@users.noreply.github.com> Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Change72 <cguo51@asu.edu>

mergify Bot added v1 kv-connector labels Jun 17, 2026

This was referenced Jun 17, 2026

[KV Offloading] Add basic offloading metrics #45958

Open

[KV Offloading] Add tiering metric plumbing #45959

Merged

Srinivasoo7 marked this pull request as ready for review June 17, 2026 19:50

Srinivasoo7 requested review from ApostaC, NickLucche, orozery and xuechendi as code owners June 17, 2026 19:50

Srinivasoo7 mentioned this pull request Jun 17, 2026

[RFC]: Offloading Metrics Redesign #44008

Open

1 task

orozery requested changes Jun 18, 2026

View reviewed changes

Srinivasoo7 force-pushed the offloading-labeled-metrics branch from 4383cf9 to a03fac4 Compare June 19, 2026 19:45

orozery reviewed Jun 21, 2026

View reviewed changes

orozery added the ready ONLY add when PR is ready to merge/full CI is needed label Jun 21, 2026

Add labeled KV offloading metrics support

dac20e2

Signed-off-by: srinivas_oo7 <sklinkedin0120@gmail.com>

Srinivasoo7 force-pushed the offloading-labeled-metrics branch from a03fac4 to dac20e2 Compare June 21, 2026 16:16

orozery approved these changes Jun 21, 2026

View reviewed changes

orozery enabled auto-merge (squash) June 21, 2026 17:40

orozery merged commit c441ad1 into vllm-project:main Jun 21, 2026
78 of 79 checks passed

tunglinwood pushed a commit to tunglinwood/vllm that referenced this pull request Jun 22, 2026

[KV Offloading] Add labeled metrics support (vllm-project#45957)

b48e59c

Signed-off-by: srinivas_oo7 <sklinkedin0120@gmail.com> Co-authored-by: srinivas_oo7 <sklinkedin0120@gmail.com>

nkzhenhua pushed a commit to nkzhenhua/vllm that referenced this pull request Jun 24, 2026

[KV Offloading] Add labeled metrics support (vllm-project#45957)

3f149c2

Signed-off-by: srinivas_oo7 <sklinkedin0120@gmail.com> Co-authored-by: srinivas_oo7 <sklinkedin0120@gmail.com>

Change72 mentioned this pull request Jul 1, 2026

[KV Offload] Expose SimpleCPU offload metrics #41790

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[KV Offloading] Add labeled metrics support#45957

[KV Offloading] Add labeled metrics support#45957
orozery merged 1 commit into
vllm-project:mainfrom
Srinivasoo7:offloading-labeled-metrics

Srinivasoo7 commented Jun 17, 2026 •

edited

Loading

orozery left a comment

orozery Jun 18, 2026

Srinivasoo7 Jun 18, 2026

orozery Jun 18, 2026

orozery Jun 18, 2026

orozery Jun 18, 2026

Srinivasoo7 Jun 18, 2026

orozery Jun 18, 2026

Srinivasoo7 Jun 18, 2026

orozery Jun 18, 2026

Srinivasoo7 Jun 18, 2026

orozery Jun 21, 2026

orozery Jun 18, 2026

Srinivasoo7 Jun 18, 2026

orozery Jun 18, 2026

Srinivasoo7 Jun 18, 2026

orozery Jun 18, 2026

orozery Jun 18, 2026

orozery Jun 18, 2026

Srinivasoo7 Jun 18, 2026

Srinivasoo7 commented Jun 20, 2026

orozery Jun 21, 2026

orozery Jun 21, 2026

mergify Bot commented Jun 21, 2026

orozery left a comment

Uh oh!

Labels

2 participants

Uh oh!

Uh oh!

Conversation

Srinivasoo7 commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

orozery left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Srinivasoo7 commented Jun 20, 2026

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mergify Bot commented Jun 21, 2026

orozery left a comment

Choose a reason for hiding this comment

Uh oh!

Labels

2 participants

Srinivasoo7 commented Jun 17, 2026 •

edited

Loading