[Data][LLM] Add vLLM metrics export and Data LLM Grafana dashboard by nrghosh · Pull Request #60385 · ray-project/ray

nrghosh · 2026-01-21T21:45:57Z

Summary

Add Prometheus metrics export for Ray Data LLM batch inference
Add log_engine_metrics config option to vLLMEngineProcessorConfig (default=True)
Integrate vLLM's RayPrometheusStatLogger for metrics export
Renamed the Serve LLM dashboard to LLM dashboard, and the vLLM engine metrics from both Data LLM and Serve LLM workloads will show up in the same panels as follows. The panel contents are the same as the existing vLLM panels under the Serve LLM dashboards.

Specifically for Serve LLM workloads, there are additional serve orchestrator metrics that will show up under the Serve Orchestrator Metrics section. The panel contents are the same as the existing vLLM panels under the Serve LLM dashboards.

QPS per vLLM worker remains filterable by deployment

Addresses #58360. Thanks @anindya-saha for the approach.

gemini-code-assist

Code Review

This pull request introduces vLLM metrics exporting for Ray Data LLM batch inference and adds a corresponding Grafana dashboard for visualization. The changes are well-structured, adding a log_engine_metrics configuration option and integrating with vLLM's RayPrometheusStatLogger. The new dashboard provides valuable insights into vLLM engine performance. My review includes a couple of suggestions to enhance the new Grafana dashboard for better clarity and consistency.

gemini-code-assist · 2026-01-21T21:48:10Z

+    Panel(
+        id=8,
+        title="vLLM: Queue Time",
+        description="Time requests spend waiting in the queue before processing.",
+        unit="s",
+        targets=[
+            Target(
+                expr='sum by(model_name, WorkerId) (rate(ray_vllm_request_queue_time_seconds_sum{{model_name=~"$vllm_model_name", WorkerId=~"$workerid", {global_filters}}}[$interval]))',
+                legend="{{model_name}} - {{WorkerId}}",
+            ),
+        ],
+        fill=1,
+        linewidth=2,
+        stack=False,
+        grid_pos=GridPos(12, 24, 12, 8),
+    ),


The "vLLM: Queue Time" panel currently displays the rate of the sum of queue times (rate(sum)), which can be an unintuitive metric. The panel's description, "Time requests spend waiting in the queue before processing," suggests that a per-request latency metric like average or percentile queue time would be more appropriate and easier to interpret.

Since ray_vllm_request_queue_time_seconds is a histogram, you can create a more informative panel that is consistent with the other latency panels in this dashboard (e.g., TTFT, E2E Latency) by showing P50, P90, P95, P99, and Mean queue times. This would provide a clearer and more comprehensive view of queueing performance.

Panel( id=8, title="vLLM: Queue Time", description="P50, P90, P95, P99, and Mean time requests spend waiting in the queue before processing.", unit="s", targets=[ Target( expr='histogram_quantile(0.99, sum by(le, model_name, WorkerId) (rate(ray_vllm_request_queue_time_seconds_bucket{{model_name=~\"$vllm_model_name\", WorkerId=~\"$workerid\", {global_filters}}}[$interval])))', legend="P99 - {{model_name}} - {{WorkerId}}", ), Target( expr='histogram_quantile(0.95, sum by(le, model_name, WorkerId) (rate(ray_vllm_request_queue_time_seconds_bucket{{model_name=~\"$vllm_model_name\", WorkerId=~\"$workerid\", {global_filters}}}[$interval])))', legend="P95 - {{model_name}} - {{WorkerId}}", ), Target( expr='histogram_quantile(0.90, sum by(le, model_name, WorkerId) (rate(ray_vllm_request_queue_time_seconds_bucket{{model_name=~\"$vllm_model_name\", WorkerId=~\"$workerid\", {global_filters}}}[$interval])))', legend="P90 - {{model_name}} - {{WorkerId}}", ), Target( expr='histogram_quantile(0.50, sum by(le, model_name, WorkerId) (rate(ray_vllm_request_queue_time_seconds_bucket{{model_name=~\"$vllm_model_name\", WorkerId=~\"$workerid\", {global_filters}}}[$interval])))', legend="P50 - {{model_name}} - {{WorkerId}}", ), Target( expr='(sum by(model_name, WorkerId) (rate(ray_vllm_request_queue_time_seconds_sum{{model_name=~\"$vllm_model_name\", WorkerId=~\"$workerid\", {global_filters}}}[$interval]))) / (sum by(model_name, WorkerId) (rate(ray_vllm_request_queue_time_seconds_count{{model_name=~\"$vllm_model_name\", WorkerId=~\"$workerid\", {global_filters}}}[$interval])))', legend="Mean - {{model_name}} - {{WorkerId}}", ), ], fill=1, linewidth=2, stack=False, grid_pos=GridPos(12, 24, 12, 8), ),

gemini-code-assist · 2026-01-21T21:48:10Z

+        "current": {
+          "selected": true,
+          "text": "5m",
+          "value": "5m"
+        }


There is an inconsistency in the default value for the interval template variable. The options array has 30s marked as "selected": true, but the current value is set to 5m. While Grafana prioritizes the current value for initialization, this discrepancy can be confusing.

To ensure consistency and set a more common default for near-real-time monitoring, I suggest updating the current value to 30s to match the selected option.

Suggested change

"current": {

"selected": true,

"text": "5m",

"value": "5m"

}

"current": {

"selected": true,

"text": "30s",

"value": "30s"

}

anindya-saha · 2026-01-23T01:01:38Z

Thank you so much @nrghosh for driving this.

github-actions · 2026-02-07T12:25:57Z

This pull request has been automatically marked as stale because it has not had
any activity for 14 days. It will be closed in another 14 days if no further activity occurs.
Thank you for your contributions.

You can always ask for help on our discussion forum or Ray's public slack channel.

If you'd like to keep this open, just leave any comment, and the stale label will be removed.

jeffreywang88 · 2026-02-09T18:48:46Z

Validation steps

Panel definition validation

Command:

cd ray && python -B -c "
from ray.dashboard.modules.metrics.dashboards.data_llm_dashboard_panels import DATA_LLM_GRAFANA_PANELS
print(f'Total panels: {len(DATA_LLM_GRAFANA_PANELS)}')
for p in DATA_LLM_GRAFANA_PANELS:
    print(f'  id={p.id:3d}  y={p.grid_pos.y:3d}  x={p.grid_pos.x:2d}  {p.title}')
"

Output:

Total panels: 12
  id=  1  y=  0  x= 0  vLLM: Token Throughput
  id=  2  y=  0  x=12  vLLM: Time Per Output Token Latency
  id=  3  y=  8  x= 0  vLLM: Cache Utilization
  id=  4  y=  8  x=12  vLLM: Prefix Cache Hit Rate
  id=  5  y= 16  x= 0  vLLM: Time To First Token Latency
  id=  6  y= 16  x=12  vLLM: E2E Request Latency
  id=  7  y= 24  x= 0  vLLM: Scheduler State
  id=  8  y= 24  x=12  vLLM: Queue Time
  id=  9  y= 32  x= 0  vLLM: Prompt Length
  id= 10  y= 32  x=12  vLLM: Generation Length
  id= 11  y= 40  x= 0  vLLM: Finish Reason
  id= 12  y= 40  x=12  vLLM: Prefill and Decode Time

Dashboard JSON generation

Command:

# generate_data_llm_dashboard.py
import importlib.util, os, sys

local_panels_path = "python/ray/dashboard/modules/metrics/dashboards/data_llm_dashboard_panels.py"
spec = importlib.util.spec_from_file_location(
    "ray.dashboard.modules.metrics.dashboards.data_llm_dashboard_panels",
    local_panels_path,
)
local_panels_module = importlib.util.module_from_spec(spec)
sys.modules["ray.dashboard.modules.metrics.dashboards.data_llm_dashboard_panels"] = local_panels_module
spec.loader.exec_module(local_panels_module)

from ray.dashboard.modules.metrics.grafana_dashboard_factory import _generate_grafana_dashboard

config = local_panels_module.data_llm_dashboard_config
content, uid = _generate_grafana_dashboard(config)

with open("data_llm_dashboard.json", "w") as f:
    f.write(content)

Live Grafana validation

Imported the generated JSON into Grafana on an Anyscale cluster running a Ray Data LLM workload
Confirmed all 12 data LLM panels render correctly with live data

@anindya-saha

Add Prometheus metrics export for Ray Data LLM batch inference. When enabled, vLLM engine metrics (TTFT, TPOT, prefix cache hit rate, KV cache utilization, etc.) are exported to Ray's metrics endpoint. - Add `log_engine_metrics` config option (default=True) - Integrate vLLM's RayPrometheusStatLogger - Add Data LLM Grafana dashboard Addresses ray-project#58360. Thanks @anindya-saha for the approach. Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>

…afana Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>

…rve llm Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>

Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>

kouroshHakha · 2026-02-10T19:04:06Z

aren't data and serve llms supposed to be identical (at least on the vllm panels?) I am afraid that if we want to add a new vLLM metric (e.g. NIXL transfer metrics) we then have to do it in two places and we start to diverge in consistency.

I am fine with renaming the current serve_llm dashboard to something like LLM dashboards and then have some sort of separation inside the dashboard to distinguish between engine metrics and orchestrator metrics.

In serve orchestration metrics is something like ray serve QPS while in ray data we might be interested in some other thing.

If there are fields that are different (e.g. Replica ID etc.) the question is how do we maximally share the engine panels between serve and data

Refactored the dashboard in the latest revision.

jeffreywang88 · 2026-02-11T02:03:55Z

Renamed the Serve LLM dashboard to LLM dashboard, and the vLLM engine metrics from both Data LLM and Serve LLM workloads will show up in the same panels as follows. The panel contents are the same as the existing vLLM panels under the Serve LLM dashboards.

Specifically for Serve LLM workloads, there are additional serve orchestrator metrics that will show up under the Serve Orchestrator Metrics section. The panel contents are the same as the existing vLLM panels under the Serve LLM dashboards.

QPS per vLLM worker remains filterable by deployment

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>

kouroshHakha

LGTM

MengjinYan

Looks good from Core side!

Dashboard related changes will need observability team to take a look. cc: @alanwguo

…ay-project#60385) Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com> Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com> Co-authored-by: Jeffrey Wang <jeffreywang@anyscale.com> Signed-off-by: Adel Nour <ans9868@nyu.edu>

…ay-project#60385) Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com> Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com> Co-authored-by: Jeffrey Wang <jeffreywang@anyscale.com>

gemini-code-assist Bot reviewed Jan 21, 2026

View reviewed changes

nrghosh changed the title ~~[Data] Add vLLM metrics export and Data LLM Grafana dashboard~~ Jan 21, 2026

nrghosh marked this pull request as ready for review January 23, 2026 23:22

nrghosh requested review from a team as code owners January 23, 2026 23:22

cursor Bot reviewed Jan 23, 2026

View reviewed changes

Comment thread python/ray/dashboard/modules/metrics/dashboards/llm_grafana_dashboard_base.json Outdated

Comment thread python/ray/llm/_internal/batch/processor/vllm_engine_proc.py

ray-gardener Bot added data Ray Data-related issues observability Issues related to the Ray Dashboard, Logging, Metrics, Tracing, and/or Profiling llm labels Jan 24, 2026

github-actions Bot added the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Feb 7, 2026

jeffreywang88 removed the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Feb 9, 2026

jeffreywang88 added the go add ONLY when ready to merge, run all tests label Feb 10, 2026

cursor Bot reviewed Feb 10, 2026

View reviewed changes

Comment thread python/ray/dashboard/modules/metrics/dashboards/data_llm_dashboard_panels.py Outdated

jeffreywang88 mentioned this pull request Feb 10, 2026

[data][llm] Profile the vLLM engine in data LLM release benchmarks #60935

Open

nrghosh and others added 5 commits February 10, 2026 10:18

Fix dashboard query syntax, add data llm under Anyscale managed in gr…

6aa7c37

…afana Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>

Fix data llm dashboard query + filter for data llm workloads - not se…

3ff10e1

…rve llm Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>

wip - improve Queue Time panel with percentile metrics

d390fba

Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>

Fix tests

4718409

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>

jeffreywang88 force-pushed the nrghosh/data-llm-metrics branch from bd33c5f to 4718409 Compare February 10, 2026 18:31

kouroshHakha reviewed Feb 10, 2026

View reviewed changes

cursor Bot reviewed Feb 11, 2026

View reviewed changes

Comment thread python/ray/dashboard/modules/metrics/dashboards/llm_grafana_dashboard_base.json

jeffreywang88 force-pushed the nrghosh/data-llm-metrics branch from a043cf9 to a99f9c4 Compare February 11, 2026 02:08

Unify serve and data LLM dashboard

44f53fa

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>

jeffreywang88 force-pushed the nrghosh/data-llm-metrics branch from a99f9c4 to 44f53fa Compare February 11, 2026 02:32

kouroshHakha approved these changes Feb 11, 2026

View reviewed changes

kouroshHakha enabled auto-merge (squash) February 11, 2026 17:41

MengjinYan approved these changes Feb 12, 2026

View reviewed changes

kouroshHakha merged commit 529d2f8 into ray-project:master Feb 12, 2026
7 checks passed

jeffreywang88 mentioned this pull request Mar 6, 2026

[llm] Separate data and serve LLM dashboards #61037

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Data][LLM] Add vLLM metrics export and Data LLM Grafana dashboard#60385

[Data][LLM] Add vLLM metrics export and Data LLM Grafana dashboard#60385
kouroshHakha merged 6 commits into
ray-project:masterfrom
nrghosh:nrghosh/data-llm-metrics

nrghosh commented Jan 21, 2026 •

edited by jeffreywang88

Loading

gemini-code-assist Bot left a comment

gemini-code-assist Bot Jan 21, 2026

gemini-code-assist Bot Jan 21, 2026

anindya-saha commented Jan 23, 2026

Uh oh!

Uh oh!

github-actions Bot commented Feb 7, 2026

jeffreywang88 commented Feb 9, 2026 •

edited

Loading

Uh oh!

kouroshHakha Feb 10, 2026

kouroshHakha Feb 10, 2026

jeffreywang88 Feb 11, 2026

Uh oh!

Uh oh!

jeffreywang88 commented Feb 11, 2026 •

edited

Loading

cursor Bot left a comment

Uh oh!

kouroshHakha left a comment

MengjinYan left a comment

Uh oh!

Labels

5 participants

Uh oh!

Conversation

nrghosh commented Jan 21, 2026 • edited by jeffreywang88 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

gemini-code-assist Bot Jan 21, 2026

Choose a reason for hiding this comment

gemini-code-assist Bot Jan 21, 2026

Choose a reason for hiding this comment

anindya-saha commented Jan 23, 2026

Uh oh!

Uh oh!

github-actions Bot commented Feb 7, 2026

jeffreywang88 commented Feb 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Validation steps

Panel definition validation

Dashboard JSON generation

Live Grafana validation

Uh oh!

kouroshHakha Feb 10, 2026

Choose a reason for hiding this comment

kouroshHakha Feb 10, 2026

Choose a reason for hiding this comment

jeffreywang88 Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jeffreywang88 commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

kouroshHakha left a comment

Choose a reason for hiding this comment

MengjinYan left a comment

Choose a reason for hiding this comment

Uh oh!

Labels

5 participants

nrghosh commented Jan 21, 2026 •

edited by jeffreywang88

Loading

jeffreywang88 commented Feb 9, 2026 •

edited

Loading

jeffreywang88 commented Feb 11, 2026 •

edited

Loading