Skip to content

[Data][LLM] Add vLLM metrics export and Data LLM Grafana dashboard#60385

Merged
kouroshHakha merged 6 commits into
ray-project:masterfrom
nrghosh:nrghosh/data-llm-metrics
Feb 12, 2026
Merged

[Data][LLM] Add vLLM metrics export and Data LLM Grafana dashboard#60385
kouroshHakha merged 6 commits into
ray-project:masterfrom
nrghosh:nrghosh/data-llm-metrics

Conversation

@nrghosh

@nrghosh nrghosh commented Jan 21, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Add Prometheus metrics export for Ray Data LLM batch inference
  • Add log_engine_metrics config option to vLLMEngineProcessorConfig (default=True)
  • Integrate vLLM's RayPrometheusStatLogger for metrics export
  • Renamed the Serve LLM dashboard to LLM dashboard, and the vLLM engine metrics from both Data LLM and Serve LLM workloads will show up in the same panels as follows. The panel contents are the same as the existing vLLM panels under the Serve LLM dashboards.
Screenshot 2026-02-10 at 5 43 14 PM
  • Specifically for Serve LLM workloads, there are additional serve orchestrator metrics that will show up under the Serve Orchestrator Metrics section. The panel contents are the same as the existing vLLM panels under the Serve LLM dashboards.
Screenshot 2026-02-10 at 5 43 28 PM
  • QPS per vLLM worker remains filterable by deployment
Screenshot 2026-02-10 at 5 44 07 PM Screenshot 2026-02-10 at 5 44 23 PM

Addresses #58360. Thanks @anindya-saha for the approach.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces vLLM metrics exporting for Ray Data LLM batch inference and adds a corresponding Grafana dashboard for visualization. The changes are well-structured, adding a log_engine_metrics configuration option and integrating with vLLM's RayPrometheusStatLogger. The new dashboard provides valuable insights into vLLM engine performance. My review includes a couple of suggestions to enhance the new Grafana dashboard for better clarity and consistency.

Comment on lines +197 to +212
Panel(
id=8,
title="vLLM: Queue Time",
description="Time requests spend waiting in the queue before processing.",
unit="s",
targets=[
Target(
expr='sum by(model_name, WorkerId) (rate(ray_vllm_request_queue_time_seconds_sum{{model_name=~"$vllm_model_name", WorkerId=~"$workerid", {global_filters}}}[$interval]))',
legend="{{model_name}} - {{WorkerId}}",
),
],
fill=1,
linewidth=2,
stack=False,
grid_pos=GridPos(12, 24, 12, 8),
),

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The "vLLM: Queue Time" panel currently displays the rate of the sum of queue times (rate(sum)), which can be an unintuitive metric. The panel's description, "Time requests spend waiting in the queue before processing," suggests that a per-request latency metric like average or percentile queue time would be more appropriate and easier to interpret.

Since ray_vllm_request_queue_time_seconds is a histogram, you can create a more informative panel that is consistent with the other latency panels in this dashboard (e.g., TTFT, E2E Latency) by showing P50, P90, P95, P99, and Mean queue times. This would provide a clearer and more comprehensive view of queueing performance.

    Panel(
        id=8,
        title="vLLM: Queue Time",
        description="P50, P90, P95, P99, and Mean time requests spend waiting in the queue before processing.",
        unit="s",
        targets=[
            Target(
                expr='histogram_quantile(0.99, sum by(le, model_name, WorkerId) (rate(ray_vllm_request_queue_time_seconds_bucket{{model_name=~\"$vllm_model_name\", WorkerId=~\"$workerid\", {global_filters}}}[$interval])))',
                legend="P99 - {{model_name}} - {{WorkerId}}",
            ),
            Target(
                expr='histogram_quantile(0.95, sum by(le, model_name, WorkerId) (rate(ray_vllm_request_queue_time_seconds_bucket{{model_name=~\"$vllm_model_name\", WorkerId=~\"$workerid\", {global_filters}}}[$interval])))',
                legend="P95 - {{model_name}} - {{WorkerId}}",
            ),
            Target(
                expr='histogram_quantile(0.90, sum by(le, model_name, WorkerId) (rate(ray_vllm_request_queue_time_seconds_bucket{{model_name=~\"$vllm_model_name\", WorkerId=~\"$workerid\", {global_filters}}}[$interval])))',
                legend="P90 - {{model_name}} - {{WorkerId}}",
            ),
            Target(
                expr='histogram_quantile(0.50, sum by(le, model_name, WorkerId) (rate(ray_vllm_request_queue_time_seconds_bucket{{model_name=~\"$vllm_model_name\", WorkerId=~\"$workerid\", {global_filters}}}[$interval])))',
                legend="P50 - {{model_name}} - {{WorkerId}}",
            ),
            Target(
                expr='(sum by(model_name, WorkerId) (rate(ray_vllm_request_queue_time_seconds_sum{{model_name=~\"$vllm_model_name\", WorkerId=~\"$workerid\", {global_filters}}}[$interval]))) / (sum by(model_name, WorkerId) (rate(ray_vllm_request_queue_time_seconds_count{{model_name=~\"$vllm_model_name\", WorkerId=~\"$workerid\", {global_filters}}}[$interval])))',
                legend="Mean - {{model_name}} - {{WorkerId}}",
            ),
        ],
        fill=1,
        linewidth=2,
        stack=False,
        grid_pos=GridPos(12, 24, 12, 8),
    ),
Comment on lines +127 to +131
"current": {
"selected": true,
"text": "5m",
"value": "5m"
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There is an inconsistency in the default value for the interval template variable. The options array has 30s marked as "selected": true, but the current value is set to 5m. While Grafana prioritizes the current value for initialization, this discrepancy can be confusing.

To ensure consistency and set a more common default for near-real-time monitoring, I suggest updating the current value to 30s to match the selected option.

Suggested change
"current": {
"selected": true,
"text": "5m",
"value": "5m"
}
"current": {
"selected": true,
"text": "30s",
"value": "30s"
}
@nrghosh nrghosh changed the title [Data] Add vLLM metrics export and Data LLM Grafana dashboard Jan 21, 2026
@anindya-saha

Copy link
Copy Markdown

Thank you so much @nrghosh for driving this.

@nrghosh nrghosh marked this pull request as ready for review January 23, 2026 23:22
@nrghosh nrghosh requested review from a team as code owners January 23, 2026 23:22
Comment thread python/ray/dashboard/modules/metrics/dashboards/llm_grafana_dashboard_base.json Outdated
Comment thread python/ray/llm/_internal/batch/processor/vllm_engine_proc.py
@ray-gardener ray-gardener Bot added data Ray Data-related issues observability Issues related to the Ray Dashboard, Logging, Metrics, Tracing, and/or Profiling llm labels Jan 24, 2026
@github-actions

github-actions Bot commented Feb 7, 2026

Copy link
Copy Markdown

This pull request has been automatically marked as stale because it has not had
any activity for 14 days. It will be closed in another 14 days if no further activity occurs.
Thank you for your contributions.

You can always ask for help on our discussion forum or Ray's public slack channel.

If you'd like to keep this open, just leave any comment, and the stale label will be removed.

@github-actions github-actions Bot added the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Feb 7, 2026
@jeffreywang88 jeffreywang88 removed the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Feb 9, 2026
@jeffreywang88

jeffreywang88 commented Feb 9, 2026

Copy link
Copy Markdown
Contributor

Validation steps

Panel definition validation

Command:

cd ray && python -B -c "
from ray.dashboard.modules.metrics.dashboards.data_llm_dashboard_panels import DATA_LLM_GRAFANA_PANELS
print(f'Total panels: {len(DATA_LLM_GRAFANA_PANELS)}')
for p in DATA_LLM_GRAFANA_PANELS:
    print(f'  id={p.id:3d}  y={p.grid_pos.y:3d}  x={p.grid_pos.x:2d}  {p.title}')
"

Output:

Total panels: 12
  id=  1  y=  0  x= 0  vLLM: Token Throughput
  id=  2  y=  0  x=12  vLLM: Time Per Output Token Latency
  id=  3  y=  8  x= 0  vLLM: Cache Utilization
  id=  4  y=  8  x=12  vLLM: Prefix Cache Hit Rate
  id=  5  y= 16  x= 0  vLLM: Time To First Token Latency
  id=  6  y= 16  x=12  vLLM: E2E Request Latency
  id=  7  y= 24  x= 0  vLLM: Scheduler State
  id=  8  y= 24  x=12  vLLM: Queue Time
  id=  9  y= 32  x= 0  vLLM: Prompt Length
  id= 10  y= 32  x=12  vLLM: Generation Length
  id= 11  y= 40  x= 0  vLLM: Finish Reason
  id= 12  y= 40  x=12  vLLM: Prefill and Decode Time

Dashboard JSON generation

Command:

# generate_data_llm_dashboard.py
import importlib.util, os, sys

local_panels_path = "python/ray/dashboard/modules/metrics/dashboards/data_llm_dashboard_panels.py"
spec = importlib.util.spec_from_file_location(
    "ray.dashboard.modules.metrics.dashboards.data_llm_dashboard_panels",
    local_panels_path,
)
local_panels_module = importlib.util.module_from_spec(spec)
sys.modules["ray.dashboard.modules.metrics.dashboards.data_llm_dashboard_panels"] = local_panels_module
spec.loader.exec_module(local_panels_module)

from ray.dashboard.modules.metrics.grafana_dashboard_factory import _generate_grafana_dashboard

config = local_panels_module.data_llm_dashboard_config
content, uid = _generate_grafana_dashboard(config)

with open("data_llm_dashboard.json", "w") as f:
    f.write(content)

Live Grafana validation

  • Imported the generated JSON into Grafana on an Anyscale cluster running a Ray Data LLM workload
  • Confirmed all 12 data LLM panels render correctly with live data
Screenshot 2026-02-09 at 8 49 03 PM
@jeffreywang88 jeffreywang88 added the go add ONLY when ready to merge, run all tests label Feb 10, 2026
Comment thread python/ray/dashboard/modules/metrics/dashboards/data_llm_dashboard_panels.py Outdated
nrghosh and others added 5 commits February 10, 2026 10:18
Add Prometheus metrics export for Ray Data LLM batch inference.
When enabled, vLLM engine metrics (TTFT, TPOT, prefix cache hit rate,
KV cache utilization, etc.) are exported to Ray's metrics endpoint.

- Add `log_engine_metrics` config option (default=True)
- Integrate vLLM's RayPrometheusStatLogger
- Add Data LLM Grafana dashboard

Addresses ray-project#58360. Thanks @anindya-saha for the approach.

Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>
…afana

Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>
…rve llm

Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>
Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>
Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
@jeffreywang88 jeffreywang88 force-pushed the nrghosh/data-llm-metrics branch from bd33c5f to 4718409 Compare February 10, 2026 18:31

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aren't data and serve llms supposed to be identical (at least on the vllm panels?) I am afraid that if we want to add a new vLLM metric (e.g. NIXL transfer metrics) we then have to do it in two places and we start to diverge in consistency.

I am fine with renaming the current serve_llm dashboard to something like LLM dashboards and then have some sort of separation inside the dashboard to distinguish between engine metrics and orchestrator metrics.

In serve orchestration metrics is something like ray serve QPS while in ray data we might be interested in some other thing.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there are fields that are different (e.g. Replica ID etc.) the question is how do we maximally share the engine panels between serve and data

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored the dashboard in the latest revision.

Comment thread python/ray/llm/_internal/batch/stages/vllm_engine_stage.py Outdated
Comment thread python/ray/llm/_internal/batch/stages/vllm_engine_stage.py Outdated
@jeffreywang88

jeffreywang88 commented Feb 11, 2026

Copy link
Copy Markdown
Contributor
  • Renamed the Serve LLM dashboard to LLM dashboard, and the vLLM engine metrics from both Data LLM and Serve LLM workloads will show up in the same panels as follows. The panel contents are the same as the existing vLLM panels under the Serve LLM dashboards.
Screenshot 2026-02-10 at 5 43 14 PM
  • Specifically for Serve LLM workloads, there are additional serve orchestrator metrics that will show up under the Serve Orchestrator Metrics section. The panel contents are the same as the existing vLLM panels under the Serve LLM dashboards.
Screenshot 2026-02-10 at 5 43 28 PM
  • QPS per vLLM worker remains filterable by deployment
Screenshot 2026-02-10 at 5 44 07 PM Screenshot 2026-02-10 at 5 44 23 PM

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

@jeffreywang88 jeffreywang88 force-pushed the nrghosh/data-llm-metrics branch from a043cf9 to a99f9c4 Compare February 11, 2026 02:08
Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
@jeffreywang88 jeffreywang88 force-pushed the nrghosh/data-llm-metrics branch from a99f9c4 to 44f53fa Compare February 11, 2026 02:32

@kouroshHakha kouroshHakha left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kouroshHakha kouroshHakha enabled auto-merge (squash) February 11, 2026 17:41

@MengjinYan MengjinYan left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good from Core side!

Dashboard related changes will need observability team to take a look. cc: @alanwguo

@kouroshHakha kouroshHakha merged commit 529d2f8 into ray-project:master Feb 12, 2026
7 checks passed
ans9868 pushed a commit to ans9868/ray that referenced this pull request Feb 18, 2026
…ay-project#60385)

Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>
Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
Co-authored-by: Jeffrey Wang <jeffreywang@anyscale.com>
Signed-off-by: Adel Nour <ans9868@nyu.edu>
Aydin-ab pushed a commit to kunling-anyscale/ray that referenced this pull request Feb 20, 2026
…ay-project#60385)

Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>
Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
Co-authored-by: Jeffrey Wang <jeffreywang@anyscale.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

data Ray Data-related issues go add ONLY when ready to merge, run all tests llm observability Issues related to the Ray Dashboard, Logging, Metrics, Tracing, and/or Profiling

5 participants