docs: update server metrics reference for Dynamo/vLLM/SGLang/TRT-LLM/Triton by ajcasagrande · Pull Request #974 · ai-dynamo/aiperf

ajcasagrande · 2026-05-21T20:00:33Z

Summary

Updates docs/server-metrics/ using a source-grounded audit of upstream server metric definitions rather than relying on grep-only matches.

Exact steps taken

Cloned/reused upstream source checkouts and audited the actual metric-definition/exporter source files, not just grep hits:
- vLLM: https://github.com/vllm-project/vllm.git
- SGLang: https://github.com/sgl-project/sglang.git
- TensorRT-LLM: https://github.com/NVIDIA/TensorRT-LLM.git
- Dynamo: https://github.com/ai-dynamo/dynamo.git
- Triton Inference Server: https://github.com/triton-inference-server/server.git
- Triton core metric definitions: https://github.com/triton-inference-server/core.git
Compared upstream metric families and labels against:
- docs/server-metrics/server-metrics.md
- docs/server-metrics/server-metrics-reference.md
- docs/server-metrics/server-metrics-json-schema.md
- docs/server-metrics/server-metrics-parquet-schema.md
Updated docs for source-confirmed gaps, including:
- Correct TensorRT-LLM trtllm_ metric names
- Dynamo-added TRTLLM metrics using the trtllm_ prefix
- Dynamo frontend/component/router/runtime/KVBM metrics
- Dynamo embedding-cache metrics
- Dynamo KV publisher metrics
- Dynamo Tokio/event-loop metrics
- SGLang cached-token, grammar, routing-key, prefill-delayer, EPLB, storage, and eviction/load-back metrics
- Triton optional labels and response-cache caveats
- TensorRT-LLM Triton backend nv_trt_llm_* / nv_llm_* families
- Expanded Parquet dynamic label examples
Verified that AIPerf’s Prometheus parser stores counter families without the sample-level trailing _total suffix by running a local parser check with prometheus_client.parser.text_string_to_metric_families.
Ran validation:
- git diff --check -- docs/server-metrics/server-metrics.md docs/server-metrics/server-metrics-reference.md docs/server-metrics/server-metrics-json-schema.md docs/server-metrics/server-metrics-parquet-schema.md
- uv run python tools/check_docs_index.py

Trust / limitations

These docs are more trustworthy than a grep-only update because the audit read upstream metric definition/exporter files and checked AIPerf parser behavior.
This is still a static source audit, not a live scrape from every backend configuration.
Optional metrics may only appear when their upstream feature is enabled.
Some upstream constants were intentionally excluded when no registration/emission path was found.

🤖 Generated with Claude Code

Summary by CodeRabbit

Documentation
- Expanded server metrics documentation with additional metrics supported across vLLM, Dynamo, SGLang, TRT-LLM, and Triton backends.
- Enhanced dynamic label columns schema for parquet export.
- Improved metrics setup instructions and troubleshooting guidance for TRT-LLM and Triton.

Refresh server metrics documentation against upstream metric definitions so backend-specific names, labels, and optional families match current source behavior. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Anthony Casagrande <acasagrande@nvidia.com>

github-actions · 2026-05-21T20:00:45Z

Try out this PR

Quick install:

pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@8967a7d8fd55bc044ce50f1c6543104ed960dd92

Recommended with virtual environment (using uv):

uv venv --python 3.12 && source .venv/bin/activate
uv pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@8967a7d8fd55bc044ce50f1c6543104ed960dd92

Last updated for commit: 8967a7d • Browse code

github-actions · 2026-05-21T20:01:32Z

Fern Docs Preview: https://nvidia-preview-470eac5d-efae-4a33-ae88-72aff8155886.docs.buildwithfern.com/aiperf/dev

coderabbitai · 2026-05-21T20:03:58Z

Warning

Rate limit exceeded

@ajcasagrande has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 19 minutes and 31 seconds before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: e647f268-160b-435b-8704-b37118d2ca8b

📥 Commits

Reviewing files that changed from the base of the PR and between 1fd1ef7 and 8967a7d.

📒 Files selected for processing (1)

docs/server-metrics/server-metrics-reference.md

Walkthrough

This PR expands server metrics documentation across three files to add Triton backend coverage and broaden metric definitions for all supported backends (vLLM, Dynamo, SGLang, TRT-LLM, Triton). Schema files now reference Triton; quick reference sections list additional metric families per backend; common metrics tables are significantly expanded with engine, cache, state, and backend-specific metrics; and troubleshooting guidance is updated for Triton endpoints.

Changes

Server Metrics Documentation Expansion

Layer / File(s)	Summary
Schema and documentation structure updates `docs/server-metrics/server-metrics-json-schema.md`, `docs/server-metrics/server-metrics-parquet-schema.md`, `docs/server-metrics/server-metrics.md`	JSON and Parquet schema files updated to document Triton backend and expand Dynamic Label Columns table with additional Prometheus-derived fields. Main documentation opening sentence extended to mention serving frontends. Server Metrics Reference link added to related documentation.
Backend quick reference metric families `docs/server-metrics/server-metrics.md`	Quick reference sections expanded for vLLM (queue and token breakdown families), Dynamo (frontend and component cache metrics), SGLang (TTFT/ITL/e2e/queue and token metrics), TRT-LLM (setup clarification for `return_perf_metrics` and `enable_iter_perf_stats`), and Triton (default Prometheus port `8002` and histogram configuration guidance).
Common metrics by server detailed tables `docs/server-metrics/server-metrics.md`	Common metrics tables significantly expanded with vLLM engine/cache/state and request phase breakdowns, Dynamo GPU cache and KV-publisher metrics, SGLang token and latency metrics, TRT-LLM scheduling and Dynamo-TRTLLM KV-transfer metrics, and Triton inference and cache metrics.
Supporting guidance updates `docs/server-metrics/server-metrics.md`	Prometheus Summary metrics warning clarified to specify AIPerf ignores rare optional Summary families. Troubleshooting row for endpoint unreachability updated to include Triton's metrics URL and port in curl examples.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Poem

🐰 Metrics bloom like clover in a field so wide,
Triton joins the backends, standing side by side,
vLLM, Dynamo, SGLang in the light,
TRT-LLM and Triton—documentation shines so bright!
Tables expand with cache and queue in sight,
A garden of metrics, documented just right! 📊✨

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Title check	✅ Passed	The PR title accurately and concisely summarizes the main change: updating server metrics documentation for five backends (Dynamo, vLLM, SGLang, TRT-LLM, Triton).

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codecov · 2026-05-21T20:19:43Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Move verbose histogram bucket lists out of wide metric tables so the reference is easier to scan without dropping the bucket details. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Anthony Casagrande <acasagrande@nvidia.com>

Signed-off-by: Anthony Casagrande <acasagrande@nvidia.com>

ajcasagrande changed the title ~~Update server metrics reference~~ May 21, 2026

github-actions Bot added the docs label May 21, 2026

ajcasagrande changed the title ~~docs: update server metrics reference~~ May 21, 2026

dynamo-ops reviewed May 21, 2026

View reviewed changes

Comment thread docs/server-metrics/server-metrics-reference.md Outdated

Comment thread docs/server-metrics/server-metrics-reference.md Outdated

FrankD412 approved these changes May 21, 2026

View reviewed changes

ajcasagrande added 2 commits May 21, 2026 22:18

remove sglang:eplb_balancedness

8a89155

Signed-off-by: Anthony Casagrande <acasagrande@nvidia.com>

Merge branch 'main' into ajc/server-metrics-docs-update

8967a7d

ajcasagrande enabled auto-merge (squash) May 22, 2026 05:19

ajcasagrande merged commit f14a7c0 into main May 22, 2026
30 checks passed

ajcasagrande deleted the ajc/server-metrics-docs-update branch May 22, 2026 05:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

docs: update server metrics reference for Dynamo/vLLM/SGLang/TRT-LLM/Triton#974

docs: update server metrics reference for Dynamo/vLLM/SGLang/TRT-LLM/Triton#974
ajcasagrande merged 4 commits into
mainfrom
ajc/server-metrics-docs-update

ajcasagrande commented May 21, 2026 •

edited by coderabbitai Bot

Loading

github-actions Bot commented May 21, 2026 •

edited

Loading

github-actions Bot commented May 21, 2026 •

edited

Loading

coderabbitai Bot commented May 21, 2026 •

edited

Loading

Rate limit exceeded

Uh oh!

Uh oh!

codecov Bot commented May 21, 2026

Uh oh!

Labels

3 participants

Uh oh!

Conversation

ajcasagrande commented May 21, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Exact steps taken

Trust / limitations

Summary by CodeRabbit

github-actions Bot commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Try out this PR

github-actions Bot commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

coderabbitai Bot commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

Uh oh!

codecov Bot commented May 21, 2026

Codecov Report

Uh oh!

Labels

3 participants

ajcasagrande commented May 21, 2026 •

edited by coderabbitai Bot

Loading

github-actions Bot commented May 21, 2026 •

edited

Loading

github-actions Bot commented May 21, 2026 •

edited

Loading

coderabbitai Bot commented May 21, 2026 •

edited

Loading