Skip to content

docs: update server metrics reference for Dynamo/vLLM/SGLang/TRT-LLM/Triton#974

Merged
ajcasagrande merged 4 commits into
mainfrom
ajc/server-metrics-docs-update
May 22, 2026
Merged

docs: update server metrics reference for Dynamo/vLLM/SGLang/TRT-LLM/Triton#974
ajcasagrande merged 4 commits into
mainfrom
ajc/server-metrics-docs-update

Conversation

@ajcasagrande

@ajcasagrande ajcasagrande commented May 21, 2026

Copy link
Copy Markdown
Contributor

Summary

Updates docs/server-metrics/ using a source-grounded audit of upstream server metric definitions rather than relying on grep-only matches.

Exact steps taken

  1. Cloned/reused upstream source checkouts and audited the actual metric-definition/exporter source files, not just grep hits:

  2. Compared upstream metric families and labels against:

    • docs/server-metrics/server-metrics.md
    • docs/server-metrics/server-metrics-reference.md
    • docs/server-metrics/server-metrics-json-schema.md
    • docs/server-metrics/server-metrics-parquet-schema.md
  3. Updated docs for source-confirmed gaps, including:

    • Correct TensorRT-LLM trtllm_ metric names
    • Dynamo-added TRTLLM metrics using the trtllm_ prefix
    • Dynamo frontend/component/router/runtime/KVBM metrics
    • Dynamo embedding-cache metrics
    • Dynamo KV publisher metrics
    • Dynamo Tokio/event-loop metrics
    • SGLang cached-token, grammar, routing-key, prefill-delayer, EPLB, storage, and eviction/load-back metrics
    • Triton optional labels and response-cache caveats
    • TensorRT-LLM Triton backend nv_trt_llm_* / nv_llm_* families
    • Expanded Parquet dynamic label examples
  4. Verified that AIPerf’s Prometheus parser stores counter families without the sample-level trailing _total suffix by running a local parser check with prometheus_client.parser.text_string_to_metric_families.

  5. Ran validation:

    • git diff --check -- docs/server-metrics/server-metrics.md docs/server-metrics/server-metrics-reference.md docs/server-metrics/server-metrics-json-schema.md docs/server-metrics/server-metrics-parquet-schema.md
    • uv run python tools/check_docs_index.py

Trust / limitations

  • These docs are more trustworthy than a grep-only update because the audit read upstream metric definition/exporter files and checked AIPerf parser behavior.
  • This is still a static source audit, not a live scrape from every backend configuration.
  • Optional metrics may only appear when their upstream feature is enabled.
  • Some upstream constants were intentionally excluded when no registration/emission path was found.

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Documentation
    • Expanded server metrics documentation with additional metrics supported across vLLM, Dynamo, SGLang, TRT-LLM, and Triton backends.
    • Enhanced dynamic label columns schema for parquet export.
    • Improved metrics setup instructions and troubleshooting guidance for TRT-LLM and Triton.

Review Change Stack

Refresh server metrics documentation against upstream metric definitions so backend-specific names, labels, and optional families match current source behavior.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Anthony Casagrande <acasagrande@nvidia.com>
@github-actions

github-actions Bot commented May 21, 2026

Copy link
Copy Markdown

Try out this PR

Quick install:

pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@8967a7d8fd55bc044ce50f1c6543104ed960dd92

Recommended with virtual environment (using uv):

uv venv --python 3.12 && source .venv/bin/activate
uv pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@8967a7d8fd55bc044ce50f1c6543104ed960dd92

Last updated for commit: 8967a7dBrowse code

@ajcasagrande ajcasagrande changed the title Update server metrics reference May 21, 2026
@github-actions github-actions Bot added the docs label May 21, 2026
@ajcasagrande ajcasagrande changed the title docs: update server metrics reference May 21, 2026
@github-actions

github-actions Bot commented May 21, 2026

Copy link
Copy Markdown
@coderabbitai

coderabbitai Bot commented May 21, 2026

Copy link
Copy Markdown
Contributor

Warning

Rate limit exceeded

@ajcasagrande has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 19 minutes and 31 seconds before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: e647f268-160b-435b-8704-b37118d2ca8b

📥 Commits

Reviewing files that changed from the base of the PR and between 1fd1ef7 and 8967a7d.

📒 Files selected for processing (1)
  • docs/server-metrics/server-metrics-reference.md

Walkthrough

This PR expands server metrics documentation across three files to add Triton backend coverage and broaden metric definitions for all supported backends (vLLM, Dynamo, SGLang, TRT-LLM, Triton). Schema files now reference Triton; quick reference sections list additional metric families per backend; common metrics tables are significantly expanded with engine, cache, state, and backend-specific metrics; and troubleshooting guidance is updated for Triton endpoints.

Changes

Server Metrics Documentation Expansion

Layer / File(s) Summary
Schema and documentation structure updates
docs/server-metrics/server-metrics-json-schema.md, docs/server-metrics/server-metrics-parquet-schema.md, docs/server-metrics/server-metrics.md
JSON and Parquet schema files updated to document Triton backend and expand Dynamic Label Columns table with additional Prometheus-derived fields. Main documentation opening sentence extended to mention serving frontends. Server Metrics Reference link added to related documentation.
Backend quick reference metric families
docs/server-metrics/server-metrics.md
Quick reference sections expanded for vLLM (queue and token breakdown families), Dynamo (frontend and component cache metrics), SGLang (TTFT/ITL/e2e/queue and token metrics), TRT-LLM (setup clarification for return_perf_metrics and enable_iter_perf_stats), and Triton (default Prometheus port 8002 and histogram configuration guidance).
Common metrics by server detailed tables
docs/server-metrics/server-metrics.md
Common metrics tables significantly expanded with vLLM engine/cache/state and request phase breakdowns, Dynamo GPU cache and KV-publisher metrics, SGLang token and latency metrics, TRT-LLM scheduling and Dynamo-TRTLLM KV-transfer metrics, and Triton inference and cache metrics.
Supporting guidance updates
docs/server-metrics/server-metrics.md
Prometheus Summary metrics warning clarified to specify AIPerf ignores rare optional Summary families. Troubleshooting row for endpoint unreachability updated to include Triton's metrics URL and port in curl examples.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Poem

🐰 Metrics bloom like clover in a field so wide,
Triton joins the backends, standing side by side,
vLLM, Dynamo, SGLang in the light,
TRT-LLM and Triton—documentation shines so bright!
Tables expand with cache and queue in sight,
A garden of metrics, documented just right! 📊✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Title check ✅ Passed The PR title accurately and concisely summarizes the main change: updating server metrics documentation for five backends (Dynamo, vLLM, SGLang, TRT-LLM, Triton).

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Comment thread docs/server-metrics/server-metrics-reference.md Outdated
Comment thread docs/server-metrics/server-metrics-reference.md Outdated
@codecov

codecov Bot commented May 21, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Move verbose histogram bucket lists out of wide metric tables so the reference is easier to scan without dropping the bucket details.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Anthony Casagrande <acasagrande@nvidia.com>
@ajcasagrande ajcasagrande enabled auto-merge (squash) May 22, 2026 05:19
@ajcasagrande ajcasagrande merged commit f14a7c0 into main May 22, 2026
30 checks passed
@ajcasagrande ajcasagrande deleted the ajc/server-metrics-docs-update branch May 22, 2026 05:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

3 participants