docs: update server metrics reference for Dynamo/vLLM/SGLang/TRT-LLM/Triton#974
Conversation
Refresh server metrics documentation against upstream metric definitions so backend-specific names, labels, and optional families match current source behavior. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Anthony Casagrande <acasagrande@nvidia.com>
Try out this PRQuick install: pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@8967a7d8fd55bc044ce50f1c6543104ed960dd92Recommended with virtual environment (using uv): uv venv --python 3.12 && source .venv/bin/activate
uv pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@8967a7d8fd55bc044ce50f1c6543104ed960dd92Last updated for commit: |
|
Warning Rate limit exceeded
You’ve run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
WalkthroughThis PR expands server metrics documentation across three files to add Triton backend coverage and broaden metric definitions for all supported backends (vLLM, Dynamo, SGLang, TRT-LLM, Triton). Schema files now reference Triton; quick reference sections list additional metric families per backend; common metrics tables are significantly expanded with engine, cache, state, and backend-specific metrics; and troubleshooting guidance is updated for Triton endpoints. ChangesServer Metrics Documentation Expansion
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
Move verbose histogram bucket lists out of wide metric tables so the reference is easier to scan without dropping the bucket details. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Anthony Casagrande <acasagrande@nvidia.com>
Signed-off-by: Anthony Casagrande <acasagrande@nvidia.com>
Summary
Updates
docs/server-metrics/using a source-grounded audit of upstream server metric definitions rather than relying on grep-only matches.Exact steps taken
Cloned/reused upstream source checkouts and audited the actual metric-definition/exporter source files, not just grep hits:
Compared upstream metric families and labels against:
docs/server-metrics/server-metrics.mddocs/server-metrics/server-metrics-reference.mddocs/server-metrics/server-metrics-json-schema.mddocs/server-metrics/server-metrics-parquet-schema.mdUpdated docs for source-confirmed gaps, including:
trtllm_metric namestrtllm_prefixnv_trt_llm_*/nv_llm_*familiesVerified that AIPerf’s Prometheus parser stores counter families without the sample-level trailing
_totalsuffix by running a local parser check withprometheus_client.parser.text_string_to_metric_families.Ran validation:
git diff --check -- docs/server-metrics/server-metrics.md docs/server-metrics/server-metrics-reference.md docs/server-metrics/server-metrics-json-schema.md docs/server-metrics/server-metrics-parquet-schema.mduv run python tools/check_docs_index.pyTrust / limitations
🤖 Generated with Claude Code
Summary by CodeRabbit