Skip to content

feat: Initial implementation of power metrics in aiperf.#803

Merged
FrankD412 merged 43 commits into
ai-dynamo:mainfrom
FrankD412:feat-power-efficiency-metrics
May 29, 2026
Merged

feat: Initial implementation of power metrics in aiperf.#803
FrankD412 merged 43 commits into
ai-dynamo:mainfrom
FrankD412:feat-power-efficiency-metrics

Conversation

@FrankD412

@FrankD412 FrankD412 commented Apr 4, 2026

Copy link
Copy Markdown
Contributor

This PR adds some basic power metrics leveraging GPU telemetry to compute the amount of power used during the benchmark.

Summary by CodeRabbit

  • New Features

    • GPU efficiency metrics: total power, total energy, and output-tokens-per-joule.
    • On-demand GPU telemetry collection entrypoint.
    • Baseline GPU metric capture during telemetry setup.
  • Tests

    • Updated unit tests to exercise the public collection entrypoint and support new telemetry behaviors.
  • Chores

    • Expanded test logging mocks to include informational logs.

Open with Devin
@FrankD412 FrankD412 self-assigned this Apr 4, 2026
@copy-pr-bot

copy-pr-bot Bot commented Apr 4, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions

github-actions Bot commented Apr 4, 2026

Copy link
Copy Markdown

Try out this PR

Quick install:

pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@8a9d8aa680fea00293f8856c280c212c2c70cc80

Recommended with virtual environment (using uv):

uv venv --python 3.12 && source .venv/bin/activate
uv pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@8a9d8aa680fea00293f8856c280c212c2c70cc80

Last updated for commit: 8a9d8aaBrowse code

@coderabbitai

coderabbitai Bot commented Apr 4, 2026

Copy link
Copy Markdown
Contributor

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Adds GPU efficiency metric computation and an explicit collector entrypoint: a new accumulator method computes total GPU power, energy, and output-tokens-per-joule; collector collection method made public for on-demand baseline captures; records and manager invoke these flows.

Changes

Cohort / File(s) Summary
Efficiency computation
src/aiperf/gpu_telemetry/accumulator.py
Added imports for EnergyMetricUnit/PowerMetricUnit and new GPUTelemetryAccumulator.compute_efficiency_metrics(metric_results, time_filter) which aggregates per-GPU power and energy across DCGM endpoints and conditionally emits total_gpu_power, total_gpu_energy, and output_tokens_per_joule.
Protocols
src/aiperf/gpu_telemetry/protocols.py
Added async def collect_and_process_metrics(self) -> None to GPUTelemetryCollectorProtocol and def compute_efficiency_metrics(self, metric_results: list[MetricResult], time_filter: TimeRangeFilter) -> list[MetricResult] to GPUTelemetryAccumulatorProtocol; added TimeRangeFilter import under TYPE_CHECKING.
Records integration
src/aiperf/records/records_manager.py
When GPU accumulator exists, builds a TimeRangeFilter from phase stats and calls compute_efficiency_metrics(...), extending records_results with returned efficiency metrics before publishing.
Manager baseline capture
src/aiperf/gpu_telemetry/manager.py
After creating a PyNVML collector, performs a baseline capture via collector.initialize() and collector.collect_and_process_metrics(), logging success at info and warning on failure; DCGM endpoint baseline logs changed from debug→info.
Collector API rename
src/aiperf/gpu_telemetry/pynvml_collector.py
Renamed private _collect_and_process_metrics to public collect_and_process_metrics and updated its invocation in the background loop.
Tests updated
tests/unit/gpu_telemetry/test_pynvml_collector.py, tests/unit/gpu_telemetry/test_telemetry_manager.py
Tests updated to call the public collect_and_process_metrics() entrypoint and to mock manager.info in telemetry manager tests.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 I hopped through logs at break of dawn,

watts and joules I proudly fawn,
tokens counted, energy measured,
metrics stitched and neatly treasured,
a tiny rabbit cheers the run 🌱⚡

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately reflects the main change: adding initial power metrics functionality to aiperf via GPU telemetry integration.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

coderabbitai[bot]

This comment was marked as resolved.

@codecov

codecov Bot commented Apr 4, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 89.07563% with 13 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/aiperf/gpu_telemetry/accumulator.py 85.71% 7 Missing and 2 partials ⚠️
src/aiperf/gpu_telemetry/pynvml_collector.py 33.33% 2 Missing ⚠️
src/aiperf/records/records_manager.py 83.33% 1 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

@FrankD412

Copy link
Copy Markdown
Contributor Author
@FrankD412 FrankD412 changed the title Draft: Initial implementation of power metrics in aiperf. Apr 6, 2026
@FrankD412 FrankD412 changed the title Draft: Initial implementation of power metrics in aiperf. [https://linear.app/nvidia/issue/AIP-838/implement-an-energytoken-benchmark-estimation] Apr 6, 2026
@FrankD412

Copy link
Copy Markdown
Contributor Author

After some local testing, some of the metrics show up but not all. I need to figure out how to run a metric calculation for energy/token and have it show up.

@FrankD412 FrankD412 force-pushed the feat-power-efficiency-metrics branch from 1238e5e to deaf3a6 Compare April 15, 2026 17:07

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/superpowers/plans/2026-04-13-monitor.md`:
- Around line 13-28: Replace the ASCII table under "File Map" with a mermaid
diagram block (```mermaid ... ```) that groups the created source files, the
modified source file, and the created test files as subgraphs; include nodes for
monitor/__init__.py (Package marker), monitor/models.py (GpuSnapshot model),
monitor/gpu.py (query_gpus), monitor/api.py (create_app), monitor/poller.py
(CsvPoller), cli_commands/monitor.py (CLI subcommand), the modified cli.py
(Register monitor command), and the three test files
tests/unit/monitor/__init__.py, test_gpu.py, test_api.py, and test_poller.py,
following the suggested structure so the diagram replaces the table exactly.

In `@src/aiperf/gpu_telemetry/manager.py`:
- Around line 212-215: The broad exception handler in manager.py (the "except
Exception as e" that calls self.warning(f"GPU Telemetry: Failed to capture
baseline from pynvml: {e}")) is missing the BLE001 suppression comment; add "#
noqa: BLE001" to that except line to match other handlers and include the same
short explanatory comment used at lines 240/334/428, so the except Exception as
e line becomes explicitly marked as intentionally broad while keeping the
existing self.warning call intact.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: cd0f4da4-9c3d-4a42-8ace-c06012576f1b

📥 Commits

Reviewing files that changed from the base of the PR and between 946bb59 and deaf3a6.

📒 Files selected for processing (9)
  • docs/superpowers/plans/2026-04-13-monitor.md
  • docs/superpowers/specs/2026-04-13-monitor-design.md
  • src/aiperf/gpu_telemetry/accumulator.py
  • src/aiperf/gpu_telemetry/manager.py
  • src/aiperf/gpu_telemetry/protocols.py
  • src/aiperf/gpu_telemetry/pynvml_collector.py
  • src/aiperf/records/records_manager.py
  • tests/unit/gpu_telemetry/test_pynvml_collector.py
  • tests/unit/gpu_telemetry/test_telemetry_manager.py
✅ Files skipped from review due to trivial changes (1)
  • docs/superpowers/specs/2026-04-13-monitor-design.md
🚧 Files skipped from review as they are similar to previous changes (3)
  • src/aiperf/records/records_manager.py
  • src/aiperf/gpu_telemetry/protocols.py
  • src/aiperf/gpu_telemetry/accumulator.py
Comment thread docs/superpowers/plans/2026-04-13-monitor.md Outdated
Comment thread src/aiperf/gpu_telemetry/manager.py Outdated
@FrankD412

Copy link
Copy Markdown
Contributor Author

Ok, I was concerned when aiperf was reporting different numbers than the reference implementation in trtllm-bench, but I ended up just wrapping both calls in a simple script to measure. I expected it to over-estimate slightly in both cases, but it looks like the math works out. I'm really surprised that the trtllm-bench case is quite different under a concurrency of 1024 with a ISL/OSL of 1k/1k. Here's what Llama-3.1-7B on 8xH200 looks like with a simple config:

CSV Export: /home/scratch.fdinatale_coreai/code/aiperf/h200/artifacts/meta-llama_Llama-3.1-70B-Instruct-openai-completions-concurrency1024-request_rateinf/profile_export_aiperf.csv
JSON Export: /home/scratch.fdinatale_coreai/code/aiperf/h200/artifacts/meta-llama_Llama-3.1-70B-Instruct-openai-completions-concurrency1024-request_rateinf/profile_export_aiperf.json
Log File: /home/scratch.fdinatale_coreai/code/aiperf/h200/artifacts/meta-llama_Llama-3.1-70B-Instruct-openai-completions-concurrency1024-request_rateinf/logs/aiperf.log


GPU 0 (NVIDIA H200): 143653.77 J
GPU 1 (NVIDIA H200): 147784.43 J
GPU 2 (NVIDIA H200): 149026.62 J
GPU 3 (NVIDIA H200): 148156.23 J
GPU 4 (NVIDIA H200): 145397.66 J
GPU 5 (NVIDIA H200): 148267.18 J
GPU 6 (NVIDIA H200): 153568.17 J
GPU 7 (NVIDIA H200): 149799.60 J
Total energy: 1185653.65 J

Energy reported by aiperf

  "osl_mismatch_count": {
    "unit": "requests",
    "avg": 1421.0
  },
  "total_token_throughput": {
    "unit": "tokens/sec",
    "avg": 27921.504814287888
  },
  "total_gpu_power": {
    "unit": "W",
    "avg": 4724.019916949152
  },
  "total_gpu_energy": {
    "unit": "J",
    "avg": 1067107.4119999986
  },
  "output_token_throughput_per_watt": {
    "unit": "tps/W",
    "avg": 3.027279746105967

trtllm-bench

-- Energy Metrics --------------------------------------

Total Energy (J):                                 1647268.6290
Output Tokens per Second per Watt (tps/W):         1.8649
Average GPU Power (W):                            666.9243

-- Request Latency Breakdown (ms) -----------------------

Validation output:

GPU 0 (NVIDIA H200): 214223.76 J
GPU 1 (NVIDIA H200): 217174.15 J
GPU 2 (NVIDIA H200): 218900.36 J
GPU 3 (NVIDIA H200): 218245.88 J
GPU 4 (NVIDIA H200): 215508.04 J
GPU 5 (NVIDIA H200): 217891.44 J
GPU 6 (NVIDIA H200): 222311.65 J
GPU 7 (NVIDIA H200): 219760.09 J
Total energy: 1744015.37 J
@FrankD412 FrankD412 changed the title Draft: Initial implementation of power metrics in aiperf. Apr 16, 2026
@github-actions github-actions Bot added the feat label Apr 16, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/aiperf/gpu_telemetry/accumulator.py`:
- Around line 337-368: The current per-GPU metric extraction only catches
NoMetricValue in the blocks calling gpu_data.get_metric_result for
"gpu_power_usage" and "energy_consumption" (power_result / energy_result),
allowing any other exception to bubble up and abort processing; update those
try/except blocks to also catch generic Exception as e (after NoMetricValue) and
log the unexpected error (include the gpu_uuid and exception text) via
self.debug or self.error so the loop continues to the next GPU and overall
efficiency metric aggregation isn't aborted.
- Around line 370-372: The early return when power_gpu_count == 0 in the
accumulator (around power_gpu_count check) is dropping metrics that don't
require instantaneous power (e.g., total_gpu_energy and
output_tokens_per_joule); change the logic in the method containing that check
(look for power_gpu_count, total_gpu_energy, output_tokens_per_joule) so you no
longer return [] when power data is missing—instead compute and include metrics
that can be derived from available energy data and only skip or set to None
those metrics that specifically require gpu_power_usage/instantaneous power;
ensure the final returned list still contains non-power metrics and that
power-dependent fields are omitted or null when power_gpu_count == 0.
- Around line 325-329: The code incorrectly reads
total_output_tokens_result.avg; since TotalOutputTokensMetric inherits
DerivedSumMetric and the derived scalar is stored as a sum, change the
assignment that computes total_output_tokens to read
total_output_tokens_result.sum instead of .avg (locate the
total_output_tokens_result variable and the total_output_tokens assignment near
the TotalOutputTokensMetric / DerivedSumMetric usage) so the value semantically
matches the DerivedSumMetric behavior.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: a0bffc0c-ed37-496c-ad35-c82557ac1743

📥 Commits

Reviewing files that changed from the base of the PR and between deaf3a6 and 6676436.

📒 Files selected for processing (1)
  • src/aiperf/gpu_telemetry/accumulator.py
Comment thread src/aiperf/gpu_telemetry/accumulator.py Outdated
Comment thread src/aiperf/gpu_telemetry/accumulator.py
Comment thread src/aiperf/gpu_telemetry/accumulator.py Outdated
@FrankD412 FrankD412 force-pushed the feat-power-efficiency-metrics branch 2 times, most recently from aea6132 to 279c92d Compare April 16, 2026 03:15

@devin-ai-integration devin-ai-integration Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 potential issue.

View 3 additional findings in Devin Review.

Open in Devin Review
Comment thread src/aiperf/records/records_manager.py Outdated
@FrankD412 FrankD412 force-pushed the feat-power-efficiency-metrics branch from 5da4623 to 1ae073e Compare April 24, 2026 16:29

@debermudez debermudez left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Four issues found during review.

Comment thread src/aiperf/records/records_manager.py Outdated
Comment thread src/aiperf/gpu_telemetry/accumulator.py Outdated
Comment thread src/aiperf/gpu_telemetry/accumulator.py Outdated
Comment thread src/aiperf/gpu_telemetry/protocols.py Outdated
@FrankD412 FrankD412 force-pushed the feat-power-efficiency-metrics branch from db6810f to 42b1fe5 Compare April 28, 2026 04:57
@FrankD412

Copy link
Copy Markdown
Contributor Author

Four issues found during review.

Alright -- fixed the issues.

Comment thread src/aiperf/gpu_telemetry/accumulator.py Outdated
Comment thread src/aiperf/records/records_manager.py Outdated
@FrankD412 FrankD412 force-pushed the feat-power-efficiency-metrics branch 2 times, most recently from 320f376 to 32f2c35 Compare May 6, 2026 23:40
@FrankD412 FrankD412 force-pushed the feat-power-efficiency-metrics branch 2 times, most recently from 2daad4f to 545a3e9 Compare May 14, 2026 20:57
Operator can now point both power-measurement serve drivers at a raw_payload
JSONL file (produced e.g. by tools/trtllm_bench_to_aiperf.py) instead of
using the built-in synthetic dataset generation. When --input-file is set:

  - aiperf is invoked with --custom-dataset-type raw_payload --input-file <path>
  - --synthetic-input-tokens-* and --output-tokens-* flags are dropped
  - --extra-inputs min_tokens/max_tokens trailers are dropped (per-row
    max_tokens in the JSONL dictates generation length)
  - --extra-inputs ignore_eos:true stays (server-behavior knob)

--isl and --osl are now sentinel-default None and validated as mutually
exclusive with --input-file (rc=2 on collision). When --input-file is unset
both are backfilled to 1000 in main(), so existing invocations are
byte-for-byte unchanged.

--endpoint-type is already a CLI flag (default completions); the operator
passes --endpoint-type chat themselves when using --input-file since aiperf's
raw_payload loader hard-requires `messages`-shaped JSONL lines. The driver
does not auto-flip.

summary.json gains an "input_file" key in both drivers so the artifact is
self-describing in both modes. The resolved absolute path is stored back
into args.input_file after the existence check so summary.json and the
aiperf invocation agree on the path.

Spec: docs/superpowers/specs/2026-05-27-power-driver-input-file-design.md

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>

@debermudez debermudez left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review — PR #803 (power-efficiency metrics)

Reviewed against b4f1f58c (current PR head). Two passes combined:

  1. Branch code review — 10 findings (3 High / 6 Medium / 1 Low).
  2. LLM-ergonomics review — 8 findings (2 High / 5 Medium / 1 Low).

Most findings anchor to specific lines and are posted inline. Two cross-cutting items don't anchor cleanly:

High — Public metrics added without docs updates (ergonomics Axis 6)

This PR adds three new public metrics (total_gpu_power, total_gpu_energy, output_tokens_per_joule), a new unit string (tokens/J), and a new "externally-injected derived metric" pattern, but no docs are updated. CLAUDE.md mandates: "DOCUMENTATION IS REQUIRED, NOT OPTIONAL ... Metrics definitions or formulas → docs/metrics-reference.md". Suggested:

  • Add a ## GPU Power Efficiency Metrics section to docs/metrics-reference.md with the three tags, units, and formulas (Σ avg(gpu_power_usage), Σ delta(energy_consumption), total_output_tokens / total_gpu_energy).
  • Add an ### Externally-Injected Derived Metric entry to docs/dev/patterns.md citing power_efficiency_metrics.py as the reference file.
  • If the pattern warrants it, sync across the four-file rule (AGENTS.md / CLAUDE.md / .github/copilot-instructions.md / .cursor/rules/python.mdc) and run make check-agent-files-sync.

Low — tools/ debug runner duplication

tools/run_trtllm_bench_with_power.py, tools/run_trtllm_serve_with_power.py, and tools/run_trtllm_serve_with_power_dcgm.py each define their own http_get / http_post / wait_for_server / write_json helpers. Extracting tools/_power_runner_common.py would prevent helper drift (though tools/ is out of strict ergonomics scope).

Runtime reproduction (deferred — should run on a GPU host pre-merge)

# F-04: degenerate phase_stats
aiperf profile -m <model> --no-warmup --request-count 0 --gpu-telemetry pynvml \
  --artifact_dir /tmp/aiperf-f04 --ui simple

# F-08: GPU flags absent when telemetry disabled
aiperf profile -m <model> --request-count 50 --no-gpu-telemetry \
  --artifact_dir /tmp/aiperf-f08 --ui simple
# expect: profile_export_aiperf.json does NOT contain total_gpu_power / total_gpu_energy / output_tokens_per_joule

# F-02: cross-phase energy leak (requires multi-phase plan)
aiperf profile ... --plan two-phase.yaml ...
# expect: phase 2's total_gpu_energy reflects only phase 2 delta

What's good

  • The asymmetric gauge/counter time-window comment at accumulator.py:407-408 is exactly the "why, not what" comment style CLAUDE.md prescribes, and test_energy_filter_widens_end_ns_while_power_filter_stays_bounded (test_accumulator.py:549) locks the invariant in.
  • _capture_collector_baseline (manager.py:266-292) cleanly relocates collector.initialize() to the baseline-capture path so the energy counter has a baseline before profile start.
  • Lambda-wrapped debug logs (accumulator.py:325-338, 364-380) match the "lambda for expensive logs" rule and correctly use uuid=gpu_uuid to dodge the loop-variable closure bug.
  • Mechanical floor is clean: check-ergonomics (257 baselined, 0 new) and ruff-baselined (305 baselined, 0 new) both pass.

Full review artifacts (local-only): artifacts/code-review.md and artifacts/code-review-2026-05-27/llm-ergonomics-pr-803-feat-power-efficiency-metrics.md.

Comment thread src/aiperf/gpu_telemetry/accumulator.py Outdated
Comment thread src/aiperf/gpu_telemetry/accumulator.py
Comment thread src/aiperf/gpu_telemetry/accumulator.py Outdated
Comment thread src/aiperf/gpu_telemetry/accumulator.py
Comment thread src/aiperf/gpu_telemetry/accumulator.py Outdated
Comment thread src/aiperf/metrics/types/power_efficiency_metrics.py
Comment thread src/aiperf/metrics/types/power_efficiency_metrics.py Outdated
Comment thread src/aiperf/records/records_manager.py Outdated
Comment thread src/aiperf/records/records_manager.py Outdated
Comment thread src/aiperf/records/records_manager.py Outdated
FrankD412 and others added 13 commits May 27, 2026 13:53
Replace the magic 1e6 conversion in _sum_gpu_energy_joules with the
unit-defined factor so a future rename or rescale of EnergyMetricUnit
cannot silently drift the energy aggregation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
…ract

Add Returns and Example blocks spelling out the 0-3 per-tag omission
conditions, clarify that metric_results is read-only and scanned only
for total_output_tokens, document the intentional energy-window
widening, and cross-reference sibling summarize() to distinguish sync
cross-GPU aggregate (one per phase) from async per-GPU emission (periodic).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
…cUnit

Replace the hardcoded "tokens/J" literal in compute_efficiency_metrics
with str(GenericMetricUnit.TOKENS_PER_JOULE) so the accumulator-emitted
unit can't drift from OutputTokensPerJouleMetric.unit on a rename.

Add test_emitted_units_match_metric_class_units pinning the
relationship for all three power-efficiency tags rather than the
literal string, and update the existing happy-path assertion to use
the enum form for the same reason.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
…s block

The Protocol's Returns block described an all-or-nothing contract
("Empty if no GPU data is available for any metric.."), but the
implementation in GPUTelemetryAccumulator returns 0-3 results with
independent per-tag omission. Rewrite to match (and fix the trailing
double-period typo) so agents implementing the Protocol model the
actual behavior.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
Rename len_records_results -> records_completed and document why the
snapshot is taken before records_results.extend(efficiency_metrics):
ProcessRecordsResult.completed must reflect request-derived records
only, not derived aggregates (total_gpu_power, total_gpu_energy,
output_tokens_per_joule).

Add test_completed_excludes_efficiency_metrics in
TestRecordsManagerEfficiencyMetricsSnapshot to guard against a future
PR reordering or removing the snapshot. Verified the test fails when
the snapshot is moved after the extend.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
…aders

MetricResult.count is stripped to None by to_json_result for DERIVED
metrics (record_models.py:70-90), so a "1 of 2 GPUs reporting" run was
indistinguishable from "2 of 2" in the CLI audit surface. Bake the
valid-data count into each header via a small _gpu_count_suffix helper:

  total_gpu_power           -> "Total GPU Power (N GPUs)"
  total_gpu_energy          -> "Total GPU Energy (N GPUs)"
  output_tokens_per_joule   -> "Output Tokens per Joule (N GPUs)"

Singular form ("1 GPU") is used when N == 1. Existing happy-path tests
gain header assertions; new test_header_reflects_partial_cohort_count
pins the "1 of 2" surface so future regressions are caught.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
Three convergent issues across TotalGpuPowerMetric, TotalGpuEnergyMetric,
and OutputTokensPerJouleMetric — same fix per class:

- Class docstring gains an explicit `Invariant:` paragraph naming the
  injection site (GPUTelemetryAccumulator.compute_efficiency_metrics)
  and the catching code path
  (MetricResultsProcessor.update_derived_metrics). Future contributors
  copy-pasting these as "the derived-metric pattern" now see the
  contract spelled out.

- `_derive_value` return annotation `-> float` -> `-> NoReturn` (typing).
  The body unconditionally raises, so the previous signature lied; the
  truthful form lets type-checkers reason about unreachable downstream
  code.

- `NoMetricValue` message rewritten to name the operation
  (derive from MetricResultsDict), the injection site, and the path
  that's expected to catch and skip. The previous form named only the
  source ("X is computed by the GPU telemetry accumulator") and gave
  agents no clue where the contract is enforced.

Add tests/unit/metrics/test_power_efficiency_metrics.py with a
parametrized contract test pinning each of the three error-message
invariants (tag, operation, injection-site) so a future weakening of
the message is caught.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
…window

_process_results previously constructed a TimeRangeFilter for the
efficiency-metrics block with two consecutive time.time_ns() fallbacks
when phase_stats.start_ns / requests_end_ns were both None. The two
calls land nanoseconds apart, producing an effectively zero-width
window. Total GPU power (gauge-based) would then either emit a
misleading 0.0W as a valid-looking metric or be silently dropped,
depending on telemetry sample jitter.

Extract the efficiency-metrics block into _apply_gpu_efficiency_metrics
with an explicit guard: when either bound is None (the degenerate
"no records flowed for this phase" case), log a warning naming the
phase and the missing fields and skip the compute call. When both
bounds are set, honor them directly with no fallback. Extracting also
keeps _process_results under the PLR0912 branch budget that the inline
guard would have pushed it over.

Add TestRecordsManagerEfficiencyMetricsDegeneratePhase with two cases:
both bounds None, and only one None (partial-degenerate). Sanity-checked
by reverting the guard — the test failed with the exact ~1us-wide
TimeRangeFilter the reviewer described.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
…race

compute_efficiency_metrics previously built the energy filter with
end_ns=None — the intent was to widen the window by one collection
cadence so the trailing counter scrape that lands just after
requests_end_ns is included in the delta. The unbounded form, however,
captures every post-phase sample present in the (append-only)
TelemetryHierarchy: late-finalization drift absorbs cooldown samples
into the phase's energy, and in multi-phase runs an earlier-phase
window can reach forward into a later phase's samples if the
finalization ordering ever changes.

Replace end_ns=None with end_ns = phase_end + Environment.GPU.FINAL_SCRAPE_GRACE_NS
(new env var, default 666_000_000 ns ~= 2x the default 333 ms
COLLECTION_INTERVAL, configurable via AIPERF_GPU_FINAL_SCRAPE_GRACE_NS).
Move the rationale from the call site into _sum_gpu_energy_joules
where the counter math actually consumes the filter, per reviewer
preference.

Update test_energy_filter_widens_end_ns_..._stays_bounded to pin the
new bounded form, and add test_repeated_calls_use_bounded_energy_window_per_phase:
simulates two consecutive phases and asserts each call's energy filter
ends at phase.end_ns + grace, not None. The latter test also asserts
the grace window stays inside the phase-2 start.

Sanity-checked by restoring end_ns=None — both tests fail with
"None == (5000000000 + 666000000)", exactly the leak the reviewer
described.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
Add a GPU Power Efficiency Metrics section to metrics-reference.md
covering the three tags introduced by this PR — total_gpu_power,
total_gpu_energy, and output_tokens_per_joule. Each entry follows the
existing per-metric format (Type, Formula, Notes) and links to the
new Externally-Injected Derived Metric pattern in docs/dev/patterns.md
(added in a follow-up commit) so readers can find the contract that
explains why _derive_value never fires for these tags.

The section also documents:
  - The bounded energy-window math via FINAL_SCRAPE_GRACE_NS.
  - The "(N GPUs)" header attribution that distinguishes partial-cohort
    runs from full ones.
  - The negative-delta clamp on counter resets.
  - The independent per-tag omission contract.

Update the Table of Contents to include the new section and its three
subsection anchors.

Per CLAUDE.md "DOCUMENTATION IS REQUIRED, NOT OPTIONAL ... Metrics
definitions or formulas -> docs/metrics-reference.md".

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
Add an Externally-Injected Derived Metric Pattern section to
docs/dev/patterns.md, slotted alongside the other specialized contract
patterns (NaN/Inf Discipline, Safe Filesystem Reads). The new entry
captures the three-part contract introduced by the power_efficiency_metrics
classes:

  1. Invariant: paragraph in the class docstring naming the injection
     site and the catching path.
  2. _derive_value returns NoReturn from typing — the body unconditionally
     raises, so the truthful annotation is NoReturn rather than the
     metric's value type.
  3. NoMetricValue message names the operation (Cannot derive 'X' from
     MetricResultsDict), the injection site
     (GPUTelemetryAccumulator.compute_efficiency_metrics), and the
     catching path (MetricResultsProcessor.update_derived_metrics).

Reference file is power_efficiency_metrics.py; injection site is
GPUTelemetryAccumulator.compute_efficiency_metrics. The new metrics-
reference section added in the previous commit links here.

Skipped the four-file sync (AGENTS.md / CLAUDE.md / copilot-instructions
/ python.mdc): the reviewer hedged with "if the pattern warrants it",
and this is a narrow pattern specific to GPU telemetry + externally
computed values, not a project-wide coding standard. The four files
already direct readers to docs/dev/patterns.md for code-pattern lookups
via the documentation table.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
…etrics

Add a fourth power-efficiency metric that reports per-user energy
footprint: total_gpu_energy / configured concurrency. Lower is better
— a more efficient deployment serves the same load for less energy
per concurrent user.

  - New GenericMetricUnit.JOULES_PER_USER ("joules/user").
  - New EnergyPerUserMetric class in power_efficiency_metrics.py
    following the externally-injected derived metric pattern
    (Invariant: docstring, _derive_value returns NoReturn, structured
    error message). display_order=903, flags=MetricFlags.NONE
    (smaller-is-better default).
  - GPUTelemetryAccumulator.compute_efficiency_metrics now reads the
    profiling phase's concurrency via run.cfg.get_profiling_phases()
    and emits energy_per_user when both concurrency is a positive int
    AND energy_count > 0. Concurrency=None (e.g. pure --request-rate
    runs) and missing energy data both yield omission, consistent
    with the other power-efficiency metrics' independent-omission
    contract.

Tests: happy-path now expects 4 metrics (default test config resolves
to concurrency=1); multiple-GPUs gains energy_per_user assertion;
partial-cohort header check covers the new tag; unit-relationship
test extends to EnergyPerUserMetric; three new tests cover concurrency
scaling, None-omission, and no-energy-data omission. The _derive_value
contract test parametrize gains the new class.

Docs: new "Energy per User" subsection in docs/metrics-reference.md
under GPU Power Efficiency Metrics; TOC and section intro updated.

Function size: compute_efficiency_metrics grew past the 80-line
ergonomics cap (101 lines now) and accumulator.py past the 500-line
file cap (511 lines). Both baselined pending a follow-up refactor
that splits efficiency-metrics construction out of the accumulator.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
@debermudez debermudez enabled auto-merge (squash) May 28, 2026 18:33
@debermudez debermudez disabled auto-merge May 28, 2026 18:33
@FrankD412 FrankD412 disabled auto-merge May 28, 2026 18:33
@FrankD412

Copy link
Copy Markdown
Contributor Author

Confirmed that the energy_per_user statistic is showing up and things appear to be behaving. Updating and setting to auto-merge once tests complete.

@FrankD412 FrankD412 enabled auto-merge (squash) May 28, 2026 22:36
@FrankD412 FrankD412 disabled auto-merge May 28, 2026 22:36
@FrankD412 FrankD412 enabled auto-merge (squash) May 28, 2026 22:37
@FrankD412 FrankD412 requested a review from debermudez May 28, 2026 22:37

@debermudez debermudez left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work man.

@FrankD412 FrankD412 merged commit a90d154 into ai-dynamo:main May 29, 2026
28 of 29 checks passed
@FrankD412 FrankD412 deleted the feat-power-efficiency-metrics branch May 29, 2026 18:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

4 participants