Skip to content

feat(api): add /api/run endpoint exposing run-identity metadata#997

Merged
FrankD412 merged 3 commits into
mainfrom
mkotila/api-run-endpoint
May 27, 2026
Merged

feat(api): add /api/run endpoint exposing run-identity metadata#997
FrankD412 merged 3 commits into
mainfrom
mkotila/api-run-endpoint

Conversation

@matthewkotila

@matthewkotila matthewkotila commented May 26, 2026

Copy link
Copy Markdown
Contributor

/api/run: how a "regression fix" turned into a design choice

Branch: mkotila/api-run-endpoint · Commit: 047d59dd · Tickets: NVBugs 6204764 / JIRA DYN-699

The story

A QA test broke. The reporter framed it as a regression: curl /api/config | jq .cli_command used to return the redacted launch command, now it returns null. The bug ticket arrived with a one-paragraph "AI suggestion" recommending the obvious fix — stitch cli_command back onto /api/config and call it done.

The obvious fix was wrong. Here's why.

The break came from PR #912 (the YAML-native v2 config refactor). Before that refactor, a single UserConfig class held everything: declarative spec (endpoint, loadgen, tokenizer, …) and per-run identity (benchmark_id, cli_command, …) jammed together at one level. /api/config dumped the whole grab-bag. PR #912 split it cleanly — declarative parts into BenchmarkConfig, identity into a new BenchmarkRun envelope. That split is the right shape: a CLI command isn't part of "the config," it's run-execution metadata — the thing that produced a config. The endpoint was migrated to return only BenchmarkConfig, which is honest, but it silently dropped cli_command and benchmark_id along with it.

So you see the trap. Option A from the bug ticket — stitch cli_command back onto /api/config — would re-create the category error PR #912 just cleaned up. You'd paper over the architectural improvement to preserve a stale API shape.

Option B is the move: add a new endpoint, /api/run, that exposes the BenchmarkRun envelope. cli_command lives there because that's where the field actually lives in the model. /api/config stays pure. As a bonus, the response is shaped as RunInfo — the same model already used in profile_export_aiperf.json — so the live API and the on-disk export can't drift.

The cost is one line in QA's test (/api/config/api/run). No in-tree consumers to migrate; the web dashboard that would have read it was on an unmerged branch and never landed.

What shipped

File Change
src/aiperf/common/models/export_models.py RunInfo.from_run(run) classmethod — single projection from BenchmarkRun
src/aiperf/exporters/metrics_json_exporter.py Replaced private _build_run_metadata with the new classmethod (one source of truth)
src/aiperf/api/routers/core.py New GET /api/runRunInfo
tests/unit/api/routers/test_core.py +5 tests: /api/config no-grab-bag guard; /api/run shape + no-secrets; end-to-end redaction test (sys.argv with --api-key sk-secret-… → asserts secret not in response)
docs/reference/api-endpoints.md (new) Full HTTP API surface documented for the first time
docs/index.yml Fern index registration for the new page

Verification

Check Result
pytest tests/unit/api/ tests/unit/exporters/ 347 passed, 1 skipped
ruff format --check + ruff check (touched files) clean
tools/check_docs_index.py OK: all 102 docs listed
Live FastAPI startup log /api/run present in route table
pre-commit run --all-files (host) passed
Every claim in api-endpoints.md re-checked against code passed (2 minor edits applied)

Decisions worth remembering

  • Endpoint name /api/run — bare singular noun matches /api/config's convention; RunInfo is the internal model name, not the route's job.
  • Reference doc over tutorial — the API surface had been undocumented since PR feat(api): add FastAPI service with plugin-based router system #717 landed without the api-service.md from ajc/websocket-dashboard-ui; a compact reference table closes the gap without expanding scope.
  • Single commit — code + tests + docs in one self-contained unit.

Follow-ups

  • QA test: migrate test_api_key_redaction from /api/config/api/run. The new in-tree test_run_does_not_leak_api_key_in_cli_command mirrors the same contract.
  • Comment on NVBugs 6204764 / DYN-699 with fix summary + the QA migration note.
  • If 0.9.0 backport is needed: cherry-pick 047d59dd.

Summary by CodeRabbit

  • New Features

    • Added /api/run endpoint exposing active run identity/metadata with automatic redaction of API keys and sensitive CLI/credential values; omits unset optional fields and returns 503 when no active run.
  • Bug Fixes

    • /api/config no longer exposes run-identity fields.
  • Documentation

    • Added comprehensive HTTP API reference with endpoints, Prometheus/metrics, health checks, results/artifacts, OpenAPI and curl examples.

Review Change Stack

The YAML-native v2 config refactor (PR #912) correctly moved
cli_command, benchmark_id, and sweep_id from UserConfig onto
BenchmarkRun, leaving /api/config as a pure BenchmarkConfig
projection. External consumers reading cli_command from /api/config
broke as a result (NVBugs 6204764 / JIRA DYN-699).

Restoring cli_command on /api/config would re-create the category
error the refactor cleaned up (run-execution metadata on a
declarative config response). Instead, expose run-identity via a
dedicated endpoint that matches the underlying model and shares
its shape with the run_info block in profile_export_aiperf.json,
so live and on-disk readers see identical fields.

- GET /api/run -> RunInfo (benchmark_id, sweep_id, trial,
  random_seed, run_label, variation_*, redacted cli_command).
- RunInfo.from_run() classmethod as the single projection function
  shared by /api/run and the JSON exporter.
- Tests: /api/config must not expose run-identity; /api/run must
  not expose config or secrets; end-to-end redaction test
  monkey-patches sys.argv with --api-key <secret> and asserts the
  secret never reaches the API response (mirrors QA's
  test_api_key_redaction).
- New docs/reference/api-endpoints.md documenting all HTTP API
  routes (the surface has been undocumented on main since #717
  landed without the api-service tutorial that only existed on
  the ajc/websocket-dashboard-ui branch).
@github-actions

github-actions Bot commented May 26, 2026

Copy link
Copy Markdown

Try out this PR

Quick install:

pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@6c1f03d6947af25a416e73fd308baa9af08371b6

Recommended with virtual environment (using uv):

uv venv --python 3.12 && source .venv/bin/activate
uv pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@6c1f03d6947af25a416e73fd308baa9af08371b6

Last updated for commit: 6c1f03dBrowse code

@github-actions github-actions Bot added the feat label May 26, 2026
@github-actions

github-actions Bot commented May 26, 2026

Copy link
Copy Markdown
@coderabbitai

coderabbitai Bot commented May 26, 2026

Copy link
Copy Markdown
Contributor

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 5cdbf449-c86d-46ae-886a-65cc284533ad

📥 Commits

Reviewing files that changed from the base of the PR and between 047d59d and 1ab2383.

📒 Files selected for processing (2)
  • src/aiperf/api/routers/core.py
  • tests/unit/api/routers/test_core.py

Walkthrough

This PR adds a /api/run endpoint that returns RunInfo projected from the current BenchmarkRun (omitting unset fields), centralizes projection in RunInfo.from_run(), updates the metrics exporter to use it, and adds tests plus API documentation and navigation.

Changes

HTTP API Run Identity Endpoint

Layer / File(s) Summary
RunInfo data model and type safety
src/aiperf/common/models/export_models.py
Adds RunInfo.from_run() classmethod to construct run identity blocks from BenchmarkRun objects; introduces from __future__ import annotations, TYPE_CHECKING-guarded imports, and fixes a forward-reference return annotation.
GET /api/run endpoint implementation
src/aiperf/api/routers/core.py
Adds the /api/run FastAPI endpoint returning RunInfo (with response_model_exclude_none=True) or raising HTTP 503 when no active run exists; updates module and router docstrings and imports.
Exporter integration with RunInfo model
src/aiperf/exporters/metrics_json_exporter.py
Refactors MetricsJsonExporter._generate_content to use RunInfo.from_run(self._run) and removes the module-local _build_run_metadata helper.
Tests, API redaction verification, and documentation
tests/unit/api/routers/test_core.py, docs/index.yml, docs/reference/api-endpoints.md
Adds tests validating /api/run returns only identity fields, omits unset optional fields, returns 503 when no run is active, verifies CLI secret redaction (e.g., --api-key), and adds API reference doc plus navigation entry.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 I hopped to the endpoint at /api/run,

where identities bloom and secrets are done.
The CLI trims its whispers to <redacted> light,
Metrics and health hum day and night,
A little rabbit guards the API’s sight.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the main change: adding a new /api/run endpoint that exposes run-identity metadata, which is the primary objective of the changeset.
Docstring Coverage ✅ Passed Docstring coverage is 86.67% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
tests/unit/api/routers/test_core.py (1)

40-97: ⚡ Quick win

Add a negative-path test for /api/run returning 503 when no run is active.

The router has an explicit 503 branch for missing run context, but this contract is not covered here yet. A single test would lock that behavior and prevent regressions.

Proposed test addition
 class TestRunEndpoint:
@@
+    def test_run_without_active_run_returns_503(
+        self,
+        api_test_client: TestClient,
+        mock_fastapi_service: FastAPIService,
+    ) -> None:
+        mock_fastapi_service.run = None
+        response = api_test_client.get("/api/run")
+        assert response.status_code == 503
+        assert response.json() == {"detail": "No active benchmark run."}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/api/routers/test_core.py` around lines 40 - 97, Add a
negative-path test to TestRunEndpoint that verifies GET /api/run returns 503
when no run is active: create a FastAPIService with no run context (e.g.,
FastAPIService(run=None, service_id="no-run-test")), wrap its app in TestClient,
call client.get("/api/run") and assert response.status_code == 503 and the body
indicates a missing run; place this alongside the other TestRunEndpoint tests to
lock the router's 503 behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@tests/unit/api/routers/test_core.py`:
- Around line 40-97: Add a negative-path test to TestRunEndpoint that verifies
GET /api/run returns 503 when no run is active: create a FastAPIService with no
run context (e.g., FastAPIService(run=None, service_id="no-run-test")), wrap its
app in TestClient, call client.get("/api/run") and assert response.status_code
== 503 and the body indicates a missing run; place this alongside the other
TestRunEndpoint tests to lock the router's 503 behavior.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: f684b964-faae-408d-a6cb-aeeb1d10fc69

📥 Commits

Reviewing files that changed from the base of the PR and between 03167d0 and 047d59d.

📒 Files selected for processing (6)
  • docs/index.yml
  • docs/reference/api-endpoints.md
  • src/aiperf/api/routers/core.py
  • src/aiperf/common/models/export_models.py
  • src/aiperf/exporters/metrics_json_exporter.py
  • tests/unit/api/routers/test_core.py
Comment thread src/aiperf/api/routers/core.py Outdated
@codecov

codecov Bot commented May 26, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Address review feedback from #997.

response_model_exclude_none=True on the /api/run decorator so the
live response matches profile_export_aiperf.json's exclude_none
projection. The route's docstring promised shape parity; without
this flag, unset Optional fields were serialized as null in the
API but omitted on disk.

Tests:
- assert unset Optional fields (sweep_id, random_seed, variation_*)
  are absent from the response, not serialized as null
- lock the 503 "no active benchmark run" contract by mutating
  mock_fastapi_service.run = None, defensively guarding a branch
  the type system makes unreachable today
Comment thread docs/reference/api-endpoints.md

@dynamo-ops dynamo-ops left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previous review comments have been addressed. Approving.

@FrankD412 FrankD412 enabled auto-merge (squash) May 27, 2026 02:11
@FrankD412 FrankD412 merged commit ce24a54 into main May 27, 2026
29 of 30 checks passed
@FrankD412 FrankD412 deleted the mkotila/api-run-endpoint branch May 27, 2026 02:13

@dynamo-ops dynamo-ops left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previous review comments have been addressed. Approving.

saturley-hall pushed a commit that referenced this pull request May 27, 2026
…se/0.9.0 (#1003)

Co-authored-by: Matthew Kotila <7692737+matthewkotila@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

3 participants