fix(api): keep /api/results listener open after benchmark completes (DYN-701) by FrankD412 · Pull Request #989 · ai-dynamo/aiperf

FrankD412 · 2026-05-22T22:55:25Z

Summary

Bug: NVBug 6204962 / DYN-701 — clients polling /api/results never observe terminal status='complete' because the API server is torn down within ~150-200ms of results arriving. The window is shorter than any practical polling cadence, so complete is effectively unreachable.
Fix: API service now owns its shutdown grace. New env var AIPERF_API_SERVER_POST_COMPLETE_GRACE (default 5.0s, range 0.0-60.0) controls how long _stop_api_server keeps uvicorn accepting connections before flipping should_exit = True. SystemController._stop_system_controller extends its pre-shutdown wait to max(0.5s, GRACE) when the API is enabled so the API child isn't SIGKILL'd mid-grace by the 2s per-process join timeout. Setting the env var to 0 recovers the old behavior.
Attribution correction: the reporter's "AI suggestion" pointed at PR feat: YAML-native v2 config + adaptive sweep orchestrator with BO & search recipes #912 / commit 94a9102, but the immediate-stop pattern actually goes back to PR feat: GPU Telemetry #335 (GPU Telemetry), refactored by PR feat: Complete Timing System Overhaul and Sticky Credit Router #565 (Timing Overhaul). The /api/results endpoint itself was added later in PRs feat(api): add FastAPI service with plugin-based router system #717/feat(api): add progress, results, and workers routers with tests #721. This is a long-latent design mismatch surfaced by the API, not a regression.

Test plan

Unit tests: env-var bounds (4 new), _stop_api_server grace ordering (2 new, asserts sleep happens before should_exit=True), SystemController wait extension (3 new — api-disabled keeps 0.5s, api-enabled stretches to grace, floor at 0.5s preserved).
Full tests/unit/ suite passes (12,732 passing; one unrelated pre-existing mlflow_data_exporter flake reproducible on clean main).
End-to-end verification against the NVBug's exact repro steps using tests/aiperf_mock_server:
- GRACE=5s (default): running ... complete observed for ~5.1s, then connection refused. 31 successful complete responses across the grace window with 100ms polling.
- GRACE=0 (env override = recovers buggy behavior): complete window collapses to ~169ms; only 1 observation caught. Confirms the env knob cleanly toggles back.
Reviewer suggestion: smoke-test with --api-port set against a real LLM endpoint to confirm benchmark wall-clock now includes the expected ~5s shutdown tail.
Cherry-pick to release/0.9.0 after merge.

🤖 Generated with Claude Code

Summary by CodeRabbit

New Features
- Added a configurable post-complete API listener grace period (default 5.0s, range 0–300s).
Behavior
- Shutdown sequencing now honors the configured grace period; post-broadcast delivery wait is floored to 0.5s when applicable and skipped in specified skip conditions.
Documentation
- Documented the environment variable, default, allowed range, and that 0 disables the grace window.
Tests
- Added unit tests covering grace behavior, skip conditions, validation, and controller shutdown timing.

copy-pr-bot · 2026-05-22T22:55:28Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

github-actions · 2026-05-22T22:55:33Z

Try out this PR

Quick install:

pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@ca265f243ea67bcf15dcb41ae3f8ede76a0b2c99

Recommended with virtual environment (using uv):

uv venv --python 3.12 && source .venv/bin/activate
uv pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@ca265f243ea67bcf15dcb41ae3f8ede76a0b2c99

Last updated for commit: ca265f2 • Browse code

github-actions · 2026-05-22T22:56:25Z

Fern Docs Preview: https://nvidia-preview-f36471d0-5d0f-4c2a-836e-6a85db17305a.docs.buildwithfern.com/aiperf/dev

coderabbitai · 2026-05-22T22:59:05Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

Walkthrough

Adds a configurable post-benchmark grace period (env AIPERF_API_SERVER_POST_COMPLETE_GRACE) that keeps the API listener open for up to the configured seconds so polling clients can observe final status; integrates config, API shutdown hold, controller coordination, tests, and documentation.

Changes

API Server Grace Window Feature

Layer / File(s)	Summary
Configuration & Validation `src/aiperf/common/environment.py`, `tests/unit/common/test_environment.py`	`_APIServerSettings.POST_COMPLETE_GRACE` is added with range validation (0.0–300.0 seconds, default 5.0). Tests cover default, environment-variable override, zero allowance, and out-of-range rejection.
API Server Grace Window Implementation `src/aiperf/api/api_service.py`, `tests/unit/api/test_api_service.py`	`FastAPIService._stop_api_server` reads `POST_COMPLETE_GRACE`, logs when >0, awaits `asyncio.sleep(grace)` to hold the listener open before shutdown. Tests verify non-zero grace wait, zero fast-path, and skipping when server task already completed.
System Controller Shutdown Coordination `src/aiperf/controller/system_controller.py`, `tests/unit/controller/test_system_controller.py`	`SystemController` adds `_api_enabled` and `_is_api_service_alive()` and computes `delivery_grace` during shutdown as `max(0.5, POST_COMPLETE_GRACE)` when API enabled and alive; otherwise uses 0.5s. Tests check disabled/enabled, minimum floor, and skip conditions.
User Documentation `docs/environment-variables.md`	Documented `AIPERF_API_SERVER_POST_COMPLETE_GRACE` with default, allowed range, and behavior (0 = immediate shutdown).

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐇 Five seconds more I softly stay,
a listening ear for callers' say.
A patient pause before I close,
then gentle threads unbind their bows.
Hooray — the graceful hop, repose.

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title directly and accurately summarizes the main change: implementing a grace period to keep the API results listener open after benchmark completion.
Docstring Coverage	✅ Passed	Docstring coverage is 86.21% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/unit/api/test_api_service.py`:
- Around line 228-233: The type annotation for sleep_calls is wrong: change its
type from list[float] to list[tuple[float, bool]] (or list[tuple[float, bool]]
using typing.Tuple if preferred) so it matches how values are appended in
fake_sleep and later unpacked; update the declaration of sleep_calls near the
fake_sleep async function (which appends (seconds,
bool(mock_server.should_exit))) to use the tuple type.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 3d8b0986-b62a-4dd5-89ad-a4457e0e6c74

📥 Commits

Reviewing files that changed from the base of the PR and between 8b56760 and 1737be7.

📒 Files selected for processing (7)

docs/environment-variables.md
src/aiperf/api/api_service.py
src/aiperf/common/environment.py
src/aiperf/controller/system_controller.py
tests/unit/api/test_api_service.py
tests/unit/common/test_environment.py
tests/unit/controller/test_system_controller.py

✅ Files skipped from review due to trivial changes (1)

docs/environment-variables.md

- Fix sleep_calls type annotation in test_stop_holds_grace_window_before_should_exit (was list[float], now list[tuple[float, bool]] matching the actual append/unpack). - Add docstrings to the new POST_COMPLETE_GRACE env tests and the fake_serve/fake_sleep helpers so docstring coverage meets CodeRabbit's threshold. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>

codecov · 2026-05-23T04:24:17Z

Codecov Report

❌ Patch coverage is 95.45455% with 1 line in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/aiperf/controller/system_controller.py	93.75%	1 Missing ⚠️

📢 Thoughts on this report? Let us know!

Addresses review feedback on PR #989: - Skip the POST_COMPLETE_GRACE sleep when _server_task is missing or already done (startup failure, server-task crash via _on_server_task_done, or no-server unit-test path). Without the guard, _stop_api_server would burn the full grace window with no listener to keep open. - Raise the upper bound on POST_COMPLETE_GRACE from 60s to 300s per reviewer suggestion; 300s gives generous headroom for slow distributed pollers without being unbounded. - Refactor the two grace tests to use the project's time_traveler fixture (.sleeps_for(duration) context manager) instead of ad-hoc asyncio.sleep patching. Reuses the project's blessed virtual-time infrastructure and is less likely to fight with other sleep patches. - Add test_stop_skips_grace_when_server_task_done as a regression test for the active-server guard. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>

…veness Per dynamo-ops bot review on PR #989: SystemController latched _api_enabled right after spawning the API process. If the API never registers (startup failure) or later transitions to FAILED/STOPPED, the extended delivery_grace would still apply — adding 5s+ of pointless shutdown wait for a listener that can't serve any client. Add SystemController._is_api_service_alive() that checks for at least one REGISTERED ServiceRunInfo of type API whose lifecycle state is not in the terminal {STOPPED, FAILED} set. Use it together with _api_enabled to gate the grace extension in _stop_system_controller. Tests: - Renamed the api-enabled tests to make the liveness precondition explicit (now register a RUNNING ServiceRunInfo in service_map). - Added three regression tests: never-registered (startup fail), FAILED state, and STOPPED state — all assert the wait stays at the 0.5s floor. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>

coderabbitai

🧹 Nitpick comments (1)

tests/unit/controller/test_system_controller.py (1)

515-577: ⚡ Quick win

Rename new tests to match the required test naming schema.

The newly added test names don’t follow the required test_<function>_<scenario>_<expected> format.

Proposed renames

-async def test_extends_to_grace_when_api_enabled_and_alive(
+async def test_stop_system_controller_api_enabled_and_alive_uses_post_complete_grace(

-async def test_floors_at_05s_when_api_alive_with_small_grace(
+async def test_stop_system_controller_api_alive_with_small_grace_uses_05s_floor(

-async def test_skips_grace_when_api_enabled_but_never_registered(
+async def test_stop_system_controller_api_enabled_never_registered_uses_05s(

-async def test_skips_grace_when_api_failed(
+async def test_stop_system_controller_api_failed_uses_05s(

-async def test_skips_grace_when_api_stopped(
+async def test_stop_system_controller_api_stopped_uses_05s(

As per coding guidelines: tests/**/*.py: Name test functions as test_<function>_<scenario>_<expected>.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/controller/test_system_controller.py` around lines 515 - 577,
Rename the tests to follow the test_<function>_<scenario>_<expected> naming
convention: update test_extends_to_grace_when_api_enabled_and_alive to
test_drive_stop_api_enabled_registered_running_extends_to_post_complete_grace,
test_floors_at_05s_when_api_alive_with_small_grace to
test_drive_stop_api_alive_small_grace_floors_0_5s,
test_skips_grace_when_api_enabled_but_never_registered to
test_drive_stop_api_enabled_never_registered_skips_grace,
test_skips_grace_when_api_failed to
test_drive_stop_api_registered_failed_skips_grace, and
test_skips_grace_when_api_stopped to
test_drive_stop_api_registered_stopped_skips_grace so they match the required
test_<function>_<scenario>_<expected> schema; locate and rename each function by
its current name in the diff (e.g.,
test_extends_to_grace_when_api_enabled_and_alive,
test_floors_at_05s_when_api_alive_with_small_grace, etc.).

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@tests/unit/controller/test_system_controller.py`:
- Around line 515-577: Rename the tests to follow the
test_<function>_<scenario>_<expected> naming convention: update
test_extends_to_grace_when_api_enabled_and_alive to
test_drive_stop_api_enabled_registered_running_extends_to_post_complete_grace,
test_floors_at_05s_when_api_alive_with_small_grace to
test_drive_stop_api_alive_small_grace_floors_0_5s,
test_skips_grace_when_api_enabled_but_never_registered to
test_drive_stop_api_enabled_never_registered_skips_grace,
test_skips_grace_when_api_failed to
test_drive_stop_api_registered_failed_skips_grace, and
test_skips_grace_when_api_stopped to
test_drive_stop_api_registered_stopped_skips_grace so they match the required
test_<function>_<scenario>_<expected> schema; locate and rename each function by
its current name in the diff (e.g.,
test_extends_to_grace_when_api_enabled_and_alive,
test_floors_at_05s_when_api_alive_with_small_grace, etc.).

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 56c6306d-7403-4dac-80f3-f70eeee178e0

📥 Commits

Reviewing files that changed from the base of the PR and between f3646a1 and 3e17497.

📒 Files selected for processing (2)

src/aiperf/controller/system_controller.py
tests/unit/controller/test_system_controller.py

…DYN-701) The API server's @on_stop hook set uvicorn.should_exit immediately upon receiving ShutdownCommand, so polling clients almost never observed the terminal status='complete' before connection-refused. Add a configurable post-completion grace window (AIPERF_API_SERVER_POST_COMPLETE_GRACE, default 5.0s) during which the listener stays open. SystemController extends its pre-shutdown wait by the same amount when the API is enabled so the API process survives long enough to honor the grace before the parent's per-process join-then-kill timeout fires. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>

- Fix sleep_calls type annotation in test_stop_holds_grace_window_before_should_exit (was list[float], now list[tuple[float, bool]] matching the actual append/unpack). - Add docstrings to the new POST_COMPLETE_GRACE env tests and the fake_serve/fake_sleep helpers so docstring coverage meets CodeRabbit's threshold. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>

Addresses review feedback on PR #989: - Skip the POST_COMPLETE_GRACE sleep when _server_task is missing or already done (startup failure, server-task crash via _on_server_task_done, or no-server unit-test path). Without the guard, _stop_api_server would burn the full grace window with no listener to keep open. - Raise the upper bound on POST_COMPLETE_GRACE from 60s to 300s per reviewer suggestion; 300s gives generous headroom for slow distributed pollers without being unbounded. - Refactor the two grace tests to use the project's time_traveler fixture (.sleeps_for(duration) context manager) instead of ad-hoc asyncio.sleep patching. Reuses the project's blessed virtual-time infrastructure and is less likely to fight with other sleep patches. - Add test_stop_skips_grace_when_server_task_done as a regression test for the active-server guard. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>

…veness Per dynamo-ops bot review on PR #989: SystemController latched _api_enabled right after spawning the API process. If the API never registers (startup failure) or later transitions to FAILED/STOPPED, the extended delivery_grace would still apply — adding 5s+ of pointless shutdown wait for a listener that can't serve any client. Add SystemController._is_api_service_alive() that checks for at least one REGISTERED ServiceRunInfo of type API whose lifecycle state is not in the terminal {STOPPED, FAILED} set. Use it together with _api_enabled to gate the grace extension in _stop_system_controller. Tests: - Renamed the api-enabled tests to make the liveness precondition explicit (now register a RUNNING ServiceRunInfo in service_map). - Added three regression tests: never-registered (startup fail), FAILED state, and STOPPED state — all assert the wait stays at the 0.5s floor. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>

…alive Follow-up to 3e17497. BaseComponentService._on_state_change suppresses StatusMessage publishes once self.stop_requested is set, so the controller's service_map[ServiceType.API][*].state stays frozen at RUNNING after the API process self-stops, crashes, or fails - leaving _is_api_service_alive() returning True and the shutdown grace extended on a dead listener. On the multiprocess backend, cross-check process.is_alive() via service_manager.multi_process_info as the authoritative signal. Other backends (K8s manager is currently a stub) fall back to the prior registration/state check. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>

dynamo-ops

Previous review comments have been addressed. Approving.

… to release/0.9.0 (#1002) Co-authored-by: Frank <3429989+FrankD412@users.noreply.github.com>

github-actions Bot added the fix label May 22, 2026

FrankD412 requested review from ajcasagrande and debermudez May 22, 2026 22:57

FrankD412 force-pushed the fdinatale/api-shutdown-grace branch from 8b56760 to 1737be7 Compare May 22, 2026 23:04

coderabbitai Bot reviewed May 22, 2026

View reviewed changes

Comment thread tests/unit/api/test_api_service.py Outdated

dynamo-ops reviewed May 22, 2026

View reviewed changes

Comment thread src/aiperf/api/api_service.py Outdated

ajcasagrande reviewed May 22, 2026

View reviewed changes

Comment thread docs/environment-variables.md Outdated

Comment thread src/aiperf/api/api_service.py Outdated

Comment thread tests/unit/api/test_api_service.py Outdated

FrankD412 requested review from ajcasagrande and dynamo-ops May 23, 2026 06:41

dynamo-ops reviewed May 23, 2026

View reviewed changes

Comment thread src/aiperf/controller/system_controller.py Outdated

coderabbitai Bot reviewed May 23, 2026

View reviewed changes

dynamo-ops reviewed May 23, 2026

View reviewed changes

Comment thread src/aiperf/controller/system_controller.py

FrankD412 and others added 5 commits May 26, 2026 16:49

FrankD412 force-pushed the fdinatale/api-shutdown-grace branch from eaefd82 to a806a0a Compare May 26, 2026 23:51

dynamo-ops approved these changes May 26, 2026

View reviewed changes

debermudez approved these changes May 27, 2026

View reviewed changes

Merge branch 'main' into fdinatale/api-shutdown-grace

ca265f2

FrankD412 enabled auto-merge (squash) May 27, 2026 01:10

FrankD412 merged commit 607977c into main May 27, 2026
24 checks passed

FrankD412 deleted the fdinatale/api-shutdown-grace branch May 27, 2026 01:12

debermudez mentioned this pull request May 27, 2026

fix(api): cherry-pick #989 (keep /api/results listener open, DYN-701) to release/0.9.0 #1002

Merged

2 tasks

saturley-hall pushed a commit that referenced this pull request May 27, 2026

fix(api): cherry-pick #989 (keep /api/results listener open, DYN-701)…

45213c3

… to release/0.9.0 (#1002) Co-authored-by: Frank <3429989+FrankD412@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(api): keep /api/results listener open after benchmark completes (DYN-701)#989

fix(api): keep /api/results listener open after benchmark completes (DYN-701)#989
FrankD412 merged 6 commits into
mainfrom
fdinatale/api-shutdown-grace

FrankD412 commented May 22, 2026 •

edited by coderabbitai Bot

Loading

copy-pr-bot Bot commented May 22, 2026

github-actions Bot commented May 22, 2026 •

edited

Loading

github-actions Bot commented May 22, 2026 •

edited

Loading

coderabbitai Bot commented May 22, 2026 •

edited

Loading

Reviews paused

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov Bot commented May 23, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

dynamo-ops left a comment

Uh oh!

Labels

4 participants

Uh oh!

Conversation

FrankD412 commented May 22, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Summary by CodeRabbit

copy-pr-bot Bot commented May 22, 2026

github-actions Bot commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Try out this PR

github-actions Bot commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

coderabbitai Bot commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Estimated code review effort

Poem

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov Bot commented May 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

dynamo-ops left a comment

Choose a reason for hiding this comment

Uh oh!

Labels

4 participants

FrankD412 commented May 22, 2026 •

edited by coderabbitai Bot

Loading

github-actions Bot commented May 22, 2026 •

edited

Loading

github-actions Bot commented May 22, 2026 •

edited

Loading

coderabbitai Bot commented May 22, 2026 •

edited

Loading

codecov Bot commented May 23, 2026 •

edited

Loading