Skip to content

fix(api): keep /api/results listener open after benchmark completes (DYN-701)#989

Merged
FrankD412 merged 6 commits into
mainfrom
fdinatale/api-shutdown-grace
May 27, 2026
Merged

fix(api): keep /api/results listener open after benchmark completes (DYN-701)#989
FrankD412 merged 6 commits into
mainfrom
fdinatale/api-shutdown-grace

Conversation

@FrankD412

@FrankD412 FrankD412 commented May 22, 2026

Copy link
Copy Markdown
Contributor

Summary

Test plan

  • Unit tests: env-var bounds (4 new), _stop_api_server grace ordering (2 new, asserts sleep happens before should_exit=True), SystemController wait extension (3 new — api-disabled keeps 0.5s, api-enabled stretches to grace, floor at 0.5s preserved).
  • Full tests/unit/ suite passes (12,732 passing; one unrelated pre-existing mlflow_data_exporter flake reproducible on clean main).
  • End-to-end verification against the NVBug's exact repro steps using tests/aiperf_mock_server:
    • GRACE=5s (default): running ... complete observed for ~5.1s, then connection refused. 31 successful complete responses across the grace window with 100ms polling.
    • GRACE=0 (env override = recovers buggy behavior): complete window collapses to ~169ms; only 1 observation caught. Confirms the env knob cleanly toggles back.
  • Reviewer suggestion: smoke-test with --api-port set against a real LLM endpoint to confirm benchmark wall-clock now includes the expected ~5s shutdown tail.
  • Cherry-pick to release/0.9.0 after merge.

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Added a configurable post-complete API listener grace period (default 5.0s, range 0–300s).
  • Behavior

    • Shutdown sequencing now honors the configured grace period; post-broadcast delivery wait is floored to 0.5s when applicable and skipped in specified skip conditions.
  • Documentation

    • Documented the environment variable, default, allowed range, and that 0 disables the grace window.
  • Tests

    • Added unit tests covering grace behavior, skip conditions, validation, and controller shutdown timing.

Review Change Stack

@copy-pr-bot

copy-pr-bot Bot commented May 22, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions

github-actions Bot commented May 22, 2026

Copy link
Copy Markdown

Try out this PR

Quick install:

pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@ca265f243ea67bcf15dcb41ae3f8ede76a0b2c99

Recommended with virtual environment (using uv):

uv venv --python 3.12 && source .venv/bin/activate
uv pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@ca265f243ea67bcf15dcb41ae3f8ede76a0b2c99

Last updated for commit: ca265f2Browse code

@github-actions github-actions Bot added the fix label May 22, 2026
@github-actions

github-actions Bot commented May 22, 2026

Copy link
Copy Markdown
@coderabbitai

coderabbitai Bot commented May 22, 2026

Copy link
Copy Markdown
Contributor

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Adds a configurable post-benchmark grace period (env AIPERF_API_SERVER_POST_COMPLETE_GRACE) that keeps the API listener open for up to the configured seconds so polling clients can observe final status; integrates config, API shutdown hold, controller coordination, tests, and documentation.

Changes

API Server Grace Window Feature

Layer / File(s) Summary
Configuration & Validation
src/aiperf/common/environment.py, tests/unit/common/test_environment.py
_APIServerSettings.POST_COMPLETE_GRACE is added with range validation (0.0–300.0 seconds, default 5.0). Tests cover default, environment-variable override, zero allowance, and out-of-range rejection.
API Server Grace Window Implementation
src/aiperf/api/api_service.py, tests/unit/api/test_api_service.py
FastAPIService._stop_api_server reads POST_COMPLETE_GRACE, logs when >0, awaits asyncio.sleep(grace) to hold the listener open before shutdown. Tests verify non-zero grace wait, zero fast-path, and skipping when server task already completed.
System Controller Shutdown Coordination
src/aiperf/controller/system_controller.py, tests/unit/controller/test_system_controller.py
SystemController adds _api_enabled and _is_api_service_alive() and computes delivery_grace during shutdown as max(0.5, POST_COMPLETE_GRACE) when API enabled and alive; otherwise uses 0.5s. Tests check disabled/enabled, minimum floor, and skip conditions.
User Documentation
docs/environment-variables.md
Documented AIPERF_API_SERVER_POST_COMPLETE_GRACE with default, allowed range, and behavior (0 = immediate shutdown).

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes


Poem

🐇 Five seconds more I softly stay,
a listening ear for callers' say.
A patient pause before I close,
then gentle threads unbind their bows.
Hooray — the graceful hop, repose.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title directly and accurately summarizes the main change: implementing a grace period to keep the API results listener open after benchmark completion.
Docstring Coverage ✅ Passed Docstring coverage is 86.21% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@FrankD412 FrankD412 force-pushed the fdinatale/api-shutdown-grace branch from 8b56760 to 1737be7 Compare May 22, 2026 23:04

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/unit/api/test_api_service.py`:
- Around line 228-233: The type annotation for sleep_calls is wrong: change its
type from list[float] to list[tuple[float, bool]] (or list[tuple[float, bool]]
using typing.Tuple if preferred) so it matches how values are appended in
fake_sleep and later unpacked; update the declaration of sleep_calls near the
fake_sleep async function (which appends (seconds,
bool(mock_server.should_exit))) to use the tuple type.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 3d8b0986-b62a-4dd5-89ad-a4457e0e6c74

📥 Commits

Reviewing files that changed from the base of the PR and between 8b56760 and 1737be7.

📒 Files selected for processing (7)
  • docs/environment-variables.md
  • src/aiperf/api/api_service.py
  • src/aiperf/common/environment.py
  • src/aiperf/controller/system_controller.py
  • tests/unit/api/test_api_service.py
  • tests/unit/common/test_environment.py
  • tests/unit/controller/test_system_controller.py
✅ Files skipped from review due to trivial changes (1)
  • docs/environment-variables.md
Comment thread tests/unit/api/test_api_service.py Outdated
Comment thread src/aiperf/api/api_service.py Outdated
FrankD412 added a commit that referenced this pull request May 22, 2026
- Fix sleep_calls type annotation in test_stop_holds_grace_window_before_should_exit
  (was list[float], now list[tuple[float, bool]] matching the actual append/unpack).
- Add docstrings to the new POST_COMPLETE_GRACE env tests and the fake_serve/fake_sleep
  helpers so docstring coverage meets CodeRabbit's threshold.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
Comment thread docs/environment-variables.md Outdated
Comment thread src/aiperf/api/api_service.py Outdated
Comment thread tests/unit/api/test_api_service.py Outdated
@codecov

codecov Bot commented May 23, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 95.45455% with 1 line in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/aiperf/controller/system_controller.py 93.75% 1 Missing ⚠️

📢 Thoughts on this report? Let us know!

FrankD412 added a commit that referenced this pull request May 23, 2026
Addresses review feedback on PR #989:

- Skip the POST_COMPLETE_GRACE sleep when _server_task is missing or
  already done (startup failure, server-task crash via _on_server_task_done,
  or no-server unit-test path). Without the guard, _stop_api_server would
  burn the full grace window with no listener to keep open.
- Raise the upper bound on POST_COMPLETE_GRACE from 60s to 300s per
  reviewer suggestion; 300s gives generous headroom for slow distributed
  pollers without being unbounded.
- Refactor the two grace tests to use the project's time_traveler
  fixture (.sleeps_for(duration) context manager) instead of ad-hoc
  asyncio.sleep patching. Reuses the project's blessed virtual-time
  infrastructure and is less likely to fight with other sleep patches.
- Add test_stop_skips_grace_when_server_task_done as a regression test
  for the active-server guard.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
Comment thread src/aiperf/controller/system_controller.py Outdated
FrankD412 added a commit that referenced this pull request May 23, 2026
…veness

Per dynamo-ops bot review on PR #989: SystemController latched _api_enabled
right after spawning the API process. If the API never registers (startup
failure) or later transitions to FAILED/STOPPED, the extended delivery_grace
would still apply — adding 5s+ of pointless shutdown wait for a listener
that can't serve any client.

Add SystemController._is_api_service_alive() that checks for at least one
REGISTERED ServiceRunInfo of type API whose lifecycle state is not in the
terminal {STOPPED, FAILED} set. Use it together with _api_enabled to gate
the grace extension in _stop_system_controller.

Tests:
- Renamed the api-enabled tests to make the liveness precondition explicit
  (now register a RUNNING ServiceRunInfo in service_map).
- Added three regression tests: never-registered (startup fail), FAILED
  state, and STOPPED state — all assert the wait stays at the 0.5s floor.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
tests/unit/controller/test_system_controller.py (1)

515-577: ⚡ Quick win

Rename new tests to match the required test naming schema.

The newly added test names don’t follow the required test_<function>_<scenario>_<expected> format.

Proposed renames
-async def test_extends_to_grace_when_api_enabled_and_alive(
+async def test_stop_system_controller_api_enabled_and_alive_uses_post_complete_grace(

-async def test_floors_at_05s_when_api_alive_with_small_grace(
+async def test_stop_system_controller_api_alive_with_small_grace_uses_05s_floor(

-async def test_skips_grace_when_api_enabled_but_never_registered(
+async def test_stop_system_controller_api_enabled_never_registered_uses_05s(

-async def test_skips_grace_when_api_failed(
+async def test_stop_system_controller_api_failed_uses_05s(

-async def test_skips_grace_when_api_stopped(
+async def test_stop_system_controller_api_stopped_uses_05s(

As per coding guidelines: tests/**/*.py: Name test functions as test_<function>_<scenario>_<expected>.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/controller/test_system_controller.py` around lines 515 - 577,
Rename the tests to follow the test_<function>_<scenario>_<expected> naming
convention: update test_extends_to_grace_when_api_enabled_and_alive to
test_drive_stop_api_enabled_registered_running_extends_to_post_complete_grace,
test_floors_at_05s_when_api_alive_with_small_grace to
test_drive_stop_api_alive_small_grace_floors_0_5s,
test_skips_grace_when_api_enabled_but_never_registered to
test_drive_stop_api_enabled_never_registered_skips_grace,
test_skips_grace_when_api_failed to
test_drive_stop_api_registered_failed_skips_grace, and
test_skips_grace_when_api_stopped to
test_drive_stop_api_registered_stopped_skips_grace so they match the required
test_<function>_<scenario>_<expected> schema; locate and rename each function by
its current name in the diff (e.g.,
test_extends_to_grace_when_api_enabled_and_alive,
test_floors_at_05s_when_api_alive_with_small_grace, etc.).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@tests/unit/controller/test_system_controller.py`:
- Around line 515-577: Rename the tests to follow the
test_<function>_<scenario>_<expected> naming convention: update
test_extends_to_grace_when_api_enabled_and_alive to
test_drive_stop_api_enabled_registered_running_extends_to_post_complete_grace,
test_floors_at_05s_when_api_alive_with_small_grace to
test_drive_stop_api_alive_small_grace_floors_0_5s,
test_skips_grace_when_api_enabled_but_never_registered to
test_drive_stop_api_enabled_never_registered_skips_grace,
test_skips_grace_when_api_failed to
test_drive_stop_api_registered_failed_skips_grace, and
test_skips_grace_when_api_stopped to
test_drive_stop_api_registered_stopped_skips_grace so they match the required
test_<function>_<scenario>_<expected> schema; locate and rename each function by
its current name in the diff (e.g.,
test_extends_to_grace_when_api_enabled_and_alive,
test_floors_at_05s_when_api_alive_with_small_grace, etc.).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 56c6306d-7403-4dac-80f3-f70eeee178e0

📥 Commits

Reviewing files that changed from the base of the PR and between f3646a1 and 3e17497.

📒 Files selected for processing (2)
  • src/aiperf/controller/system_controller.py
  • tests/unit/controller/test_system_controller.py
Comment thread src/aiperf/controller/system_controller.py
FrankD412 and others added 5 commits May 26, 2026 16:49
…DYN-701)

The API server's @on_stop hook set uvicorn.should_exit immediately upon
receiving ShutdownCommand, so polling clients almost never observed the
terminal status='complete' before connection-refused. Add a configurable
post-completion grace window (AIPERF_API_SERVER_POST_COMPLETE_GRACE,
default 5.0s) during which the listener stays open. SystemController
extends its pre-shutdown wait by the same amount when the API is enabled
so the API process survives long enough to honor the grace before the
parent's per-process join-then-kill timeout fires.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
- Fix sleep_calls type annotation in test_stop_holds_grace_window_before_should_exit
  (was list[float], now list[tuple[float, bool]] matching the actual append/unpack).
- Add docstrings to the new POST_COMPLETE_GRACE env tests and the fake_serve/fake_sleep
  helpers so docstring coverage meets CodeRabbit's threshold.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
Addresses review feedback on PR #989:

- Skip the POST_COMPLETE_GRACE sleep when _server_task is missing or
  already done (startup failure, server-task crash via _on_server_task_done,
  or no-server unit-test path). Without the guard, _stop_api_server would
  burn the full grace window with no listener to keep open.
- Raise the upper bound on POST_COMPLETE_GRACE from 60s to 300s per
  reviewer suggestion; 300s gives generous headroom for slow distributed
  pollers without being unbounded.
- Refactor the two grace tests to use the project's time_traveler
  fixture (.sleeps_for(duration) context manager) instead of ad-hoc
  asyncio.sleep patching. Reuses the project's blessed virtual-time
  infrastructure and is less likely to fight with other sleep patches.
- Add test_stop_skips_grace_when_server_task_done as a regression test
  for the active-server guard.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
…veness

Per dynamo-ops bot review on PR #989: SystemController latched _api_enabled
right after spawning the API process. If the API never registers (startup
failure) or later transitions to FAILED/STOPPED, the extended delivery_grace
would still apply — adding 5s+ of pointless shutdown wait for a listener
that can't serve any client.

Add SystemController._is_api_service_alive() that checks for at least one
REGISTERED ServiceRunInfo of type API whose lifecycle state is not in the
terminal {STOPPED, FAILED} set. Use it together with _api_enabled to gate
the grace extension in _stop_system_controller.

Tests:
- Renamed the api-enabled tests to make the liveness precondition explicit
  (now register a RUNNING ServiceRunInfo in service_map).
- Added three regression tests: never-registered (startup fail), FAILED
  state, and STOPPED state — all assert the wait stays at the 0.5s floor.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
…alive

Follow-up to 3e17497. BaseComponentService._on_state_change suppresses
StatusMessage publishes once self.stop_requested is set, so the controller's
service_map[ServiceType.API][*].state stays frozen at RUNNING after the API
process self-stops, crashes, or fails - leaving _is_api_service_alive()
returning True and the shutdown grace extended on a dead listener.

On the multiprocess backend, cross-check process.is_alive() via
service_manager.multi_process_info as the authoritative signal. Other
backends (K8s manager is currently a stub) fall back to the prior
registration/state check.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
@FrankD412 FrankD412 force-pushed the fdinatale/api-shutdown-grace branch from eaefd82 to a806a0a Compare May 26, 2026 23:51

@dynamo-ops dynamo-ops left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previous review comments have been addressed. Approving.

@FrankD412 FrankD412 enabled auto-merge (squash) May 27, 2026 01:10
@FrankD412 FrankD412 merged commit 607977c into main May 27, 2026
24 checks passed
@FrankD412 FrankD412 deleted the fdinatale/api-shutdown-grace branch May 27, 2026 01:12
saturley-hall pushed a commit that referenced this pull request May 27, 2026
… to release/0.9.0 (#1002)

Co-authored-by: Frank <3429989+FrankD412@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

4 participants