fix(api): keep /api/results listener open after benchmark completes (DYN-701)#989
Conversation
Try out this PRQuick install: pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@ca265f243ea67bcf15dcb41ae3f8ede76a0b2c99Recommended with virtual environment (using uv): uv venv --python 3.12 && source .venv/bin/activate
uv pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@ca265f243ea67bcf15dcb41ae3f8ede76a0b2c99Last updated for commit: |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughAdds a configurable post-benchmark grace period (env ChangesAPI Server Grace Window Feature
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
8b56760 to
1737be7
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tests/unit/api/test_api_service.py`:
- Around line 228-233: The type annotation for sleep_calls is wrong: change its
type from list[float] to list[tuple[float, bool]] (or list[tuple[float, bool]]
using typing.Tuple if preferred) so it matches how values are appended in
fake_sleep and later unpacked; update the declaration of sleep_calls near the
fake_sleep async function (which appends (seconds,
bool(mock_server.should_exit))) to use the tuple type.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 3d8b0986-b62a-4dd5-89ad-a4457e0e6c74
📒 Files selected for processing (7)
docs/environment-variables.mdsrc/aiperf/api/api_service.pysrc/aiperf/common/environment.pysrc/aiperf/controller/system_controller.pytests/unit/api/test_api_service.pytests/unit/common/test_environment.pytests/unit/controller/test_system_controller.py
✅ Files skipped from review due to trivial changes (1)
- docs/environment-variables.md
- Fix sleep_calls type annotation in test_stop_holds_grace_window_before_should_exit (was list[float], now list[tuple[float, bool]] matching the actual append/unpack). - Add docstrings to the new POST_COMPLETE_GRACE env tests and the fake_serve/fake_sleep helpers so docstring coverage meets CodeRabbit's threshold. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
Addresses review feedback on PR #989: - Skip the POST_COMPLETE_GRACE sleep when _server_task is missing or already done (startup failure, server-task crash via _on_server_task_done, or no-server unit-test path). Without the guard, _stop_api_server would burn the full grace window with no listener to keep open. - Raise the upper bound on POST_COMPLETE_GRACE from 60s to 300s per reviewer suggestion; 300s gives generous headroom for slow distributed pollers without being unbounded. - Refactor the two grace tests to use the project's time_traveler fixture (.sleeps_for(duration) context manager) instead of ad-hoc asyncio.sleep patching. Reuses the project's blessed virtual-time infrastructure and is less likely to fight with other sleep patches. - Add test_stop_skips_grace_when_server_task_done as a regression test for the active-server guard. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
…veness Per dynamo-ops bot review on PR #989: SystemController latched _api_enabled right after spawning the API process. If the API never registers (startup failure) or later transitions to FAILED/STOPPED, the extended delivery_grace would still apply — adding 5s+ of pointless shutdown wait for a listener that can't serve any client. Add SystemController._is_api_service_alive() that checks for at least one REGISTERED ServiceRunInfo of type API whose lifecycle state is not in the terminal {STOPPED, FAILED} set. Use it together with _api_enabled to gate the grace extension in _stop_system_controller. Tests: - Renamed the api-enabled tests to make the liveness precondition explicit (now register a RUNNING ServiceRunInfo in service_map). - Added three regression tests: never-registered (startup fail), FAILED state, and STOPPED state — all assert the wait stays at the 0.5s floor. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
There was a problem hiding this comment.
🧹 Nitpick comments (1)
tests/unit/controller/test_system_controller.py (1)
515-577: ⚡ Quick winRename new tests to match the required test naming schema.
The newly added test names don’t follow the required
test_<function>_<scenario>_<expected>format.Proposed renames
-async def test_extends_to_grace_when_api_enabled_and_alive( +async def test_stop_system_controller_api_enabled_and_alive_uses_post_complete_grace( -async def test_floors_at_05s_when_api_alive_with_small_grace( +async def test_stop_system_controller_api_alive_with_small_grace_uses_05s_floor( -async def test_skips_grace_when_api_enabled_but_never_registered( +async def test_stop_system_controller_api_enabled_never_registered_uses_05s( -async def test_skips_grace_when_api_failed( +async def test_stop_system_controller_api_failed_uses_05s( -async def test_skips_grace_when_api_stopped( +async def test_stop_system_controller_api_stopped_uses_05s(As per coding guidelines:
tests/**/*.py: Name test functions astest_<function>_<scenario>_<expected>.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/unit/controller/test_system_controller.py` around lines 515 - 577, Rename the tests to follow the test_<function>_<scenario>_<expected> naming convention: update test_extends_to_grace_when_api_enabled_and_alive to test_drive_stop_api_enabled_registered_running_extends_to_post_complete_grace, test_floors_at_05s_when_api_alive_with_small_grace to test_drive_stop_api_alive_small_grace_floors_0_5s, test_skips_grace_when_api_enabled_but_never_registered to test_drive_stop_api_enabled_never_registered_skips_grace, test_skips_grace_when_api_failed to test_drive_stop_api_registered_failed_skips_grace, and test_skips_grace_when_api_stopped to test_drive_stop_api_registered_stopped_skips_grace so they match the required test_<function>_<scenario>_<expected> schema; locate and rename each function by its current name in the diff (e.g., test_extends_to_grace_when_api_enabled_and_alive, test_floors_at_05s_when_api_alive_with_small_grace, etc.).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@tests/unit/controller/test_system_controller.py`:
- Around line 515-577: Rename the tests to follow the
test_<function>_<scenario>_<expected> naming convention: update
test_extends_to_grace_when_api_enabled_and_alive to
test_drive_stop_api_enabled_registered_running_extends_to_post_complete_grace,
test_floors_at_05s_when_api_alive_with_small_grace to
test_drive_stop_api_alive_small_grace_floors_0_5s,
test_skips_grace_when_api_enabled_but_never_registered to
test_drive_stop_api_enabled_never_registered_skips_grace,
test_skips_grace_when_api_failed to
test_drive_stop_api_registered_failed_skips_grace, and
test_skips_grace_when_api_stopped to
test_drive_stop_api_registered_stopped_skips_grace so they match the required
test_<function>_<scenario>_<expected> schema; locate and rename each function by
its current name in the diff (e.g.,
test_extends_to_grace_when_api_enabled_and_alive,
test_floors_at_05s_when_api_alive_with_small_grace, etc.).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 56c6306d-7403-4dac-80f3-f70eeee178e0
📒 Files selected for processing (2)
src/aiperf/controller/system_controller.pytests/unit/controller/test_system_controller.py
…DYN-701) The API server's @on_stop hook set uvicorn.should_exit immediately upon receiving ShutdownCommand, so polling clients almost never observed the terminal status='complete' before connection-refused. Add a configurable post-completion grace window (AIPERF_API_SERVER_POST_COMPLETE_GRACE, default 5.0s) during which the listener stays open. SystemController extends its pre-shutdown wait by the same amount when the API is enabled so the API process survives long enough to honor the grace before the parent's per-process join-then-kill timeout fires. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
- Fix sleep_calls type annotation in test_stop_holds_grace_window_before_should_exit (was list[float], now list[tuple[float, bool]] matching the actual append/unpack). - Add docstrings to the new POST_COMPLETE_GRACE env tests and the fake_serve/fake_sleep helpers so docstring coverage meets CodeRabbit's threshold. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
Addresses review feedback on PR #989: - Skip the POST_COMPLETE_GRACE sleep when _server_task is missing or already done (startup failure, server-task crash via _on_server_task_done, or no-server unit-test path). Without the guard, _stop_api_server would burn the full grace window with no listener to keep open. - Raise the upper bound on POST_COMPLETE_GRACE from 60s to 300s per reviewer suggestion; 300s gives generous headroom for slow distributed pollers without being unbounded. - Refactor the two grace tests to use the project's time_traveler fixture (.sleeps_for(duration) context manager) instead of ad-hoc asyncio.sleep patching. Reuses the project's blessed virtual-time infrastructure and is less likely to fight with other sleep patches. - Add test_stop_skips_grace_when_server_task_done as a regression test for the active-server guard. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
…veness Per dynamo-ops bot review on PR #989: SystemController latched _api_enabled right after spawning the API process. If the API never registers (startup failure) or later transitions to FAILED/STOPPED, the extended delivery_grace would still apply — adding 5s+ of pointless shutdown wait for a listener that can't serve any client. Add SystemController._is_api_service_alive() that checks for at least one REGISTERED ServiceRunInfo of type API whose lifecycle state is not in the terminal {STOPPED, FAILED} set. Use it together with _api_enabled to gate the grace extension in _stop_system_controller. Tests: - Renamed the api-enabled tests to make the liveness precondition explicit (now register a RUNNING ServiceRunInfo in service_map). - Added three regression tests: never-registered (startup fail), FAILED state, and STOPPED state — all assert the wait stays at the 0.5s floor. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
…alive Follow-up to 3e17497. BaseComponentService._on_state_change suppresses StatusMessage publishes once self.stop_requested is set, so the controller's service_map[ServiceType.API][*].state stays frozen at RUNNING after the API process self-stops, crashes, or fails - leaving _is_api_service_alive() returning True and the shutdown grace extended on a dead listener. On the multiprocess backend, cross-check process.is_alive() via service_manager.multi_process_info as the authoritative signal. Other backends (K8s manager is currently a stub) fall back to the prior registration/state check. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
eaefd82 to
a806a0a
Compare
dynamo-ops
left a comment
There was a problem hiding this comment.
Previous review comments have been addressed. Approving.
… to release/0.9.0 (#1002) Co-authored-by: Frank <3429989+FrankD412@users.noreply.github.com>
Summary
/api/resultsnever observe terminalstatus='complete'because the API server is torn down within ~150-200ms of results arriving. The window is shorter than any practical polling cadence, socompleteis effectively unreachable.AIPERF_API_SERVER_POST_COMPLETE_GRACE(default 5.0s, range 0.0-60.0) controls how long_stop_api_serverkeeps uvicorn accepting connections before flippingshould_exit = True.SystemController._stop_system_controllerextends its pre-shutdown wait tomax(0.5s, GRACE)when the API is enabled so the API child isn't SIGKILL'd mid-grace by the 2s per-process join timeout. Setting the env var to0recovers the old behavior./api/resultsendpoint itself was added later in PRs feat(api): add FastAPI service with plugin-based router system #717/feat(api): add progress, results, and workers routers with tests #721. This is a long-latent design mismatch surfaced by the API, not a regression.Test plan
_stop_api_servergrace ordering (2 new, asserts sleep happens beforeshould_exit=True),SystemControllerwait extension (3 new — api-disabled keeps 0.5s, api-enabled stretches to grace, floor at 0.5s preserved).tests/unit/suite passes (12,732 passing; one unrelated pre-existingmlflow_data_exporterflake reproducible on cleanmain).tests/aiperf_mock_server:GRACE=5s(default):running ... completeobserved for ~5.1s, then connection refused. 31 successfulcompleteresponses across the grace window with 100ms polling.GRACE=0(env override = recovers buggy behavior):completewindow collapses to ~169ms; only 1 observation caught. Confirms the env knob cleanly toggles back.--api-portset against a real LLM endpoint to confirm benchmark wall-clock now includes the expected ~5s shutdown tail.release/0.9.0after merge.🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Behavior
Documentation
Tests