feat: add cleanup=True support to AsyncSandboxClient#859
Conversation
✅ Deploy Preview for agent-sandbox canceled.
|
|
Hi @RidPra. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Regular contributors should join the org to skip this step. Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
There was a problem hiding this comment.
Pull request overview
Adds cleanup=True support to the Python AsyncSandboxClient to mirror the synchronous client’s optional atexit-based cleanup behavior, aiming to reduce orphaned sandbox infrastructure when callers forget explicit teardown.
Changes:
- Add
cleanupparameter toAsyncSandboxClientand register anatexithook to rundelete_all()on interpreter exit. - Update
AsyncSandboxClientdocstring/constructor docs to describe the new cleanup behavior. - Add unit tests asserting
atexit.register()is (or is not) called based oncleanup.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
clients/python/agentic-sandbox-client/k8s_agent_sandbox/async_sandbox_client.py |
Introduces cleanup option and atexit hook; updates documentation. |
clients/python/agentic-sandbox-client/k8s_agent_sandbox/test/unit/test_async_sandboxclient.py |
Adds unit tests around atexit registration behavior. |
|
|
||
| Uses a snapshot of the tracked claims and a fresh :class:`AsyncK8sHelper` | ||
| so that no loop-bound objects from the original client are reused across | ||
| event loop boundaries. All errors are silently suppressed — atexit |
There was a problem hiding this comment.
Should we retry before silently failing and logging if the cleanup isn't successful?
There was a problem hiding this comment.
Thanks @SHRUTI6991 for the review, made the change as suggested
barney-s
left a comment
There was a problem hiding this comment.
Thanks for properly using a fresh AsyncK8sHelper in a new event loop via asyncio.run to handle cleanup safely after the main loop has exited.
| if not claims: | ||
| return | ||
|
|
||
| async def _do_cleanup(): |
There was a problem hiding this comment.
Since deletion involves making HTTP requests to the Kubernetes API, deleting active sandboxes sequentially inside the atexit handler can delay program termination, especially if there are multiple sandboxes or network lag.
We can optimize this to run the cleanups concurrently using asyncio.gather:
async def _do_cleanup():
helper = AsyncK8sHelper()
try:
async def _delete_one(ns, claim_name):
try:
await helper.delete_sandbox_claim(claim_name, ns)
except Exception as e:
if sys.stderr is not None:
print(
f"[agent-sandbox] Warning: failed to delete sandbox claim "
f"'{claim_name}' in namespace '{ns}' during atexit cleanup: {e}",
file=sys.stderr,
)
await asyncio.gather(*(_delete_one(ns, claim_name) for ns, claim_name in claims))
finally:
await helper.close()There was a problem hiding this comment.
Thanks @barney-s , it makes sense, made the changes as you suggested
|
|
||
| with patch("k8s_agent_sandbox.async_sandbox_client.AsyncK8sHelper", return_value=mock_helper_instance): | ||
| self.client._atexit_cleanup() | ||
|
|
There was a problem hiding this comment.
Using standard assert_any_call is more idiomatic and readable than manually parsing call_args_list.
For example:
mock_helper_instance.delete_sandbox_claim.assert_any_call("claim-abc", "default")
mock_helper_instance.delete_sandbox_claim.assert_any_call("claim-xyz", "other-ns")There was a problem hiding this comment.
Made the change as mentioned, thanks for the review
| self, | ||
| connection_config: SandboxConnectionConfig | None = None, | ||
| tracer_config: SandboxTracerConfig | None = None, | ||
| cleanup: bool = False, |
There was a problem hiding this comment.
should this be true by default ?
There was a problem hiding this comment.
Since SandboxClient defaults to cleanup=False (sandbox_client.py#L60), I kept the same default in AsyncSandboxClient for consistency. That said, Open to changing it to True if that's the preferred behavior.
There was a problem hiding this comment.
Please open an issue to track this. No need to tackle in this PR
|
/ok-to-test |
|
Please address or comment/close on the copilot comments. |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughAsyncSandboxClient adds optional process-termination cleanup via a new ChangesProcess Termination Cleanup for AsyncSandboxClient
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In
`@clients/python/agentic-sandbox-client/k8s_agent_sandbox/async_sandbox_client.py`:
- Around line 81-86: The docstring for the public cleanup parameter and the
private method _atexit_cleanup are inaccurate — the cleanup behavior is not
silent but prints warnings to sys.stderr; update the public parameter
description for cleanup to say that failures are best-effort and will emit
warnings to sys.stderr (not silently ignored) and edit the _atexit_cleanup
docstring to document that it writes per-claim and top-level warnings to
sys.stderr on failures, preserving the note that it is best-effort and safe to
call after the main event loop exits.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 4680399b-ed54-4c18-b48a-7999ef152414
📒 Files selected for processing (2)
clients/python/agentic-sandbox-client/k8s_agent_sandbox/async_sandbox_client.pyclients/python/agentic-sandbox-client/k8s_agent_sandbox/test/unit/test_async_sandboxclient.py
janetkuo
left a comment
There was a problem hiding this comment.
lgtm
@SHRUTI6991 @barney-s would you like to take another look before we merge this?
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: janetkuo, RidPra The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/lgtm Thanks @RidPra — my earlier requested change (guarding the outer |
…#859) * feat: add cleanup=True support to AsyncSandboxClient * Fix atexit cleanup to use fresh event loop resources * Print stderr warning on atexit cleanup failure instead of silently ignoring * Use asyncio.gather for concurrent atexit cleanup and assert_any_call in tests * Guard outer stderr print with sys.stderr is not None check * Update docstrings to reflect stderr warnings instead of silent suppression
…#859) * feat: add cleanup=True support to AsyncSandboxClient * Fix atexit cleanup to use fresh event loop resources * Print stderr warning on atexit cleanup failure instead of silently ignoring * Use asyncio.gather for concurrent atexit cleanup and assert_any_call in tests * Guard outer stderr print with sys.stderr is not None check * Update docstrings to reflect stderr warnings instead of silent suppression
What this PR does / why we need it:
Adds
cleanup=Truesupport toAsyncSandboxClient, matching the existingbehavior in the synchronous
SandboxClient.When
cleanup=Trueis passed, anatexithandler is registered that callsasyncio.run(self.delete_all())on program termination. This automaticallydeletes all tracked sandboxes when the program exits, preventing orphaned
sandbox infrastructure if the caller forgets to explicitly clean up or if
the program exits unexpectedly.
asyncio.run()is used instead ofloop.run_until_complete()because itcreates a fresh event loop, which is safe to call from an atexit handler
after the main event loop has already exited.
Which issue(s) this PR is related to:
Fixes 595
Examples:
Fixes #
Ref (issue in a different repository)
-->
Release Note
Summary by CodeRabbit
New Features
Documentation
Tests