Skip to content

feat: add cleanup=True support to AsyncSandboxClient#859

Merged
k8s-ci-robot merged 10 commits into
kubernetes-sigs:mainfrom
RidPra:main
Jun 10, 2026
Merged

feat: add cleanup=True support to AsyncSandboxClient#859
k8s-ci-robot merged 10 commits into
kubernetes-sigs:mainfrom
RidPra:main

Conversation

@RidPra

@RidPra RidPra commented May 23, 2026

Copy link
Copy Markdown
Contributor

What this PR does / why we need it:

Adds cleanup=True support to AsyncSandboxClient, matching the existing
behavior in the synchronous SandboxClient.

When cleanup=True is passed, an atexit handler is registered that calls
asyncio.run(self.delete_all()) on program termination. This automatically
deletes all tracked sandboxes when the program exits, preventing orphaned
sandbox infrastructure if the caller forgets to explicitly clean up or if
the program exits unexpectedly.

asyncio.run() is used instead of loop.run_until_complete() because it
creates a fresh event loop, which is safe to call from an atexit handler
after the main event loop has already exited.

Which issue(s) this PR is related to:

Fixes 595

Examples:
Fixes #
Ref (issue in a different repository)
-->

Release Note

Summary by CodeRabbit

  • New Features

    • Optional automatic cleanup of tracked sandbox resources on process termination (opt-in).
  • Documentation

    • Updated usage docs to describe the new cleanup behavior and how to enable it.
  • Tests

    • Added unit tests verifying cleanup registration when enabled, no registration when disabled, deletion of tracked resources, no-op when none exist, and suppressed errors that emit warnings to stderr.
@netlify

netlify Bot commented May 23, 2026

Copy link
Copy Markdown

Deploy Preview for agent-sandbox canceled.

Name Link
🔨 Latest commit e2e2e5b
🔍 Latest deploy log https://app.netlify.com/projects/agent-sandbox/deploys/6a275dc9f7542d00087c6d7f
@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label May 23, 2026
@k8s-ci-robot

Copy link
Copy Markdown
Contributor

Hi @RidPra. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work.

Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels May 23, 2026
@janetkuo janetkuo requested a review from Copilot May 23, 2026 21:00

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds cleanup=True support to the Python AsyncSandboxClient to mirror the synchronous client’s optional atexit-based cleanup behavior, aiming to reduce orphaned sandbox infrastructure when callers forget explicit teardown.

Changes:

  • Add cleanup parameter to AsyncSandboxClient and register an atexit hook to run delete_all() on interpreter exit.
  • Update AsyncSandboxClient docstring/constructor docs to describe the new cleanup behavior.
  • Add unit tests asserting atexit.register() is (or is not) called based on cleanup.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
clients/python/agentic-sandbox-client/k8s_agent_sandbox/async_sandbox_client.py Introduces cleanup option and atexit hook; updates documentation.
clients/python/agentic-sandbox-client/k8s_agent_sandbox/test/unit/test_async_sandboxclient.py Adds unit tests around atexit registration behavior.
Comment thread clients/python/agentic-sandbox-client/k8s_agent_sandbox/async_sandbox_client.py Outdated
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels May 24, 2026

Uses a snapshot of the tracked claims and a fresh :class:`AsyncK8sHelper`
so that no loop-bound objects from the original client are reused across
event loop boundaries. All errors are silently suppressed — atexit

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we retry before silently failing and logging if the cleanup isn't successful?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @SHRUTI6991 for the review, made the change as suggested

@RidPra RidPra requested a review from SHRUTI6991 June 1, 2026 16:43

@barney-s barney-s left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for properly using a fresh AsyncK8sHelper in a new event loop via asyncio.run to handle cleanup safely after the main loop has exited.

if not claims:
return

async def _do_cleanup():

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since deletion involves making HTTP requests to the Kubernetes API, deleting active sandboxes sequentially inside the atexit handler can delay program termination, especially if there are multiple sandboxes or network lag.

We can optimize this to run the cleanups concurrently using asyncio.gather:

async def _do_cleanup():
    helper = AsyncK8sHelper()
    try:
        async def _delete_one(ns, claim_name):
            try:
                await helper.delete_sandbox_claim(claim_name, ns)
            except Exception as e:
                if sys.stderr is not None:
                    print(
                        f"[agent-sandbox] Warning: failed to delete sandbox claim "
                        f"'{claim_name}' in namespace '{ns}' during atexit cleanup: {e}",
                        file=sys.stderr,
                    )

        await asyncio.gather(*(_delete_one(ns, claim_name) for ns, claim_name in claims))
    finally:
        await helper.close()

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @barney-s , it makes sense, made the changes as you suggested


with patch("k8s_agent_sandbox.async_sandbox_client.AsyncK8sHelper", return_value=mock_helper_instance):
self.client._atexit_cleanup()

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using standard assert_any_call is more idiomatic and readable than manually parsing call_args_list.

For example:

mock_helper_instance.delete_sandbox_claim.assert_any_call("claim-abc", "default")
mock_helper_instance.delete_sandbox_claim.assert_any_call("claim-xyz", "other-ns")

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made the change as mentioned, thanks for the review

self,
connection_config: SandboxConnectionConfig | None = None,
tracer_config: SandboxTracerConfig | None = None,
cleanup: bool = False,

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be true by default ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since SandboxClient defaults to cleanup=False (sandbox_client.py#L60), I kept the same default in AsyncSandboxClient for consistency. That said, Open to changing it to True if that's the preferred behavior.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please open an issue to track this. No need to tackle in this PR

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@barney-s barney-s self-assigned this Jun 2, 2026
@barney-s

barney-s commented Jun 3, 2026

Copy link
Copy Markdown
Collaborator

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jun 3, 2026
@barney-s

barney-s commented Jun 3, 2026

Copy link
Copy Markdown
Collaborator

Please address or comment/close on the copilot comments.

@coderabbitai

coderabbitai Bot commented Jun 8, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 27065c97-4108-45ce-b8ed-b5fdc6741dbe

📥 Commits

Reviewing files that changed from the base of the PR and between 1834b05 and dc9f0d1.

📒 Files selected for processing (1)
  • clients/python/agentic-sandbox-client/k8s_agent_sandbox/async_sandbox_client.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • clients/python/agentic-sandbox-client/k8s_agent_sandbox/async_sandbox_client.py

📝 Walkthrough

Walkthrough

AsyncSandboxClient adds optional process-termination cleanup via a new cleanup constructor parameter. When enabled, an atexit hook triggers _atexit_cleanup, which snapshots tracked claims, uses a fresh event loop to delete them concurrently via a new AsyncK8sHelper, suppresses errors, and warns to stderr. Tests validate handler registration, claim deletion, no-op on empty registry, and exception suppression.

Changes

Process Termination Cleanup for AsyncSandboxClient

Layer / File(s) Summary
Constructor parameter and documentation
clients/python/agentic-sandbox-client/k8s_agent_sandbox/async_sandbox_client.py
Module imports atexit and sys; constructor docstring documents the new cleanup parameter's behavior, fresh event loop boundary, and best-effort error suppression; class docstring mentions automatic claim cleanup on program termination.
Atexit hook registration
clients/python/agentic-sandbox-client/k8s_agent_sandbox/async_sandbox_client.py
Constructor registers _atexit_cleanup via atexit.register when the cleanup flag is enabled.
Cleanup method implementation
clients/python/agentic-sandbox-client/k8s_agent_sandbox/async_sandbox_client.py
_atexit_cleanup captures tracked claim keys, instantiates a fresh AsyncK8sHelper, deletes all claims concurrently using asyncio.gather within asyncio.run, ensures helper shutdown, and emits stderr warnings if exceptions occur during deletion.
Cleanup behavior tests
clients/python/agentic-sandbox-client/k8s_agent_sandbox/test/unit/test_async_sandboxclient.py
Five tests verify that atexit handler registration is gated on the cleanup flag, all tracked claims are deleted using a fresh AsyncK8sHelper, no cleanup occurs when the sandbox registry is empty, and exceptions during deletion are suppressed with warnings to sys.stderr.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 A rabbit pads where programs close the day,

It counts the claims it keeps in gentle rows.
Fresh loops and helpers tidy every nook,
Quietly suppressing errors with a look,
Hopping off while stderr softly shows.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 61.54% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly describes the main change: adding cleanup=True support to AsyncSandboxClient, which is the primary feature introduced in this PR.
Description check ✅ Passed The description adequately covers what the PR does, why it's needed, and references the related issue. However, the release-note section is incomplete and should be filled in.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@clients/python/agentic-sandbox-client/k8s_agent_sandbox/async_sandbox_client.py`:
- Around line 81-86: The docstring for the public cleanup parameter and the
private method _atexit_cleanup are inaccurate — the cleanup behavior is not
silent but prints warnings to sys.stderr; update the public parameter
description for cleanup to say that failures are best-effort and will emit
warnings to sys.stderr (not silently ignored) and edit the _atexit_cleanup
docstring to document that it writes per-claim and top-level warnings to
sys.stderr on failures, preserving the note that it is best-effort and safe to
call after the main event loop exits.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 4680399b-ed54-4c18-b48a-7999ef152414

📥 Commits

Reviewing files that changed from the base of the PR and between ba47882 and 1834b05.

📒 Files selected for processing (2)
  • clients/python/agentic-sandbox-client/k8s_agent_sandbox/async_sandbox_client.py
  • clients/python/agentic-sandbox-client/k8s_agent_sandbox/test/unit/test_async_sandboxclient.py
Comment thread clients/python/agentic-sandbox-client/k8s_agent_sandbox/async_sandbox_client.py Outdated

@janetkuo janetkuo left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@SHRUTI6991 @barney-s would you like to take another look before we merge this?

@k8s-ci-robot

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: janetkuo, RidPra

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 10, 2026
@aditya-shantanu

Copy link
Copy Markdown
Collaborator

/lgtm

Thanks @RidPra — my earlier requested change (guarding the outer sys.stderr is not None check in the top-level _atexit_cleanup exception handler) is addressed in commit 1834b05, and the docstring now accurately documents the best-effort stderr-warning behavior. The fresh-AsyncK8sHelper/asyncio.run cleanup path, concurrent asyncio.gather deletion, and argument ordering all look correct. No further substantive concerns from me.

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 10, 2026
@k8s-ci-robot k8s-ci-robot merged commit c1ce193 into kubernetes-sigs:main Jun 10, 2026
11 checks passed
@github-project-automation github-project-automation Bot moved this from Linked to Done in Agent Sandbox Jun 10, 2026
khirotaka pushed a commit to khirotaka/agent-sandbox that referenced this pull request Jun 12, 2026
…#859)

* feat: add cleanup=True support to AsyncSandboxClient

* Fix atexit cleanup to use fresh event loop resources

* Print stderr warning on atexit cleanup failure instead of silently ignoring

* Use asyncio.gather for concurrent atexit cleanup and assert_any_call in tests

* Guard outer stderr print with sys.stderr is not None check

* Update docstrings to reflect stderr warnings instead of silent suppression
alexatakvelon pushed a commit to volatilemolotov/agent-sandbox that referenced this pull request Jun 24, 2026
…#859)

* feat: add cleanup=True support to AsyncSandboxClient

* Fix atexit cleanup to use fresh event loop resources

* Print stderr warning on atexit cleanup failure instead of silently ignoring

* Use asyncio.gather for concurrent atexit cleanup and assert_any_call in tests

* Guard outer stderr print with sys.stderr is not None check

* Update docstrings to reflect stderr warnings instead of silent suppression
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. ready-for-review size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

7 participants