Skip to content

Preserve sandbox selector label when replicas is 0#754

Merged
k8s-ci-robot merged 6 commits into
kubernetes-sigs:mainfrom
shrutiyam-glitch:bug-749
May 12, 2026
Merged

Preserve sandbox selector label when replicas is 0#754
k8s-ci-robot merged 6 commits into
kubernetes-sigs:mainfrom
shrutiyam-glitch:bug-749

Conversation

@shrutiyam-glitch

@shrutiyam-glitch shrutiyam-glitch commented May 7, 2026

Copy link
Copy Markdown
Contributor

What this PR does / why we need it:

This change addresses an issue where the sandbox name hash (selector label) is not be available when a sandbox is scaled down to zero replicas during suspension.
Updated sandbox_controller to not unset the value of status.labelselector when the replicas is 0.
If the hash cannot be resolved, the suspension fails gracefully with a clear error reason

Additionally, this PR:

  • Adds unit tests to verify graceful failure when the sandbox name hash is missing.
  • Introduces an integration test phase for testing suspend and resume on a new sandbox client instance.
  • Updates the documentation to reflect the expanded testing phases.

Which issue(s) this PR is related to:

Fixes #749

Integration Test: test_podsnapshot_extension.py

....
....
***** Phase 2: Testing Suspend/Resume on a NEW Sandbox client *****

======= Testing Suspend and Resume in a new Sandbox =======
Creating initial sandbox from template 'python-counter-template'...
2026-05-11 17:46:58,250 - INFO - Creating SandboxClaim 'sandbox-claim-052be361' in namespace 'sandbox-test' using template 'python-counter-template'...
2026-05-11 17:46:58,384 - INFO - Resolving sandbox name from claim 'sandbox-claim-052be361'...
2026-05-11 17:46:58,463 - INFO - Resolved sandbox name 'sandbox-claim-052be361' from claim status
2026-05-11 17:46:58,463 - INFO - Watching for Sandbox sandbox-claim-052be361 to become ready...
2026-05-11 17:46:59,439 - INFO - Sandbox sandbox-claim-052be361 is ready.
Initial sandbox 'sandbox-claim-052be361' ready.

Suspending current sandbox 'sandbox-claim-052be361'...
2026-05-11 17:46:59,734 - INFO - Waiting for snapshot manual trigger 'suspend-sandbox-claim-052be361-20260511-174659-9f789fde' to be processed...
2026-05-11 17:47:02,429 - INFO - Snapshot manual trigger 'suspend-sandbox-claim-052be361-20260511-174659-9f789fde' processed successfully. Created Snapshot UID: a3d3ddf1-655e-4098-b337-8273dbb42eed
2026-05-11 17:47:02,621 - INFO - Sandbox 'sandbox-claim-052be361' suspended (scaled down to 0 replicas).
2026-05-11 17:47:02,621 - INFO - Waiting up to 180s for pod 'sandbox-claim-052be361' (UID: 4c73f31d-2341-4de7-b01b-a21fde15ddc1) to terminate...
2026-05-11 17:47:04,716 - INFO - Sandbox 'sandbox-claim-052be361' pod successfully terminated.
Sandbox suspended. Snapshot UID: a3d3ddf1-655e-4098-b337-8273dbb42eed
Waiting for suspend snapshot 'a3d3ddf1-655e-4098-b337-8273dbb42eed' to become ready...
2026-05-11 17:47:04,766 - INFO - Listing snapshots with label selector: agents.x-k8s.io/sandbox-name-hash=aea8e3fe,tenant-id=test-tenant,user-id=test-user
2026-05-11 17:47:04,821 - INFO - Found 1 snapshots.
Snapshot 'a3d3ddf1-655e-4098-b337-8273dbb42eed' is ready.
Closing connection for the old sandbox handle...
2026-05-11 17:47:04,822 - INFO - Connection to sandbox claim 'sandbox-claim-052be361' has been closed.

Re-attaching to sandbox claim 'sandbox-claim-052be361' to get a fresh handle...
2026-05-11 17:47:04,822 - INFO - Resolving sandbox name from claim 'sandbox-claim-052be361'...
2026-05-11 17:47:04,874 - INFO - Resolved sandbox name 'sandbox-claim-052be361' from claim status

Resuming sandbox 'sandbox-claim-052be361'...
2026-05-11 17:47:05,160 - INFO - Listing snapshots with label selector: agents.x-k8s.io/sandbox-name-hash=aea8e3fe
2026-05-11 17:47:05,215 - INFO - Found 1 snapshots.
2026-05-11 17:47:05,286 - INFO - Sandbox 'sandbox-claim-052be361' resumed (scaled up to 1 replica).
2026-05-11 17:47:05,286 - INFO - Waiting up to 180s for pod to become ready...
2026-05-11 17:47:09,486 - INFO - Sandbox 'sandbox-claim-052be361' successfully restored from snapshot 'a3d3ddf1-655e-4098-b337-8273dbb42eed'.
Sandbox successfully resumed and restored from Snapshot UID: a3d3ddf1-655e-4098-b337-8273dbb42eed

Running cleanup of remaining snapshots on the restored sandbox 'sandbox-claim-052be361'...
2026-05-11 17:47:09,536 - INFO - Listing snapshots with label selector: agents.x-k8s.io/sandbox-name-hash=aea8e3fe
2026-05-11 17:47:09,590 - INFO - Found 1 snapshots.
Found 1 snapshots remaining for the sandbox.
2026-05-11 17:47:09,590 - INFO - Deleting every snapshot for this pod...
2026-05-11 17:47:09,590 - INFO - Deleting ALL snapshots for this pod.
2026-05-11 17:47:09,590 - INFO - Listing snapshots with label selector: agents.x-k8s.io/sandbox-name-hash=aea8e3fe
2026-05-11 17:47:09,690 - INFO - Found 1 snapshots.
2026-05-11 17:47:09,690 - INFO - Deleting PodSnapshot 'a3d3ddf1-655e-4098-b337-8273dbb42eed'...
2026-05-11 17:47:09,759 - INFO - PodSnapshot 'a3d3ddf1-655e-4098-b337-8273dbb42eed' deletion requested. Waiting for confirmation...
2026-05-11 17:47:09,809 - INFO - Waiting for PodSnapshot 'a3d3ddf1-655e-4098-b337-8273dbb42eed' to be deleted...
2026-05-11 17:47:10,537 - INFO - PodSnapshot 'a3d3ddf1-655e-4098-b337-8273dbb42eed' confirmed deleted.
2026-05-11 17:47:10,537 - INFO - Snapshot deletion process completed. Deleted 1 snapshots.
Cleaned up remaining snapshots.
--- Pod Snapshot Test Passed! ---
....
....
@netlify

netlify Bot commented May 7, 2026

Copy link
Copy Markdown

Deploy Preview for agent-sandbox canceled.

Name Link
🔨 Latest commit c5eff65
🔍 Latest deploy log https://app.netlify.com/projects/agent-sandbox/deploys/6a021d2a97ed850009aea9df
@k8s-ci-robot k8s-ci-robot requested review from justinsb and vicentefb May 7, 2026 18:59
@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels May 7, 2026
@janetkuo janetkuo requested a review from Copilot May 7, 2026 19:00

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses a Python SDK suspend/resume failure mode where the sandbox name-hash selector (status.selector) may be cleared when the sandbox is suspended (replicas scaled to 0). The approach is to proactively fetch/cache the name-hash while the sandbox is still active, and to fail suspend early with a clear error if the hash can’t be resolved.

Changes:

  • Eagerly fetch/caches the sandbox name-hash in Sandbox.__init__, and retries hash resolution during suspend() before scaling to 0.
  • Adds a unit test covering graceful failure when the name-hash cannot be resolved during suspend.
  • Extends the PodSnapshot integration test script with an additional “phase” and updates snapshot README test-phase documentation.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
clients/python/agentic-sandbox-client/test_podsnapshot_extension.py Adds a second integration-test phase for suspend/resume flow and snapshot cleanup.
clients/python/agentic-sandbox-client/k8s_agent_sandbox/sandbox.py Eagerly fetches/caches sandbox name-hash during Sandbox initialization.
clients/python/agentic-sandbox-client/k8s_agent_sandbox/gke_extensions/snapshots/sandbox_with_snapshot_support.py Ensures name-hash is resolvable before suspend, otherwise fails gracefully.
clients/python/agentic-sandbox-client/k8s_agent_sandbox/gke_extensions/snapshots/test/unit/test_sandbox_with_snapshot_support.py Updates unit test setup for eager fetch; adds test for missing name-hash during suspend.
clients/python/agentic-sandbox-client/k8s_agent_sandbox/gke_extensions/snapshots/README.md Updates documented integration test phases to include the new phase.
Comment thread clients/python/agentic-sandbox-client/test_podsnapshot_extension.py Outdated
Comment thread clients/python/agentic-sandbox-client/test_podsnapshot_extension.py
Comment thread clients/python/agentic-sandbox-client/k8s_agent_sandbox/sandbox.py Outdated
Comment thread clients/python/agentic-sandbox-client/test_podsnapshot_extension.py Outdated
Comment thread clients/python/agentic-sandbox-client/k8s_agent_sandbox/sandbox.py Outdated
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels May 9, 2026
@shrutiyam-glitch shrutiyam-glitch changed the title Ensure sandbox selector label is available before suspend in Python SDK May 9, 2026
@shrutiyam-glitch shrutiyam-glitch changed the title Preserver sandbox selector label when replicas is 0 May 11, 2026

@igooch igooch left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small nit, otherwise LGTM

@igooch igooch left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 11, 2026
@shrutiyam-glitch

Copy link
Copy Markdown
Contributor Author

/assign @janetkuo

@barney-s

Copy link
Copy Markdown
Collaborator

thanks
/lgtm

@barney-s

Copy link
Copy Markdown
Collaborator

/approve

@k8s-ci-robot

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: barney-s, igooch, shrutiyam-glitch

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 12, 2026
@k8s-ci-robot k8s-ci-robot merged commit 473faf3 into kubernetes-sigs:main May 12, 2026
11 checks passed
@github-project-automation github-project-automation Bot moved this from Backlog to Done in Agent Sandbox May 12, 2026
@janetkuo janetkuo moved this from Done to Linked in Agent Sandbox Jun 5, 2026
khirotaka pushed a commit to khirotaka/agent-sandbox that referenced this pull request Jun 12, 2026
* Ensure sandbox selector label is available

* Enable labelselector in sandbox when replicas is 0

* fix: test

* Update test

* Update log

* Update comment
alexatakvelon pushed a commit to volatilemolotov/agent-sandbox that referenced this pull request Jun 24, 2026
* Ensure sandbox selector label is available

* Enable labelselector in sandbox when replicas is 0

* fix: test

* Update test

* Update log

* Update comment
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. ready-for-review size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

6 participants