Preserve sandbox selector label when replicas is 0#754
Conversation
✅ Deploy Preview for agent-sandbox canceled.
|
There was a problem hiding this comment.
Pull request overview
This PR addresses a Python SDK suspend/resume failure mode where the sandbox name-hash selector (status.selector) may be cleared when the sandbox is suspended (replicas scaled to 0). The approach is to proactively fetch/cache the name-hash while the sandbox is still active, and to fail suspend early with a clear error if the hash can’t be resolved.
Changes:
- Eagerly fetch/caches the sandbox name-hash in
Sandbox.__init__, and retries hash resolution duringsuspend()before scaling to 0. - Adds a unit test covering graceful failure when the name-hash cannot be resolved during suspend.
- Extends the PodSnapshot integration test script with an additional “phase” and updates snapshot README test-phase documentation.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| clients/python/agentic-sandbox-client/test_podsnapshot_extension.py | Adds a second integration-test phase for suspend/resume flow and snapshot cleanup. |
| clients/python/agentic-sandbox-client/k8s_agent_sandbox/sandbox.py | Eagerly fetches/caches sandbox name-hash during Sandbox initialization. |
| clients/python/agentic-sandbox-client/k8s_agent_sandbox/gke_extensions/snapshots/sandbox_with_snapshot_support.py | Ensures name-hash is resolvable before suspend, otherwise fails gracefully. |
| clients/python/agentic-sandbox-client/k8s_agent_sandbox/gke_extensions/snapshots/test/unit/test_sandbox_with_snapshot_support.py | Updates unit test setup for eager fetch; adds test for missing name-hash during suspend. |
| clients/python/agentic-sandbox-client/k8s_agent_sandbox/gke_extensions/snapshots/README.md | Updates documented integration test phases to include the new phase. |
igooch
left a comment
There was a problem hiding this comment.
Small nit, otherwise LGTM
|
/assign @janetkuo |
|
thanks |
|
/approve |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: barney-s, igooch, shrutiyam-glitch The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
* Ensure sandbox selector label is available * Enable labelselector in sandbox when replicas is 0 * fix: test * Update test * Update log * Update comment
* Ensure sandbox selector label is available * Enable labelselector in sandbox when replicas is 0 * fix: test * Update test * Update log * Update comment
What this PR does / why we need it:
This change addresses an issue where the sandbox name hash (selector label) is not be available when a sandbox is scaled down to zero replicas during suspension.
Updated
sandbox_controllerto not unset the value ofstatus.labelselectorwhen the replicas is 0.If the hash cannot be resolved, the suspension fails gracefully with a clear error reason
Additionally, this PR:
Which issue(s) this PR is related to:
Fixes #749
Integration Test:
test_podsnapshot_extension.py