api: Replace spec.templateRef in SandboxClaim with spec.warmpoolRef.#899
Conversation
✅ Deploy Preview for agent-sandbox canceled.
|
70d6ee3 to
6a3beb5
Compare
There was a problem hiding this comment.
Pull request overview
This PR implements KEP-0208's API change to remove the ambiguous combination of templateRef + warmpool policy in SandboxClaimSpec, replacing them with a single required warmPoolRef. The reconciler now resolves the underlying SandboxTemplate indirectly through the referenced SandboxWarmPool, and the in-memory warm-pool queue is keyed by warm-pool name instead of by template hash. A claim that supplies spec.env now implicitly bypasses the warm pool and triggers a cold start. All Go/Python clients, e2e tests, CRDs, and generated docs are updated for the rename.
Changes:
- API: drop
TemplateRef+WarmPoolPolicyfromSandboxClaimSpec; introduceSandboxWarmPoolRefandWarmPoolRefField; liftSandboxTemplateRefintosandboxtemplate_types.go. - Controller: route adoption through
WarmPoolRef, resolve template via warm pool, keySimpleSandboxQueueby warm-pool name, watch/indexSandboxWarmPool, addErrWarmPoolNotFoundcondition reason, replace "env + warmpool ⇒ error" with implicit cold start, and derive metrictemplate_namefrom sandbox annotation or template lookup. - Clients/tests: rename
template/TemplateName→warmpool/WarmPoolNameacross Go/Python SDKs and all unit/e2e tests; introduceSandboxWarmPoolNotFoundError; create explicitSandboxWarmPoolobjects for shutdown-policy and metrics e2e tests.
Reviewed changes
Copilot reviewed 37 out of 38 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| extensions/api/v1beta1/sandboxclaim_types.go | Drops WarmPoolPolicy/TemplateRef, adds SandboxWarmPoolRef and WarmPoolRefField index key. |
| extensions/api/v1beta1/sandboxtemplate_types.go | Moves SandboxTemplateRef definition next to the template type it references. |
| extensions/api/v1beta1/zz_generated.deepcopy.go | Regenerated deepcopy for the new SandboxWarmPoolRef/updated spec. |
| extensions/controllers/sandboxclaim_controller.go | Core rewrite: warm-pool-keyed queue, warm-pool watcher, implicit cold start on env, template resolution via warm pool, new error/reason. |
| extensions/controllers/queue/simple_sandbox_queue.go | Renames templateHash parameters to warmPoolName and updates comments. |
| extensions/controllers/sandboxclaim_controller_test.go | Adds warm-pool fixtures everywhere, replaces template-hash assertions with controller-ref/annotation lookups, removes the old WarmPoolPolicy test suite. |
| extensions/controllers/sandboxclaim_pod_exclusivity_test.go | Adds SandboxWarmPool object and keys the seeded queue by pool name. |
| k8s/crds & helm/crds sandboxclaims YAML | CRD schema updated: warmPoolRef required, sandboxTemplateRef/warmpool removed. |
| docs/api.md | Regenerated API reference for the new spec and SandboxWarmPoolRef type. |
| clients/go/sandbox/{options,client,sandbox,k8s,*_test}.go | Renames TemplateName→WarmPoolName and updates validation/error messages. |
| clients/go/examples/gateway/main.go | Updated field name (value still references "my-sandbox-template"). |
| clients/python/.../sandbox_client.py, async_sandbox_client.py, k8s_helper.py, async_k8s_helper.py | Replace template/warmpool policy params with a single warmpool (name); add SandboxWarmPoolNotFoundError handling. |
| clients/python/.../exceptions.py, init.py | Export new SandboxWarmPoolNotFoundError. |
| clients/python/.../test/unit/*.py, test_client.py | Tests updated to the new API; some legacy warmpool policy tests retained but now exercise only naming. |
| test/e2e/extensions/{shutdown_policy_test,sandboxclaim_metric_test,warmpool_sandbox_watcher_test,pythonruntime_test}.go | Create a SandboxWarmPool per test and point claims at it. |
| test/e2e/{parallelism_test,chromesandbox_claim_test}.go | Switch parameters from template to warm-pool name. |
Files not reviewed (1)
- extensions/api/v1beta1/zz_generated.deepcopy.go: Language not supported
aditya-shantanu
left a comment
There was a problem hiding this comment.
Reviewed the KEP-0208 implementation (replacing spec.templateRef/warmpool with spec.warmPoolRef). Overall this matches the KEP's preferred solution and the mechanics line up well:
- Generated artifacts are in sync — the CRD (
extensions.agents.x-k8s.io_sandboxclaims.yaml) anddocs/api.mdwere regenerated correctly, andWarmPoolPolicyis fully removed. - The warm-pool controller sets both the
SandboxWarmPoolowner reference and theSandboxTemplateRefAnnotationon warm sandboxes, sogetWarmPoolName()and the metrics annotation fast-path both resolve as intended. - Watches/indexers/event handlers were consistently re-pointed from
SandboxTemplatetoSandboxWarmPool.
A few things worth addressing before merge — left inline. Nothing is a hard blocker except possibly the missing regression test for the central new behavior (implicit env-based cold start). Severity is noted per comment: most are minor/cleanup, one design point about the in-memory queue key is worth a maintainer decision.
(Note: this is an ACTION REQUIRED breaking change to the v1beta1 API made in-place — no conversion for already-stored SandboxClaim objects. That's consistent with the KEP's stated migration plan, just flagging it explicitly for the release notes / reviewers.)
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: janetkuo, SHRUTI6991 The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
SandboxClaim API PR kubernetes-sigs/agent-sandbox#899 replaced the SandboxClaim spec.sandboxTemplateRef and spec.warmpool fields with a single spec.warmPoolRef; the controller now resolves the template through the warm pool. Update the load generator to emit warmPoolRef and bump the default manifest ref to the post-GoogleCloudPlatform#899 main HEAD so the installed CRDs match.
SandboxClaim API PR kubernetes-sigs/agent-sandbox#899 replaced the SandboxClaim spec.sandboxTemplateRef and spec.warmpool fields with a single spec.warmPoolRef; the controller now resolves the template through the warm pool. Update the load generator to emit warmPoolRef and bump the default manifest ref to the post-GoogleCloudPlatform#899 main HEAD so the installed CRDs match.
Make the agent_sandbox benchmark run: a SandboxClaim load generator, the metrics it produces, the Run wiring that drives them, and a provision/prepare install split for fast iteration. The load generator (agent_sandbox_loadgen.py) submits SandboxClaim custom resources at a target QPS through a single shared Kubernetes Watch stream (no per-claim polling). ClaimDriver handles create/watch with 429 retry and separate connection pools, LoadGenerator paces submission, and readiness is tracked with bounded concurrency. Claims reference the warm pool directly via spec.warmPoolRef (kubernetes-sigs/agent-sandbox#899 replaced sandboxTemplateRef/warmpool with a single warmPoolRef; the controller resolves the template through the warm pool), and the default manifest ref is bumped to the post-GoogleCloudPlatform#899 main HEAD so the installed CRDs match. The metrics module (agent_sandbox_metrics.py) computes startup-time percentiles, submit/completion QPS, peak concurrency, warm_served_fraction, error counts, and lifecycle/exec-duration percentiles from the recorded events. The benchmark Run constructs the load generator from the load-shape flags, runs it, and converts the recorded events into PKB samples (the stub Run from the resource PR returned nothing). Install is split across provision and prepare: provision installs only the cluster scaffolding (gVisor, CRDs, RBAC); the controller Deployment, sandbox template, and warm pool move to the prepare stage via a new K8sAgentSandbox.InstallWorkload. This lets the controller be reinstalled against an existing cluster with --run_stage=prepare to iterate on controller settings without recreating it. Because the benchmark spec is pickled at provision and unpickled without re-applying flags, Prepare calls RefreshSpecFromFlags on a resume so the controller, template, and warm pool config reflect the current command-line flags. Note: --run_stage=provision alone no longer installs the controller; run provision,prepare for a full setup. Adds the kubernetes Python client to requirements.txt, plus unit tests for the load generator, the metrics, and the provision/prepare split.
…Ref`. (kubernetes-sigs#899) * api: Replace templateRef in SandboxClaim with warmpoolRef. * Generate api doc. * Fix rebase conflict. * Address co-pilot comments. * Address all comments. * Add a TODO for requeueing logic.
…ollow-up) Replace sandboxTemplateRef with the mandatory warmPoolRef field in SandboxClaim manifests, matching the upstream API change introduced in commit 2feb713. - Rename createSandbox() parameter template → warmpool - Add inspectClaimConditions() to surface WarmPoolNotFound / TemplateNotFound terminal failures as typed errors - Add SandboxTemplateNotFoundError and SandboxWarmPoolNotFoundError - Add getSandboxClaimWarmpoolName() public method - Update unit tests to cover new error paths and renamed parameters
…Ref`. (kubernetes-sigs#899) * api: Replace templateRef in SandboxClaim with warmpoolRef. * Generate api doc. * Fix rebase conflict. * Address co-pilot comments. * Address all comments. * Add a TODO for requeueing logic.
…ollow-up) Replace sandboxTemplateRef with the mandatory warmPoolRef field in SandboxClaim manifests, matching the upstream API change introduced in commit 2feb713. - Rename createSandbox() parameter template → warmpool - Add inspectClaimConditions() to surface WarmPoolNotFound / TemplateNotFound terminal failures as typed errors - Add SandboxTemplateNotFoundError and SandboxWarmPoolNotFoundError - Add getSandboxClaimWarmpoolName() public method - Update unit tests to cover new error paths and renamed parameters
…Ref`. (kubernetes-sigs#899) * api: Replace templateRef in SandboxClaim with warmpoolRef. * Generate api doc. * Fix rebase conflict. * Address co-pilot comments. * Address all comments. * Add a TODO for requeueing logic.
* Refactor the Sandbox Template hash generation to include the namespace * tweaks after review * expand tests * Revert back to main * Partition SimpleSandboxQueue by Namespace * Log, conciseness, test * Pop and remove legacy as well as namespaced * Clean up state after rebase on PRs #899 and 864 * test GetWithStrategy fallback case * Remove unnecessary fallback cases, and nits * Clean up SimpleSandboxQueue changes from previous revisions
What this PR does / why we need it:
This PR implements the API changes proposed in KEP-0208: Resolving Mutually Exclusive Fields in SandboxClaim for Beta.
Previously, a user creating a
SandboxClaimhad to provide aTemplateRefand could optionally specify aWarmPoolPolicy. This created ambiguity and conflicting states if the claim requested a template that did not match the underlying warm pool's template, and also risked naming collisions with custom warm pools.To provide a cleaner, more predictable API contract for end-users, this PR removes the
templateRefandwarmpoolpolicy fields fromSandboxClaimSpecand replaces them with a singlewarmPoolRef.Key Changes:
TemplateRefandWarmPoolPolicywithWarmPoolRefinSandboxClaimSpec.len(claim.Spec.Env) > 0).SimpleSandboxQueueto route and key off theWarmPoolNameinstead of theTemplateRefHash, enabling O(1) lookups directly against the requested pool.SandboxClaimcontroller now looks up theSandboxWarmPoolfirst, and dynamically resolves the associatedSandboxTemplatewhen falling back to a cold start (i.e., when the pool is empty or bypassed).chromesandbox_claim_test.go,pythonruntime_test.go, etc.) and the metrics collection to align with the new schema.Which issue(s) this PR fixes:
Working on #740
Release note: