Skip to content

api: Replace spec.templateRef in SandboxClaim with spec.warmpoolRef.#899

Merged
k8s-ci-robot merged 6 commits into
kubernetes-sigs:mainfrom
SHRUTI6991:warmpool_policy_kep_impl
Jun 1, 2026
Merged

api: Replace spec.templateRef in SandboxClaim with spec.warmpoolRef.#899
k8s-ci-robot merged 6 commits into
kubernetes-sigs:mainfrom
SHRUTI6991:warmpool_policy_kep_impl

Conversation

@SHRUTI6991

Copy link
Copy Markdown
Contributor

What this PR does / why we need it:

This PR implements the API changes proposed in KEP-0208: Resolving Mutually Exclusive Fields in SandboxClaim for Beta.

Previously, a user creating a SandboxClaim had to provide a TemplateRef and could optionally specify a WarmPoolPolicy. This created ambiguity and conflicting states if the claim requested a template that did not match the underlying warm pool's template, and also risked naming collisions with custom warm pools.

To provide a cleaner, more predictable API contract for end-users, this PR removes the templateRef and warmpool policy fields from SandboxClaimSpec and replaces them with a single warmPoolRef.

Key Changes:

  • API Schema Update: Replaced TemplateRef and WarmPoolPolicy with WarmPoolRef in SandboxClaimSpec.
  • Implicit Cold Starts: Added logic to implicitly bypass the warm pool queue and trigger a cold start if the user provides custom environment variables (len(claim.Spec.Env) > 0).
  • Queue Refactoring: Refactored SimpleSandboxQueue to route and key off the WarmPoolName instead of the TemplateRefHash, enabling O(1) lookups directly against the requested pool.
  • Controller Updates: The SandboxClaim controller now looks up the SandboxWarmPool first, and dynamically resolves the associated SandboxTemplate when falling back to a cold start (i.e., when the pool is empty or bypassed).
  • Metrics & E2E: Updated E2E tests (chromesandbox_claim_test.go, pythonruntime_test.go, etc.) and the metrics collection to align with the new schema.

Which issue(s) this PR fixes:

Working on #740

Release note:

ACTION REQUIRED: The `SandboxClaim` API has been updated to use `warmPoolRef` instead of `templateRef`. The `warmpool` policy field has been removed. Users must now explicitly point their SandboxClaims to a `SandboxWarmPool`. To perform a cold start without pre-warming, cluster administrators should create a SandboxWarmPool with `replicas: 0` for users to reference.
@netlify

netlify Bot commented May 30, 2026

Copy link
Copy Markdown

Deploy Preview for agent-sandbox canceled.

Name Link
🔨 Latest commit cd231a1
🔍 Latest deploy log https://app.netlify.com/projects/agent-sandbox/deploys/6a1deab96e49a9000848384b
@k8s-ci-robot k8s-ci-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels May 30, 2026
@SHRUTI6991 SHRUTI6991 force-pushed the warmpool_policy_kep_impl branch from 70d6ee3 to 6a3beb5 Compare May 30, 2026 23:05
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 30, 2026
@SHRUTI6991 SHRUTI6991 changed the title api: Replace templateRef in SandboxClaim with warmpoolRef. May 30, 2026

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements KEP-0208's API change to remove the ambiguous combination of templateRef + warmpool policy in SandboxClaimSpec, replacing them with a single required warmPoolRef. The reconciler now resolves the underlying SandboxTemplate indirectly through the referenced SandboxWarmPool, and the in-memory warm-pool queue is keyed by warm-pool name instead of by template hash. A claim that supplies spec.env now implicitly bypasses the warm pool and triggers a cold start. All Go/Python clients, e2e tests, CRDs, and generated docs are updated for the rename.

Changes:

  • API: drop TemplateRef + WarmPoolPolicy from SandboxClaimSpec; introduce SandboxWarmPoolRef and WarmPoolRefField; lift SandboxTemplateRef into sandboxtemplate_types.go.
  • Controller: route adoption through WarmPoolRef, resolve template via warm pool, key SimpleSandboxQueue by warm-pool name, watch/index SandboxWarmPool, add ErrWarmPoolNotFound condition reason, replace "env + warmpool ⇒ error" with implicit cold start, and derive metric template_name from sandbox annotation or template lookup.
  • Clients/tests: rename template/TemplateNamewarmpool/WarmPoolName across Go/Python SDKs and all unit/e2e tests; introduce SandboxWarmPoolNotFoundError; create explicit SandboxWarmPool objects for shutdown-policy and metrics e2e tests.

Reviewed changes

Copilot reviewed 37 out of 38 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
extensions/api/v1beta1/sandboxclaim_types.go Drops WarmPoolPolicy/TemplateRef, adds SandboxWarmPoolRef and WarmPoolRefField index key.
extensions/api/v1beta1/sandboxtemplate_types.go Moves SandboxTemplateRef definition next to the template type it references.
extensions/api/v1beta1/zz_generated.deepcopy.go Regenerated deepcopy for the new SandboxWarmPoolRef/updated spec.
extensions/controllers/sandboxclaim_controller.go Core rewrite: warm-pool-keyed queue, warm-pool watcher, implicit cold start on env, template resolution via warm pool, new error/reason.
extensions/controllers/queue/simple_sandbox_queue.go Renames templateHash parameters to warmPoolName and updates comments.
extensions/controllers/sandboxclaim_controller_test.go Adds warm-pool fixtures everywhere, replaces template-hash assertions with controller-ref/annotation lookups, removes the old WarmPoolPolicy test suite.
extensions/controllers/sandboxclaim_pod_exclusivity_test.go Adds SandboxWarmPool object and keys the seeded queue by pool name.
k8s/crds & helm/crds sandboxclaims YAML CRD schema updated: warmPoolRef required, sandboxTemplateRef/warmpool removed.
docs/api.md Regenerated API reference for the new spec and SandboxWarmPoolRef type.
clients/go/sandbox/{options,client,sandbox,k8s,*_test}.go Renames TemplateNameWarmPoolName and updates validation/error messages.
clients/go/examples/gateway/main.go Updated field name (value still references "my-sandbox-template").
clients/python/.../sandbox_client.py, async_sandbox_client.py, k8s_helper.py, async_k8s_helper.py Replace template/warmpool policy params with a single warmpool (name); add SandboxWarmPoolNotFoundError handling.
clients/python/.../exceptions.py, init.py Export new SandboxWarmPoolNotFoundError.
clients/python/.../test/unit/*.py, test_client.py Tests updated to the new API; some legacy warmpool policy tests retained but now exercise only naming.
test/e2e/extensions/{shutdown_policy_test,sandboxclaim_metric_test,warmpool_sandbox_watcher_test,pythonruntime_test}.go Create a SandboxWarmPool per test and point claims at it.
test/e2e/{parallelism_test,chromesandbox_claim_test}.go Switch parameters from template to warm-pool name.
Files not reviewed (1)
  • extensions/api/v1beta1/zz_generated.deepcopy.go: Language not supported
Comment thread clients/go/examples/gateway/main.go Outdated
Comment thread extensions/controllers/sandboxclaim_controller.go Outdated
Comment thread extensions/controllers/sandboxclaim_controller_test.go Outdated

@aditya-shantanu aditya-shantanu left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the KEP-0208 implementation (replacing spec.templateRef/warmpool with spec.warmPoolRef). Overall this matches the KEP's preferred solution and the mechanics line up well:

  • Generated artifacts are in sync — the CRD (extensions.agents.x-k8s.io_sandboxclaims.yaml) and docs/api.md were regenerated correctly, and WarmPoolPolicy is fully removed.
  • The warm-pool controller sets both the SandboxWarmPool owner reference and the SandboxTemplateRefAnnotation on warm sandboxes, so getWarmPoolName() and the metrics annotation fast-path both resolve as intended.
  • Watches/indexers/event handlers were consistently re-pointed from SandboxTemplate to SandboxWarmPool.

A few things worth addressing before merge — left inline. Nothing is a hard blocker except possibly the missing regression test for the central new behavior (implicit env-based cold start). Severity is noted per comment: most are minor/cleanup, one design point about the in-memory queue key is worth a maintainer decision.

(Note: this is an ACTION REQUIRED breaking change to the v1beta1 API made in-place — no conversion for already-stored SandboxClaim objects. That's consistent with the KEP's stated migration plan, just flagging it explicitly for the release notes / reviewers.)

Comment thread extensions/controllers/queue/simple_sandbox_queue.go
Comment thread extensions/controllers/sandboxclaim_controller.go
Comment thread extensions/controllers/sandboxclaim_controller.go Outdated
Comment thread extensions/controllers/sandboxclaim_controller.go Outdated
Comment thread extensions/api/v1beta1/sandboxclaim_types.go Outdated
Comment thread extensions/api/v1beta1/sandboxtemplate_types.go Outdated
Comment thread extensions/controllers/sandboxclaim_controller.go Outdated
Comment thread extensions/controllers/sandboxclaim_controller.go Outdated
Comment thread extensions/controllers/sandboxclaim_controller.go Outdated
Comment thread extensions/controllers/sandboxclaim_controller.go Outdated
Comment thread extensions/controllers/sandboxclaim_controller.go Outdated
Comment thread extensions/controllers/sandboxclaim_controller.go

@janetkuo janetkuo left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 1, 2026
@k8s-ci-robot

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: janetkuo, SHRUTI6991

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 1, 2026
@k8s-ci-robot k8s-ci-robot merged commit 2feb713 into kubernetes-sigs:main Jun 1, 2026
13 checks passed
@github-project-automation github-project-automation Bot moved this from Backlog to Done in Agent Sandbox Jun 1, 2026
geojaz added a commit to onix-net/PerfKitBenchmarker that referenced this pull request Jun 4, 2026
 SandboxClaim API

PR kubernetes-sigs/agent-sandbox#899 replaced the SandboxClaim spec.sandboxTemplateRef and spec.warmpool fields with a single spec.warmPoolRef; the controller now resolves the template through the warm pool. Update the load generator to emit warmPoolRef and bump the default manifest ref to the post-GoogleCloudPlatform#899 main HEAD so the installed CRDs match.
geojaz added a commit to onix-net/PerfKitBenchmarker that referenced this pull request Jun 4, 2026
 SandboxClaim API

PR kubernetes-sigs/agent-sandbox#899 replaced the SandboxClaim spec.sandboxTemplateRef and spec.warmpool fields with a single spec.warmPoolRef; the controller now resolves the template through the warm pool. Update the load generator to emit warmPoolRef and bump the default manifest ref to the post-GoogleCloudPlatform#899 main HEAD so the installed CRDs match.
geojaz added a commit to onix-net/PerfKitBenchmarker that referenced this pull request Jun 4, 2026
Make the agent_sandbox benchmark run: a SandboxClaim load generator, the
metrics it produces, the Run wiring that drives them, and a provision/prepare
install split for fast iteration.

The load generator (agent_sandbox_loadgen.py) submits SandboxClaim custom
resources at a target QPS through a single shared Kubernetes Watch stream (no
per-claim polling). ClaimDriver handles create/watch with 429 retry and
separate connection pools, LoadGenerator paces submission, and readiness is
tracked with bounded concurrency. Claims reference the warm pool directly via
spec.warmPoolRef (kubernetes-sigs/agent-sandbox#899 replaced
sandboxTemplateRef/warmpool with a single warmPoolRef; the controller resolves
the template through the warm pool), and the default manifest ref is bumped to
the post-GoogleCloudPlatform#899 main HEAD so the installed CRDs match.

The metrics module (agent_sandbox_metrics.py) computes startup-time
percentiles, submit/completion QPS, peak concurrency, warm_served_fraction,
error counts, and lifecycle/exec-duration percentiles from the recorded
events. The benchmark Run constructs the load generator from the load-shape
flags, runs it, and converts the recorded events into PKB samples (the stub
Run from the resource PR returned nothing).

Install is split across provision and prepare: provision installs only the
cluster scaffolding (gVisor, CRDs, RBAC); the controller Deployment, sandbox
template, and warm pool move to the prepare stage via a new
K8sAgentSandbox.InstallWorkload. This lets the controller be reinstalled
against an existing cluster with --run_stage=prepare to iterate on controller
settings without recreating it. Because the benchmark spec is pickled at
provision and unpickled without re-applying flags, Prepare calls
RefreshSpecFromFlags on a resume so the controller, template, and warm pool
config reflect the current command-line flags. Note: --run_stage=provision
alone no longer installs the controller; run provision,prepare for a full
setup.

Adds the kubernetes Python client to requirements.txt, plus unit tests for the
load generator, the metrics, and the provision/prepare split.
carlossg pushed a commit to carlossg/agent-sandbox that referenced this pull request Jun 5, 2026
…Ref`. (kubernetes-sigs#899)

* api: Replace templateRef in SandboxClaim with warmpoolRef.

* Generate api doc.

* Fix rebase conflict.

* Address co-pilot comments.

* Address all comments.

* Add a TODO for requeueing logic.
khirotaka added a commit to khirotaka/agent-sandbox that referenced this pull request Jun 12, 2026
…ollow-up)

Replace sandboxTemplateRef with the mandatory warmPoolRef field in SandboxClaim manifests, matching the upstream API change introduced in commit 2feb713.

- Rename createSandbox() parameter template → warmpool
- Add inspectClaimConditions() to surface WarmPoolNotFound / TemplateNotFound terminal failures as typed errors
- Add SandboxTemplateNotFoundError and SandboxWarmPoolNotFoundError
- Add getSandboxClaimWarmpoolName() public method
- Update unit tests to cover new error paths and renamed parameters
khirotaka pushed a commit to khirotaka/agent-sandbox that referenced this pull request Jun 12, 2026
…Ref`. (kubernetes-sigs#899)

* api: Replace templateRef in SandboxClaim with warmpoolRef.

* Generate api doc.

* Fix rebase conflict.

* Address co-pilot comments.

* Address all comments.

* Add a TODO for requeueing logic.
khirotaka added a commit to khirotaka/agent-sandbox that referenced this pull request Jun 12, 2026
…ollow-up)

Replace sandboxTemplateRef with the mandatory warmPoolRef field in SandboxClaim manifests, matching the upstream API change introduced in commit 2feb713.

- Rename createSandbox() parameter template → warmpool
- Add inspectClaimConditions() to surface WarmPoolNotFound / TemplateNotFound terminal failures as typed errors
- Add SandboxTemplateNotFoundError and SandboxWarmPoolNotFoundError
- Add getSandboxClaimWarmpoolName() public method
- Update unit tests to cover new error paths and renamed parameters
lauragalbraith added a commit to lauragalbraith/agent-sandbox that referenced this pull request Jun 16, 2026
lauragalbraith added a commit to lauragalbraith/agent-sandbox that referenced this pull request Jun 23, 2026
alexatakvelon pushed a commit to volatilemolotov/agent-sandbox that referenced this pull request Jun 24, 2026
…Ref`. (kubernetes-sigs#899)

* api: Replace templateRef in SandboxClaim with warmpoolRef.

* Generate api doc.

* Fix rebase conflict.

* Address co-pilot comments.

* Address all comments.

* Add a TODO for requeueing logic.
kubernetes-prow Bot pushed a commit that referenced this pull request Jun 29, 2026
* Refactor the Sandbox Template hash generation to include the namespace

* tweaks after review

* expand tests

* Revert back to main

* Partition SimpleSandboxQueue by Namespace

* Log, conciseness, test

* Pop and remove legacy as well as namespaced

* Clean up state after rebase on PRs #899 and 864

* test GetWithStrategy fallback case

* Remove unnecessary fallback cases, and nits

* Clean up SimpleSandboxQueue changes from previous revisions
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. ready-for-review size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

5 participants