Skip to content

api: Remove spec.replicas and introduce spec.operatingMode for suspend and resume#801

Merged
k8s-ci-robot merged 7 commits into
kubernetes-sigs:mainfrom
SHRUTI6991:field_name
May 29, 2026
Merged

api: Remove spec.replicas and introduce spec.operatingMode for suspend and resume#801
k8s-ci-robot merged 7 commits into
kubernetes-sigs:mainfrom
SHRUTI6991:field_name

Conversation

@SHRUTI6991

@SHRUTI6991 SHRUTI6991 commented May 13, 2026

Copy link
Copy Markdown
Contributor

Working on: #740

Description

  • Remove spec.replicas and introduce spec.operatingMode to represent suspension and resume behavior in Sandbox.

  • The Suspension and Resume will now be represented as a new field spec.operatingMode which will have Running and Suspended modes. This is solidified in https://github.com/kubernetes-sigs/agent-sandbox/pull/762/changes. A new KEP is added in this PR which documents the decisions for spec.operatingMode.
    The reconciler, extension controllers, generated CRDs, Python SDK suspend/resume logic, e2e tests, docs, and the roadmap are all updated to use the new mode-based vocabulary.

Changes

  • Introduce SandboxMode type with Running/Suspended constants; remove Replicasand the scale subresource from the Sandbox API and generated CRDs.
  • Update the Sandbox controller, SandboxClaim controller, SandboxWarmPool controller, and their tests to set/check Spec.OperatingMode instead of Spec.Replicas.
  • Update the Python gke_extensions snapshot support (is_suspended, suspend, resume and tests/README) to patch spec.operatingMode instead of spec.replicas, and refresh comments in sandbox.py / async_sandbox.py and the roadmap.

Release Notes [Breaking Changes]

Removing the scale subresource is a breaking change that breaks kubectl scale commands and HorizontalPodAutoscalers (HPA). It also affect PodDisruptionBudget.

Backward Compatibility

We don't have full backward compatibility at this stage, however we do support handling the management of existing Sandboxes via "Defaulting" behavior. The spec.operatingMode default behavior is "Running".

@netlify

netlify Bot commented May 13, 2026

Copy link
Copy Markdown

Deploy Preview for agent-sandbox canceled.

Name Link
🔨 Latest commit ff48aa0
🔍 Latest deploy log https://app.netlify.com/projects/agent-sandbox/deploys/6a1902ef28d756000848c867
@k8s-ci-robot k8s-ci-robot requested review from barney-s and justinsb May 13, 2026 22:12
@k8s-ci-robot k8s-ci-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels May 13, 2026
@janetkuo janetkuo requested a review from Copilot May 13, 2026 23:00

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR removes the spec.replicas field (and the corresponding status.replicas / status.selector and scale subresource) from the v1alpha1 Sandbox API and replaces it with an explicit spec.mode enum (Running | Suspended, default Running). The reconciler, extension controllers, generated CRDs, Python SDK suspend/resume logic, e2e tests, docs, and the roadmap are all updated to use the new mode-based vocabulary. This is explicitly called out as a breaking change.

Changes:

  • Introduce SandboxMode type with Running/Suspended constants; remove Replicas, LabelSelector, and the scale subresource from the Sandbox API and generated CRDs.
  • Update the Sandbox controller, SandboxClaim controller, SandboxWarmPool controller, and their tests to set/check Spec.Mode instead of Spec.Replicas, including new condition message "Pod does not exist, mode is Suspended".
  • Update the Python gke_extensions snapshot support (is_suspended, suspend, resume and tests/README) to patch spec.mode instead of spec.replicas, and refresh comments in sandbox.py / async_sandbox.py and the roadmap.

Reviewed changes

Copilot reviewed 20 out of 21 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
api/v1alpha1/sandbox_types.go Adds SandboxMode enum and Spec.Mode; removes Spec.Replicas, Status.Replicas, Status.LabelSelector, and the scale subresource marker.
api/v1alpha1/zz_generated.deepcopy.go Drops the now-removed Replicas deepcopy block.
k8s/crds/agents.x-k8s.io_sandboxes.yaml, helm/crds/agents.x-k8s.io_sandboxes.yaml Regenerated CRDs reflecting the new schema (no replicas/selector, no scale subresource, new mode enum with default Running).
controllers/sandbox_controller.go Defaults Spec.Mode; deletes pod when Mode == Suspended; stops populating Status.Replicas/Status.LabelSelector; updates log/condition messages; removes k8s.io/utils/ptr import.
controllers/sandbox_controller_test.go Test cases switched to Mode: Running/Suspended and expected statuses no longer assert Replicas/LabelSelector.
extensions/controllers/sandboxclaim_controller.go Sets sandbox.Spec.Mode = Running instead of the old replicas workaround.
extensions/controllers/sandboxclaim_controller_test.go, sandboxclaim_pod_exclusivity_test.go, sandboxwarmpool_controller.go, sandboxwarmpool_controller_test.go Updated to use Mode in test fixtures and pool sandbox creation.
test/e2e/basic_test.go, shutdown_test.go, volumeclaimtemplate_test.go, mode_test.go Removed assertions on Replicas/LabelSelector; renamed TestSandboxReplicasTestSandboxMode and updated suspend flow.
clients/python/.../sandbox_with_snapshot_support.py + tests + README _set_replicas_set_mode; is_suspended reads spec.mode; messages and docs updated.
clients/python/.../sandbox.py, async_sandbox.py Comment updates referring to spec.mode instead of spec.replicas.
roadmap.md Wording change from "replicas scale to 0" to "mode is set to Suspended/Running".
Files not reviewed (1)
  • api/v1alpha1/zz_generated.deepcopy.go: Language not supported
Comments suppressed due to low confidence (1)

test/e2e/mode_test.go:11

  • Two lines of the standard Apache 2.0 license header were accidentally deleted in this file. The current header now jumps from "...distributed on an "AS IS" BASIS," straight to "limitations under the License." — the lines "WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied." and "See the License for the specific language governing permissions and" need to be restored. This deletion is unrelated to the spec.replicas → spec.mode rename and should be reverted.
Comment thread api/v1alpha1/sandbox_types.go Outdated
Comment thread controllers/sandbox_controller.go Outdated
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 13, 2026
Comment thread controllers/sandbox_controller.go Outdated
Comment thread k8s/crds/agents.x-k8s.io_sandboxes.yaml
Comment thread test/e2e/mode_test.go Outdated
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 14, 2026
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 14, 2026
Comment thread docs/keps/694-kep-for-suspend-and-resume-for-beta/README.md Outdated
Comment thread docs/keps/694-kep-for-suspend-and-resume-for-beta/README.md Outdated
Comment thread docs/keps/694-kep-for-suspend-and-resume-for-beta/README.md Outdated
Comment thread clients/python/agentic-sandbox-client/k8s_agent_sandbox/sandbox.py Outdated
Comment thread clients/python/agentic-sandbox-client/k8s_agent_sandbox/async_sandbox.py Outdated
Comment thread controllers/sandbox_controller.go Outdated
Comment thread api/v1beta1/sandbox_types.go
Comment thread docs/keps/694-kep-for-suspend-and-resume-for-beta/README.md Outdated
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 28, 2026
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 28, 2026
Upgrading from Alpha to Beta is designed to be seamless for end-users, relying heavily on native Kubernetes API defaulting mechanisms to prevent disruption.

1. **CRD Update:** The cluster administrator applies the updated `Sandbox` CRD containing the new `spec.operatingMode` Enum field.
2. **Defaulting Behavior:** Because the `spec.operatingMode` field is defined with `// +kubebuilder:default=Running`, all existing Sandbox resources in the cluster will automatically be treated as `Running` by the API server.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The defaulting argument only holds for Sandboxes that were running at upgrade time. What about Sandboxes a user had explicitly suspended (spec.replicas: 0) before the upgrade?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good point.

Because of the automatic defaulting to Running, any Sandbox that was explicitly suspended (spec.replicas: 0) in the Alpha version will be automatically resumed. If administrators or users wish to keep these Sandboxes suspended across the upgrade, they must patch the existing Sandbox resources to explicitly set spec.operatingMode: Suspended prior to upgrading the controller. I have updated the details in the KEP and kept it brief.

The full migration details will be added in it's own PR: https://github.com/kubernetes-sigs/agent-sandbox/pull/848/changes which talks about how to do alpha sandbox deletion, patching the sandboxes correctly.

Comment thread extensions/controllers/sandboxwarmpool_controller_test.go Outdated
Comment thread controllers/sandbox_controller.go Outdated
Comment thread roadmap.md Outdated
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels May 29, 2026

@janetkuo janetkuo left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 29, 2026
@k8s-ci-robot

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: aditya-shantanu, janetkuo, SHRUTI6991, vicentefb

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 29, 2026
@k8s-ci-robot k8s-ci-robot merged commit e461edb into kubernetes-sigs:main May 29, 2026
14 checks passed
@github-project-automation github-project-automation Bot moved this from Backlog to Done in Agent Sandbox May 29, 2026
geojaz added a commit to geojaz/agent-sandbox that referenced this pull request May 30, 2026
Upstream PR kubernetes-sigs#801 removed SandboxStatus.Replicas in favor of
OperatingMode. Replace the mutable status field used in the conflict
tests with Status.ServiceFQDN (a string), and add the Apache 2.0
license header required by the boilerplate check.
geojaz added a commit to geojaz/agent-sandbox that referenced this pull request Jun 2, 2026
Upstream PR kubernetes-sigs#801 removed SandboxStatus.Replicas in favor of
OperatingMode. Replace the mutable status field used in the conflict
tests with Status.ServiceFQDN (a string), and add the Apache 2.0
license header required by the boilerplate check.
khirotaka pushed a commit to khirotaka/agent-sandbox that referenced this pull request Jun 12, 2026
…d and resume (kubernetes-sigs#801)

* Remove spec.replicas and introduce spec.operatingMode to suspend and resume a sandbox.

* temp work.

* Address comments.

* Update migration plan.

* Update migration plan.

* fix the shutdown time test.

* retrigger test.
alexatakvelon pushed a commit to volatilemolotov/agent-sandbox that referenced this pull request Jun 24, 2026
…d and resume (kubernetes-sigs#801)

* Remove spec.replicas and introduce spec.operatingMode to suspend and resume a sandbox.

* temp work.

* Address comments.

* Update migration plan.

* Update migration plan.

* fix the shutdown time test.

* retrigger test.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. ready-for-review size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

7 participants