example: scale to zero using KEDA by shrutiyam-glitch · Pull Request #1048 · kubernetes-sigs/agent-sandbox

shrutiyam-glitch · 2026-06-26T17:55:21Z

What this PR does / why we need it:

This pull request introduces a complete end-to-end guide and resource templates demonstrating how to implement scale-to-zero capabilities for warm sandbox pools on GKE using KEDA.

By default, warm pools must balance active instances with resource consumption. This example provides ready-to-use configurations to dynamically scale warm pools based on claim rates, allowing them to scale down to zero when idle.

Which issue(s) this PR is related to:

Ref: #677
Related issues: #1050

Release Note

Added a complete end-to-end example and guide for scaling GKE sandbox warm pools to zero using KEDA.

Summary by CodeRabbit

New Features
- Added a complete KEDA scale-to-zero example for warm pool workloads on GKE.
- Included ready-to-use manifests for warm pools, workload templates, monitoring, and autoscaling.
- Added a sample load generator to help test scaling behavior.
Documentation
- Added step-by-step setup, verification, and troubleshooting guidance.
- Documented an alternate Cloud Monitoring-based scaling option and when to use it.

netlify · 2026-06-26T17:55:26Z

✅ Deploy Preview for agent-sandbox canceled.

Name	Link
🔨 Latest commit	`fda2802`
🔍 Latest deploy log	https://app.netlify.com/projects/agent-sandbox/deploys/6a4319f273cf2f0008f6e6f3

coderabbitai · 2026-06-26T18:00:14Z

📝 Walkthrough

Walkthrough

Adds a new examples/keda-scale-to-zero/ directory containing a SandboxTemplate, SandboxWarmPool, PodMonitoring, two KEDA ScaledObject manifests (Prometheus and Stackdriver variants), a Python load-generator script, and a README with an end-to-end GKE runbook.

Changes

KEDA warm pool scale-to-zero example

Layer / File(s)	Summary
Sandbox template and warm pool manifests `examples/keda-scale-to-zero/python-sandbox-template.yaml`, `examples/keda-scale-to-zero/sandboxwarmpool.yaml`	Defines a `SandboxTemplate` with a `python-runtime` container and a `SandboxWarmPool` initialized to `replicas: 0` for KEDA-managed scaling.
Prometheus-based ScaledObject and metrics scrape `examples/keda-scale-to-zero/pod-monitoring.yaml`, `examples/keda-scale-to-zero/scaledobject-prometheus.yaml`	Adds a `PodMonitoring` resource to scrape controller metrics into GMP and a KEDA `ScaledObject` that scales the warm pool based on the `agent_sandbox_claim_creation_total` rate via GMP Prometheus frontend.
Stackdriver-based ScaledObject `examples/keda-scale-to-zero/scaledobject-stackdriver.yaml`	Adds a `TriggerAuthentication` using GKE Workload Identity and a `ScaledObject` with Cloud Monitoring trigger, fallback for transient errors, and per-second rate alignment with `activationTargetValue` gating.
Claim load generator script `examples/keda-scale-to-zero/create-claim.py`	Python script that creates `SandboxClaim` CRs at a configurable rate using daemon threads, reports progress, and waits for TTL-based cleanup after the test loop.
End-to-end runbook and troubleshooting docs `examples/keda-scale-to-zero/README.md`	README covering rationale, prerequisites, full Prometheus and Stackdriver runbooks (KEDA install, IAM, Workload Identity, load generation, verification), scaling mechanics, troubleshooting, and sources.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~15 minutes

Possibly related PRs

kubernetes-sigs/agent-sandbox#1050: Changes agent_sandbox_claim_creation_total to use the real claim.Spec.WarmPoolRef.Name label, directly aligning the metric with the filter queries used in the ScaledObject manifests added here.

Suggested reviewers

igooch
janetkuo
barney-s
justinsb

🐇 A warm pool sleeping at zero,
Til claims arrive and KEDA says "go!"
The Prometheus rate ticks up fast,
Replicas wake from the past—
Scale up, scale down, what a show! 🎉

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title is concise and clearly reflects the main change: adding a KEDA-based scale-to-zero example.
Description check	✅ Passed	The description follows the template, covers the change, links related issues, and includes a release note.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

coderabbitai

Actionable comments posted: 5

🧹 Nitpick comments (1)

examples/keda-swp-scaling/python-sandbox-template.yaml (1)
4-5: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Point this comment at the actual dependent file.

create-claim.py never references python-sandbox-template; the real contract is examples/keda-swp-scaling/sandboxwarmpool.yaml via spec.sandboxTemplateRef.name. As written, the comment sends readers to the wrong file when they rename resources.
📝 Suggested fix
-  # The create-claim.py expects the template to have this name
+  # sandboxwarmpool.yaml references this via spec.sandboxTemplateRef.name
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/keda-swp-scaling/python-sandbox-template.yaml` around lines 4 - 5,
The comment is attached to the wrong template file and should point to the
actual dependency used by create-claim.py. Update the reference so it documents
examples/keda-swp-scaling/sandboxwarmpool.yaml and the
spec.sandboxTemplateRef.name contract, since that is what the claim script
relies on when matching resource names. Keep the note aligned with the real
consumer and remove the misleading link to python-sandbox-template.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@examples/keda-swp-scaling/create-claim.py`:
- Around line 28-31: The kubeconfig fallback in create-claim.py is too broad
because the current try/except around config.load_kube_config() catches all
failures and can hide real local config/auth errors. Narrow the catch in that
startup block to ConfigException, and only call config.load_incluster_config()
when KUBERNETES_SERVICE_HOST is set so the fallback happens only in-cluster; use
the existing config.load_kube_config and config.load_incluster_config calls to
locate the change.
- Around line 53-94: The load loop in create_claim.py is counting scheduled
threads as completed claims and can launch unbounded daemon workers. Update
create_claim and the main rate loop so concurrency is bounded with a worker
limit or thread pool, and only increment/report progress after
create_namespaced_custom_object finishes successfully. Keep the progress and
final totals tied to completed claim creations rather than thread starts.

In `@examples/keda-swp-scaling/README.md`:
- Around line 171-175: The Stackdriver IAM example in the README uses a
hardcoded PROJECT_ID inside the principal URI, so update the command in the KEDA
IAM binding example to interpolate the actual $PROJECT_ID consistently. Make the
principal string in the gcloud projects add-iam-policy-binding example match the
same project variable used elsewhere in the snippet so the workload identity
principal resolves correctly for the KEDA operator.

In `@examples/keda-swp-scaling/scaledobject-stackdriver.yaml`:
- Around line 59-60: The Stackdriver ScaledObject’s target setting is
inconsistent with the Prometheus variant, so update the `targetValue` in
`scaledobject-stackdriver.yaml` to match the same claims/sec per replica
threshold used by the Prometheus example. Keep the `ScaledObject` configuration
aligned with the HPA/Prometheus semantics and adjust the nearby comment so it no
longer claims a different value “matches the HPA example.”
- Line 47: The Stackdriver scaledobject manifest currently hardcodes a specific
GCP project ID, so replace the projectId value in the scaledobject-stackdriver
YAML with a placeholder such as PROJECT_ID or YOUR_PROJECT_ID and make sure any
related example references use the same placeholder. Keep the manifest generic
by updating the field in the Stackdriver configuration block, and add a brief
note in the README explaining that users must substitute their own project ID
before applying the example.

---

Nitpick comments:
In `@examples/keda-swp-scaling/python-sandbox-template.yaml`:
- Around line 4-5: The comment is attached to the wrong template file and should
point to the actual dependency used by create-claim.py. Update the reference so
it documents examples/keda-swp-scaling/sandboxwarmpool.yaml and the
spec.sandboxTemplateRef.name contract, since that is what the claim script
relies on when matching resource names. Keep the note aligned with the real
consumer and remove the misleading link to python-sandbox-template.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 19b2de5d-0e53-4ca1-8abb-d9454a23c09e

📥 Commits

Reviewing files that changed from the base of the PR and between bce2dda and f83dec5.

📒 Files selected for processing (7)

examples/keda-swp-scaling/README.md
examples/keda-swp-scaling/create-claim.py
examples/keda-swp-scaling/pod-monitoring.yaml
examples/keda-swp-scaling/python-sandbox-template.yaml
examples/keda-swp-scaling/sandboxwarmpool.yaml
examples/keda-swp-scaling/scaledobject-prometheus.yaml
examples/keda-swp-scaling/scaledobject-stackdriver.yaml

dongjiang1989 · 2026-06-29T02:06:31Z

@@ -0,0 +1,312 @@
+# SandboxWarmPool Scale-to-Zero with KEDA on GKE


add it to site show in website

Copilot

Pull request overview

This PR adds a new end-to-end example showing how to scale SandboxWarmPool replicas down to zero on GKE using KEDA, and updates controller metrics so cold-start claim creation is labeled with the referenced warm pool name (enabling warmpool_name-scoped scaling queries).

Changes:

Record cold-start SandboxClaim creation metrics with claim.spec.warmPoolRef.name instead of the hardcoded "none".
Update the existing controller test to assert the new warmpool_name label value.
Add a complete examples/keda-scale-to-zero/ walkthrough with manifests and a load generator script.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
extensions/controllers/sandboxclaim_controller.go	Updates cold-start metric recording to use the claim’s referenced warm pool name.
extensions/controllers/sandboxclaim_controller_test.go	Adjusts assertions to validate the new `warmpool_name` label value.
examples/keda-scale-to-zero/README.md	Adds an end-to-end guide for KEDA-based scale-to-zero on GKE (GMP + optional Stackdriver path).
examples/keda-scale-to-zero/scaledobject-prometheus.yaml	Adds KEDA `ScaledObject` using the Prometheus scaler against the GMP frontend.
examples/keda-scale-to-zero/scaledobject-stackdriver.yaml	Adds an alternative KEDA `ScaledObject` using Cloud Monitoring (Stackdriver) directly.
examples/keda-scale-to-zero/sandboxwarmpool.yaml	Adds a `SandboxWarmPool` manifest starting at `replicas: 0` for KEDA control.
examples/keda-scale-to-zero/python-sandbox-template.yaml	Adds a `SandboxTemplate` manifest for the example warm pool.
examples/keda-scale-to-zero/pod-monitoring.yaml	Adds GMP `PodMonitoring` manifest to scrape controller metrics.
examples/keda-scale-to-zero/create-claim.py	Adds a Python script to generate `SandboxClaim` load (with TTL via lifecycle shutdown time).

kubernetes-prow · 2026-06-30T01:20:56Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: shrutiyam-glitch
Once this PR has been reviewed and has the lgtm label, please assign vicentefb for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

examples/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

coderabbitai

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

examples/keda-scale-to-zero/README.md (1)
1-319: 📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win

Add a docs page for keda-scale-to-zero The README is mounted into assets/additional/examples, but there’s no site/content/docs/use-cases/examples/keda-scale-to-zero/_index.md or landing-page link, so it won’t show up in the examples nav/index yet.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/keda-scale-to-zero/README.md` around lines 1 - 319, The
`keda-scale-to-zero` example is documented only in the README and won’t appear
in the site navigation/index yet. Add the missing docs page at
`site/content/docs/use-cases/examples/keda-scale-to-zero/_index.md` and wire it
into the examples landing page so it is discoverable alongside the other
examples; use the existing `examples/keda-scale-to-zero/README.md` content as
the source and keep the page title/metadata aligned with the examples section.
Source: Path instructions

🧹 Nitpick comments (2)

examples/keda-scale-to-zero/README.md (1)
269-269: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low value

Hyphenate compound modifier for grammar correctness.

"~1 minute window" should be "~1-minute window" (compound modifier before a noun).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/keda-scale-to-zero/README.md` at line 269, Update the wording in the
KEDA scale-to-zero README to hyphenate the compound modifier in the sentence
containing “~1 minute window.” Adjust the text so the modifier before the noun
reads as “~1-minute window,” keeping the rest of the sentence unchanged.
Source: Linters/SAST tools
examples/keda-scale-to-zero/create-claim.py (1)
21-26: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Expose the load knobs instead of hard-coding them.

The PR describes this as a configurable load generator, but rate, duration, and TTL can only be changed by editing the file. Pulling them from env vars would make the example reusable as documented.
Suggested edit
 NAMESPACE = os.getenv("NAMESPACE", "keda-test")
 WARMPOOL = os.getenv("WARM_POOL_NAME", "python-sdk-warmpool")
-RATE_PER_SECOND = 5
-TEST_DURATION_MINUTES = 10
-CLAIM_TTL_SECONDS = 60
+RATE_PER_SECOND = int(os.getenv("RATE_PER_SECOND", "5"))
+TEST_DURATION_MINUTES = int(os.getenv("TEST_DURATION_MINUTES", "10"))
+CLAIM_TTL_SECONDS = int(os.getenv("CLAIM_TTL_SECONDS", "60"))
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/keda-scale-to-zero/create-claim.py` around lines 21 - 26, The
load-generator settings are still hard-coded in the create_claim.py
configuration block, so the example is not actually configurable. Update the
top-level constants in create_claim.py (for example RATE_PER_SECOND,
TEST_DURATION_MINUTES, and CLAIM_TTL_SECONDS alongside NAMESPACE and WARMPOOL)
to read from environment variables with sensible defaults, and make sure the
rest of the script uses those symbols so the load knobs can be changed without
editing the file.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@examples/keda-scale-to-zero/create-claim.py`:
- Around line 60-69: The create flow in create_claim() and the surrounding loop
should not fire-and-forget background workers while incrementing totals upfront.
Replace the unbounded thread spawning with a bounded executor or equivalent
ownership mechanism, wait for all submitted create tasks to complete, and count
only successful create_namespaced_custom_object calls as created. Also surface
failures instead of swallowing them in the except block so the caller can
observe RBAC/CRD/config errors and apply backpressure when API latency is high.

In `@examples/keda-scale-to-zero/python-sandbox-template.yaml`:
- Around line 4-5: Update the comment in python-sandbox-template.yaml so it
correctly describes the linkage: the template name is consumed by
sandboxwarmpool.yaml through ${TEMPLATE_NAME}, not directly by create-claim.py.
Adjust the wording near the name field to reference the warm pool manifest and
keep the explanation aligned with the actual consumer, using the unique symbols
TEMPLATE_NAME and sandboxwarmpool.yaml to locate the spot.

In `@examples/keda-scale-to-zero/README.md`:
- Line 111: The troubleshooting guidance in the README uses inconsistent metric
label names for the controller metric exposed by the KEDA scale-to-zero example.
Update the references in the troubleshooting section and the Cloud Console query
example to use the same label as the metric description in the document, and
align the wording around the controller metric exposed by the relevant README
sections such as the metric description and troubleshooting/query examples. If
both labels are intended for different resources, explicitly distinguish them so
users know which label to query when investigating scale-from-zero issues.

---

Outside diff comments:
In `@examples/keda-scale-to-zero/README.md`:
- Around line 1-319: The `keda-scale-to-zero` example is documented only in the
README and won’t appear in the site navigation/index yet. Add the missing docs
page at `site/content/docs/use-cases/examples/keda-scale-to-zero/_index.md` and
wire it into the examples landing page so it is discoverable alongside the other
examples; use the existing `examples/keda-scale-to-zero/README.md` content as
the source and keep the page title/metadata aligned with the examples section.

---

Nitpick comments:
In `@examples/keda-scale-to-zero/create-claim.py`:
- Around line 21-26: The load-generator settings are still hard-coded in the
create_claim.py configuration block, so the example is not actually
configurable. Update the top-level constants in create_claim.py (for example
RATE_PER_SECOND, TEST_DURATION_MINUTES, and CLAIM_TTL_SECONDS alongside
NAMESPACE and WARMPOOL) to read from environment variables with sensible
defaults, and make sure the rest of the script uses those symbols so the load
knobs can be changed without editing the file.

In `@examples/keda-scale-to-zero/README.md`:
- Line 269: Update the wording in the KEDA scale-to-zero README to hyphenate the
compound modifier in the sentence containing “~1 minute window.” Adjust the text
so the modifier before the noun reads as “~1-minute window,” keeping the rest of
the sentence unchanged.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 3407863e-eb6d-4ca1-b185-73c735f741cd

📥 Commits

Reviewing files that changed from the base of the PR and between f83dec5 and fda2802.

📒 Files selected for processing (7)

examples/keda-scale-to-zero/README.md
examples/keda-scale-to-zero/create-claim.py
examples/keda-scale-to-zero/pod-monitoring.yaml
examples/keda-scale-to-zero/python-sandbox-template.yaml
examples/keda-scale-to-zero/sandboxwarmpool.yaml
examples/keda-scale-to-zero/scaledobject-prometheus.yaml
examples/keda-scale-to-zero/scaledobject-stackdriver.yaml

coderabbitai · 2026-06-30T01:31:06Z

+    try:
+        custom_api.create_namespaced_custom_object(
+            group="extensions.agents.x-k8s.io",
+            version="v1beta1",
+            namespace=NAMESPACE,
+            plural="sandboxclaims",
+            body=body
+        )
+    except Exception as e:
+        print(f"Error creating {name}: {e}")


🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

Own the create lifecycle instead of fire-and-forget threads.

This loop increments the total before each background request finishes, suppresses failures inside create_claim(), and never waits for the spawned workers. A bad RBAC/CRD/config setup can therefore report thousands of “claims created” while creating none, and sustained API latency can accumulate thousands of live threads with no backpressure. Use a bounded executor, wait for completion, and count successes separately from attempts. As per coding guidelines, "Concurrency: respect context.Context cancellation; avoid goroutines without lifetime ownership; protect shared state."

Suggested direction

-import threading +from concurrent.futures import ThreadPoolExecutor, as_completed @@ def create_claim(index): @@ try: custom_api.create_namespaced_custom_object( @@ body=body ) + return True except Exception as e: print(f"Error creating {name}: {e}") + return False @@ - counter = 0 + attempts = 0 + successes = 0 @@ - while time.time() < end_time: - # Fire and forget the creation in a thread to avoid blocking the clock - threading.Thread(target=create_claim, args=(counter,), daemon=True).start() - counter += 1 + futures = [] + with ThreadPoolExecutor(max_workers=RATE_PER_SECOND * 2) as executor: + while time.time() < end_time: + futures.append(executor.submit(create_claim, attempts)) + attempts += 1 @@ - if counter % 10 == 0: - print(f"Progress: {counter} claims created...") + if attempts % 10 == 0: + print(f"Progress: {attempts} claim attempts scheduled...") + + for future in as_completed(futures): + if future.result(): + successes += 1 @@ - print(f"Load test complete. Total claims created: {counter}") + print(f"Load test complete. {successes}/{attempts} claims created successfully.")

Also applies to: 80-103

🧰 Tools

🪛 Ruff (0.15.20)

[warning] 68-68: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@examples/keda-scale-to-zero/create-claim.py` around lines 60 - 69, The create flow in create_claim() and the surrounding loop should not fire-and-forget background workers while incrementing totals upfront. Replace the unbounded thread spawning with a bounded executor or equivalent ownership mechanism, wait for all submitted create tasks to complete, and count only successful create_namespaced_custom_object calls as created. Also surface failures instead of swallowing them in the except block so the caller can observe RBAC/CRD/config errors and apply backpressure when API latency is high.

Sources: Coding guidelines, Linters/SAST tools

coderabbitai · 2026-06-30T01:31:06Z

+  # The create-claim.py expects the template to have this name
+  name: ${TEMPLATE_NAME}


📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win

Fix the name-linkage comment.

create-claim.py never references the template directly; it targets the warm pool, and examples/keda-scale-to-zero/sandboxwarmpool.yaml is the manifest that consumes ${TEMPLATE_NAME}. As written, this comment points readers at the wrong object to keep in sync.

Suggested edit

- # The create-claim.py expects the template to have this name + # sandboxwarmpool.yaml refers to this template via spec.sandboxTemplateRef.name

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

# The create-claim.py expects the template to have this name

name: ${TEMPLATE_NAME}

# sandboxwarmpool.yaml refers to this template via spec.sandboxTemplateRef.name

name: ${TEMPLATE_NAME}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@examples/keda-scale-to-zero/python-sandbox-template.yaml` around lines 4 - 5, Update the comment in python-sandbox-template.yaml so it correctly describes the linkage: the template name is consumed by sandboxwarmpool.yaml through ${TEMPLATE_NAME}, not directly by create-claim.py. Adjust the wording near the name field to reference the warm pool manifest and keep the explanation aligned with the actual consumer, using the unique symbols TEMPLATE_NAME and sandboxwarmpool.yaml to locate the spot.

coderabbitai · 2026-06-30T01:31:06Z

+
+3. **Expose the controller metric via GKE Managed Service for Prometheus**:
+   Apply the `pod-monitoring.yaml` to scrape the controller's `/metrics` endpoint. This exposes
+   `agent_sandbox_claim_creation_total{warmpool_name="..."}` into GMP.


🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Fix inconsistent metric label names in troubleshooting guidance.

The document states the controller exposes agent_sandbox_claim_creation_total{warmpool_name="..."} (Line 111) and explains that the controller records warmpool_name from spec.warmPoolRef.name (Line 265). However, the troubleshooting section tells users to "verify you filtered on sandbox_template" (Line 280) and the Cloud Console query example uses {sandbox_template="$TEMPLATE_NAME"} (Line 305). These are contradictory — warmpool_name and sandbox_template are different labels on different resources. Users following the troubleshooting steps will query a non-existent label and fail to diagnose scale-from-zero issues.

Use warmpool_name consistently throughout, or if the metric actually carries both labels, clarify which label is used for which purpose.

Also applies to: 265-265, 280-280, 305-305

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@examples/keda-scale-to-zero/README.md` at line 111, The troubleshooting guidance in the README uses inconsistent metric label names for the controller metric exposed by the KEDA scale-to-zero example. Update the references in the troubleshooting section and the Cloud Console query example to use the same label as the metric description in the document, and align the wording around the controller metric exposed by the relevant README sections such as the metric description and troubleshooting/query examples. If both labels are intended for different resources, explicitly distinguish them so users know which label to query when investigating scale-from-zero issues.

github-project-automation Bot added this to Agent Sandbox Jun 26, 2026

github-project-automation Bot moved this to Backlog in Agent Sandbox Jun 26, 2026

kubernetes-prow Bot requested review from aditya-shantanu and igooch June 26, 2026 17:55

shrutiyam-glitch marked this pull request as draft June 26, 2026 17:55

kubernetes-prow Bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jun 26, 2026

coderabbitai Bot requested changes Jun 26, 2026

View reviewed changes

Keda scale to zero

b16c768

shrutiyam-glitch force-pushed the keda branch from f83dec5 to b16c768 Compare June 26, 2026 21:16

kubernetes-prow Bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jun 26, 2026

shrutiyam-glitch mentioned this pull request Jun 26, 2026

metric: add warmpool_name label for sandboxclaim metrics #1050

Merged

dongjiang1989 reviewed Jun 29, 2026

View reviewed changes

janetkuo requested a review from Copilot June 29, 2026 17:31

Copilot started reviewing on behalf of janetkuo June 29, 2026 17:32 View session

Copilot AI reviewed Jun 29, 2026

View reviewed changes

janetkuo added the action-required: resolve-copilot-comments label Jun 29, 2026

shrutiyam-glitch added 2 commits June 30, 2026 01:19

Address comments

fc0c47c

Merge branch 'main' into keda

fda2802

coderabbitai Bot approved these changes Jun 30, 2026

View reviewed changes

shrutiyam-glitch marked this pull request as ready for review June 30, 2026 01:23

kubernetes-prow Bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 30, 2026

kubernetes-prow Bot requested review from janetkuo and vicentefb June 30, 2026 01:23

shrutiyam-glitch marked this pull request as draft June 30, 2026 01:24

kubernetes-prow Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 30, 2026

janetkuo added ready-for-review and removed action-required: resolve-copilot-comments labels Jun 30, 2026

coderabbitai Bot requested changes Jun 30, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

example: scale to zero using KEDA#1048

example: scale to zero using KEDA#1048
shrutiyam-glitch wants to merge 3 commits into
kubernetes-sigs:mainfrom
shrutiyam-glitch:keda

shrutiyam-glitch commented Jun 26, 2026 •

edited by coderabbitai Bot

Loading

netlify Bot commented Jun 26, 2026 •

edited

Loading

coderabbitai Bot commented Jun 26, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dongjiang1989 Jun 29, 2026

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kubernetes-prow Bot commented Jun 30, 2026

coderabbitai Bot left a comment

coderabbitai Bot Jun 30, 2026

coderabbitai Bot Jun 30, 2026

coderabbitai Bot Jun 30, 2026

Labels

4 participants

		@@ -0,0 +1,312 @@
		# SandboxWarmPool Scale-to-Zero with KEDA on GKE

		# The create-claim.py expects the template to have this name
		name: ${TEMPLATE_NAME}

Uh oh!

Conversation

shrutiyam-glitch commented Jun 26, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it:

Which issue(s) this PR is related to:

Release Note

Summary by CodeRabbit

netlify Bot commented Jun 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for agent-sandbox canceled.

coderabbitai Bot commented Jun 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dongjiang1989 Jun 29, 2026

Choose a reason for hiding this comment

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kubernetes-prow Bot commented Jun 30, 2026

coderabbitai Bot left a comment

Choose a reason for hiding this comment

coderabbitai Bot Jun 30, 2026

Choose a reason for hiding this comment

coderabbitai Bot Jun 30, 2026

Choose a reason for hiding this comment

coderabbitai Bot Jun 30, 2026

Choose a reason for hiding this comment

Labels

4 participants

shrutiyam-glitch commented Jun 26, 2026 •

edited by coderabbitai Bot

Loading

netlify Bot commented Jun 26, 2026 •

edited

Loading

coderabbitai Bot commented Jun 26, 2026 •

edited

Loading