Skip to content

example: scale to zero using KEDA#1048

Draft
shrutiyam-glitch wants to merge 3 commits into
kubernetes-sigs:mainfrom
shrutiyam-glitch:keda
Draft

example: scale to zero using KEDA#1048
shrutiyam-glitch wants to merge 3 commits into
kubernetes-sigs:mainfrom
shrutiyam-glitch:keda

Conversation

@shrutiyam-glitch

@shrutiyam-glitch shrutiyam-glitch commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

What this PR does / why we need it:

This pull request introduces a complete end-to-end guide and resource templates demonstrating how to implement scale-to-zero capabilities for warm sandbox pools on GKE using KEDA.

By default, warm pools must balance active instances with resource consumption. This example provides ready-to-use configurations to dynamically scale warm pools based on claim rates, allowing them to scale down to zero when idle.

Which issue(s) this PR is related to:

Ref: #677
Related issues: #1050

Release Note

Added a complete end-to-end example and guide for scaling GKE sandbox warm pools to zero using KEDA. 

Summary by CodeRabbit

  • New Features
    • Added a complete KEDA scale-to-zero example for warm pool workloads on GKE.
    • Included ready-to-use manifests for warm pools, workload templates, monitoring, and autoscaling.
    • Added a sample load generator to help test scaling behavior.
  • Documentation
    • Added step-by-step setup, verification, and troubleshooting guidance.
    • Documented an alternate Cloud Monitoring-based scaling option and when to use it.
@netlify

netlify Bot commented Jun 26, 2026

Copy link
Copy Markdown

Deploy Preview for agent-sandbox canceled.

Name Link
🔨 Latest commit fda2802
🔍 Latest deploy log https://app.netlify.com/projects/agent-sandbox/deploys/6a4319f273cf2f0008f6e6f3
@shrutiyam-glitch shrutiyam-glitch marked this pull request as draft June 26, 2026 17:55
@kubernetes-prow kubernetes-prow Bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jun 26, 2026
@coderabbitai

coderabbitai Bot commented Jun 26, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

Adds a new examples/keda-scale-to-zero/ directory containing a SandboxTemplate, SandboxWarmPool, PodMonitoring, two KEDA ScaledObject manifests (Prometheus and Stackdriver variants), a Python load-generator script, and a README with an end-to-end GKE runbook.

Changes

KEDA warm pool scale-to-zero example

Layer / File(s) Summary
Sandbox template and warm pool manifests
examples/keda-scale-to-zero/python-sandbox-template.yaml, examples/keda-scale-to-zero/sandboxwarmpool.yaml
Defines a SandboxTemplate with a python-runtime container and a SandboxWarmPool initialized to replicas: 0 for KEDA-managed scaling.
Prometheus-based ScaledObject and metrics scrape
examples/keda-scale-to-zero/pod-monitoring.yaml, examples/keda-scale-to-zero/scaledobject-prometheus.yaml
Adds a PodMonitoring resource to scrape controller metrics into GMP and a KEDA ScaledObject that scales the warm pool based on the agent_sandbox_claim_creation_total rate via GMP Prometheus frontend.
Stackdriver-based ScaledObject
examples/keda-scale-to-zero/scaledobject-stackdriver.yaml
Adds a TriggerAuthentication using GKE Workload Identity and a ScaledObject with Cloud Monitoring trigger, fallback for transient errors, and per-second rate alignment with activationTargetValue gating.
Claim load generator script
examples/keda-scale-to-zero/create-claim.py
Python script that creates SandboxClaim CRs at a configurable rate using daemon threads, reports progress, and waits for TTL-based cleanup after the test loop.
End-to-end runbook and troubleshooting docs
examples/keda-scale-to-zero/README.md
README covering rationale, prerequisites, full Prometheus and Stackdriver runbooks (KEDA install, IAM, Workload Identity, load generation, verification), scaling mechanics, troubleshooting, and sources.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~15 minutes

Possibly related PRs

  • kubernetes-sigs/agent-sandbox#1050: Changes agent_sandbox_claim_creation_total to use the real claim.Spec.WarmPoolRef.Name label, directly aligning the metric with the filter queries used in the ScaledObject manifests added here.

Suggested reviewers

  • igooch
  • janetkuo
  • barney-s
  • justinsb

🐇 A warm pool sleeping at zero,
Til claims arrive and KEDA says "go!"
The Prometheus rate ticks up fast,
Replicas wake from the past—
Scale up, scale down, what a show! 🎉

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title is concise and clearly reflects the main change: adding a KEDA-based scale-to-zero example.
Description check ✅ Passed The description follows the template, covers the change, links related issues, and includes a release note.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (1)
examples/keda-swp-scaling/python-sandbox-template.yaml (1)

4-5: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Point this comment at the actual dependent file.

create-claim.py never references python-sandbox-template; the real contract is examples/keda-swp-scaling/sandboxwarmpool.yaml via spec.sandboxTemplateRef.name. As written, the comment sends readers to the wrong file when they rename resources.

📝 Suggested fix
-  # The create-claim.py expects the template to have this name
+  # sandboxwarmpool.yaml references this via spec.sandboxTemplateRef.name
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/keda-swp-scaling/python-sandbox-template.yaml` around lines 4 - 5,
The comment is attached to the wrong template file and should point to the
actual dependency used by create-claim.py. Update the reference so it documents
examples/keda-swp-scaling/sandboxwarmpool.yaml and the
spec.sandboxTemplateRef.name contract, since that is what the claim script
relies on when matching resource names. Keep the note aligned with the real
consumer and remove the misleading link to python-sandbox-template.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@examples/keda-swp-scaling/create-claim.py`:
- Around line 28-31: The kubeconfig fallback in create-claim.py is too broad
because the current try/except around config.load_kube_config() catches all
failures and can hide real local config/auth errors. Narrow the catch in that
startup block to ConfigException, and only call config.load_incluster_config()
when KUBERNETES_SERVICE_HOST is set so the fallback happens only in-cluster; use
the existing config.load_kube_config and config.load_incluster_config calls to
locate the change.
- Around line 53-94: The load loop in create_claim.py is counting scheduled
threads as completed claims and can launch unbounded daemon workers. Update
create_claim and the main rate loop so concurrency is bounded with a worker
limit or thread pool, and only increment/report progress after
create_namespaced_custom_object finishes successfully. Keep the progress and
final totals tied to completed claim creations rather than thread starts.

In `@examples/keda-swp-scaling/README.md`:
- Around line 171-175: The Stackdriver IAM example in the README uses a
hardcoded PROJECT_ID inside the principal URI, so update the command in the KEDA
IAM binding example to interpolate the actual $PROJECT_ID consistently. Make the
principal string in the gcloud projects add-iam-policy-binding example match the
same project variable used elsewhere in the snippet so the workload identity
principal resolves correctly for the KEDA operator.

In `@examples/keda-swp-scaling/scaledobject-stackdriver.yaml`:
- Around line 59-60: The Stackdriver ScaledObject’s target setting is
inconsistent with the Prometheus variant, so update the `targetValue` in
`scaledobject-stackdriver.yaml` to match the same claims/sec per replica
threshold used by the Prometheus example. Keep the `ScaledObject` configuration
aligned with the HPA/Prometheus semantics and adjust the nearby comment so it no
longer claims a different value “matches the HPA example.”
- Line 47: The Stackdriver scaledobject manifest currently hardcodes a specific
GCP project ID, so replace the projectId value in the scaledobject-stackdriver
YAML with a placeholder such as PROJECT_ID or YOUR_PROJECT_ID and make sure any
related example references use the same placeholder. Keep the manifest generic
by updating the field in the Stackdriver configuration block, and add a brief
note in the README explaining that users must substitute their own project ID
before applying the example.

---

Nitpick comments:
In `@examples/keda-swp-scaling/python-sandbox-template.yaml`:
- Around line 4-5: The comment is attached to the wrong template file and should
point to the actual dependency used by create-claim.py. Update the reference so
it documents examples/keda-swp-scaling/sandboxwarmpool.yaml and the
spec.sandboxTemplateRef.name contract, since that is what the claim script
relies on when matching resource names. Keep the note aligned with the real
consumer and remove the misleading link to python-sandbox-template.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 19b2de5d-0e53-4ca1-8abb-d9454a23c09e

📥 Commits

Reviewing files that changed from the base of the PR and between bce2dda and f83dec5.

📒 Files selected for processing (7)
  • examples/keda-swp-scaling/README.md
  • examples/keda-swp-scaling/create-claim.py
  • examples/keda-swp-scaling/pod-monitoring.yaml
  • examples/keda-swp-scaling/python-sandbox-template.yaml
  • examples/keda-swp-scaling/sandboxwarmpool.yaml
  • examples/keda-swp-scaling/scaledobject-prometheus.yaml
  • examples/keda-swp-scaling/scaledobject-stackdriver.yaml
Comment thread examples/keda-scale-to-zero/create-claim.py Outdated
Comment thread examples/keda-swp-scaling/create-claim.py Outdated
Comment thread examples/keda-swp-scaling/README.md Outdated
Comment thread examples/keda-swp-scaling/scaledobject-stackdriver.yaml Outdated
Comment thread examples/keda-scale-to-zero/scaledobject-stackdriver.yaml Outdated
@kubernetes-prow kubernetes-prow Bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jun 26, 2026
@@ -0,0 +1,312 @@
# SandboxWarmPool Scale-to-Zero with KEDA on GKE

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add it to site show in website

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a new end-to-end example showing how to scale SandboxWarmPool replicas down to zero on GKE using KEDA, and updates controller metrics so cold-start claim creation is labeled with the referenced warm pool name (enabling warmpool_name-scoped scaling queries).

Changes:

  • Record cold-start SandboxClaim creation metrics with claim.spec.warmPoolRef.name instead of the hardcoded "none".
  • Update the existing controller test to assert the new warmpool_name label value.
  • Add a complete examples/keda-scale-to-zero/ walkthrough with manifests and a load generator script.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
extensions/controllers/sandboxclaim_controller.go Updates cold-start metric recording to use the claim’s referenced warm pool name.
extensions/controllers/sandboxclaim_controller_test.go Adjusts assertions to validate the new warmpool_name label value.
examples/keda-scale-to-zero/README.md Adds an end-to-end guide for KEDA-based scale-to-zero on GKE (GMP + optional Stackdriver path).
examples/keda-scale-to-zero/scaledobject-prometheus.yaml Adds KEDA ScaledObject using the Prometheus scaler against the GMP frontend.
examples/keda-scale-to-zero/scaledobject-stackdriver.yaml Adds an alternative KEDA ScaledObject using Cloud Monitoring (Stackdriver) directly.
examples/keda-scale-to-zero/sandboxwarmpool.yaml Adds a SandboxWarmPool manifest starting at replicas: 0 for KEDA control.
examples/keda-scale-to-zero/python-sandbox-template.yaml Adds a SandboxTemplate manifest for the example warm pool.
examples/keda-scale-to-zero/pod-monitoring.yaml Adds GMP PodMonitoring manifest to scrape controller metrics.
examples/keda-scale-to-zero/create-claim.py Adds a Python script to generate SandboxClaim load (with TTL via lifecycle shutdown time).
Comment thread examples/keda-scale-to-zero/scaledobject-prometheus.yaml
Comment thread examples/keda-scale-to-zero/scaledobject-prometheus.yaml Outdated
Comment thread examples/keda-scale-to-zero/scaledobject-stackdriver.yaml Outdated
Comment thread examples/keda-scale-to-zero/README.md
Comment thread examples/keda-scale-to-zero/README.md Outdated
@kubernetes-prow

Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: shrutiyam-glitch
Once this PR has been reviewed and has the lgtm label, please assign vicentefb for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@shrutiyam-glitch shrutiyam-glitch marked this pull request as ready for review June 30, 2026 01:23
@kubernetes-prow kubernetes-prow Bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 30, 2026
@kubernetes-prow kubernetes-prow Bot requested review from janetkuo and vicentefb June 30, 2026 01:23
@shrutiyam-glitch shrutiyam-glitch marked this pull request as draft June 30, 2026 01:24
@kubernetes-prow kubernetes-prow Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 30, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
examples/keda-scale-to-zero/README.md (1)

1-319: 📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win

Add a docs page for keda-scale-to-zero The README is mounted into assets/additional/examples, but there’s no site/content/docs/use-cases/examples/keda-scale-to-zero/_index.md or landing-page link, so it won’t show up in the examples nav/index yet.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/keda-scale-to-zero/README.md` around lines 1 - 319, The
`keda-scale-to-zero` example is documented only in the README and won’t appear
in the site navigation/index yet. Add the missing docs page at
`site/content/docs/use-cases/examples/keda-scale-to-zero/_index.md` and wire it
into the examples landing page so it is discoverable alongside the other
examples; use the existing `examples/keda-scale-to-zero/README.md` content as
the source and keep the page title/metadata aligned with the examples section.

Source: Path instructions

🧹 Nitpick comments (2)
examples/keda-scale-to-zero/README.md (1)

269-269: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low value

Hyphenate compound modifier for grammar correctness.

"~1 minute window" should be "~1-minute window" (compound modifier before a noun).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/keda-scale-to-zero/README.md` at line 269, Update the wording in the
KEDA scale-to-zero README to hyphenate the compound modifier in the sentence
containing “~1 minute window.” Adjust the text so the modifier before the noun
reads as “~1-minute window,” keeping the rest of the sentence unchanged.

Source: Linters/SAST tools

examples/keda-scale-to-zero/create-claim.py (1)

21-26: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Expose the load knobs instead of hard-coding them.

The PR describes this as a configurable load generator, but rate, duration, and TTL can only be changed by editing the file. Pulling them from env vars would make the example reusable as documented.

Suggested edit
 NAMESPACE = os.getenv("NAMESPACE", "keda-test")
 WARMPOOL = os.getenv("WARM_POOL_NAME", "python-sdk-warmpool")
-RATE_PER_SECOND = 5
-TEST_DURATION_MINUTES = 10
-CLAIM_TTL_SECONDS = 60
+RATE_PER_SECOND = int(os.getenv("RATE_PER_SECOND", "5"))
+TEST_DURATION_MINUTES = int(os.getenv("TEST_DURATION_MINUTES", "10"))
+CLAIM_TTL_SECONDS = int(os.getenv("CLAIM_TTL_SECONDS", "60"))
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/keda-scale-to-zero/create-claim.py` around lines 21 - 26, The
load-generator settings are still hard-coded in the create_claim.py
configuration block, so the example is not actually configurable. Update the
top-level constants in create_claim.py (for example RATE_PER_SECOND,
TEST_DURATION_MINUTES, and CLAIM_TTL_SECONDS alongside NAMESPACE and WARMPOOL)
to read from environment variables with sensible defaults, and make sure the
rest of the script uses those symbols so the load knobs can be changed without
editing the file.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@examples/keda-scale-to-zero/create-claim.py`:
- Around line 60-69: The create flow in create_claim() and the surrounding loop
should not fire-and-forget background workers while incrementing totals upfront.
Replace the unbounded thread spawning with a bounded executor or equivalent
ownership mechanism, wait for all submitted create tasks to complete, and count
only successful create_namespaced_custom_object calls as created. Also surface
failures instead of swallowing them in the except block so the caller can
observe RBAC/CRD/config errors and apply backpressure when API latency is high.

In `@examples/keda-scale-to-zero/python-sandbox-template.yaml`:
- Around line 4-5: Update the comment in python-sandbox-template.yaml so it
correctly describes the linkage: the template name is consumed by
sandboxwarmpool.yaml through ${TEMPLATE_NAME}, not directly by create-claim.py.
Adjust the wording near the name field to reference the warm pool manifest and
keep the explanation aligned with the actual consumer, using the unique symbols
TEMPLATE_NAME and sandboxwarmpool.yaml to locate the spot.

In `@examples/keda-scale-to-zero/README.md`:
- Line 111: The troubleshooting guidance in the README uses inconsistent metric
label names for the controller metric exposed by the KEDA scale-to-zero example.
Update the references in the troubleshooting section and the Cloud Console query
example to use the same label as the metric description in the document, and
align the wording around the controller metric exposed by the relevant README
sections such as the metric description and troubleshooting/query examples. If
both labels are intended for different resources, explicitly distinguish them so
users know which label to query when investigating scale-from-zero issues.

---

Outside diff comments:
In `@examples/keda-scale-to-zero/README.md`:
- Around line 1-319: The `keda-scale-to-zero` example is documented only in the
README and won’t appear in the site navigation/index yet. Add the missing docs
page at `site/content/docs/use-cases/examples/keda-scale-to-zero/_index.md` and
wire it into the examples landing page so it is discoverable alongside the other
examples; use the existing `examples/keda-scale-to-zero/README.md` content as
the source and keep the page title/metadata aligned with the examples section.

---

Nitpick comments:
In `@examples/keda-scale-to-zero/create-claim.py`:
- Around line 21-26: The load-generator settings are still hard-coded in the
create_claim.py configuration block, so the example is not actually
configurable. Update the top-level constants in create_claim.py (for example
RATE_PER_SECOND, TEST_DURATION_MINUTES, and CLAIM_TTL_SECONDS alongside
NAMESPACE and WARMPOOL) to read from environment variables with sensible
defaults, and make sure the rest of the script uses those symbols so the load
knobs can be changed without editing the file.

In `@examples/keda-scale-to-zero/README.md`:
- Line 269: Update the wording in the KEDA scale-to-zero README to hyphenate the
compound modifier in the sentence containing “~1 minute window.” Adjust the text
so the modifier before the noun reads as “~1-minute window,” keeping the rest of
the sentence unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 3407863e-eb6d-4ca1-b185-73c735f741cd

📥 Commits

Reviewing files that changed from the base of the PR and between f83dec5 and fda2802.

📒 Files selected for processing (7)
  • examples/keda-scale-to-zero/README.md
  • examples/keda-scale-to-zero/create-claim.py
  • examples/keda-scale-to-zero/pod-monitoring.yaml
  • examples/keda-scale-to-zero/python-sandbox-template.yaml
  • examples/keda-scale-to-zero/sandboxwarmpool.yaml
  • examples/keda-scale-to-zero/scaledobject-prometheus.yaml
  • examples/keda-scale-to-zero/scaledobject-stackdriver.yaml
Comment on lines +60 to +69
try:
custom_api.create_namespaced_custom_object(
group="extensions.agents.x-k8s.io",
version="v1beta1",
namespace=NAMESPACE,
plural="sandboxclaims",
body=body
)
except Exception as e:
print(f"Error creating {name}: {e}")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

Own the create lifecycle instead of fire-and-forget threads.

This loop increments the total before each background request finishes, suppresses failures inside create_claim(), and never waits for the spawned workers. A bad RBAC/CRD/config setup can therefore report thousands of “claims created” while creating none, and sustained API latency can accumulate thousands of live threads with no backpressure. Use a bounded executor, wait for completion, and count successes separately from attempts. As per coding guidelines, "Concurrency: respect context.Context cancellation; avoid goroutines without lifetime ownership; protect shared state."

Suggested direction
-import threading
+from concurrent.futures import ThreadPoolExecutor, as_completed
@@
 def create_claim(index):
@@
     try:
         custom_api.create_namespaced_custom_object(
@@
             body=body
         )
+        return True
     except Exception as e:
         print(f"Error creating {name}: {e}")
+        return False
@@
-    counter = 0
+    attempts = 0
+    successes = 0
@@
-        while time.time() < end_time:
-            # Fire and forget the creation in a thread to avoid blocking the clock
-            threading.Thread(target=create_claim, args=(counter,), daemon=True).start()
-            counter += 1
+        futures = []
+        with ThreadPoolExecutor(max_workers=RATE_PER_SECOND * 2) as executor:
+            while time.time() < end_time:
+                futures.append(executor.submit(create_claim, attempts))
+                attempts += 1
@@
-            if counter % 10 == 0:
-                print(f"Progress: {counter} claims created...")
+                if attempts % 10 == 0:
+                    print(f"Progress: {attempts} claim attempts scheduled...")
+
+            for future in as_completed(futures):
+                if future.result():
+                    successes += 1
@@
-    print(f"Load test complete. Total claims created: {counter}")
+    print(f"Load test complete. {successes}/{attempts} claims created successfully.")

Also applies to: 80-103

🧰 Tools
🪛 Ruff (0.15.20)

[warning] 68-68: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/keda-scale-to-zero/create-claim.py` around lines 60 - 69, The create
flow in create_claim() and the surrounding loop should not fire-and-forget
background workers while incrementing totals upfront. Replace the unbounded
thread spawning with a bounded executor or equivalent ownership mechanism, wait
for all submitted create tasks to complete, and count only successful
create_namespaced_custom_object calls as created. Also surface failures instead
of swallowing them in the except block so the caller can observe RBAC/CRD/config
errors and apply backpressure when API latency is high.

Sources: Coding guidelines, Linters/SAST tools

Comment on lines +4 to +5
# The create-claim.py expects the template to have this name
name: ${TEMPLATE_NAME}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win

Fix the name-linkage comment.

create-claim.py never references the template directly; it targets the warm pool, and examples/keda-scale-to-zero/sandboxwarmpool.yaml is the manifest that consumes ${TEMPLATE_NAME}. As written, this comment points readers at the wrong object to keep in sync.

Suggested edit
-  # The create-claim.py expects the template to have this name
+  # sandboxwarmpool.yaml refers to this template via spec.sandboxTemplateRef.name
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# The create-claim.py expects the template to have this name
name: ${TEMPLATE_NAME}
# sandboxwarmpool.yaml refers to this template via spec.sandboxTemplateRef.name
name: ${TEMPLATE_NAME}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/keda-scale-to-zero/python-sandbox-template.yaml` around lines 4 - 5,
Update the comment in python-sandbox-template.yaml so it correctly describes the
linkage: the template name is consumed by sandboxwarmpool.yaml through
${TEMPLATE_NAME}, not directly by create-claim.py. Adjust the wording near the
name field to reference the warm pool manifest and keep the explanation aligned
with the actual consumer, using the unique symbols TEMPLATE_NAME and
sandboxwarmpool.yaml to locate the spot.

3. **Expose the controller metric via GKE Managed Service for Prometheus**:
Apply the `pod-monitoring.yaml` to scrape the controller's `/metrics` endpoint. This exposes
`agent_sandbox_claim_creation_total{warmpool_name="..."}` into GMP.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Fix inconsistent metric label names in troubleshooting guidance.

The document states the controller exposes agent_sandbox_claim_creation_total{warmpool_name="..."} (Line 111) and explains that the controller records warmpool_name from spec.warmPoolRef.name (Line 265). However, the troubleshooting section tells users to "verify you filtered on sandbox_template" (Line 280) and the Cloud Console query example uses {sandbox_template="$TEMPLATE_NAME"} (Line 305). These are contradictory — warmpool_name and sandbox_template are different labels on different resources. Users following the troubleshooting steps will query a non-existent label and fail to diagnose scale-from-zero issues.

Use warmpool_name consistently throughout, or if the metric actually carries both labels, clarify which label is used for which purpose.

Also applies to: 265-265, 280-280, 305-305

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/keda-scale-to-zero/README.md` at line 111, The troubleshooting
guidance in the README uses inconsistent metric label names for the controller
metric exposed by the KEDA scale-to-zero example. Update the references in the
troubleshooting section and the Cloud Console query example to use the same
label as the metric description in the document, and align the wording around
the controller metric exposed by the relevant README sections such as the metric
description and troubleshooting/query examples. If both labels are intended for
different resources, explicitly distinguish them so users know which label to
query when investigating scale-from-zero issues.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. ready-for-review size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

4 participants