[RFC] Add claimed label to agent_sandboxes metric#711
Conversation
✅ Deploy Preview for agent-sandbox ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
|
Welcome @rainwoodman! |
|
Hi @rainwoodman. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Regular contributors should join the org to skip this step. Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
There was a problem hiding this comment.
Pull request overview
This RFC improves observability of sandbox utilization by extending the existing agent_sandboxes Prometheus metric to explicitly distinguish claimed (in-use) sandboxes from unclaimed ones, using OwnerReferences to detect Sandboxes controlled by a SandboxClaim.
Changes:
- Add a
claimed="true|false"label to theagent_sandboxesmetric descriptor. - Populate
claimedduring collection by checking the Sandbox controller OwnerReference Kind. - Extend unit tests to assert the new label behavior, including a claimed Sandbox case.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| internal/metrics/metrics.go | Adds claimed to the agent_sandboxes descriptor label set. |
| internal/metrics/sandbox_collector.go | Computes claimed based on controller OwnerReference and includes it in metric key/labels. |
| internal/metrics/sandbox_collector_test.go | Updates existing expectations to include claimed and adds a claimed-sandbox test case. |
igooch
left a comment
There was a problem hiding this comment.
A couple comments, overall the PR looks look and is an improvement on the current metric.
Goal:
The goal of this change is to add a claimed label to the agent_sandboxes
metric to distinguish claimed sandboxes from those in the warm pool. This
helps in monitoring and understanding the state of sandboxes in the cluster
more accurately.
Summary of Changes:
- Updated internal/metrics/metrics.go to include the claimed label in the
AgentSandboxesDesc descriptor.
- Updated internal/metrics/sandbox_collector.go to populate the claimed
label by checking the OwnerReferences of each Sandbox. If a Sandbox is
owned by a SandboxClaim, it is marked as claimed ("true"), otherwise
"false".
- Updated internal/metrics/sandbox_collector_test.go to cover the new
label, including a specific test case for claimed sandboxes.
Verification Results:
- Unit tests passed successfully: go test -v ./internal/metrics/...
- Linter passed with 0 issues: ./dev/tools/lint-go
We have addressed the review comments from igooch on PR 711. Goal of Changes: The goal was to refactor the sandbox metrics collector to: 1. Avoid hardcoded API versions by dynamically retrieving them from the extensions API package. 2. Improve code readability by using a switch statement instead of nested if/else blocks. 3. Ensure consistent naming for the warm pool owned_by metric label value (SandboxWarmPool instead of Warmpool). Changes Made: - Sandbox Collector Refactoring: Imported sigs.k8s.io/agent-sandbox/extensions/api/v1alpha1 and used it to dynamically determine the API version. Refactored the owner reference check to use a switch statement and updated the warm pool label value. - Metrics Documentation Update: Updated the documentation comments for the agent_sandboxes metric descriptor to reflect the new SandboxWarmPool label value. - Test Updates: Updated the unit tests to assert the new SandboxWarmPool label value instead of Warmpool. Verification: Ran the metrics package tests and verified they all passed: go test -v ./internal/metrics/...
13438b8 to
a42f807
Compare
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: igooch, rainwoodman The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
* metrics: add claimed label to agent_sandboxes metric
Goal:
The goal of this change is to add a claimed label to the agent_sandboxes
metric to distinguish claimed sandboxes from those in the warm pool. This
helps in monitoring and understanding the state of sandboxes in the cluster
more accurately.
Summary of Changes:
- Updated internal/metrics/metrics.go to include the claimed label in the
AgentSandboxesDesc descriptor.
- Updated internal/metrics/sandbox_collector.go to populate the claimed
label by checking the OwnerReferences of each Sandbox. If a Sandbox is
owned by a SandboxClaim, it is marked as claimed ("true"), otherwise
"false".
- Updated internal/metrics/sandbox_collector_test.go to cover the new
label, including a specific test case for claimed sandboxes.
Verification Results:
- Unit tests passed successfully: go test -v ./internal/metrics/...
- Linter passed with 0 issues: ./dev/tools/lint-go
* metrics: replace claimed label with owned_by on agent_sandboxes metric
* metrics: address Copilot comments on PR 711
* metrics: fix linter issues on PR 711
* Address PR 711 review comments from igooch
We have addressed the review comments from igooch on PR 711.
Goal of Changes:
The goal was to refactor the sandbox metrics collector to:
1. Avoid hardcoded API versions by dynamically retrieving them from the
extensions API package.
2. Improve code readability by using a switch statement instead of nested
if/else blocks.
3. Ensure consistent naming for the warm pool owned_by metric label value
(SandboxWarmPool instead of Warmpool).
Changes Made:
- Sandbox Collector Refactoring:
Imported sigs.k8s.io/agent-sandbox/extensions/api/v1alpha1 and used it to
dynamically determine the API version. Refactored the owner reference check
to use a switch statement and updated the warm pool label value.
- Metrics Documentation Update:
Updated the documentation comments for the agent_sandboxes metric descriptor
to reflect the new SandboxWarmPool label value.
- Test Updates:
Updated the unit tests to assert the new SandboxWarmPool label value instead
of Warmpool.
Verification:
Ran the metrics package tests and verified they all passed:
go test -v ./internal/metrics/...
* metrics: add claimed label to agent_sandboxes metric
Goal:
The goal of this change is to add a claimed label to the agent_sandboxes
metric to distinguish claimed sandboxes from those in the warm pool. This
helps in monitoring and understanding the state of sandboxes in the cluster
more accurately.
Summary of Changes:
- Updated internal/metrics/metrics.go to include the claimed label in the
AgentSandboxesDesc descriptor.
- Updated internal/metrics/sandbox_collector.go to populate the claimed
label by checking the OwnerReferences of each Sandbox. If a Sandbox is
owned by a SandboxClaim, it is marked as claimed ("true"), otherwise
"false".
- Updated internal/metrics/sandbox_collector_test.go to cover the new
label, including a specific test case for claimed sandboxes.
Verification Results:
- Unit tests passed successfully: go test -v ./internal/metrics/...
- Linter passed with 0 issues: ./dev/tools/lint-go
* metrics: replace claimed label with owned_by on agent_sandboxes metric
* metrics: address Copilot comments on PR 711
* metrics: fix linter issues on PR 711
* Address PR 711 review comments from igooch
We have addressed the review comments from igooch on PR 711.
Goal of Changes:
The goal was to refactor the sandbox metrics collector to:
1. Avoid hardcoded API versions by dynamically retrieving them from the
extensions API package.
2. Improve code readability by using a switch statement instead of nested
if/else blocks.
3. Ensure consistent naming for the warm pool owned_by metric label value
(SandboxWarmPool instead of Warmpool).
Changes Made:
- Sandbox Collector Refactoring:
Imported sigs.k8s.io/agent-sandbox/extensions/api/v1alpha1 and used it to
dynamically determine the API version. Refactored the owner reference check
to use a switch statement and updated the warm pool label value.
- Metrics Documentation Update:
Updated the documentation comments for the agent_sandboxes metric descriptor
to reflect the new SandboxWarmPool label value.
- Test Updates:
Updated the unit tests to assert the new SandboxWarmPool label value instead
of Warmpool.
Verification:
Ran the metrics package tests and verified they all passed:
go test -v ./internal/metrics/...
Problem Statement
Currently, the
agent_sandboxesmetric does not distinguish between sandboxes that are actively claimed by a user/agent and those that are sitting idle in the warm pool. This makes it difficult to track actual utilization of the sandbox pool from metrics alone.Proposed Solution (RFC)
To provide a more reliable and explicit way to track this state, this PR proposes adding a
claimed="true|false"label to theagent_sandboxesmetric.Instead of relying on annotations, this implementation checks the
OwnerReferencesof theSandboxresource. If theSandboxis controlled by aSandboxClaim, it is marked asclaimed="true". Otherwise, it defaults to"false".This approach keeps the metric cardinality low (only two values for
claimed) while providing high-value visibility into the actual utilization of the sandbox pool.Intention
This PR is submitted as an RFC (Request for Comments). While the code is functional and includes unit tests, the primary goal is to start a discussion on the best way to model this state in our metrics and to seek general advice on actionable metrics for the Sandbox system.
Specifically, we are interested in metrics that can help answer operational questions such as:
We are also looking for feedback on:
claimedlabel is the right abstraction for this specific metric.We hope this sparks some inspiration and discussion!
Changes Included
internal/metrics/metrics.go: Added"claimed"toAgentSandboxesDesclabels.internal/metrics/sandbox_collector.go: UpdatedCollectmethod to checkOwnerReferencesand set theclaimedlabel.internal/metrics/sandbox_collector_test.go: Added test cases to verify the new label behavior.Verification Results
go test -v ./internal/metrics/...