Skip to content

Re-Implement worker thread collision avoidance for warm pool adoption#437

Merged
k8s-ci-robot merged 1 commit into
kubernetes-sigs:mainfrom
igooch:collision-avoidance
Mar 20, 2026
Merged

Re-Implement worker thread collision avoidance for warm pool adoption#437
k8s-ci-robot merged 1 commit into
kubernetes-sigs:mainfrom
igooch:collision-avoidance

Conversation

@igooch

@igooch igooch commented Mar 19, 2026

Copy link
Copy Markdown
Contributor

This pull request restores the deterministic "window" selection logic for SandboxClaim workers when adopting resources from the warm pool. This mechanism was previously introduced in PR #391 but was inadvertently removed during the refactor in PR #395, which transitioned the warm pool to use full Sandbox CRs instead of bare pods.

@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 19, 2026
@k8s-ci-robot

Copy link
Copy Markdown
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@netlify

netlify Bot commented Mar 19, 2026

Copy link
Copy Markdown

Deploy Preview for agent-sandbox canceled.

Name Link
🔨 Latest commit ad3adc2
🔍 Latest deploy log https://app.netlify.com/projects/agent-sandbox/deploys/69bdb32febbd7e000869b2e7
@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Mar 19, 2026
}
// Update uses optimistic concurrency (resourceVersion) so concurrent
// claims racing to adopt the same sandbox will conflict and retry.
if err := r.Update(ctx, adopted); err != nil {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

patch instead ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we update to server-side apply to the controller as a separate PR?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, i'm working on that PR

@aditya-shantanu

Copy link
Copy Markdown
Collaborator

/ok-to-test

@k8s-ci-robot k8s-ci-robot added the ok-to-test Indicates a non-member PR verified by an org member that is safe to test. label Mar 19, 2026
@igooch igooch changed the title Implement worker thread collision avoidance for warm pool adoption Mar 20, 2026
@igooch igooch force-pushed the collision-avoidance branch from 7305fdd to ad3adc2 Compare March 20, 2026 20:50
@igooch igooch marked this pull request as ready for review March 20, 2026 20:54
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 20, 2026
@k8s-ci-robot k8s-ci-robot requested a review from justinsb March 20, 2026 20:54
if len(readyCandidates) == 0 {
log.Info("No ready warm pool candidates, falling through to cold start",
"totalCandidates", len(candidates))
if len(candidates) == 0 {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just to confirm if the warm pool contains a mix of ready and non-ready sandboxes, the hashed startIndex might point to a non-ready sandbox, causing the controller to adopt it even if fully ready ones exist earlier in the list. Is that ok ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It depends on the number of worker threads vs. ready sandboxes. The controller selects a starting index from a window size equal to the worker count (MaxConcurrentReconciles). If the window is wider than the number of ready sandboxes, the controller may adopt a non-ready sandbox first, even if ready ones exist earlier in the sorted list.

@vicentefb vicentefb left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 20, 2026
@k8s-ci-robot

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: igooch, vicentefb

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit 1f109cf into kubernetes-sigs:main Mar 20, 2026
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

4 participants