Skip to content

ci: optimize staging image builds and increase promotion timeout#1021

Merged
janetkuo merged 3 commits into
kubernetes-sigs:mainfrom
janetkuo:ci/optimize-staging-builds
Jun 23, 2026
Merged

ci: optimize staging image builds and increase promotion timeout#1021
janetkuo merged 3 commits into
kubernetes-sigs:mainfrom
janetkuo:ci/optimize-staging-builds

Conversation

@janetkuo

@janetkuo janetkuo commented Jun 23, 2026

Copy link
Copy Markdown
Member

What this PR does / why we need it:

Optimizes staging image builds and increases the release promotion polling timeout to prevent CI pipeline failures during scheduled releases. Specifically:

  1. Increase polling timeout: Bumps the per-image staging registry polling timeout in dev/tools/tag-promote-images from 45 to 90 minutes to accommodate slower Prow postsubmit builds.
  2. Increase compute resources: Bumps the Google Cloud Build machine type in cloudbuild.yaml from E2_HIGHCPU_8 to E2_HIGHCPU_32 (32 vCPUs, 32 GB RAM) to accelerate C/Python dependency compilation.
  3. Enable registry caching: Configures docker buildx in dev/tools/push-images to push and pull layer cache manifests (type=registry) to/from the K8s staging Artifact Registry (:buildcache), allowing subsequent Prow builds to instantly reuse pre-compiled Python wheels and base layers.
  4. Fix workflows issue: add pyyaml dependency for crd sorting during release publish

Which issue(s) this PR is related to:

NONE

Release Note

NONE

Summary by CodeRabbit

  • Chores
    • Increased Cloud Build machine capacity to improve compilation performance.
    • Enabled Docker Buildx registry caching to speed up image builds.
    • Extended image promotion registry polling timeout to 90 minutes to improve reliability.
  • Documentation
    • Added an extra Python dependency during draft release generation.
Copilot AI review requested due to automatic review settings June 23, 2026 20:57
@netlify

netlify Bot commented Jun 23, 2026

Copy link
Copy Markdown

Deploy Preview for agent-sandbox failed. Why did it fail? →

Name Link
🔨 Latest commit d2a8ee3
🔍 Latest deploy log https://app.netlify.com/projects/agent-sandbox/deploys/6a3afad393eb5d000824cb97
@kubernetes-prow kubernetes-prow Bot requested review from SHRUTI6991 and barney-s June 23, 2026 20:57
@kubernetes-prow kubernetes-prow Bot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Jun 23, 2026
@coderabbitai

coderabbitai Bot commented Jun 23, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

Cloud Build VM machineType is upgraded from E2_HIGHCPU_8 to E2_HIGHCPU_32. The push-images script gains Docker Buildx --cache-from/--cache-to registry cache flags when using registry output. The tag-promote-images staging-registry polling loop is doubled from 45 to 90 iterations (90-minute max wait), with the timeout status message updated accordingly. The release workflow's manifest generation step adds pyyaml==6.0.2 as a Python dependency.

Changes

CI Build Pipeline Improvements

Layer / File(s) Summary
Build machine size and Buildx registry cache
cloudbuild.yaml, dev/tools/push-images
machineType changed to E2_HIGHCPU_32; push-images computes a buildcache image reference and appends --cache-from and --cache-to type=registry,mode=max to the buildx command when output type is registry.
Staging registry polling timeout doubled
dev/tools/tag-promote-images
get_latest_digest loop count changed from 45 to 90, doubling the maximum wait from 45 to 90 minutes; adjacent timeout comment updated to match.
Release workflow Python dependency
.github/workflows/release.yml
Added pyyaml==6.0.2 to the pip install command in the publish-draft job's manifest generation step.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

Poem

🐇 Hop, hop! The machine grows strong and wide,
With 32 cores to power the ride.
Cache layers stack up, registry-bound,
While polling waits longer — patience profound.
And yaml comes too, for releases to glide! 🏗️

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the main changes: optimizing staging image builds and increasing the promotion timeout, which aligns with all three key modifications in the changeset.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description check ✅ Passed The pull request description follows the template structure with clear sections explaining what the PR does, why it's needed, related issues, and release notes.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@kubernetes-prow kubernetes-prow Bot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jun 23, 2026

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR tunes the release/CI image pipeline to be more resilient and faster in staging by increasing promotion polling timeouts, allocating more Cloud Build CPU, and enabling docker buildx registry-backed layer caching for repeat builds.

Changes:

  • Increase staging registry polling window in tag-promote-images from 45 to 90 minutes per image.
  • Configure dev/tools/push-images to publish and consume a registry cache manifest (:buildcache) when pushing images via buildx.
  • Increase Cloud Build machine type to E2_HIGHCPU_32 to speed dependency compilation-heavy image builds.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
dev/tools/tag-promote-images Extends the staging registry polling loop to wait longer for postsubmit-built images.
dev/tools/push-images Adds buildx --cache-from/--cache-to registry caching to accelerate subsequent image builds.
cloudbuild.yaml Upsizes Cloud Build machine type to reduce overall image build time.
Comment thread dev/tools/tag-promote-images

@aditya-shantanu aditya-shantanu left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@kubernetes-prow kubernetes-prow Bot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 23, 2026
@kubernetes-prow

Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: aditya-shantanu, janetkuo

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@janetkuo

Copy link
Copy Markdown
Member Author

Suspect the Netlify build failures coming from #660 (review). This PR didn't make site changes.

Copilot AI review requested due to automatic review settings June 23, 2026 21:29
@kubernetes-prow kubernetes-prow Bot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 23, 2026
@janetkuo

Copy link
Copy Markdown
Member Author

/label tide/merge-method-rebase

@kubernetes-prow kubernetes-prow Bot added the tide/merge-method-rebase Denotes a PR that should be rebased by tide when it merges. label Jun 23, 2026
@shrutiyam-glitch

Copy link
Copy Markdown
Contributor

/lgtm

@kubernetes-prow kubernetes-prow Bot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 23, 2026

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

Comment thread .github/workflows/release.yml
@janetkuo janetkuo merged commit bed7e5c into kubernetes-sigs:main Jun 23, 2026
7 of 12 checks passed
@github-project-automation github-project-automation Bot moved this from Backlog to Done in Agent Sandbox Jun 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. tide/merge-method-rebase Denotes a PR that should be rebased by tide when it merges.

4 participants