Skip to content

fix: retry on errors when watching pods#9373

Merged
plumpy merged 7 commits into
GoogleContainerTools:mainfrom
mikedld:bugfix/gh8658-pod-wait-misses-events-after-timeout
Jan 14, 2025
Merged

fix: retry on errors when watching pods#9373
plumpy merged 7 commits into
GoogleContainerTools:mainfrom
mikedld:bugfix/gh8658-pod-wait-misses-events-after-timeout

Conversation

@mikedld

@mikedld mikedld commented Apr 1, 2024

Copy link
Copy Markdown
Contributor

Fixes: #8658

Description
If timeout (or some network error) occurs while waiting for a pod initialization or termination event, e.g. when build takes a long time, skaffold becomes stuck and never finishes the operation. Use retry watcher to handle the errors gracefully.

This PR is based on the patch I posted in #8658 last year; never got any feedback on it there so decided to go ahead. I'm using this patch since then and it works fine on my end. To reiterate,

Also note that the same issue affects WaitForDeploymentToStabilize (and probably some other places where Watch is used) but I can't test it so I didn't patch it.

I only managed to fix exising unit test, not add any new test(s), as I'm not at all comfortable with Go. If that's an issue, I'm okay with someone else picking this up.

@google-cla

google-cla Bot commented Apr 1, 2024

Copy link
Copy Markdown

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

mikedld added 3 commits April 2, 2024 00:11
If timeout (or some network error?) occurs while waiting for a pod
initialization or termination event, e.g. when build takes a long time,
skaffold becomes stuck and never finishes the operation. Use retry
watcher to handle the errors gracefully.
@mikedld mikedld force-pushed the bugfix/gh8658-pod-wait-misses-events-after-timeout branch from 6f3b074 to e1ec0c5 Compare April 1, 2024 23:11
@mikedld mikedld changed the title Retry on errors when watching pods Apr 1, 2024
@certifiedloud

Copy link
Copy Markdown

How can we encourage this fix to be merged? This issue is causing significant issues for skaffold users who want to utilize kaniko.

@alphanota alphanota self-assigned this Dec 17, 2024
@alphanota

Copy link
Copy Markdown
Contributor

@mikedld Thank you for this PR. Would you mind fixing the conflicting files and that the PR is synced to skaffold main?

@mikedld mikedld requested a review from a team as a code owner December 17, 2024 21:58
@mikedld mikedld requested a review from plumpy December 17, 2024 21:58
alphanota
alphanota previously approved these changes Jan 7, 2025
@alphanota alphanota enabled auto-merge (squash) January 14, 2025 20:29
@alphanota alphanota requested review from alphanota and removed request for plumpy January 14, 2025 20:55
@alphanota alphanota disabled auto-merge January 14, 2025 20:56
@alphanota alphanota requested a review from plumpy January 14, 2025 20:56
@plumpy plumpy enabled auto-merge (squash) January 14, 2025 22:05
@plumpy plumpy merged commit cd7c1fb into GoogleContainerTools:main Jan 14, 2025
@menahyouyeah menahyouyeah mentioned this pull request Mar 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

4 participants