Skip to content

feat: add keep-running-on-failure-implementation#8270

Merged
aaron-prindle merged 2 commits into
GoogleContainerTools:mainfrom
ericzzzzzzz:feat-keep-running-on-failure
Jan 20, 2023
Merged

feat: add keep-running-on-failure-implementation#8270
aaron-prindle merged 2 commits into
GoogleContainerTools:mainfrom
ericzzzzzzz:feat-keep-running-on-failure

Conversation

@ericzzzzzzz

@ericzzzzzzz ericzzzzzzz commented Dec 21, 2022

Copy link
Copy Markdown
Contributor

fixes: #4158

Description

  • added support to enable skaffold session keep running on failure in the first dev loop with a flag
  • put registering files logic after the first devLoop as
    • When first devloop fails, users could make some changes on their project, if we register files to monitor first, the listener will detect changes made for fixing up the first devLoop, so the callback(devloop) in the listener will get triggered again
  • added v2 event test helper functions so we can listen v2 events for our integration tests.
  • Refactor integration tests that use API trigger as now registering file to monitor happens after the first devLoop, we need to wait the registering finishes before updating any files, so file changes can be captured.
  • Split TestDevAPITrigger into TestBuildAPITrigger and TestDeployAPITrigger, the original one test was hanging after we upgraded skaffold to go1.19, I think this test was a flake even before we upgrade to go1.19, we might have seen this failing before, but I'm not too sure about it. The cause of hanging was that the first build api trigger triggers a rebuild in devloop -- this build can take very long, then the test sends a deploy trigger to the server, Deploy intent was set to true, then the rebuild finishes and reset all Dev Intents, so Deploy intent was reset to false even devLoop runs, the deploy is not happening. I think the root problem might be that we set intent immediately after the skaffold server receives the intent, we could set the intent in skaffold session when the listener gets intent from the intent channel, that might require large change, so not doing it in this pr.

Test Plan

  • use examples/getting-started project, make a change on main.go file to let compile fail.
  • run skaffold dev --keep-running-on-failure
    • expect to see WARN[0008] Failed to build artifacts, please fix the error and press any key to continue. subtask=-1 task=DevLoop from console
  • fix the the compile error then press any key to continue the dev session, dev should continue
@ericzzzzzzz ericzzzzzzz marked this pull request as draft December 21, 2022 01:04
@codecov

codecov Bot commented Dec 21, 2022

Copy link
Copy Markdown

Codecov Report

Merging #8270 (4a4c067) into main (290280e) will decrease coverage by 4.47%.
The diff coverage is 54.68%.

@@            Coverage Diff             @@
##             main    #8270      +/-   ##
==========================================
- Coverage   70.48%   66.01%   -4.48%     
==========================================
  Files         515      605      +90     
  Lines       23150    29694    +6544     
==========================================
+ Hits        16317    19602    +3285     
- Misses       5776     8620    +2844     
- Partials     1057     1472     +415     
Impacted Files Coverage Δ
cmd/skaffold/app/cmd/completion.go 13.04% <0.00%> (-1.25%) ⬇️
cmd/skaffold/app/cmd/config/list.go 65.21% <ø> (ø)
cmd/skaffold/app/cmd/config/set.go 88.72% <ø> (ø)
cmd/skaffold/app/cmd/config/util.go 54.28% <ø> (ø)
cmd/skaffold/app/cmd/credits.go 100.00% <ø> (ø)
cmd/skaffold/app/cmd/credits/export.go 0.00% <0.00%> (ø)
cmd/skaffold/app/cmd/deploy.go 40.90% <0.00%> (-12.94%) ⬇️
cmd/skaffold/app/cmd/generate_pipeline.go 60.00% <ø> (ø)
cmd/skaffold/app/cmd/inspect_modules.go 65.00% <ø> (ø)
cmd/skaffold/app/cmd/inspect_profiles.go 66.66% <ø> (ø)
... and 416 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@ericzzzzzzz ericzzzzzzz force-pushed the feat-keep-running-on-failure branch 3 times, most recently from b6059dc to 8b3d916 Compare December 22, 2022 15:15
@ericzzzzzzz ericzzzzzzz force-pushed the feat-keep-running-on-failure branch 2 times, most recently from 225c76e to 18f9fd3 Compare January 4, 2023 14:34
@ericzzzzzzz ericzzzzzzz force-pushed the feat-keep-running-on-failure branch from 18f9fd3 to bc98610 Compare January 4, 2023 14:37
@ericzzzzzzz ericzzzzzzz marked this pull request as ready for review January 4, 2023 15:53
@ericzzzzzzz ericzzzzzzz changed the title feat: add keep-running-on-failure-implementation(wip) Jan 4, 2023
@aaron-prindle

aaron-prindle commented Jan 12, 2023

Copy link
Copy Markdown
Contributor

Just tried it out, seems to be working well for most cases! I noticed though that the error is not necessarily always the same as what would be output w/o the --keep-running-on-failure flag. For example if i have a remote kubecontext:

aprindle@aprindle-ssd ~/skaffold/examples/getting-started  [feat-keep-running-on-failure]$ kubectl config current-context
gke_aprindle-test-cluster_us-west1-a_cluster-3

And I attempt skaffold dev (w/ no --default-repo flag), I get this error:

...
Build Failed. No push access to specified image repository. Try running with `--default-repo` flag. Otherwise start a local kubernetes cluster like `minikube`.

If I run this with skaffold dev --keep-running-on-failure I get:

The push refers to repository [docker.io/library/skaffold-example]
9d6349f27163: Preparing
994393dc58e7: Preparing
WARN[0004] Failed to build artifacts, please fix the error and press any key to continue.  subtask=-1 task=DevLoop

Is there a way that we can preserve these actionable errors that we surface, it seems currently using this flag can sometimes mask more actionable error messages that we would normally surface

@aaron-prindle

aaron-prindle commented Jan 12, 2023

Copy link
Copy Markdown
Contributor

Also we should create a new feature-request github issue around having skaffold surface the compiler error (docker error logs) if that can be grabbed from the docker build logs when using --keep-running-on-failure (vs the generic error message Failed to build artifacts, please fix the error and press any key to continue.)

Comment thread pkg/skaffold/runner/dev.go Outdated
var err error
bRes, err := r.Build(ctx, out, artifacts)
for ; err != nil && r.runCtx.Opts.KeepRunningOnFailure; bRes, err = r.Build(ctx, out, artifacts) {
log.Entry(ctx).Warnf("Failed to build artifacts, please fix the error and press any key to continue.")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you need to print the err as well

Comment thread pkg/skaffold/runner/dev.go Outdated
if r.runCtx.IsTestPhaseActive() {
err = r.Test(ctx, out, bRes)
for ; err != nil && r.runCtx.Opts.KeepRunningOnFailure; err = r.Test(ctx, out, bRes) {
log.Entry(ctx).Warnf("Failed to run tests, please fix the error and press any key to continue.")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here, the err message should be outputted

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahhh.. you're right! I'll add err message.. Thanks!!

@gsquared94

Copy link
Copy Markdown
Contributor

I have a case where log tailing stops.
Repro:

  1. In the examples/getting-started project make a typo like renaming containers to containers2 in the skaffold.yaml file.
  2. Run skaffold dev --keep-alive-on-error
  3. Should see the message:
    WARN[0002] Failed to deploy, please fix the error and press any key to continue.  subtask=-1 task=DevLoop
    
  4. Fix the typo in skaffold.yaml file. The deployment should succeed.
    Deployments stabilized in 33.811792ms
    Listing files to watch...
     - skaffold-example
    Press Ctrl+C to exit
    Watching for changes...
    
  5. But the log streaming hasn't started.
  6. Make some other change to retrigger a dev loop. Now after deploy the log streaming starts.
    Deployments stabilized in 3.191 seconds
    Watching for changes...
    [getting-started3] Hello world!
    [getting-started3] Hello world!
    
@ericzzzzzzz

Copy link
Copy Markdown
Contributor Author

I have a case where log tailing stops. Repro:

  1. In the examples/getting-started project make a typo like renaming containers to containers2 in the skaffold.yaml file.
  2. Run skaffold dev --keep-alive-on-error
  3. Should see the message:
    WARN[0002] Failed to deploy, please fix the error and press any key to continue.  subtask=-1 task=DevLoop
    
  4. Fix the typo in skaffold.yaml file. The deployment should succeed.
    Deployments stabilized in 33.811792ms
    Listing files to watch...
     - skaffold-example
    Press Ctrl+C to exit
    Watching for changes...
    
  5. But the log streaming hasn't started.
  6. Make some other change to retrigger a dev loop. Now after deploy the log streaming starts.
    Deployments stabilized in 3.191 seconds
    Watching for changes...
    [getting-started3] Hello world!
    [getting-started3] Hello world!
    

@gsquared94 great catch! Thank you!
This is actually not log stop tailing..

  • The cause was after fixing the manifests, Deploy is was called without detecting those changes.. so the actually didn't happen(we have a logic to compare previous deployed manifests vs current manifests we want to deploy, the deployed manifests are tracked even deploy failed), even deploy happened, it should fail as manifests changes were not updated in the session, we should probably
    • check if error is bad request from k8s server , if so do the re-render or..
    • re-render anyway???? doesn't seem too bad..

@aaron-prindle aaron-prindle left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

3 participants