Coder Social home page Coder Social logo

Comments (9)

seaneagan avatar seaneagan commented on June 11, 2024

I have noticed that one of our end-to-end tests is flaky (install-test-fail), and sometimes acts in such a way that the failures counter suddenly resets and the wrong remediation strategy is taken.

It looks to me like the status update which increments the failure count fails, and so the retry is not aware of any failed installs, but does find an existing release, so it tries to upgrade it.

From a first observation of the logs this problem does not seem to just exist for the end-to-end tests, but may also occur in real environments. I have a suspicion that it may have to do with the fact that we do not retry failed status updates, and seem to be running into the object has been modified; please apply your changes to the latest version and try again errors quote often, especially for the install-test-fail scenario.

This may be an indication that:

1. Multiple processes are touching the same resource

2. The (during reconciliation) modified object is not passed on correctly everywhere, and we have a split brain problem (aggravated by the fact that we do not perform retries on failed updates)

The kustomize-controller e2e tests show the same status update error messages, I wonder if the issue is the reconciliation requests when the undlerying source is updated etc:

https://github.com/fluxcd/helm-controller/blob/main/controllers/helmchart_watcher.go#L73
https://github.com/fluxcd/kustomize-controller/blob/main/controllers/gitrepository_watcher.go#L80

  1. Rather than setting the annotation can we just directly queue a reconciliation from Go?
  2. Do we want to retry status updates regardless of this issue?

I'll start looking into 1.

from helm-controller.

seaneagan avatar seaneagan commented on June 11, 2024

The kustomize-controller e2e tests show the same status update error messages, I wonder if the issue is the reconciliation requests when the undlerying source is updated etc:

https://github.com/fluxcd/helm-controller/blob/main/controllers/helmchart_watcher.go#L73

I guess not, still seeing the status update errors when disabling the helm chart watcher: #126

from helm-controller.

hiddeco avatar hiddeco commented on June 11, 2024

Rather than setting the annotation can we just directly queue a reconciliation from Go?

I read a lot of the kubernetes-sigs/cluster-api code this week, and I think something like this is possible using https://godoc.org/sigs.k8s.io/controller-runtime/pkg/handler#EnqueueRequestsFromMapFunc.

Also a quick heads up that I am working on a bigger overhaul of the controller code, due to lessons learned from the above. Focused around splitting the reconcileRelease logic into smaller bits, more resilient state observations, and making it testable based on the advises in this document and some of this.

I guess not, still seeing the status update errors when disabling the helm chart watcher

We are probably losing state somewhere due to not working with pointers everywhere, this seems to be one of the reasons cluster-api is making extensive use of pointers.

from helm-controller.

hiddeco avatar hiddeco commented on June 11, 2024

Looking at the recent commit history, it seems to have worsened when #115 was introduced. I can't directly see why however.

from helm-controller.

hiddeco avatar hiddeco commented on June 11, 2024

Based on the work in #127, the quick fix for this particular issue (without waiting for my refactor) is likey to use Client#Status#Update as much as we can to push status updates, instead of fully updating the object using Client#Update.

from helm-controller.

stefanprodan avatar stefanprodan commented on June 11, 2024

kustomize-controller uses Client#Status#Update everywhere and the conflict error is still there

from helm-controller.

hiddeco avatar hiddeco commented on June 11, 2024

@stefanprodan I think we should then try to use https://godoc.org/sigs.k8s.io/controller-runtime/pkg/handler#EnqueueRequestsFromMapFunc instead of annotating the object there, to see if this improves the situation.

from helm-controller.

squaremo avatar squaremo commented on June 11, 2024

I think something like this is possible using https://godoc.org/sigs.k8s.io/controller-runtime/pkg/handler#EnqueueRequestsFromMapFunc.

This is what I use in image-reflector and image-automation, rather than annotating objects.

from helm-controller.

hiddeco avatar hiddeco commented on June 11, 2024

This seems to have been solved by the changes in #131, #132 and #133.

from helm-controller.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.