Comments (9)
I have noticed that one of our end-to-end tests is flaky (
install-test-fail
), and sometimes acts in such a way that the failures counter suddenly resets and the wrong remediation strategy is taken.
It looks to me like the status update which increments the failure count fails, and so the retry is not aware of any failed installs, but does find an existing release, so it tries to upgrade it.
From a first observation of the logs this problem does not seem to just exist for the end-to-end tests, but may also occur in real environments. I have a suspicion that it may have to do with the fact that we do not retry failed status updates, and seem to be running into
the object has been modified; please apply your changes to the latest version and try again
errors quote often, especially for theinstall-test-fail
scenario.This may be an indication that:
1. Multiple processes are touching the same resource 2. The (during reconciliation) modified object is not passed on correctly everywhere, and we have a split brain problem (aggravated by the fact that we do not perform retries on failed updates)
The kustomize-controller e2e tests show the same status update error messages, I wonder if the issue is the reconciliation requests when the undlerying source is updated etc:
https://github.com/fluxcd/helm-controller/blob/main/controllers/helmchart_watcher.go#L73
https://github.com/fluxcd/kustomize-controller/blob/main/controllers/gitrepository_watcher.go#L80
- Rather than setting the annotation can we just directly queue a reconciliation from Go?
- Do we want to retry status updates regardless of this issue?
I'll start looking into 1.
from helm-controller.
The kustomize-controller e2e tests show the same status update error messages, I wonder if the issue is the reconciliation requests when the undlerying source is updated etc:
https://github.com/fluxcd/helm-controller/blob/main/controllers/helmchart_watcher.go#L73
I guess not, still seeing the status update errors when disabling the helm chart watcher: #126
from helm-controller.
Rather than setting the annotation can we just directly queue a reconciliation from Go?
I read a lot of the kubernetes-sigs/cluster-api
code this week, and I think something like this is possible using https://godoc.org/sigs.k8s.io/controller-runtime/pkg/handler#EnqueueRequestsFromMapFunc.
Also a quick heads up that I am working on a bigger overhaul of the controller code, due to lessons learned from the above. Focused around splitting the reconcileRelease
logic into smaller bits, more resilient state observations, and making it testable based on the advises in this document and some of this.
I guess not, still seeing the status update errors when disabling the helm chart watcher
We are probably losing state somewhere due to not working with pointers everywhere, this seems to be one of the reasons cluster-api is making extensive use of pointers.
from helm-controller.
Looking at the recent commit history, it seems to have worsened when #115 was introduced. I can't directly see why however.
from helm-controller.
Based on the work in #127, the quick fix for this particular issue (without waiting for my refactor) is likey to use Client#Status#Update
as much as we can to push status updates, instead of fully updating the object using Client#Update
.
from helm-controller.
kustomize-controller uses Client#Status#Update
everywhere and the conflict error is still there
from helm-controller.
@stefanprodan I think we should then try to use https://godoc.org/sigs.k8s.io/controller-runtime/pkg/handler#EnqueueRequestsFromMapFunc instead of annotating the object there, to see if this improves the situation.
from helm-controller.
I think something like this is possible using https://godoc.org/sigs.k8s.io/controller-runtime/pkg/handler#EnqueueRequestsFromMapFunc.
This is what I use in image-reflector and image-automation, rather than annotating objects.
from helm-controller.
This seems to have been solved by the changes in #131, #132 and #133.
from helm-controller.
Related Issues (20)
- HelmRelease does redundant validation on chart name HOT 1
- Missing some crucial events HOT 2
- HelmRelease verify provider gpg HOT 1
- Drift mode should detect extra properties HOT 1
- Chart version only includes git SHA at root chart HOT 2
- Only deploy prerelease versions HOT 1
- Feature Request: Replace reconciliation interval with cron schedule in HelmRelease CRD HOT 1
- [BUG] Drift Detection can not be disabled for specific resources using annotations or labels
- [BUG] memory usage grows exponentially when there are lots of CRDs HOT 2
- [BUG] Helm drift detection on configmap patching '*** (after)' instead of the actual template from the helm chart HOT 13
- Backward compatibility of helm-controller HOT 6
- FEATURE: First-class support for secret decryption HOT 1
- Unable to detect server capabilities HOT 16
- HelmRelease: CRDs of disabled subcharts get installed anyway HOT 8
- Failed to reconcile HelmRelease field immutable HOT 1
- DependsOn readiness check should only rely on Ready condition HOT 10
- (site) DependsOn does not document cross-namespace dependencies HOT 2
- Changes in postRenderers are ingored HOT 6
- v0.37.4 has CVE-2024-26147 high vulnerability HOT 1
- Flux Helm Not Removing HPA objects on upgrade HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from helm-controller.