Coder Social home page Coder Social logo

Sync can be "waiting for healthy state of /ConfigMap/<name>" for a while despite the config map can already be created about argo-cd HOT 12 OPEN

andrii-korotkov-verkada avatar andrii-korotkov-verkada commented on June 18, 2024 10
Sync can be "waiting for healthy state of /ConfigMap/" for a while despite the config map can already be created

from argo-cd.

Comments (12)

tooptoop4 avatar tooptoop4 commented on June 18, 2024

how many apps do u have? what k8s version?

from argo-cd.

andrii-korotkov-verkada avatar andrii-korotkov-verkada commented on June 18, 2024

There are about 200 apps, the architecture is one EKS cluster per AWS account. The EKS k8s version is 1.26.

from argo-cd.

andrii-korotkov-verkada avatar andrii-korotkov-verkada commented on June 18, 2024

One observation from logs with tasks list is that the config map in question has live object nil for a while and target object obj. I'm looking at this code that might populate live object, but it doesn't for some time https://github.com/argoproj/gitops-engine/blob/8a3ce6d85caa4220cfcaa8aa8b6d6dff476909ec/pkg/sync/sync_context.go#L707-L712.

from argo-cd.

andrii-korotkov-verkada avatar andrii-korotkov-verkada commented on June 18, 2024

Looking at K8s audit logs for one of such cases, there are a couple of events with 404 response, e.g. (filtered)

{
    "kind": "Event",
    "apiVersion": "audit.k8s.io/v1",
    "level": "Metadata",
    "stage": "ResponseComplete",
    "requestURI": "/api/v1/namespaces/default/configmaps/<name>",
    "verb": "get",
    "objectRef": {
        "resource": "configmaps",
        "namespace": "default",
        "name": "<name>",
        "apiVersion": "v1"
    },
    "responseStatus": {
        "metadata": {},
        "status": "Failure",
        "message": "configmaps \"<name>\" not found",
        "reason": "NotFound",
        "details": {
            "name": "<name>",
            "kind": "configmaps"
        },
        "code": 404
    },
}

Then there's a success with 201 code, which should mean created.

{
    "kind": "Event",
    "apiVersion": "audit.k8s.io/v1",
    "level": "Metadata",
    "stage": "ResponseComplete",
    "requestURI": "/api/v1/namespaces/default/configmaps?fieldManager=argocd-controller",
    "verb": "create",
    "objectRef": {
        "resource": "configmaps",
        "namespace": "default",
        "name": "<name>",
        "apiVersion": "v1"
    },
    "responseStatus": {
        "metadata": {},
        "code": 201
    },
}

Note that the later URL doesn't include the config map name.
Then, ~7 min later (can be longer), there's an entry with 200 code

{
    "kind": "Event",
    "apiVersion": "audit.k8s.io/v1",
    "level": "Metadata",
    "stage": "ResponseComplete",
    "requestURI": "/api/v1/namespaces/default/configmaps/<name>?dryRun=All&fieldManager=argocd-controller&force=true",
    "verb": "patch",
    "objectRef": {
        "resource": "configmaps",
        "namespace": "default",
        "name": "<name>",
        "apiVersion": "v1"
    },
    "responseStatus": {
        "metadata": {},
        "code": 200
    },
}

The later is followed by Argo log Updating resource result, status: 'Synced' -> 'Synced', phase 'Running' -> 'Succeeded', message 'configmap/<name> created' -> 'configmap/<name> created'.

from argo-cd.

andrii-korotkov-verkada avatar andrii-korotkov-verkada commented on June 18, 2024

Looks like it tries to get the config map (a couple of times), then very shortly after makes a call to create it, but doesn't get an update about it until there's some patch later.

from argo-cd.

andrii-korotkov-verkada avatar andrii-korotkov-verkada commented on June 18, 2024

I'm curious about this code https://github.com/argoproj/gitops-engine/blob/master/pkg/sync/sync_context.go#L1242-L1250. It seems like for config map creation we can also set phase to completed if it was running even in a wet run.

from argo-cd.

andrii-korotkov-verkada avatar andrii-korotkov-verkada commented on June 18, 2024

One more discovery. This code https://github.com/argoproj/gitops-engine/blob/8a3ce6d85caa4220cfcaa8aa8b6d6dff476909ec/pkg/sync/sync_context.go#L635-L654 creates tasks for sc.resources, same one used to enrich tasks with live objects. It provides tasks in "Tasks from managed resources" log and they don't contain a task for a new config map.

Apparently the new config map is not a managed resource yet, so the task won't be enriched with a live obj.
One difference in our manifests I found is that app.kubernetes.io/instance is not set in the repo manifests metadata labels (while app is set), but is eventually present on the resource in the cluster. Maybe by adding that to manifests I can solve this issue.

from argo-cd.

andrii-korotkov-verkada avatar andrii-korotkov-verkada commented on June 18, 2024

Nope, adding that annotation doesn't help. I have to figure out how to add a target config map to the managed app resources in some other way that works.

from argo-cd.

andrii-korotkov-verkada avatar andrii-korotkov-verkada commented on June 18, 2024

Hm, the instance annotation seems like a correct default way to specify belonging to an application https://argo-cd.readthedocs.io/en/stable/user-guide/resource_tracking/.

from argo-cd.

andrii-korotkov-verkada avatar andrii-korotkov-verkada commented on June 18, 2024

Seems like both app.kubernetes.io/instance and app.kubernetes.io/part-of put the corresponding task under tasks for managed resources, but they don't seem to be updated - at least logs show just (,,) for their state.

from argo-cd.

andrii-korotkov-verkada avatar andrii-korotkov-verkada commented on June 18, 2024

Apparently resource.Live is still nil in those cases for some time https://github.com/argoproj/gitops-engine/blob/8a3ce6d85caa4220cfcaa8aa8b6d6dff476909ec/pkg/sync/sync_context.go#L913

from argo-cd.

andrii-korotkov avatar andrii-korotkov commented on June 18, 2024

I think I’ve finally figured it out. The root cause seems to be adding app operation and refresh events to the queues at the same time, e.g.

ctrl.appRefreshQueue.AddRateLimited(key)
. Sync state operates on resources gathered from reconciliation, so if the app operation event is processed before the refresh one (when triggered on resource update/creation), the refresh doesn’t help sync to progress and it essentially needs to wait for another app refresh.

The fix seems to be to schedule app operation event after refresh event is finished processing. There’s one place where operation event is scheduled without refresh event (which can be kept there), and one place where refresh even is scheduled without the operation one during the app deletion handling

ctrl.appRefreshQueue.Add(key)
. It’s probably safe to schedule operation even after that, since it has some code to check that app was deleted, but if not l, the refresh queue can be modified to store tuples with 2nd element being bool indicating whether to schedule the operation event after processing the refresh event.

I’ll write a fix later this week.

from argo-cd.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.