Coder Social home page Coder Social logo

Comments (6)

k8s-ci-robot avatar k8s-ci-robot commented on July 29, 2024

This issue is currently awaiting triage.

If CAPI Operator contributors determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

from cluster-api-operator.

guettli avatar guettli commented on July 29, 2024

We see something which could be related to that. The CRD controlplaneproviders got deleted and is now in that state for hours.

We have no clue why this happened.

@dtzar were you able to solve that?

from cluster-api-operator.

guettli avatar guettli commented on July 29, 2024

In our case argoCD was OutOfSync because the ca-bundle was injected.

from cluster-api-operator.

dtzar avatar dtzar commented on July 29, 2024

We install certmanager separately and have a sleep hook which checks to make sure it is available before the install. Then we've tried various ways (raw manifest YAML and helm-chart) to do the install to no avail. The thing which is most painful is the same exact configuration one time will work and another time it will fail. There is some type of race-condition happening which puts things into this state. Lost many painful hours to this problem at this point and still not sure of what the cause is. I haven't been able to reproduce it without ArgoCD, but ArgoCD should NOT do anything as far as I can see with automatically deleting anything. It should only keep trying to apply the YAML there until it matches what it sees in git.

from cluster-api-operator.

dtzar avatar dtzar commented on July 29, 2024

It seems that for some reason the namespace are getting deleted, but I can't understand from where.

From ArgoCD UI:

apiVersion: v1
kind: Namespace
metadata:
  annotations:
    argocd.argoproj.io/sync-options: Prune=false
    argocd.argoproj.io/tracking-id: 'addon-gitops-aks-capi-operator:/Namespace:capi-operator-system/capi-system'
    helm.sh/hook: post-install
    helm.sh/hook-weight: '1'
  creationTimestamp: '2024-07-08T20:27:35Z'
  deletionTimestamp: '2024-07-08T20:28:18Z'
  labels:
    kubernetes.io/metadata.name: capi-system
  name: capi-system
  resourceVersion: '51476'
  uid: c7b7379d-2368-4ea3-9fe0-fa7d9e42bbac
spec:
  finalizers:
    - kubernetes
status:
  conditions:
    - lastTransitionTime: '2024-07-08T20:28:27Z'
      message: All resources successfully discovered
      reason: ResourcesDiscovered
      status: 'False'
      type: NamespaceDeletionDiscoveryFailure
    - lastTransitionTime: '2024-07-08T20:28:27Z'
      message: All legacy kube types successfully parsed
      reason: ParsedGroupVersions
      status: 'False'
      type: NamespaceDeletionGroupVersionParsingFailure
    - lastTransitionTime: '2024-07-08T20:28:32Z'
      message: >-
        Failed to delete all resource types, 2 remaining: Internal error
        occurred: error resolving resource, Internal error occurred: error
        resolving resource
      reason: ContentDeletionFailed
      status: 'True'
      type: NamespaceDeletionContentFailure
    - lastTransitionTime: '2024-07-08T20:28:32Z'
      message: All content successfully removed
      reason: ContentRemoved
      status: 'False'
      type: NamespaceContentRemaining
    - lastTransitionTime: '2024-07-08T20:28:32Z'
      message: All content-preserving finalizers finished
      reason: ContentHasNoFinalizers
      status: 'False'
      type: NamespaceFinalizersRemaining
  phase: Terminating

From the cluster itself:

NAME                                STATUS        AGE
argo-events                         Active        144m
argo-rollouts                       Active        144m
argo-workflows                      Active        144m
argocd                              Active        145m
azure-infrastructure-system         Terminating   12m
capi-kubeadm-bootstrap-system       Terminating   12m
capi-kubeadm-control-plane-system   Terminating   12m
capi-operator-system                Active        143m
capi-system                         Terminating   12m
cert-manager                        Active        144m
crossplane-system                   Active        145m
default                             Active        150m
helm-addon-system                   Terminating   12m
kube-node-lease                     Active        150m
kube-public                         Active        150m
kube-system                         Active        150m
workload                            Active        144m

From the capi-operator-system log:

E0708 20:28:25.368707       1 controller.go:329] "Reconciler error" err="failed to create config map for provider \"kubeadm\": configmaps \"controlplane-kubeadm-v1.7.3\" is forbidden: unable to create new content in namespace capi-kubeadm-control-plane-system because it is being terminated" controller="controlplaneprovider" controllerGroup="operator.cluster.x-k8s.io" controllerKind="ControlPlaneProvider" ControlPlaneProvider="capi-kubeadm-control-plane-system/kubeadm" namespace="capi-kubeadm-control-plane-system" name="kubeadm" reconcileID="55fb76de-9194-434b-bd3c-03c8c07445ed"
I0708 20:28:25.368868       1 genericprovider_controller.go:62] "Reconciling provider" controller="controlplaneprovider" controllerGroup="operator.cluster.x-k8s.io" controllerKind="ControlPlaneProvider" ControlPlaneProvider="capi-kubeadm-control-plane-system/kubeadm" namespace="capi-kubeadm-control-plane-system" name="kubeadm" reconcileID="e5bc03d9-3c4c-4619-9b2f-7cef46e6e3cf"
I0708 20:28:25.369043       1 preflight_checks.go:58] "Performing preflight checks" controller="controlplaneprovider" controllerGroup="operator.cluster.x-k8s.io" controllerKind="ControlPlaneProvider" ControlPlaneProvider="capi-kubeadm-control-plane-system/kubeadm" namespace="capi-kubeadm-control-plane-system" name="kubeadm" reconcileID="e5bc03d9-3c4c-4619-9b2f-7cef46e6e3cf"
I0708 20:28:25.369288       1 preflight_checks.go:199] "Preflight checks passed" controller="controlplaneprovider" controllerGroup="operator.cluster.x-k8s.io" controllerKind="ControlPlaneProvider" ControlPlaneProvider="capi-kubeadm-control-plane-system/kubeadm" namespace="capi-kubeadm-control-plane-system" name="kubeadm" reconcileID="e5bc03d9-3c4c-4619-9b2f-7cef46e6e3cf"
I0708 20:28:25.369509       1 phases.go:240] "No configuration secret was specified" controller="controlplaneprovider" controllerGroup="operator.cluster.x-k8s.io" controllerKind="ControlPlaneProvider" ControlPlaneProvider="capi-kubeadm-control-plane-system/kubeadm" namespace="capi-kubeadm-control-plane-system" name="kubeadm" reconcileID="e5bc03d9-3c4c-4619-9b2f-7cef46e6e3cf"
I0708 20:28:26.075335       1 genericprovider_controller.go:62] "Reconciling provider" controller="coreprovider" controllerGroup="operator.cluster.x-k8s.io" controllerKind="CoreProvider" CoreProvider="capi-system/cluster-api" namespace="capi-system" name="cluster-api" reconcileID="e0c1d4ba-c81a-4a1d-ba87-d6eff3892d91"
I0708 20:28:26.164419       1 genericprovider_controller.go:190] "Deleting provider resources" controller="coreprovider" controllerGroup="operator.cluster.x-k8s.io" controllerKind="CoreProvider" CoreProvider="capi-system/cluster-api" namespace="capi-system" name="cluster-api" reconcileID="e0c1d4ba-c81a-4a1d-ba87-d6eff3892d91"
I0708 20:28:26.164461       1 phases.go:547] "Deleting provider" controller="coreprovider" controllerGroup="operator.cluster.x-k8s.io" controllerKind="CoreProvider" CoreProvider="capi-system/cluster-api" namespace="capi-system" name="cluster-api" reconcileID="e0c1d4ba-c81a-4a1d-ba87-d6eff3892d91"
I0708 20:28:26.770877       1 manifests_downloader.go:80] "Downloading provider manifests" controller="controlplaneprovider" controllerGroup="operator.cluster.x-k8s.io" controllerKind="ControlPlaneProvider" ControlPlaneProvider="capi-kubeadm-control-plane-system/kubeadm" namespace="capi-kubeadm-control-plane-system" name="kubeadm" reconcileID="e5bc03d9-3c4c-4619-9b2f-7cef46e6e3cf"
I0708 20:28:26.966645       1 healthcheck_controller.go:122] "Checking provider health" controller="deployment" controllerGroup="apps" controllerKind="Deployment" Deployment="capi-system/capi-controller-manager" namespace="capi-system" name="capi-controller-manager" reconcileID="e6a6c932-b3a5-411a-95dc-c62ba148a5ce"
E0708 20:28:26.975151       1 controller.go:329] "Reconciler error" err="failed to create config map for provider \"kubeadm\": configmaps \"controlplane-kubeadm-v1.7.3\" is forbidden: unable to create new content in namespace capi-kubeadm-control-plane-system because it is being terminated" controller="controlplaneprovider" controllerGroup="operator.cluster.x-k8s.io" controllerKind="ControlPlaneProvider" ControlPlaneProvider="capi-kubeadm-control-plane-system/kubeadm" namespace="capi-kubeadm-control-plane-system" name="kubeadm" reconcileID="e5bc03d9-3c4c-4619-9b2f-7cef46e6e3cf"
I0708 20:28:26.975293       1 genericprovider_controller.go:62] "Reconciling provider" controller="controlplaneprovider" controllerGroup="operator.cluster.x-k8s.io" controllerKind="ControlPlaneProvider" ControlPlaneProvider="capi-kubeadm-control-plane-system/kubeadm" namespace="capi-kubeadm-control-plane-system" name="kubeadm" reconcileID="6d512c3a-7fe7-475a-8db7-436f5d876508"
I0708 20:28:26.975740       1 genericprovider_controller.go:190] "Deleting provider resources" controller="controlplaneprovider" controllerGroup="operator.cluster.x-k8s.io" controllerKind="ControlPlaneProvider" ControlPlaneProvider="capi-kubeadm-control-plane-system/kubeadm" namespace="capi-kubeadm-control-plane-system" name="kubeadm" reconcileID="6d512c3a-7fe7-475a-8db7-436f5d876508"
I0708 20:28:26.975814       1 phases.go:547] "Deleting provider" controller="controlplaneprovider" controllerGroup="operator.cluster.x-k8s.io" controllerKind="ControlPlaneProvider" ControlPlaneProvider="capi-kubeadm-control-plane-system/kubeadm" namespace="capi-kubeadm-control-plane-system" name="kubeadm" reconcileID="6d512c3a-7fe7-475a-8db7-436f5d876508"
E0708 20:28:27.408965       1 controller.go:329] "Reconciler error" err="failed to create config map for provider \"azure\": configmaps \"infrastructure-azure-v1.15.2\" is forbidden: unable to create new content in namespace azure-infrastructure-system because it is being terminated" controller="infrastructureprovider" controllerGroup="operator.cluster.x-k8s.io" controllerKind="InfrastructureProvider" InfrastructureProvider="azure-infrastructure-system/azure" namespace="azure-infrastructure-system" name="azure" reconcileID="3eed34cb-ae60-4e31-a2d2-62b3e9aeab40"
I0708 20:28:27.409030       1 genericprovider_controller.go:62] "Reconciling provider" controller="infrastructureprovider" controllerGroup="operator.cluster.x-k8s.io" controllerKind="InfrastructureProvider" InfrastructureProvider="azure-infrastructure-system/azure" namespace="azure-infrastructure-system" name="azure" reconcileID="950f078b-c43d-4d94-b49e-662eb491a1e3"
I0708 20:28:27.409174       1 genericprovider_controller.go:190] "Deleting provider resources" controller="infrastructureprovider" controllerGroup="operator.cluster.x-k8s.io" controllerKind="InfrastructureProvider" InfrastructureProvider="azure-infrastructure-system/azure" namespace="azure-infrastructure-system" name="azure" reconcileID="950f078b-c43d-4d94-b49e-662eb491a1e3"
I0708 20:28:27.409192       1 phases.go:547] "Deleting provider" controller="infrastructureprovider" controllerGroup="operator.cluster.x-k8s.io" controllerKind="InfrastructureProvider" InfrastructureProvider="azure-infrastructure-system/azure" namespace="azure-infrastructure-system" name="azure" reconcileID="950f078b-c43d-4d94-b49e-662eb491a1e3"

from cluster-api-operator.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.