Coder Social home page Coder Social logo

openshift / cluster-kube-controller-manager-operator Goto Github PK

View Code? Open in Web Editor NEW
43.0 15.0 122.0 83.32 MB

The kube-controller-manager operator installs and maintains the kube-controller-manager on a cluster

License: Apache License 2.0

Makefile 0.63% Go 99.37%

cluster-kube-controller-manager-operator's Introduction

Kubernetes Controller Manager operator

The Kubernetes Controller Manager operator manages and updates the Kubernetes Controller Manager deployed on top of OpenShift. The operator is based on OpenShift library-go framework and it is installed via Cluster Version Operator (CVO).

It contains the following components:

  • Operator
  • Bootstrap manifest renderer
  • Installer based on static pods
  • Configuration observer

By default, the operator exposes Prometheus metrics via metrics service. The metrics are collected from following components:

  • Kubernetes Controller Manager operator

Configuration

The configuration for the Kubernetes Controller Manager is coming from:

Debugging

Operator also expose events that can help debugging issues. To get operator events, run following command:

$ oc get events -n  openshift-kube-controller-manager-operator

This operator is configured via KubeControllerManager custom resource:

$ oc describe kubecontrollermanager
apiVersion: operator.openshift.io/v1
kind: KubeControllerManager
metadata:
  name: cluster
spec:
  managementState: Managed
  ...
$ oc explain kubecontrollermanager

to learn more about the resource itself.

The current operator status is reported using the ClusterOperator resource. To get the current status you can run follow command:

$ oc get clusteroperator/kube-controller-manager

Developing and debugging the operator

In the running cluster cluster-version-operator is responsible for maintaining functioning and non-altered elements. In that case to be able to use custom operator image one has to perform one of these operations:

  1. Set your operator in umanaged state, see here for details, in short:
oc patch clusterversion/version --type='merge' -p "$(cat <<- EOF
spec:
  overrides:
  - group: apps
    kind: Deployment
    name: kube-controller-manager-operator
    namespace: openshift-kube-controller-manager-operator
    unmanaged: true
EOF
)"
  1. Scale down cluster-version-operator:
oc scale --replicas=0 deploy/cluster-version-operator -n openshift-cluster-version

IMPORTANT: This apprach disables cluster-version-operator completly, whereas previous only tells it to not manage a kube-controller-manager-operator!

After doing this you can now change the image of the operator to the desired one:

oc patch deployment/kube-controller-manager-operator -n openshift-kube-controller-manager-operator -p '{"spec":{"template":{"spec":{"containers":[{"name":"kube-controller-manager-operator","image":"<user>/cluster-kube-controller-manager-operator","env":[{"name":"OPERATOR_IMAGE","value":"<user>/cluster-kube-controller-manager-operator"}]}]}}}}'

Developing and debugging the bootkube bootstrap phase

The operator image version used by the installer bootstrap phase can be overridden by creating a custom origin-release image pointing to the developer's operator :latest image:

$ IMAGE_ORG=<user> make images
$ docker push <user>/origin-cluster-kube-controller-manager-operator

$ cd ../cluster-kube-apiserver-operator
$ IMAGES=cluster-kube-controller-manager-operator IMAGE_ORG=<user> make origin-release
$ docker push <user>/origin-release:latest

$ cd ../installer
$ OPENSHIFT_INSTALL_RELEASE_IMAGE_OVERRIDE=docker.io/<user>/origin-release:latest bin/openshift-install cluster ...

cluster-kube-controller-manager-operator's People

Contributors

abhinavdahiya avatar atiratree avatar bertinatto avatar csrwng avatar damemi avatar deads2k avatar fedosin avatar gnufied avatar ingvagabund avatar joelspeed avatar juanvallejo avatar marun avatar mfojtik avatar openshift-ci[bot] avatar openshift-merge-bot[bot] avatar openshift-merge-robot avatar p0lyn0mial avatar ravisantoshgudimetla avatar s-urbaniak avatar sallyom avatar sanchezl avatar sjenning avatar smarterclayton avatar soltysh avatar stlaz avatar sttts avatar swghosh avatar tkashem avatar tnozicka avatar vrutkovs avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cluster-kube-controller-manager-operator's Issues

How to sync the timezone with running hosts ?

controller manager pod does not aware of running host timezone, it affect the CronJob schedule time, Service serving cert generating or evaluation timing, log timestamp, and potential time aware tasks. I'm wondering how to sync timezone with the running pods based on operator with running host.

panic: asset v3.11.0/kube-controller-manager/ns.yaml not found

after removing openshift-cluster-kube-controller-manager from the cvoconfig.overrides the operator works for a little while (minutes) but then I see this panic and the whole openshift-kube-controler-manager namespace goes away.

I1012 02:02:45.252594       1 reflector.go:286] github.com/openshift/cluster-kube-controller-manager-operator/vendor/k8s.io/client-go/informers/factory.go:130: forcing resync
I1012 02:02:45.254090       1 reflector.go:286] github.com/openshift/cluster-kube-controller-manager-operator/vendor/k8s.io/client-go/informers/factory.go:130: forcing resync
E1012 02:02:45.255370       1 runtime.go:66] Observed a panic: "asset: Asset(v3.11.0/kube-controller-manager/ns.yaml): Asset v3.11.0/kube-controller-manager/ns.yaml not found" (asset: Asset(v3.11.0/kube-controller-manager/ns.yaml): Asset v3.11.0/kube-controller-manager/ns.yaml not found)
/go/src/github.com/openshift/cluster-kube-controller-manager-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:72
/go/src/github.com/openshift/cluster-kube-controller-manager-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:65
/go/src/github.com/openshift/cluster-kube-controller-manager-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:51
/usr/local/go/src/runtime/asm_amd64.s:573
/usr/local/go/src/runtime/panic.go:502
/go/src/github.com/openshift/cluster-kube-controller-manager-operator/pkg/operator/v311_00_assets/bindata.go:458
/go/src/github.com/openshift/cluster-kube-controller-manager-operator/pkg/operator/sync_kubecontrollermanager_v311_00.go:31
/go/src/github.com/openshift/cluster-kube-controller-manager-operator/pkg/operator/operator.go:129
/go/src/github.com/openshift/cluster-kube-controller-manager-operator/pkg/operator/operator.go:174
/go/src/github.com/openshift/cluster-kube-controller-manager-operator/pkg/operator/operator.go:163
/go/src/github.com/openshift/cluster-kube-controller-manager-operator/pkg/operator/operator.go:157
/go/src/github.com/openshift/cluster-kube-controller-manager-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133
/go/src/github.com/openshift/cluster-kube-controller-manager-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:134
/go/src/github.com/openshift/cluster-kube-controller-manager-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88
/usr/local/go/src/runtime/asm_amd64.s:2361
I1012 02:02:45.255499       1 reflector.go:286] github.com/openshift/cluster-kube-controller-manager-operator/pkg/generated/informers/externalversions/factory.go:101: forcing resync
I1012 02:02:45.258724       1 reflector.go:286] github.com/openshift/cluster-kube-controller-manager-operator/vendor/k8s.io/client-go/informers/factory.go:130: forcing resync
E1012 02:02:46.257355       1 runtime.go:66] Observed a panic: "asset: Asset(v3.11.0/kube-controller-manager/ns.yaml): Asset v3.11.0/kube-controller-manager/ns.yaml not found" (asset: Asset(v3.11.0/kube-controller-manager/ns.yaml): Asset v3.11.0/kube-controller-manager/ns.yaml not found)
/go/src/github.com/openshift/cluster-kube-controller-manager-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:72
/go/src/github.com/openshift/cluster-kube-controller-manager-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:65
/go/src/github.com/openshift/cluster-kube-controller-manager-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:51
/usr/local/go/src/runtime/asm_amd64.s:573
/usr/local/go/src/runtime/panic.go:502
/go/src/github.com/openshift/cluster-kube-controller-manager-operator/pkg/operator/v311_00_assets/bindata.go:458
/go/src/github.com/openshift/cluster-kube-controller-manager-operator/pkg/operator/sync_kubecontrollermanager_v311_00.go:31
/go/src/github.com/openshift/cluster-kube-controller-manager-operator/pkg/operator/operator.go:129
/go/src/github.com/openshift/cluster-kube-controller-manager-operator/pkg/operator/operator.go:174
/go/src/github.com/openshift/cluster-kube-controller-manager-operator/pkg/operator/operator.go:163
/go/src/github.com/openshift/cluster-kube-controller-manager-operator/pkg/operator/operator.go:157
/go/src/github.com/openshift/cluster-kube-controller-manager-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133
/go/src/github.com/openshift/cluster-kube-controller-manager-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:134
/go/src/github.com/openshift/cluster-kube-controller-manager-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88
/usr/local/go/src/runtime/asm_amd64.s:2361

–allocate-node-cidrs always been set to false

we are using Terway cni on Aibaba Cloud, so we need use kube-controller-manager to allocate PodCIDR, we set follow config in KubeControllerManager.operator.openshift.io/v1, but no effect for us.

unsupportedConfigOverrides:
extendedArguments:
allocate-node-cidrs:
- "true"
node-cidr-mask-size:
- "23"

$ cat /etc/kubernetes/static-pod-resources/kube-controller-manager-pod-9/configmaps/config/config.yaml |jq
{
"apiVersion": "kubecontrolplane.config.openshift.io/v1",
"extendedArguments": {
"allocate-node-cidrs": [
"true"
],
"cert-dir": [
"/var/run/kubernetes"
],
"cluster-cidr": [
"10.108.128.0/18"
],
"cluster-name": [
"staging-mpcr4"
],
"cluster-signing-cert-file": [
"/etc/kubernetes/static-pod-certs/secrets/csr-signer/tls.crt"
],
"cluster-signing-key-file": [
"/etc/kubernetes/static-pod-certs/secrets/csr-signer/tls.key"
],
"configure-cloud-routes": [
"false"
],
"controllers": [
"*",
"-ttl",
"-bootstrapsigner",
"-tokencleaner"
],
"enable-dynamic-provisioning": [
"true"
],
"experimental-cluster-signing-duration": [
"720h"
],
"feature-gates": [
"RotateKubeletServerCertificate=true",
"SupportPodPidsLimit=true",
"NodeDisruptionExclusion=true",
"ServiceNodeExclusion=true",
"SCTPSupport=true",
"LegacyNodeRoleBehavior=false"
],
"flex-volume-plugin-dir": [
"/etc/kubernetes/kubelet-plugins/volume/exec"
],
"kube-api-burst": [
"300"
],
"kube-api-qps": [
"150"
],
"leader-elect": [
"true"
],
"leader-elect-resource-lock": [
"configmaps"
],
"leader-elect-retry-period": [
"3s"
],
"node-cidr-mask-size": [
"23"
],
"port": [
"0"
],
"root-ca-file": [
"/etc/kubernetes/static-pod-resources/configmaps/serviceaccount-ca/ca-bundle.crt"
],
"secure-port": [
"10257"
],
"service-account-private-key-file": [
"/etc/kubernetes/static-pod-resources/secrets/service-account-private-key/service-account.key"
],
"service-cluster-ip-range": [
"172.30.0.0/18"
],
"use-service-account-credentials": [
"true"
]
},
"kind": "KubeControllerManagerConfig",
"serviceServingCert": {
"certFile": "/etc/kubernetes/static-pod-resources/configmaps/service-ca/ca-bundle.crt"
}
}

kcm logs:
I0417 05:50:25.193690 1 patch.go:65] FLAGSET: generic
I0417 05:50:25.193699 1 flags.go:33] FLAG: --allocate-node-cidrs="false"
I0417 05:50:25.193701 1 flags.go:33] FLAG: --allow-untagged-cloud="false"
I0417 05:50:25.193704 1 flags.go:33] FLAG: --cidr-allocator-type="RangeAllocator"
I0417 05:50:25.193707 1 flags.go:33] FLAG: --cloud-config=""
I0417 05:50:25.193709 1 flags.go:33] FLAG: --cloud-provider=""
I0417 05:50:25.193712 1 flags.go:33] FLAG: --cluster-cidr="10.108.128.0/18"
I0417 05:50:25.193715 1 flags.go:33] FLAG: --cluster-name="staging-mpcr4"
I0417 05:50:25.193717 1 flags.go:33] FLAG: --configure-cloud-routes="true"
I0417 05:50:25.193720 1 flags.go:33] FLAG: --controller-start-interval="0s"
I0417 05:50:25.193722 1 flags.go:33] FLAG: --controllers="[*,-ttl,-bootstrapsigner,-tokencleaner]"
I0417 05:50:25.193728 1 flags.go:33] FLAG: --external-cloud-volume-plugin=""
I0417 05:50:25.193731 1 flags.go:33] FLAG: --feature-gates="LegacyNodeRoleBehavior=false,NodeDisruptionExclusion=true,RotateKubeletServerCertificate=true,SCTPSupport=true,ServiceNodeExclusion=true,SupportPodPidsLimit=true"

kube-controller-manager certs permissions

Running OKD 4.9

After a week or so of uptime internal certs seem to be refreshed. The following files on controllers are left with permissions 644 rather than 600.

This causes the okd4 compliance operator to flag an alert via the:

ocp4-file-permissions-openshift-pki-cert-files
ocp4-file-permissions-openshift-pki-key-files

Rules. Which are expecting all files to be 600 - as they are with these two exceptions.

/etc/kubernetes/static-pod-resources/kube-controller-manager-certs/secrets/csr-signer/tls.crt
/etc/kubernetes/static-pod-resources/kube-controller-manager-certs/secrets/csr-signer/tls.key

Is it possible to fix or control the permissions on these files when created?

I'm not entirely sure where in the okd4 operator chain creation of these files occur but here seems the best place to start!

PodDisruptionBudgetAtLimit alert should not be raised when there no disruption allowed.

PodDisruptionBudgetAtLimit should not be raised when there is no disruption allowed:
Example:
When there is only one pod and no disruption is allowed on it, this alert keeps on showing all the time.

In this kind of situation, this alert should not be raised, as it can be a conscious decision to have the expected number of pods to be equal to the min number of pods, and not allow any disruption.

kcm rootCA missing apiserver ca leads to kube-root-ca problem

1 Bug phenomenon

  1. The kcm(kube-controller-manager) rootCA is generated by manageServiceAccountCABundle in targerconfigcontroller, and this func will get kube-apiserver-server-ca cm first , and then use it and other two cm to generate kcm rootCA
  2. when sometimes(very unlikely to happen , but i have met this just one time) the kube-apiserver-server-ca cm is missing, and the manageServiceAccountCABundle generate rootCA without kube-apiserver-server-ca , and finally the kcm leader holds the wrong rootCA, it will lead to the kube-root-ca problem in every pod, and the ocp release wil not work until i stop the wrong kcm leader

2 Bug fix:
2.1. this bug can be resolved by adding kube-apiserver-server-ca check in manageServiceAccountCABundle of targerconfigcontroller ,as follows

func manageServiceAccountCABundle(ctx context.Context, lister corev1listers.ConfigMapLister, client corev1client.ConfigMapsGetter, recorder events.Recorder) (*corev1.ConfigMap, bool, error) {
requiredConfigMap, err := resourcesynccontroller.CombineCABundleConfigMaps(
resourcesynccontroller.ResourceLocation{Namespace: operatorclient.TargetNamespace, Name: "serviceaccount-ca"},
lister,
// include the ca bundle needed to recognize the server
resourcesynccontroller.ResourceLocation{Namespace: operatorclient.GlobalMachineSpecifiedConfigNamespace, Name: "kube-apiserver-server-ca"},
// include the ca bundle needed to recognize default
// certificates generated by cluster-ingress-operator
resourcesynccontroller.ResourceLocation{Namespace: operatorclient.GlobalMachineSpecifiedConfigNamespace, Name: "default-ingress-cert"},
)
if err != nil {
return nil, false, err
}
return resourceapply.ApplyConfigMap(ctx, client, recorder, requiredConfigMap)
}

image
2.1 maybe like this , but this is not the best way to resolve this problem
image

2.3 this is also can be resolved by modifying the openshift library func CombineCABundleConfigMaps in resourcesynccontroller as

func CombineCABundleConfigMaps(destinationConfigMap ResourceLocation, lister corev1listers.ConfigMapLister, inputConfigMaps ...ResourceLocation) (*corev1.ConfigMap, error) {
certificates := []*x509.Certificate{}
for _, input := range inputConfigMaps {
inputConfigMap, err := lister.ConfigMaps(input.Namespace).Get(input.Name)
if apierrors.IsNotFound(err) {
continue
}
if err != nil {
return nil, err
}
// configmaps must conform to this
inputContent := inputConfigMap.Data["ca-bundle.crt"]
if len(inputContent) == 0 {
continue
}
inputCerts, err := cert.ParseCertsPEM([]byte(inputContent))
if err != nil {
return nil, fmt.Errorf("configmap/%s in %q is malformed: %v", input.Name, input.Namespace, err)
}
certificates = append(certificates, inputCerts...)
}
certificates = crypto.FilterExpiredCerts(certificates...)
finalCertificates := []*x509.Certificate{}
// now check for duplicates. n^2, but super simple
for i := range certificates {
found := false
for j := range finalCertificates {
if reflect.DeepEqual(certificates[i].Raw, finalCertificates[j].Raw) {
found = true
break
}
}
if !found {
finalCertificates = append(finalCertificates, certificates[i])
}
}
caBytes, err := crypto.EncodeCertificates(finalCertificates...)
if err != nil {
return nil, err
}
return &corev1.ConfigMap{
ObjectMeta: metav1.ObjectMeta{Namespace: destinationConfigMap.Namespace, Name: destinationConfigMap.Name},
Data: map[string]string{
"ca-bundle.crt": string(caBytes),
},
}, nil
}

image

3 This is related to the openshift library-go issue # issue 1472 github.com/openshift/library-go missing key configmap

Operator won't accept installs without a cloudprovider

On BYOR install using libvirt configs on GCE the installer keeps restarting:

I1203 14:41:51.153742       1 controller.go:154] clusterOperator openshift-cluster-kube-controller-manager-operator/openshift-cluster-kube-controller-manager-operator set to {"apiVersion":"config.openshift.io/v1","kind":"ClusterOperator","metadata":{"creationTimestamp":"2018-12-03T12:21:50Z","generation":1,"name":"openshift-cluster-kube-controller-manager-operator","namespace":"openshift-cluster-kube-controller-manager-operator","resourceVersion":"128030","selfLink":"/apis/config.openshift.io/v1/clusteroperators/openshift-cluster-kube-controller-manager-operator","uid":"031e8a98-f6f6-11e8-99a8-42010af00002"},"status":{"conditions":[{"Message":"ConfigObservationFailing: configmap/cluster-config-v1.kube-system: no recognized cloud provider platform found","Reason":"ConfigObservationFailing","Status":"True","Type":"Failing"}]}}
I1203 14:41:51.282575       1 request.go:485] Throttling request took 795.972047ms, request: GET:https://172.30.0.1:443/api/v1/namespaces/openshift-kube-controller-manager/secrets/serving-cert
I1203 14:41:51.482642       1 request.go:485] Throttling request took 787.843176ms, request: POST:https://172.30.0.1:443/api/v1/namespaces/openshift-kube-controller-manager/pods
I1203 14:41:51.582621       1 leaderelection.go:209] successfully renewed lease openshift-cluster-kube-controller-manager-operator/openshift-cluster-kube-controller-manager-operator-lock
I1203 14:41:51.684204       1 request.go:485] Throttling request took 797.718808ms, request: GET:https://172.30.0.1:443/api/v1/namespaces/openshift-kube-controller-manager/configmaps/kube-controller-manager-pod
I1203 14:41:51.882578       1 request.go:485] Throttling request took 732.083152ms, request: GET:https://172.30.0.1:443/api/v1/namespaces/openshift-kube-controller-manager/serviceaccounts/installer-sa
I1203 14:41:51.904727       1 controller.go:154] clusterOperator openshift-cluster-kube-controller-manager-operator/openshift-cluster-kube-controller-manager-operator set to {"apiVersion":"config.openshift.io/v1","kind":"ClusterOperator","metadata":{"creationTimestamp":"2018-12-03T12:21:50Z","generation":1,"name":"openshift-cluster-kube-controller-manager-operator","namespace":"openshift-cluster-kube-controller-manager-operator","resourceVersion":"128042","selfLink":"/apis/config.openshift.io/v1/clusteroperators/openshift-cluster-kube-controller-manager-operator","uid":"031e8a98-f6f6-11e8-99a8-42010af00002"},"status":{"conditions":[{"Message":"ConfigObservationFailing: configmap/cluster-config-v1.kube-system: no recognized cloud provider platform found","Reason":"ConfigObservationFailing","Status":"True","Type":"Failing"}]}}
I1203 14:41:52.082588       1 request.go:485] Throttling request took 796.323758ms, request: GET:https://172.30.0.1:443/api/v1/namespaces/openshift-kube-controller-manager/secrets/serving-cert-1
I1203 14:41:52.282626       1 request.go:485] Throttling request took 787.73166ms, request: GET:https://172.30.0.1:443/api/v1/namespaces/openshift-kube-controller-manager/pods/installer-1-vrutkovs-ig-m-0q7p

This makes operator rewrite service and rolebindings, which puts a large load on the cluster

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.