uswitch / nidhogg Goto Github PK

View Code? Open in Web Editor NEW

76.0 11.0 15.0 28.39 MB

Kubernetes Node taints based on Daemonset Pods

License: Apache License 2.0

Dockerfile 1.68% Makefile 4.92% Go 93.40%

kubernetes

nidhogg's Introduction

🚨 THIS REPO IS NOW BEING ABANDONED 🚨

⚠️⚠️⚠️⚠️⚠️⚠️ Please read ⚠️⚠️⚠️⚠️⚠️⚠️

If you are interested in using a maintened fork of this repo, please use pelotech-nidhogg, the fork that the engineering team at Pelotech have kindly agreed to maintain.

See this issue for more details

⚠️⚠️⚠️⚠️⚠️⚠️ END of please read ⚠️⚠️⚠️⚠️⚠️⚠️

We are only a small team and we don't have the capacity to maintain a tool we don't use anymore, so from now on we will be moving this repo to Abandoned but we are happy that a team like the one at Pelotech is investing in this solution and giving back to the open source community by maintaining an active fork.

Nidhogg

Nidhogg is a controller that taints nodes based on whether a Pod from a specific Daemonset is running on them.

Sometimes you have a Daemonset that is so important that you don't want other pods to run on your node until that Daemonset is up and running on the node. Nidhogg solves this problem by tainting the node until your Daemonset pod is ready, preventing pods that don't tolerate the taint from scheduling there.

Nidhogg annotate the node when all the required taints are removed: nidhogg.uswitch.com/first-time-ready: 2006-01-02T15:04:05Z

Nidhogg was built using Kubebuilder

Usage

Nidhogg requires a yaml/json config file to tell it what Daemonsets to watch and what nodes to act on. nodeSelector is a map of keys/values corresponding to node labels. daemonsets is an array of Daemonsets to watch, each containing two fields name and namespace. Nodes are tainted with taint that follows the format of nidhogg.uswitch.com/namespace.name:NoSchedule.

Example:

YAML:

nodeSelector:
  node-role.kubernetes.io/node
daemonsets:
  - name: kiam
    namespace: kube-system

JSON:

{
  "nodeSelector": [
    "node-role.kubernetes.io/node",
    "!node-role.kubernetes.io/master",
    "aws.amazon.com/ec2.asg.name in (standard, special)"
  ],
  "daemonsets": [
    {
      "name": "kiam",
      "namespace": "kube-system"
    }
  ]
}

This example will select any nodes in AWS ASGs named "standard" or "special" that have the label node-role.kubernetes.io/node present, and no nodes with label node-role.kubernetes.io/master. If the matching nodes do not have a running and ready pod from the kiam daemonset in the kube-system namespace. It will add a taint of nidhogg.uswitch.com/kube-system.kiam:NoSchedule until there is a ready kiam pod on the node.

If you want pods to be able to run on the nidhogg tainted nodes you can add a toleration:

spec:
  tolerations:
  - key: nidhogg.uswitch.com/kube-system.kiam
    operator: "Exists"
    effect: NoSchedule

Deploying

Docker images can be found at https://quay.io/uswitch/nidhogg

Example Kustomize manifests can be found here to quickly deploy this to a cluster.

Flags

-config-file string
    Path to config file (default "config.json")
-kubeconfig string
    Paths to a kubeconfig. Only required if out-of-cluster.
-leader-configmap string
    Name of configmap to use for leader election
-leader-election
    enable leader election
-leader-namespace string
    Namespace where leader configmap located
-master string
    The address of the Kubernetes API server. Overrides any value in kubeconfig. Only required if out-of-cluster.
-metrics-addr string
    The address the metric endpoint binds to. (default ":8080")

nidhogg's People

Contributors

Stargazers

Watchers

Forkers

neoshrew spothanis datadog clamoriniere jacobtopper granular-ryanbonham pnovotnak nadaahm megdewan syllogy getbreathelife martinabeleda jasonlai chenzongshu preflightsiren

nidhogg's Issues

Nidhogg shows error `Could not construct reference` when logging node events

I'm running the latest image build of Nidhogg (14f4e815a27dc8caaed62e0e9a5341e0f3b7fb31) with Kubernetes v1.21 and i am seeing lots of errors in the logs, even though Nidhogg seems to be working. I think the error is due to the deprecation of selfLink in Kubernetes v1.20: kubernetes/enhancements#1164

Here is the error (truncated for brevity):

E0622 23:00:36.723752       1 event.go:259] Could not construct reference to: '&v1.Node{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}
...
due to: 'selfLink was empty, can't make reference'. Will not report event: 'Normal' 'TaintsChanged'
...

Indirect node tainting using Conditions

Rather than directly setting a taint on a Node, manage conditions for the status of that Node and ~~allow the control plane to~~ then have another part of the controller rely on that condition information to taint the node.

To make this work, admission control could mutate incoming Nodes to have that condition imposed immediately, leaving that in place until there is evidence that the required DaemonSets are healthy and ready.

Daemonset selector

I'd like to now if it would be possible to add a feature to match the node with the selector of the daemonset itself instead of using the global nidhogg configuration for it.

Happy to work on this if it makes sense 👍

Small window where nodes are untainted?

Thank you very much for Nidhogg!

I've just implemented it for the first time, but I'm seeing some behaviour that I'm not sure is correct, so would like to run it by you and see if there is something that I can do to fix it up.

I have the following YAML configuration:

    daemonsets:
    - name: fluentd-fluentd-elasticsearch
      namespace: fluentd
    - name: kiam-agent
      namespace: kiam
    - name: node-local-dns
      namespace: kube-system
    - name: weave-net
      namespace: kube-system
    nodeSelector:
      node-role.kubernetes.io/node: ""

With the above, I'm waiting on 4 critical daemonsets.

I have been looking at the Nidhogg logs as nodes are added to the cluster by the cluster-autoscaler.
What I'm finding is that initially taints are added by Nidhogg, but not for all of the 4 daemon sets.
Often it is 3 of them, which then clear, and the firstTimeReady gets set, and then a few seconds later the missing 4th taint is added, along with the other 3 and then they proceed to get removed again as things become ready.
This appears to give a 2 or 3 second window where pods may be able schedule onto the node even though it isn't quite ready yet.

An example of logs showing this follows:

{"level":"info","ts":1593498895.9255075,"logger":"nidhogg","msg":"Updating Node taints","instance":"ip-10-8-108-84.ap-southeast-2.compute.internal","taints added":["nidhogg.uswitch.com/kiam.kiam-agent","nidhogg.uswitch.com/kube-system.node-local-dns","nidhogg.uswitch.com/kube-system.weave-net"],"taints removed":[],"taintLess":false,"firstTimeReady":""}
{"level":"info","ts":1593498895.9452868,"logger":"nidhogg","msg":"Updating Node taints","instance":"ip-10-8-108-84.ap-southeast-2.compute.internal","taints added":[],"taints removed":["nidhogg.uswitch.com/kube-system.weave-net"],"taintLess":false,"firstTimeReady":""}
{"level":"info","ts":1593498896.1045887,"logger":"nidhogg","msg":"Updating Node taints","instance":"ip-10-8-108-84.ap-southeast-2.compute.internal","taints added":[],"taints removed":["nidhogg.uswitch.com/kube-system.node-local-dns"],"taintLess":false,"firstTimeReady":""}
{"level":"info","ts":1593498896.1525385,"logger":"nidhogg","msg":"Updating Node taints","instance":"ip-10-8-108-84.ap-southeast-2.compute.internal","taints added":[],"taints removed":["nidhogg.uswitch.com/kiam.kiam-agent"],"taintLess":true,"firstTimeReady":"2020-06-30T06:34:56Z"}
{"level":"info","ts":1593498899.199731,"logger":"nidhogg","msg":"Updating Node taints","instance":"ip-10-8-108-84.ap-southeast-2.compute.internal","taints added":["nidhogg.uswitch.com/fluentd.fluentd-fluentd-elasticsearch"],"taints removed":[],"taintLess":false,"firstTimeReady":"2020-06-30T06:34:56Z"}
{"level":"info","ts":1593498899.5885453,"logger":"nidhogg","msg":"Updating Node taints","instance":"ip-10-8-108-84.ap-southeast-2.compute.internal","taints added":["nidhogg.uswitch.com/kube-system.weave-net"],"taints removed":[],"taintLess":false,"firstTimeReady":"2020-06-30T06:34:56Z"}
{"level":"info","ts":1593498900.7854457,"logger":"nidhogg","msg":"Updating Node taints","instance":"ip-10-8-108-84.ap-southeast-2.compute.internal","taints added":["nidhogg.uswitch.com/kube-system.node-local-dns"],"taints removed":[],"taintLess":false,"firstTimeReady":"2020-06-30T06:34:56Z"}
{"level":"info","ts":1593498901.1978955,"logger":"nidhogg","msg":"Updating Node taints","instance":"ip-10-8-108-84.ap-southeast-2.compute.internal","taints added":["nidhogg.uswitch.com/kiam.kiam-agent"],"taints removed":[],"taintLess":false,"firstTimeReady":"2020-06-30T06:34:56Z"}
{"level":"info","ts":1593498916.6390643,"logger":"nidhogg","msg":"Updating Node taints","instance":"ip-10-8-108-84.ap-southeast-2.compute.internal","taints added":[],"taints removed":["nidhogg.uswitch.com/kube-system.node-local-dns"],"taintLess":false,"firstTimeReady":"2020-06-30T06:34:56Z"}
{"level":"info","ts":1593498927.831887,"logger":"nidhogg","msg":"Updating Node taints","instance":"ip-10-8-108-84.ap-southeast-2.compute.internal","taints added":[],"taints removed":["nidhogg.uswitch.com/kiam.kiam-agent"],"taintLess":false,"firstTimeReady":"2020-06-30T06:34:56Z"}
{"level":"info","ts":1593498935.2295878,"logger":"nidhogg","msg":"Updating Node taints","instance":"ip-10-8-108-84.ap-southeast-2.compute.internal","taints added":[],"taints removed":["nidhogg.uswitch.com/kube-system.weave-net"],"taintLess":false,"firstTimeReady":"2020-06-30T06:34:56Z"}
{"level":"info","ts":1593498949.044196,"logger":"nidhogg","msg":"Updating Node taints","instance":"ip-10-8-108-84.ap-southeast-2.compute.internal","taints added":[],"taints removed":["nidhogg.uswitch.com/fluentd.fluentd-fluentd-elasticsearch"],"taintLess":true,"firstTimeReady":"2020-06-30T06:34:56Z"}

The first line has added the taints for kiam, node-local-dns, and weave-net, but there is no taint added for fluentd yet.
The next 3 lines are the 3 taints it has being removed one by one, with the final one marking the node as taintLess and setting firstTimeReady.
Then the next line (roughly 3 seconds later) adds the fluentd taint that was previously missing.
It is these 3 seconds that I'm concerned about.
Next 3 more lines re-add the previous taints that were removed, and then the taints are all removed again until the node is ready.

It's not always fluentd that is the left until later, sometime it is node-local-dns instead.
And it's not always 3 taints that are initially addeded either; I've also seen just 2 of the taints added in the first line.
I haven't had it installed for long, but if it is useful I can collect more details and pass them on.

Intending to fork & update; link to fork in README?

Howdy 👋

We (Pelotech) have recently started using Nidhogg. Our use-case is to manage taints to control pod scheduling while using Multus (along with other CNI plugins). We've encountered an issue or two related to outdated APIs (resulting in noisy warnings in logs), and have found some flakiness (seems to be due to an issue with the logic used to detect DaemonSet readiness). We also have thoughts about updating the config approach (to eliminate the ConfigMap).

From the unanswered issues and time since last commit, it seems like this project is more-or-less done/complete (and/or abandoned). Our intention is to update the code to current idiomatic Golang (e.g., Go modules, etc), to update the dependencies (esp. Kubernetes client), and update the readiness check.

In addition, we plan to update the API to remove everything related to NodeSelector and DaemonSet name, and to move that functionality into annotations/labels on the relevant DaemonSet. This approach would move all scheduling logic to the monitored DaemonSet, as well as which taint is related to which DaemonSet (instead of the current fixed taint naming: nidhogg.uswitch.com/$DS_NAMESPACE.$DS_NAME). This change, in particular, will be a major breaking, API change. In our view it will increase the flexibility and generality of Nidhogg.

If in fact USwitch is not continuing development of Nidhogg, once we've released an updated version, we'd be glad if you were willing to point to the new version in the README here. There are many places that link to uswitch/nidhogg and it would be ideal to ease discovery of a new iteration.

Will you accept a PR to direct future users to an updated and actively maintained fork?

Feature Request: FirstTimeReady timestamp(unixtimestamp) as gauge metrics.

Reason - we wanted to measure how fast our node is ready for scheduling.
This can be calculated by metrics from kube state metrics (unix timestamp) kube_node_created.
If we get metrics from taint controller about at what time(unix timestamp) node is marked ready.

we can make this calculation by subtracting those metrics.

Cluster-autoscaler runaway scale-up and daemonsets

How do you make this work with the cluster-autoscaler so it will not just keep spinning up nodes in the event of a daemonset failure? For example, if we set taints based on daemonsets A and B, and then there's an issue where daemonset B is broken and crashlooping, the autoscaler will just keep spinning up tainted nodes because it sees that there are pending pods. Is there a way to prevent this from happening?

Cannot Invert Selector Match or Use Existence

It's not possible to invert a selector match or check label existence which would be nice for us.

Examples with kubectl:

kubectl get node -l !node-role.kubernetes.io/master


kubectl get node -l node-role.kubernetes.io/node

question: race conditions with untainted nodes

Thanks for your great work, @Joseph-Irving.

I have a question about nidhogg, if you don't mind. When kubernetes adds new nodes to the cluster (e.g. cluster-autoscaler), is it possible to have an untainted node during a small period of time (until the nidhogg controller applies the taint to it)?

I'm interested in knowing this because during that small period of time a pod might be scheduled on the node where the daemonsets are not ready yet.

request for new image

can you publish new image as existing one is 1 year old and contains vulnerabilities