Coder Social home page Coder Social logo

ngatia / draino Goto Github PK

View Code? Open in Web Editor NEW

This project forked from planetlabs/draino

0.0 1.0 0.0 91 KB

Automatically cordon and drain Kubernetes nodes based on node conditions

License: Apache License 2.0

Dockerfile 0.54% Go 97.13% Shell 0.79% Smarty 1.55%

draino's Introduction

draino Docker Pulls Godoc Travis Codecov

Draino automatically drains Kubernetes nodes based on labels and node conditions. Nodes that match all of the supplied labels and any of the supplied node conditions will be cordoned immediately and drained after a configurable drain-buffer time.

Draino is intended for use alongside the Kubernetes Node Problem Detector and Cluster Autoscaler. The Node Problem Detector can set a node condition when it detects something wrong with a node - for instance by watching node logs or running a script. The Cluster Autoscaler can be configured to delete nodes that are underutilised. Adding Draino to the mix enables autoremediation:

  1. The Node Problem Detector detects a permanent node problem and sets the corresponding node condition.
  2. Draino notices the node condition. It immediately cordons the node to prevent new pods being scheduled there, and schedules a drain of the node.
  3. Once the node has been drained the Cluster Autoscaler will consider it underutilised. It will be eligible for scale down (i.e. termination) by the Autoscaler after a configurable period of time.

Usage

$ docker run planetlabs/draino /draino --help
usage: draino [<flags>] <node-conditions>...

Automatically cordons and drains nodes that match the supplied conditions.

Flags:
      --help                     Show context-sensitive help (also try --help-long and --help-man).
  -d, --debug                    Run with debug logging.
      --listen=":10002"          Address at which to expose /metrics and /healthz.
      --kubeconfig=KUBECONFIG    Path to kubeconfig file. Leave unset to use in-cluster config.
      --master=MASTER            Address of Kubernetes API server. Leave unset to use in-cluster config.
      --dry-run                  Emit an event without cordoning or draining matching nodes.
      --max-grace-period=8m0s    Maximum time evicted pods will be given to terminate gracefully.
      --eviction-headroom=30s    Additional time to wait after a pod's termination grace period for it to have been deleted.
      --drain-buffer=10m0s       Minimum time between starting each drain. Nodes are always cordoned immediately.
      --node-label=KEY=VALUE ...
                                 Only nodes with this label will be eligible for cordoning and draining. May be specified multiple times.
      --evict-daemonset-pods     Evict pods that were created by an extant DaemonSet.
      --evict-emptydir-pods      Evict pods with local storage, i.e. with emptyDir volumes.
      --evict-unreplicated-pods  Evict pods that were not created by a replication controller.
      --protected-pod-annotation=KEY[=VALUE] ...
                                 Protect pods with this annotation from eviction. May be specified multiple times.

Args:
  <node-conditions>  Nodes for which any of these conditions are true will be cordoned and drained.

Considerations

Keep the following in mind before deploying Draino:

  • Always run Draino in --dry-run mode first to ensure it would drain the nodes you expect it to. In dry run mode Draino will emit logs, metrics, and events but will not actually cordon or drain nodes.
  • Draino immediately cordons nodes that match its configured labels and node conditions, but will wait a configurable amount of time (10 minutes by default) between draining nodes. i.e. If two nodes begin exhibiting a node condition simultaneously one node will be drained immediately and the other in 10 minutes.
  • Draino considers a drain to have failed if at least one pod eviction triggered by that drain fails. If Draino fails to evict two of five pods it will consider the Drain to have failed, but the remaining three pods will always be evicted.

Deployment

Draino is automatically built from master and pushed to the Docker Hub. Builds are tagged planetlabs/draino:latest and planetlabs/draino:$(git rev-parse --short HEAD). An example Kubernetes deployment manifest is provided.

Monitoring

Draino provides a simple healthcheck endpoint at /healthz and Prometheus metrics at /metrics. The following metrics exist:

$ kubectl -n kube-system exec -it ${DRAINO_POD} -- apk add curl
$ kubectl -n kube-system exec -it ${DRAINO_POD} -- curl http://localhost:10002/metrics
# HELP draino_cordoned_nodes_total Number of nodes cordoned.
# TYPE draino_cordoned_nodes_total counter
draino_cordoned_nodes_total{result="succeeded"} 2
draino_cordoned_nodes_total{result="failed"} 1
# HELP draino_drained_nodes_total Number of nodes drained.
# TYPE draino_drained_nodes_total counter
draino_drained_nodes_total{result="succeeded"} 1
draino_drained_nodes_total{result="failed"} 1

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.