mattmattox / drain-node-on-crash Goto Github PK

View Code? Open in Web Editor NEW

20.0 20.0 14.0 25.55 MB

This app is designed to automatically Drain a node after a crash where the node fails to recover after 5mins.

Dockerfile 14.45% Shell 63.53% HTML 22.03%

drain-node-on-crash's People

Contributors

Stargazers

Watchers

Forkers

valentyn-yakovlev holmesb e-n-g-xor ravibhardwaj05 hardeep18 jodr5786 yuriatanasov kumargaurav522 aspettl exhaf1998 theykk-bunker pczerkas fpiesche dungsvtech

drain-node-on-crash's Issues

Failed Chart deployment

Chart: v1.0-rc8
Settings: Defaults
k8s: v1.15.12
Error message:

Failed to install app drain-node-on-crash. Error: Deployment.apps "drain-node_manager" is invalid: metadata.name: Invalid value: "drain-node_manager": a DNS-1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*')

Images failing to pull

Leader is failing to pull from Docker hub
Error message:

ImagePullBackOff: Back-off pulling image "drainnode/leader:v1.0-rc9"
ErrImagePull: rpc error: code = Unknown desc = Error response from daemon: manifest for drainnode/leader:v1.0-rc9 not found: manifest unknown: manifest unknown

How to install latest master release?

Please provide info on how to install from master? Looks like there were changes to build stack.

Why deployment, not daemonset?

Hi,

Any reason you decided to use a Deployment rather than a DaemonSet? DS would be more reliable IMO.
For example, in a two-node cluster, if node A goes down, all pods in Deployment will be scheduled on node B. Then, when A returns, if B goes down, drain-node-on-crash is slower to function since it must first wait for a pod to be scheduled on A. Using a DaemonSet means there is already an available pod and drain-node-on-crash doesn't suffer this delay.

Thanks and nice work BTW.

Replicas not getting set correctly for manager

Replicas are getting set to 1 by default. Need to fix values.yaml and deployment.yaml.

YAML

apiVersion: apps/v1
kind: Deployment
metadata:
  annotations:
    deployment.kubernetes.io/revision: "2"
    meta.helm.sh/release-name: drain-node-on-crash
    meta.helm.sh/release-namespace: drain-node-on-crash
  creationTimestamp: "2020-09-07T23:08:19Z"
  generation: 2
  labels:
    app: drain-node-on-crash
    app.kubernetes.io/managed-by: Helm
    io.cattle.field/appId: drain-node-on-crash
  name: drain-node-manager
  namespace: drain-node-on-crash
  resourceVersion: "6331"
  selfLink: /apis/apps/v1/namespaces/drain-node-on-crash/deployments/drain-node-manager
  uid: a49650e1-d1c0-4133-95e6-21edc1cbb5a8
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: drain-node-on-crash
      io.cattle.field/appId: drain-node-on-crash
  strategy:
    type: Recreate
  template:
    metadata:
      annotations:
        cattle.io/timestamp: "2020-09-07T23:16:27Z"
        field.cattle.io/ports: '[[{"containerPort":4040,"dnsName":"drain-node-manager","hostPort":0,"kind":"ClusterIP","name":"4040tcp02","protocol":"TCP","sourcePort":0}]]'
      creationTimestamp: null
      labels:
        app: drain-node-on-crash
        io.cattle.field/appId: drain-node-on-crash
    spec:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: app
                operator: In
                values:
                - drain-node-on-crash
            topologyKey: kubernetes.io/hostname
      containers:
      - env:
        - name: AUTO_UNCORDON
          value: "true"
        - name: NODE_TIMEOUT
          value: "360"
        image: drainnode/manager:v1.0-rc9
        imagePullPolicy: IfNotPresent
        name: drain
        resources: {}
        securityContext:
          capabilities: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
      - args:
        - --election=drain-node-on-crash
        - --http=0.0.0.0:4040
        image: docker.io/drainnode/leader:v1.0-rc9
        imagePullPolicy: IfNotPresent
        name: leader
        ports:
        - containerPort: 4040
          name: 4040tcp02
          protocol: TCP
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      serviceAccount: drain-node
      serviceAccountName: drain-node
      terminationGracePeriodSeconds: 30
status:
  conditions:
  - lastTransitionTime: "2020-09-07T23:08:19Z"
    lastUpdateTime: "2020-09-07T23:08:19Z"
    message: Deployment does not have minimum availability.
    reason: MinimumReplicasUnavailable
    status: "False"
    type: Available
  - lastTransitionTime: "2020-09-07T23:08:19Z"
    lastUpdateTime: "2020-09-07T23:17:12Z"
    message: ReplicaSet "drain-node-manager-9847c878d" is progressing.
    reason: ReplicaSetUpdated
    status: "True"
    type: Progressing
  observedGeneration: 2
  replicas: 1
  unavailableReplicas: 1
  updatedReplicas: 1

Leader can't create endpoint

Need to fix the permissions for the service account system:serviceaccount:drain-node-on-crash:drain-node
Error message:

F0907 23:42:27.979181       8 main.go:108] failed to create election: endpoints "drain-node-on-crash" is forbidden: User "system:serviceaccount:drain-node-on-crash:drain-node" cannot get resource "endpoints" in API group "" in the namespace "default"

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.