Coder Social home page Coder Social logo

drain-node-on-crash's People

Contributors

mattmattox avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

drain-node-on-crash's Issues

Images failing to pull

Leader is failing to pull from Docker hub
Error message:

ImagePullBackOff: Back-off pulling image "drainnode/leader:v1.0-rc9"
ErrImagePull: rpc error: code = Unknown desc = Error response from daemon: manifest for drainnode/leader:v1.0-rc9 not found: manifest unknown: manifest unknown

Leader can't create endpoint

Need to fix the permissions for the service account system:serviceaccount:drain-node-on-crash:drain-node
Error message:

F0907 23:42:27.979181       8 main.go:108] failed to create election: endpoints "drain-node-on-crash" is forbidden: User "system:serviceaccount:drain-node-on-crash:drain-node" cannot get resource "endpoints" in API group "" in the namespace "default"

Failed Chart deployment

Chart: v1.0-rc8
Settings: Defaults
k8s: v1.15.12
Error message:

Failed to install app drain-node-on-crash. Error: Deployment.apps "drain-node_manager" is invalid: metadata.name: Invalid value: "drain-node_manager": a DNS-1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*')

Replicas not getting set correctly for manager

Replicas are getting set to 1 by default. Need to fix values.yaml and deployment.yaml.

YAML

apiVersion: apps/v1
kind: Deployment
metadata:
  annotations:
    deployment.kubernetes.io/revision: "2"
    meta.helm.sh/release-name: drain-node-on-crash
    meta.helm.sh/release-namespace: drain-node-on-crash
  creationTimestamp: "2020-09-07T23:08:19Z"
  generation: 2
  labels:
    app: drain-node-on-crash
    app.kubernetes.io/managed-by: Helm
    io.cattle.field/appId: drain-node-on-crash
  name: drain-node-manager
  namespace: drain-node-on-crash
  resourceVersion: "6331"
  selfLink: /apis/apps/v1/namespaces/drain-node-on-crash/deployments/drain-node-manager
  uid: a49650e1-d1c0-4133-95e6-21edc1cbb5a8
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: drain-node-on-crash
      io.cattle.field/appId: drain-node-on-crash
  strategy:
    type: Recreate
  template:
    metadata:
      annotations:
        cattle.io/timestamp: "2020-09-07T23:16:27Z"
        field.cattle.io/ports: '[[{"containerPort":4040,"dnsName":"drain-node-manager","hostPort":0,"kind":"ClusterIP","name":"4040tcp02","protocol":"TCP","sourcePort":0}]]'
      creationTimestamp: null
      labels:
        app: drain-node-on-crash
        io.cattle.field/appId: drain-node-on-crash
    spec:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: app
                operator: In
                values:
                - drain-node-on-crash
            topologyKey: kubernetes.io/hostname
      containers:
      - env:
        - name: AUTO_UNCORDON
          value: "true"
        - name: NODE_TIMEOUT
          value: "360"
        image: drainnode/manager:v1.0-rc9
        imagePullPolicy: IfNotPresent
        name: drain
        resources: {}
        securityContext:
          capabilities: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
      - args:
        - --election=drain-node-on-crash
        - --http=0.0.0.0:4040
        image: docker.io/drainnode/leader:v1.0-rc9
        imagePullPolicy: IfNotPresent
        name: leader
        ports:
        - containerPort: 4040
          name: 4040tcp02
          protocol: TCP
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      serviceAccount: drain-node
      serviceAccountName: drain-node
      terminationGracePeriodSeconds: 30
status:
  conditions:
  - lastTransitionTime: "2020-09-07T23:08:19Z"
    lastUpdateTime: "2020-09-07T23:08:19Z"
    message: Deployment does not have minimum availability.
    reason: MinimumReplicasUnavailable
    status: "False"
    type: Available
  - lastTransitionTime: "2020-09-07T23:08:19Z"
    lastUpdateTime: "2020-09-07T23:17:12Z"
    message: ReplicaSet "drain-node-manager-9847c878d" is progressing.
    reason: ReplicaSetUpdated
    status: "True"
    type: Progressing
  observedGeneration: 2
  replicas: 1
  unavailableReplicas: 1
  updatedReplicas: 1

Why deployment, not daemonset?

Hi,

Any reason you decided to use a Deployment rather than a DaemonSet? DS would be more reliable IMO.
For example, in a two-node cluster, if node A goes down, all pods in Deployment will be scheduled on node B. Then, when A returns, if B goes down, drain-node-on-crash is slower to function since it must first wait for a pod to be scheduled on A. Using a DaemonSet means there is already an available pod and drain-node-on-crash doesn't suffer this delay.

Thanks and nice work BTW.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.