Coder Social home page Coder Social logo

sfowl / external-health-monitor Goto Github PK

View Code? Open in Web Editor NEW

This project forked from kubernetes-csi/external-health-monitor

0.0 2.0 0.0 6.8 MB

This repo contains sidecar controller and agent for volume health monitoring.

License: Apache License 2.0

Makefile 5.68% Shell 37.42% Go 53.76% Dockerfile 0.27% Python 2.87%

external-health-monitor's Introduction

External Health Monitor

The External Health Monitor is part of Kubernetes implementation of Container Storage Interface (CSI). It was introduced as an Alpha feature in Kubernetes v1.19.

Overview

The External Health Monitor is implemented as two components: External Health Monitor Controller and External Health Monitor Agent.

  • External Health Monitor Controller:

    • The external health monitor controller will be deployed as a sidecar together with the CSI controller driver, similar to how the external-provisioner sidecar is deployed.
    • Trigger controller RPC to check the health condition of the CSI volumes.
    • The external controller sidecar will also watch for node failure events. This component can be enabled via a flag.
  • External Health Monitor Agent:

    • The external health monitor agent will be deployed as a sidecar together with the CSI node driver on every Kubernetes worker node.
    • Trigger node RPC to check volume's mounting conditions.

The External Health Monitor needs to invoke the following CSI interfaces.

  • External Health Monitor Controller:
    • ListVolumes (If both ListVolumes and ControllerGetVolume are supported, ListVolumes will be used)
    • ControllerGetVolume
  • External Health Monitor Agent:
    • NodeGetVolumeStats

Compatibility

This information reflects the head of this branch.

Compatible with CSI Version Container Image Min K8s Version Recommend K8s version
CSI Spec v1.3.0 k8s.gcr.io/sig-storage.csi-external-health-monitor-controller 1.19 1.19
CSI Spec v1.3.0 k8s.gcr.io/sig-storage/csi-external-health-monitor-agent 1.19 1.19

Driver Support

Currently, the CSI volume health monitoring interfaces are only implemented in the Mock Driver.

Usage

External Health Monitor needs to be deployed with CSI driver.

Build && Push Image

You can run the command below in the root directory of the project.

make container GOFLAGS_VENDOR=$( [ -d vendor ] && echo '-mod=vendor' )

And then, you can tag and push it to your own image repository.

docker tag csi-external-health-monitor-controller:latest <custom-image-repo-addr>/csi-external-health-monitor-controller:<custom-image-tag>

docker tag csi-external-health-monitor-agent:latest <custom-image-repo-addr>/csi-external-health-monitor-agent:<custom-image-tag>

External Health Monitor Controller

cd external-health-monitor
kubectl create -f deploy/kubernetes/external-health-monitor-controller

External Health Monitor Agent

kubectl create -f deploy/kubernetes/external-health-monitor-agent

You can run kubectl get pods command to confirm if they are deployed on your cluster successfully.

Check logs of external health monitor controller and agent as follows:

  • kubectl logs <leader-of-external-health-monitor-controller-container-name> -c csi-external-health-monitor-controller
  • kubectl logs <external-health-monitor-agent-container-name> -c csi-external-health-monitor-agent

Check if there are events on PVCs or Pods that report abnormal volume condition when the volume you are using is abnormal.

csi-external-health-monitor-controller-sidecar-command-line-options

Important optional arguments that are highly recommended to be used

  • leader-election: Enables leader election. This is useful when there are multiple replicas of the same external-health-monitor-controller running for one CSI driver. Only one of them may be active (=leader). A new leader will be re-elected when the current leader dies or becomes unresponsive for ~15 seconds.

  • leader-election-namespace <namespace>: The namespace where the leader election resource exists. Defaults to the pod namespace if not set.

  • http-endpoint: The TCP network address where the HTTP server for diagnostics, including metrics and leader election health check, will listen (example: :8080 which corresponds to port 8080 on local host). The default is empty string, which means the server is disabled.

  • metrics-path: The HTTP path where prometheus metrics will be exposed. Default is /metrics.

  • worker-threads: Number of worker threads for running volume checker when CSI Driver supports ControllerGetVolume, but not ListVolumes. The default value is 10.

Other recognized arguments

  • kubeconfig <path>: Path to Kubernetes client configuration that the external-health-monitor-controller uses to connect to the Kubernetes API server. When omitted, default token provided by Kubernetes will be used. This option is useful only when the external-health-monitor-controller does not run as a Kubernetes pod, e.g. for debugging.

  • resync <duration>: Internal resync interval when the monitor controller re-evaluates all existing resource objects that it was watching and tries to fulfill them. It does not affect re-tries of failed calls! It should be used only when there is a bug in Kubernetes watch logic. The default is ten mintiues.

  • csiAddress <path-to-csi>: This is the path to the CSI Driver socket inside the pod that the external-health-monitor-controller container will use to issue CSI operations (/run/csi/socket is used by default).

  • version: Prints the current version of external-health-monitor-controller.

  • timeout <duration>: Timeout of all calls to CSI Driver. It should be set to value that accommodates the majority of ListVolumes, ControllerGetVolume calls. 15 seconds is used by default.

  • list-volumes-interval <duration>: Interval of monitoring volume health condition by invoking the RPC interface of ListVolumes. You can adjust it to change the frequency of the evaluation process. Five mintiues by default if not set.

  • enable-node-watcher <boolean>: Enable node-watcher. node-watcher evaluates volume health condition by checking node status periodically.

  • monitor-interval <duration>: Interval of monitoring volume health condition when CSI Driver supports ControllerGetVolume, but not ListVolumes. It is also used by nodeWatcher. You can adjust it to change the frequency of the evaluation process. One minute by default if not set.

  • volume-list-add-interval <duration>: Interval of listing volumes and adding them to the queue when CSI driver supports ControllerGetVolume, but not ListVolumes.

  • node-list-add-interval <duration>: Interval of listing nodes and adding them. It is used together with monitor-interval and enable-node-watcher by nodeWatcher.

  • metrics-address: (deprecated) The TCP network address where the Prometheus metrics endpoint will run (example: :8080, which corresponds to port 8080 on local host). The default is the empty string, which means the metrics and leader election check endpoint is disabled.

csi-external-health-monitor-agent-sidecar-command-line-options

Important optional arguments that are highly recommended to be used

  • http-endpoint: The TCP network address where the HTTP server for diagnostics, including metrics and leader election health check, will listen (example: :8080 which corresponds to port 8080 on local host). The default is empty string, which means the server is disabled.

  • metrics-path: The HTTP path where prometheus metrics will be exposed. Default is /metrics.

  • worker-threads: Number of worker threads for running volume checker by invoking RPC interface NodeGetVolumeStats. Default value is 10.

Other recognized arguments

  • kubeconfig <path>: Path to Kubernetes client configuration that the external-health-monitor-agent uses to connect to Kubernetes API server. When omitted, the default token provided by Kubernetes will be used. This option is useful only when the external-health-monitor-agent does not run as a Kubernetes pod, e.g. for debugging.

  • resync <duration>: Internal resync interval when the monitor agent re-evaluates all existing resource objects that it was watching and tries to fulfill them. It does not affect re-tries of failed calls! It should be used only when there is a bug in Kubernetes watch logic. The default is ten mintiues.

  • monitor-interval <duration>: Interval of monitoring volume health condition by invoking RPC interface NodeGetVolumeStats. You can adjust it to change the frequency of the evaluation process. One minute by default if not set.

  • csiAddress <path-to-csi>: This is the path to the CSI Driver socket inside the pod that the external-health-monitor-agent container will use to issue CSI operations (/run/csi/socket is used by default).

  • version: Prints the current version of external-health-monitor-agent.

  • timeout <duration>: Timeout of all calls to CSI Driver. It should be set to value that accommodates the majority of NodeGetVolumeStats calls. 15 seconds is used by default.

  • kubelet-root-path: Path to kubelet. It is used to generate the volume path. /var/lib/kubelet by default if not set.

  • metrics-address: (deprecated) The TCP network address where the prometheus metrics endpoint will run (example: :8080, which corresponds to port 8080 on localhost). The default is the empty string, which means the metrics endpoint is disabled.

HTTP endpoint

Both sidecars optionally exposes an HTTP endpoint at address:port, specified by the --http-endpoint argument. When set, these two paths may be exposed:

  • Metrics path, as set by --metrics-path argument (default is /metrics) - both sidecars.
  • Leader election health check at /healthz/leader-election - only in the External Health Monitor Controller. It is recommended to run a liveness probe against this endpoint when leader election is used to kill a external-health-monitor-controller leader that fails to connect to the API server to renew its leadership. See kubernetes-csi/csi-lib-utils#66 for details.

Community, discussion, contribution, and support

Learn how to engage with the Kubernetes community on the community page.

You can reach the maintainers of this project at:

Code of conduct

Participation in the Kubernetes community is governed by the Kubernetes Code of Conduct.

external-health-monitor's People

Contributors

k8s-ci-robot avatar pohly avatar msau42 avatar fengzixu avatar xing-yang avatar ggriffiths avatar nickrenren avatar spiffxp avatar namrata-ibm avatar nikhita avatar mucahitkurt avatar kazimsarikaya avatar ddebroy avatar animeshk08 avatar verult avatar gnufied avatar wozniakjan avatar pensu avatar pengzhisun avatar darkowlzz avatar cyb70289 avatar saad-ali avatar windayski avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.