invisionapp / go-health Goto Github PK

Library for enabling asynchronous health checks in your service

License: MIT License

Go 97.72% Makefile 2.28%

golang-library microservice docker containers health kubernetes opensource mesos

go-health's Introduction

go-health

A library that enables async dependency health checking for services running on an orchestrated container platform such as kubernetes or mesos.

Why is this important?

Container orchestration platforms require that the underlying service(s) expose a "health check" which is used by the platform to determine whether the container is in a good or bad state.

While this can be achieved by simply exposing a /status endpoint that performs synchronous checks against its dependencies (followed by returning a 200 or non-200 status code), it is not optimal for a number of reasons:

It does not scale
- The more dependencies you add, the longer your health check will take to complete (and potentially cause your service to be killed off by the orchestration platform).
- Depending on the complexity of a given dependency, your check may be fairly involved where it is okay for it to take 30s+ to complete.
It adds unnecessary load on yours deps or at worst, becomes a DoS target
- Non-malicious scenario
  - Thundering herd problem -- in the event of a deployment (or restart, etc.), all of your service containers are likely to have their /status endpoints checked by the orchestration platform as soon as they come up. Depending on the complexity of the checks, running that many simultaneous checks against your dependencies could cause at worst the dependencies to experience problems and at minimum add unnecessary load.
  - Security scanners -- if your organization runs periodic security scans, they may hit your /status endpoint and trigger unnecessary dep checks.
- Malicious scenario
  - Loading up any basic HTTP benchmarking tool and pointing it at your /status endpoint could choke your dependencies (and potentially your service).

With that said, not everyone needs asynchronous checks. If your service has one dependency (and that is unlikely to change), it is trivial to write a basic, synchronous check and it will probably suffice.

However, if you anticipate that your service will have several dependencies, with varying degrees of complexity for determining their health state - you should probably think about introducing asynchronous health checks.

How does this library help?

Writing an async health checking framework for your service is not a trivial task, especially if Go is not your primary language.

This library:

Allows you to define how to check your dependencies.
Allows you to define warning and fatal thresholds.
Will run your dependency checks on a given interval, in the background. [1]
Exposes a way for you to gather the check results in a fast and thread-safe manner to help determine the final status of your /status endpoint. [2]
Comes bundled w/ pre-built checkers for well-known dependencies such as Redis, Mongo, HTTP and more.
Makes it simple to implement and provide your own checkers (by adhering to the checker interface).
Allows you to trigger listener functions when your health checks fail or recover using the IStatusListener interface.
Allows you to run custom logic when a specific health check completes by using the OnComplete hook.

[1] Make sure to run your checks on a "sane" interval - ie. if you are checking your Redis dependency once every five minutes, your service is essentially running blind for about 4.59/5 minutes. Unless you have a really good reason, check your dependencies every X seconds, rather than X minutes.

[2] go-health continuously writes dependency health state data and allows you to query that data via .State(). Alternatively, you can use one of the pre-built HTTP handlers for your /healthcheck endpoint (and thus not have to manually inspect the state data).

Example

For full examples, look through the examples dir

Create an instance of health and configure a checker (or two)

import (
	health "github.com/InVisionApp/go-health/v2"
	"github.com/InVisionApp/go-health/v2/checkers"
	"github.com/InVisionApp/go-health/v2/handlers"
)

// Create a new health instance
h := health.New()

// Create a checker
myURL, _ := url.Parse("https://google.com")
myCheck, _ := checkers.NewHTTP(&checkers.HTTPConfig{
    URL: myURL,
})

h.AddChecks([]*health.Config{
    {
        Name:     "my-check",
        Checker:  myCheck,
        Interval: time.Duration(2) * time.Second,
        Fatal:    true,
    },
})

Start the health check

h.Start()

From here on, you can either configure an endpoint such as /healthcheck to use a built-in handler such as handlers.NewJSONHandlerFunc() or get the current health state of all your deps by traversing the data returned by h.State().

Sample /healthcheck output

Assuming you have configured go-health with two HTTP checkers, your /healthcheck output would look something like this:

{
    "details": {
        "bad-check": {
            "name": "bad-check",
            "status": "failed",
            "error": "Ran into error while performing 'GET' request: Get google.com: unsupported protocol scheme \"\"",
            "check_time": "2017-12-30T16:20:13.732240871-08:00"
        },
        "good-check": {
            "name": "good-check",
            "status": "ok",
            "check_time": "2017-12-30T16:20:13.80109931-08:00"
        }
    },
    "status": "ok"
}

Additional Documentation

OnComplete Hook VS IStatusListener

At first glance it may seem that these two features provide the same functionality. However, they are meant for two different use cases:

The IStatusListener is useful when you want to run a custom function in the event that the overall status of your health checks change. I.E. if go-health is currently checking the health for two different dependencies A and B, you may want to trip a circuit breaker for A and/or B. You could also put your service in a state where it will notify callers that it is not currently operating correctly. The opposite can be done when your service recovers.

The OnComplete hook is called whenever a health check for an individual dependency is complete. This means that the function you register with the hook gets called every single time go-health completes the check. It's completely possible to register different functions with each configured health check or not to hook into the completion of certain health checks entirely. For instance, this can be useful if you want to perform cleanup after a complex health check or if you want to send metrics to your APM software when a health check completes. It is important to keep in mind that this hook effectively gets called on roughly the same interval you define for the health check.

Contributing

All PR's are welcome, as long as they are well tested. Follow the typical fork->branch->pr flow.

go-health's People

Contributors

Stargazers

Watchers

Forkers

mitakeck huzichunjohn hapiman forging2012 cluo diorahman schigh nicaurybenitez gaoxiaojun azeezolaniran2016 hadoop835 etsangsplk sauvaget endorama helios-ag shammishailaj maxcnunes jangocheng sysbot gofogo cryptobuks zhu-wenjun developgo dimitarpetrov godeps mattaharish adewoleadenigbagbe unguiculus goodliving gocontrib alan-ma-umg lexicalunit totr adikabintang davidkmn joseldmc g-tool alexliesenfeld standardgalactic ajunlonglive isabella232 f1bonacc1 kartikthakur-22 vijrishabh naqvijafar91 ruslanternovy marcosdy tomgoetz4711 zfg88287508 luis-sousa-pinto becklin7 khestia ripexz yuttasakcom avithe-great mukeshselv

go-health's Issues

Create fake for IHealth interface

Allow fatal checks to recover

As it stands right now, if a check that's configured as fatal fails - the healthcheck will be marked as failed and even if the check recovers - the fatal state will not get flipped back.

This is OK for those who use go-health w/ services that live on an orchestrated platform such as k8s or mesos (where k8s will kill the service when it sees a bad healthcheck), but not good for those who live outside of that - the service in essence will never "recover".

Update go-health so that if a check that's configured as fatal recovers, it first checks to see if the overall state is failed and then flips the failed state back to normal.

GRPC Checkers

Hi Team, Thanks for this awesome library. Is there a way to add grpc checkers with the current package.

Update documentation

Add links to godoc, report, cov. Setup CI.

Indentation for JSON in NewJSONHandlerFunc

When using go-health together with gin-gonic I'm finding it hard to format the JSON output from NewJSONHandlerFunc().
Would it be possible to update the signature of NewJSONHandlerFunc() to include some kind of config? I don't really know what approach that would be the best, but some pseudo examples:

func NewJSONHandlerFunc(h health.IHealth, custom map[string]interface{}, formatConfig *health.FomatConfig) http.HandlerFunc {
    ...

    encoder := json.NewEncoder(...)
    if formatConfig != nil {
        encoder.SetIndent(formatConfig.Prefix, formatConfig.Indent)
    }
    json.Encode(...)
}

Or something simple like:

func NewJSONHandlerFunc(h health.IHealth, custom map[string]interface{}, prefix, indent string) http.HandlerFunc {
    ...
    data, err := json.MarshalIndent(fullBody.data, prefix, indent)
}

Like I said, I don't know the best (go) approach for this. And if this is possible somehow already I would be happy for all the information I can get.

Thanks for a very nice project.

Is this repo maintained?

Is this repo actively maintained?

Uses no longer accessible dependencies

Describe the bug
go-health depends on mongodb-boltdb-mock which is no longer accessible

Although this dependency is not used directly, it stops dependabot from working properly as it appears in the go.sum file of the projects depending on go-health

Could the dependency be removed and an alternative chosen?

Write Redis check

Module migration did not encode major version

Release v2.1.1 cannot be used because the module migration was done incorrectly. Starting at v2, the module path must end in the major version.

See https://github.com/golang/go/wiki/Modules#semantic-import-versioning.

Data race in states handling

Hello,

You may want to review your mutex usage around map[string]State in health.go, as safeGetStates() is not safe: you safely copy a pointer to the map by returning the map value, but it is the access to the individual elements of the map that you should protect (or safely deep-copy the map on each read access, but it may be heavy-handed ;). Confirmed with the go race detector with a dummy check, and polling a /ready route.

func main() {
        h := health.New()

        h.AddChecks([]*health.Config{
                {
                        Name:     "my-check",
                        Checker:  foo{},
                        Interval: time.Duration(1) * time.Second,
                        Fatal:    true,
                },
        })
        h.Start()
        http.Handle("/ready", handlers.NewJSONHandlerFunc(h, nil))
        http.ListenAndServe(":8020", nil)
}

Side-note, I was taking a look at the various health packages, maybe https://github.com/heptiolabs/healthcheck/ could be a good fit for you, it seems simpler (less packages/smaller API) but powerful and bugfree, and less open source packages to maintain (for you) or evaluate (for me) is often better ;)

Best regards,
David

(updated: no need for a 1 ms check interval and using wrk in order to trigger the go race detector, 1s and a single curl is enough ;)

Ideally checkers with external dependencies should live in a different package

For instance, I just need to use the reachable checker and my project doesn't have any mongo dep. But, to use the reachable checker I need to install all dependencies in InVisionApp/go-health/checker and the mongo one has a dependency on github.com/globalsign/mgo. If the checkers that have some external dependency were in a different package it wouldn't force me to have a dependency on something that I'm not using.
No need to move them to a different repository, just having them a subfolder would be enough to avoid this dependency issue.
I can create a PR for it, but it would cause a breaking change in the project.

Suggested packages:

InVisionApp/go-health/checker no external deps
InVisionApp/go-health/checker/mongo
InVisionApp/go-health/checker/redis
InVisionApp/go-health/checker/disk
InVisionApp/go-health/checker/memcache

syntax question

I was looking at the example and saw this:

handlers.NewJSONHandlerFunc(h)

I'm new to golang, so this might just be my lack of experience. Could NewJSONHandlerFunc have been attached to the struct (so it would be h.NewJSONHandlerFunc())?

Write DB check

Metrics Endpoint

Would a pull request accepted that adds prometheus support?

I was thinking of something that returns:

# HELP healthcheck_failed A counter that shows number of failed checks.
# TYPE healthcheck_failed counter
healthcheck_failed{check="ping", fatal="true} 3

Write HTTP Check

add remote server health checker

In many cases, we need to check remote server status, such as cpu usage percentage, memory usage percentage. I hope this project add this feature. thanks

Write examples

Add retryable support to http checker

Wanted to understand more on the ping timeout for the redis healthcheck

👋🏼 I just wanted to know the value of ping timeout for the redis healthcheck. Thank you!

Write Mongo check

Degraded state?

When a non-fatal check fails, the top level status is still "ok". Is the intent that you have to check each component for success in order to determine if the service is actually 100% ok?

Would it be possible to add support for a degraded state or something along the lines of "there are failures, but nothing fatal"?

Add a `StopWithStatus(...)`

Add an optional StopWithStatus(...) method that on top of calling Stop() will also change the status + message of the built-in /healthcheck endpoints.

Persist checkers state?

Hi,

I would like to persist checkers state, so if some check was in the failed state after I restart service I would like to still be in a failed state and to expect recover. I can use map from State() to save that to some JSON file, but I don't see a way to set it for checker before Start(). Is this possible?

Thanks,
Milan