Coder Social home page Coder Social logo

inovex / trovilo Goto Github PK

View Code? Open in Web Editor NEW
16.0 16.0 4.0 54 KB

trovilo collects and prepares files from Kubernetes ConfigMaps for Prometheus & friends

License: Apache License 2.0

Go 89.27% Makefile 9.42% Dockerfile 1.31%
prometheus alertmanager alerts kubernetes monitoring grafana dashboards configmap

trovilo's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

trovilo's Issues

Fix travis CI

Currently the CI run fails in the publish state. This should be fixed.

Refactor trovilo job design

Currently trovilo supports multiple jobs (e.g. to gather information for Prometheus and for Grafana) since trovilo is designed to run inside Kubernetes environments I don't actaully see a benefit in supporting multiple jobs (and adding the complexity). From my perspective trovilo should always be used as a sidecar to the according service like in the Prometheus example: https://github.com/inovex/trovilo/blob/master/examples/k8s/deployment.yaml#L60

Are there any reasons why we should keep to support multiple jobs?

add docs

Add README and useful examples

Possible race condition?

Currently there is a possible race condition in the way trovilo is implemented (AFAIK):

Imagine the following flow:

1.) trovilo get's started
2.) add a ConfigMap with the expected labels for trovilo
3.) trovilo add ConfigMap (or correctly the content of the ConfigMap) on "disk"
4.) tovilo crashes
5.) Delete ConfigMap from above
6.) trovilo recovers

--> If a ConfigMap is deleted during a crash of trovilo the ConfigMap will never be clean up, correct? Since this line will never be called https://github.com/inovex/trovilo/blob/master/cmd/trovilo/main.go#L104 or to be precisly trovilo never checks the initial state of the targetDir.

Allow for a decorator phase / command to i.e. force-tag alerts

Thanks for this very helpful tool!

I'd love to be able to define a decorator that applies changes to the collected data from configmaps before feeding it to i.e. Alertmanager or Grafana.

One very distinct use-case are Prometheus' alert definitions which are collected from multiple Kubernetes namespaces. If one wants to route the alerts based on the source namespace the configmap was picked-up from, this metadata needs to be immutably available. In case of alerts this requires the source to either ensure the PromQL query leaves this as label on the data or have it set an additional label or an annotations containing the namespace for each and every alert. Ensuring this over hundreds of alerts and many different teams and people without maintaining the source namespace info for each alert behind the scenes is prone to fail.

My suggestion is to simply allow a command to run for each configmap collected by trovilo which then receives the Kubernetes metadata of the individual configmap as environment variables, i.e. K8S_METADATA_NAMESPACE, K8S_METADATA_NAME. The command could simply be a call of sed or maybe even a jsonpatch which decorates the source data with additional info.

Running this arbitrary command and simply providing some variables to it does not make trovilo any more domain specific.
But especially in multi-tenancy the namespace might just be the most important piece of information one wants to add / keep on the data that is then given to Alertmanager or Grafana.

Trovilo crashes without helpful error message / cause

After running for hours without any issue, Trovilo sometimes crashes with a very short error message like:

"{"error":"EOF","level":"fatal","msg":"Kubernetes ConfigMap watcher encountered an error. Exit..","time":"2018-08-30T16:04:31Z"}"

Unfortunately there is no indication of what could have causes this and after the container is restarted it works again for a long time until it crashes in the same matter again. While I cannot rule out an influence from consumed configmaps, I am very certain this is not causes but a change in the configmaps. It also happens when Kubernetes resources are not changed at all over a longer period of time.

Maybe some timeout when talking to the API which is not handled well?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.