Coder Social home page Coder Social logo

bmhatfield / riemann-sumd Goto Github PK

View Code? Open in Web Editor NEW
48.0 2.0 10.0 119 KB

Agent for scheduling event generating processes and sending the results to Riemann

License: MIT License

Python 100.00%
riemann python daemon riemann-events nagios-checks monitoring

riemann-sumd's Introduction

Riemann-sumd

Python agent for scheduling event generating processes and sending the results to Riemann

What?

Riemann-sumd is an agent for scheduling 'tasks', such as commands that conform to the Nagios plugin interface, and sending the results to Riemann. There are multiple task interfaces, such as the Nagios plugin interface, a JSON interface over stdout, and a JSON interface from arbitrary URLs.

Why?

While configuring my Riemann install, I noticed that the already-built clients were single-purpose daemons that sent their own events to Riemann. To operationalize such a thing, I'd have to deploy and monitor and maintain a fleet of little processes, which I was not interested in doing. In addition, I'd have to create additional little monitoring daemons that reproduce this functionality.

Instead, I decided that I'd prefer to have a small daemon that scheduled customizable tasks to run, and transformed their output into Riemann events. Additionally, I realized that there's a wealth of monitoring scripts out there that cohere quite nicely to the concept of a Riemann event: Nagios checks! If one could run a Nagios check, capture the return code, and send it (and the check's output and performance data) to Riemann as an event, that could be quite useful!

Configuration

It's a simple daemon with the capability to perform a few different 'types' of tasks on a schedule.

  • nagios: Nagios style tasks (IE; return 0 for OK, 2 for CRITICAL, etc)
  • json: JSON style tasks (Execute a command that returns a JSON list of events over stdout)
  • http_json: JSON retrieved over HTTP (See below for schema)
  • Deprecated: cloudkick: Has been renamed to http_json

The configuration aims to be dead simple: in a /etc/riemann/tasks.d/ directory, create SOMETASK.task with the following YAML-style fields:

# Required
service: "Random State"
arg: 'bash -c "exit $((RANDOM % 4))"'
type: "nagios"

# If omitted, defaults to 60s
ttl: 60

# If omitted, defaults to empty set
tags: ['flapper', 'notify']


# If omitted, defaults to system's hostname
host: "myhost.example.com"

# Set arbitrary attributes, optional
attributes:
	window-size: 3
	contact-email: "[email protected]"

# Assign a specific 'performance data' key to be the 'metric' for the event.
# Must be prepended with "task_", all others will be added as attributes on the event.
# If omitted, the first performance data pair returned by the check is used
metric: task_load5

# Set a grace period on the events sent to Riemann before the expired.
# If omitted, defaults to 5.
ttl_multiplier: 5

# Set a note to be prepended to the description attached to the event. Defaults to ""
note: "SOMERANDOMSTRING" (also settable per-item in cloudkick.json)

Internal Notes

Internally, the scheduler calculates the task's skew, and schedules the next event of this task to run at now + offset - skew. When that deadline is near, the scheduler returns the task, which is started in a subprocess and added to a queue to be examined later. A pool of worker threads pull the next already-running task off the queue, join it, wait for it to complete, and send the results to Riemann.

Dependencies

YAML parser
http://pyyaml.org/wiki/PyYAML
Ubuntu: python-yaml
import yaml

Daemonizing library - implements unix daemon functionality nicely
http://pypi.python.org/pypi/python-daemon/
Ubuntu: python-daemon
import daemon

Riemann client library, depends on 'protobuf'
https://github.com/banjiewen/bernhard
Ubuntu: python-protobuf
Ubuntu: -does not exist-
import bernhard

Requests
http://docs.python-requests.org/en/latest/
Ubuntu: python-requests
import requests

Nagios Plugin Interface

For documentation about the Nagios Plugin Interface, see the Plugin Interface Documentation and the Performance Data Format

HTTP JSON Interface

The JSON structure should contain an entry for each event, as well as metrics and other data:

{
   "status":"All systems go",
    "state":"ok",
    "enabled":true,
    "metrics":[
        {
            "name":"Some Queue Count",
            "state":"ok",
            "value":0,
            "warn_threshold":4500,
            "error_threshold":9000
        },
        {
            "name":"Other Queue Count",
            "state":"ok",
            "value":38,
            "warn_threshold":4500,
            "error_threshold":9000
        }
    ]
}

Packaging riemann-sumd

You can package riemann-sumd for debian using bdist_deb:

python setup.py --command-packages=stdeb.command bdist_deb

riemann-sumd's People

Contributors

bmhatfield avatar goblin avatar michaeldoyle avatar samn avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

riemann-sumd's Issues

Daemon mode doesn't work

When I try to run it as a daemon, it exits immediately. I see this in my logfile:

'module' object has no attribute 'DaemonContext'

(without a trailing \n)

I did some investigation and it seems that the example code at https://pypi.python.org/pypi/python-daemon/ is broken:

~$ ./bin/python2.7 
Python 2.7.3 (default, Aug  1 2012, 05:14:39) 
[GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import daemon
>>> print daemon.__file__
/srv/sumd_ve/local/lib/python2.7/site-packages/daemon.pyc
>>> with daemon.DaemonContext():
...   print "foo"
... 
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'module' object has no attribute 'DaemonContext'
>>> 

Config tags not being passed through

Hiya,

It looks like there's maybe an issue with task tags not being unioned with config tags.
I'm not a python expert but this code snippet seems to not set the unioned result back into the t.tags variable.

loader.py (line 52)
if additional_tags:
t.tags.union(set(additional_tags))

Am I correct or have I missed something?

Python 2.6.5 compatibility

Hi Brian,

I've found a compatibility issue with the dict comprehension in line 276 of lib/task.py when using Python v2.6.5 on Ubuntu 10.04.
Getting a syntax error when trying to run sumd.

Traceback (most recent call last):
File "/opt/sumd/bin/sumd", line 36, in
import loader
File "lib/loader.py", line 5, in
import task
File "lib/task.py", line 276
event.attributes.update({self.attrprefix + name: result["attributes"][name] for name in result["attributes"]})
^
SyntaxError: invalid syntax

Would you like to sumd to be compatible with 2.6.5?
If so, you could opt for this syntax instead.
event.attributes.update(dict((self.attrprefix + name, result["attributes"][name]) for name in result["attributes"]))

Regards,
Cammy.

Time out slow nagios checks

If a nagios check is badly configured or broken, and if it just hangs there doing nothing, it'll cause sumd to lock up after a while.

A quick test using /bin/cat as an arg to a task shows that sumd will spawn 31 cat processes and then do nothing more.

It would be nice if it had some kind of timeout and killed the offending probe and get un-stuck.

Safely Handle Messages "Too Large" for Riemann

In Nagios Task:

2016-01-06 19:15:44,786 - bernhard - ERROR - Exception writing to socket: [Errno 90] Message too long
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/bernhard/__init__.py", line 123, in write
    self.sock.sendto(message, (self.host, self.port))
error: [Errno 90] Message too long
2016-01-06 19:15:44,790 - bernhard - ERROR - Exception writing to socket: [Errno 90] Message too long
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/bernhard/__init__.py", line 123, in write
    self.sock.sendto(message, (self.host, self.port))
error: [Errno 90] Message too long

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.