Coder Social home page Coder Social logo

statsite / statsite Goto Github PK

View Code? Open in Web Editor NEW
1.8K 74.0 245.0 3.25 MB

C implementation of statsd

Home Page: http://statsite.github.io/statsite/

License: Other

Python 25.69% Shell 0.79% Ruby 0.77% C 68.83% JavaScript 0.50% Makefile 0.74% M4 1.43% Ragel 0.97% Dockerfile 0.28%
statsd stats statsite aggregated-metrics

statsite's Introduction

Statsite Build Status

Statsite is a metrics aggregation server. Statsite is based heavily on Etsy's StatsD https://github.com/etsy/statsd, and is wire compatible.

Features

  • Multiple metric types
    • Key / Value
    • Gauges
    • Counters
    • Timers
    • Sets
  • Efficient summary metrics for timer data:
    • Mean
    • Min/Max
    • Standard deviation
    • Median, Percentile 95, Percentile 99
    • Histograms
  • Dynamic set implementation:
    • Exactly counts for small sets
    • HyperLogLog for large sets
  • Included sinks:
    • Graphite
    • InfluxDB
    • Ganglia
    • Librato
    • CloudWatch
    • OpenTSDB
    • HTTP
  • Binary protocol
  • TCP, UDP, and STDIN
  • Fast

Architecture

Statsite is designed to be both highly performant, and very flexible. To achieve this, it implements the stats collection and aggregation in pure C, using an event loop to be extremely fast. This allows it to handle hundreds of connections, and millions of metrics. After each flush interval expires, statsite performs a fork/exec to start a new stream handler invoking a specified application. Statsite then streams the aggregated metrics over stdin to the application, which is free to handle the metrics as it sees fit.

This allows statsite to aggregate metrics and then ship metrics to any number of sinks (Graphite, SQL databases, etc). There is an included Python script that ships metrics to graphite.

Statsite tries to minimize memory usage by not storing all the metrics that are received. Counter values are aggregated as they are received, and timer values are stored and aggregated using the Cormode-Muthukrishnan algorithm from "Effective Computation of Biased Quantiles over Data Streams". This means that the percentile values are not perfectly accurate, and are subject to a specifiable error epsilon. This allows us to store only a fraction of the samples.

Histograms can also be optionally maintained for timer values. The minimum and maximum values along with the bin widths must be specified in advance, and as samples are received the bins are updated. Statsite supports multiple histograms configurations, and uses a longest-prefix match policy.

Handling of Sets in statsite depend on the number of entries received. For small cardinalities (<64 currently), statsite will count exactly the number of unique items. For larger sets, it switches to using a HyperLogLog to estimate cardinalities with high accuracy and low space utilization. This allows statsite to estimate huge set sizes without retaining all the values. The parameters of the HyperLogLog can be tuned to provide greater accuracy at the cost of memory.

The HyperLogLog is based on the Google paper, "HyperLogLog in Practice: Algorithmic Engineering of a State of The Art Cardinality Estimation Algorithm".

Install

The following quickstart will probably work. If not, see INSTALL.md for detailed information.

Download and build from source. This requires autoconf, automake and libtool to be available, available usually through a system package manager. Steps:

$ git clone https://github.com/statsite/statsite.git
$ cd statsite
$ ./autogen.sh
$ ./configure
$ make
$ ./statsite

If you get any errors, you may need to check if all dependencies are installed, see INSTALL.md.

Building the test code may generate errors if libcheck is not available. To build the test code successfully, do the following:

$ cd deps/check-0.10.0/
$ ./configure
$ make
# make install
# ldconfig (necessary on some Linux distros)
$ cd ../../
$ make test

At this point, the test code should build successfully.

Docker

You can build your own image of docker using the Dockerfile

$ git clone https://github.com/statsite/statsite.git
$ cd statsite
$ docker build -t statsite/statsite:latest .
$ docker run statsite/statsite:latest

You can override the configuration via a mount that provide a statsite.conf

$ docker run -v /config/statsite:/etc/statsite statsite/statsite:latest

Or override the configuration with a different path by passing it in the CMD

$ docker run -v /config/statsite:/tmp statsite/statsite:latest -f /tmp/statsite.docker.example

See statsite.docker.conf for a starting point

Usage

Statsite is configured using a simple INI file. Here is an example configuration file:

[statsite]
port = 8125
udp_port = 8125
log_level = INFO
log_facility = local0
flush_interval = 10
timer_eps = 0.01
set_eps = 0.02
stream_cmd = python sinks/graphite.py localhost 2003 statsite

[histogram_api]
prefix=api
min=0
max=100
width=5

[histogram_default]
prefix=
min=0
max=200
width=20

Then run statsite, pointing it to that file::

statsite -f /etc/statsite.conf

A full list of configuration options is below.

Configuration Options

Each statsite configuration option is documented below. Statsite configuration options must exist in the statsite section of the INI file:

  • tcp_port : Integer, sets the TCP port to listen on. Default 8125. 0 to disable.

  • port: Same as above. For compatibility.

  • udp_port : Integer, sets the UDP port. Default 8125. 0 to disable.

  • udp_rcvbuf : Integer, sets the SO_RCVBUF socket buffer in bytes on the UDP port. Defaults to 0 which does not change the OS default setting.

  • bind_address : The address to bind on. Defaults to 0.0.0.0

  • parse_stdin: Enables parsing stdin as an input stream. Defaults to 0.

  • log_level : The logging level that statsite should use. One of: DEBUG, INFO, WARN, ERROR, or CRITICAL. All logs go to syslog, and also stderr when not daemonizing. Default is DEBUG.

  • log_facility : The syslog logging facility that statsite should use. One of: user, daemon, local0, local1, local2, local3, local4, local5, local6, local7. All logs go to syslog.

  • flush_interval : How often the metrics should be flushed to the sink in seconds. Defaults to 10 seconds.

  • timer_eps : The upper bound on error for timer estimates. Defaults to 1%. Decreasing this value causes more memory utilization per timer.

  • set_eps : The upper bound on error for unique set estimates. Defaults to 2%. Decreasing this value causes more memory utilization per set.

  • stream_cmd : This is the command that statsite invokes every flush_interval seconds to handle the metrics. It can be any executable. It should read inputs over stdin and exit with status code 0 on success.

  • aligned_flush : If set, flushes will be aligned on flush_interval boundaries, eg. for a 15 second flush interval the flushes would be aligned to (0,15,30,45) boundaries of every minute. This means the first flush period might be shorter than the flush interval depending on the start time of statsite.

  • input_counter : If set, statsite will count how many commands it received in the flush interval, and the count will be emitted under this name. For example if set to "numStats", then statsite will emit "counter.numStats" with the number of samples it has received.

  • daemonize : Should statsite daemonize. Defaults to 0.

  • pid_file : When daemonizing, where to put the pid file. Defaults to /var/run/statsite.pid

  • binary_stream : Should data be streamed to the stream_cmd in binary form instead of ASCII form. Defaults to 0.

  • use_type_prefix : Should prefixes with message type be added to the messages. Does not affect global_prefix. Defaults to 1.

  • global_prefix : Prefix that will be added to all messages. Defaults to empty string.

  • kv_prefix, gauges_prefix, counts_prefix, sets_prefix, timers_prefix : prefix for each message type. Defaults to respectively: "kv.", "gauges.", "counts.", "sets.", "timers.". Values will be ignored if use_type_prefix set to 0.

  • extended_counters : If enabled, the counter output will be extended to include the rate. Defaults to false.

  • legacy_extended_counters : If enabled, the meaning of the "count" generated metrics on the counters would be the number of metrics received. If false, it would be the sum of the values. This is done for backwards compatibility. Defaults to true.

  • timers_include : Allows you to configure which timer metrics to include through a comma separated list of values. Supported values include count, mean, stdev, sum, sum_sq, lower, upper, rate, median and sample_rate. If this option is not specified then all values except median will be included by default. median will be included if quantiles include 0.5

  • prefix_binary_stream : If enabled, the keys streamed to a the stream_cmd when using binary_stream mode are also prefixed. By default, this is false, and keys do not get the prefix.

  • quantiles : A comma-separated list of quantiles to calculate for timers. Defaults to 0.5, 0.95, 0.99

In addition to global configurations, statsite supports histograms as well. Histograms are configured one per section, and the INI section must start with the word histogram. These are the recognized options:

  • prefix : This is the key prefix to match on. The longest matching prefix is used. If the prefix is blank, it is the default for all keys.

  • min : Floating value. The minimum bound on the histogram. Values below this go into a special bucket containing everything less than this value.

  • max: Floating value. The maximum bound on the histogram. Values above this go into a special bucket containing everything more than this value.

  • width : Floating value. The width of each bucket between the min and max.

Each histogram section must specify all options to be valid.

Protocol

By default, Statsite will listen for TCP and UDP connections. A message looks like the following (where the flag is optional)::

key:value|type[|@flag]

Messages must be terminated by newlines (\n).

Currently supported message types:

  • kv - Simple Key/Value.
  • g - Gauge, similar to kv but only the last value per key is retained
  • ms - Timer.
  • h - Alias for timer
  • c - Counter.
  • s - Unique Set

After the flush interval, the counters and timers of the same key are aggregated and this is sent to the store.

Gauges also support "delta" updates, which are supported by prefixing the value with either a + or a -. This implies you can't explicitly set a gauge to a negative number without first setting it to zero.

Multiple metrics may be batched together in one UDP packet a separated by a newline (\n) character. Care must be taken to keep UDP data size smaller than the network MTU minus 28 bytes for IP/UDP headers. Statsite supports a maximum UDP data length of 1500 bytes.

Examples:

The following is a simple key/value pair, in this case reporting how many queries we've seen in the last second on MySQL::

mysql.queries:1381|kv

The following is a timer, timing the response speed of an API call::

api.session_created:114|ms

The next example increments the "rewards" counter by 1::

rewards:1|c

Here we initialize a gauge and then modify its value::

inventory:100|g
inventory:-5|g
inventory:+2|g

Sets count the unique items, so if statsite gets::

users:abe|s
users:zoe|s
users:bob|s
users:abe|s

Then it will emit a count 3 for the number of uniques it has seen.

Writing Statsite Sinks

Statsite ships with graphite, librato, gmetric, and influxdb sinks, but ANY executable or script can be used as a sink. The sink should read its inputs from stdin, where each metric is in the form::

key|val|timestamp\n

Each metric is separated by a newline. The process should terminate with an exit code of 0 to indicate success.

Here is an example of the simplest possible Python sink:

#!/usr/bin/env python
import sys

lines = sys.stdin.read().split("\n")
metrics = [l.split("|") for l in lines]

for key, value, timestamp in metrics:
    print key, value, timestamp

Binary Protocol

In addition to the statsd compatible ASCII protocol, statsite includes a lightweight binary protocol. This can be used if you want to make use of special characters such as the colon, pipe character, or newlines. It is also marginally faster to process, and may provide 10-20% more throughput.

Each command is sent to statsite over the same ports with this header:

<Magic Byte><Metric Type><Key Length>

Then depending on the metric type, it is followed by either:

<Value><Key>
<Set Length><Key><Set Key>

The "Magic Byte" is the value 0xaa (170). This switches the internal processing from the ASCII mode to binary. The metric type is one of:

  • 0x1 : Key value / Gauge
  • 0x2 : Counter
  • 0x3 : Timer
  • 0x4 : Set
  • 0x5 : Gauge
  • 0x6 : Gauge Delta update

The key length is a 2 byte unsigned integer with the length of the key, INCLUDING a NULL terminator. The key must include a null terminator, and it's length must include this.

If the metric type is K/V, Counter or Timer, then we expect a value and a key. The value is a standard IEEE754 double value, which is 8 bytes in length. The key is provided as a byte stream which is Key Length long, terminated by a NULL (0) byte.

If the metric type is Set, then we expect the length of a set key, provided like the key length. The key should then be followed by an additional Set Key, which is Set Length long, terminated by a NULL (0) byte.

All of these values must be transmitted in Little Endian order.

Here is an example of sending ("Conns", "c", 200) as hex:

0xaa 0x02 0x0600 0x0000000000006940 0x436f6e6e7300

Note: The binary protocol does not include support for "flags" and resultantly cannot be used for transmitting sampled counters.

Binary Sink Protocol

It is also possible to have the data streamed to be represented in a binary format. Again, this is used if you want to use the reserved characters. It may also be faster.

Each command is sent to the sink in the following manner:

<Timestamp><Metric Type><Value Type><Key Length><Value><Key>[<Count>]

Most of these are the same as the binary protocol. There are a few. changes however. The Timestamp is sent as an 8 byte unsigned integer, which is the current Unix timestamp. The Metric type is one of:

  • 0x1 : Key value
  • 0x2 : Counter
  • 0x3 : Timer
  • 0x4 : Set
  • 0x5 : Gauge

The value type is one of:

  • 0x0 : No type (Key/Value)
  • 0x1 : Sum (Also used for Sets)
  • 0x2 : Sum Squared
  • 0x3 : Mean
  • 0x4 : Count
  • 0x5 : Standard deviation
  • 0x6 : Minimum Value
  • 0x7 : Maximum Value
  • 0x8 : Histogram Floor Value
  • 0x9 : Histogram Bin Value
  • 0xa : Histogram Ceiling Value
  • 0xb : Count Rate (Sum / Flush Interval)
  • 0xc : Sample Rate (Count / Flush Interval)
  • 0x80 OR percentile : If the type OR's with 128 (0x80), then it is a percentile amount. The amount is OR'd with 0x80 to provide the type. For example (0x80 | 0x32) = 0xb2 is the 50% percentile or medium. The 95th percentile is (0x80 | 0xdf) = 0xdf.

The key length is a 2 byte unsigned integer representing the key length terminated by a NULL character. The Value is an IEEE754 double. Lastly, the key is a NULL-terminated character stream.

The final <Count> field is only set for histogram values. It is always provided as an unsigned 32 bit integer value. Histograms use the value field to specify the bin, and the count field for the entries in that bin. The special values for histogram floor and ceiling indicate values that were outside the specified histogram range. For example, if the min value was 50 and the max 200, then HISTOGRAM_FLOOR will have value 50, and the count is the number of entires which were below this minimum value. The ceiling is the same but visa versa. For bin values, the value is the minimum value of the bin, up to but not including the next bin.

To enable the binary sink protocol, add a configuration variable binary_stream to the configuration file with the value yes. An example sink is provided in sinks/binary_sink.py.

statsite's People

Contributors

albertored avatar armon avatar bnoordhuis avatar breml avatar c4milo avatar choplin avatar djmb avatar dpaneda avatar filippog avatar freels avatar gihad avatar hanshasselberg avatar jjneely avatar johnkeates avatar kcwong-verseon avatar kikitux avatar kuba-- avatar melkor217 avatar mheffner avatar n0coast avatar rounoff avatar sleepybishop avatar stopdropandrew avatar talebook avatar tantra35 avatar taylorchu avatar tuxmonteiro avatar ualtinok avatar veetow avatar yvlasenko avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

statsite's Issues

Sinks that exit very quickly show as exiting with code 127

I have created a sink in Go which processes the metrics very fast. I have to add a delay before exiting to keep statsite from thinking the sink failed.

Here is the output when I set the delay to 95 milliseconds, which is right on the edge of triggering the issue for my program:

Aug  2 17:12:43 mac-pro.local statsite[94176] <Info>: Starting statsite.
Aug  2 17:12:43 mac-pro.local statsite[94176] <Info>: stdin is disabled
Aug  2 17:12:43 mac-pro.local statsite[94176] <Info>: Listening on tcp ':::8125'
Aug  2 17:12:43 mac-pro.local statsite[94176] <Info>: Listening on udp ':::8125'.
2014/08/02 17:12:53 nothing
2014/08/02 17:13:03 nothing
2014/08/02 17:13:13 nothing
2014/08/02 17:13:23 nothing
2014/08/02 17:13:33 nothing
2014/08/02 17:13:43 nothing
2014/08/02 17:13:53 nothing
2014/08/02 17:14:03 nothing
2014/08/02 17:14:13 nothing
Aug  2 17:14:13 mac-pro.local statsite[94176] <Warning>: Streaming command exited with status 127
2014/08/02 17:14:23 nothing
2014/08/02 17:14:33 nothing
2014/08/02 17:14:43 nothing
Aug  2 17:14:43 mac-pro.local statsite[94176] <Warning>: Streaming command exited with status 127
2014/08/02 17:14:53 nothing
2014/08/02 17:15:03 nothing
Aug  2 17:15:03 mac-pro.local statsite[94176] <Warning>: Streaming command exited with status 127
2014/08/02 17:15:13 nothing
2014/08/02 17:15:23 nothing
2014/08/02 17:15:33 nothing
2014/08/02 17:15:43 nothing
Aug  2 17:15:43 mac-pro.local statsite[94176] <Warning>: Streaming command exited with status 127
2014/08/02 17:15:53 nothing
Aug  2 17:15:53 mac-pro.local statsite[94176] <Warning>: Streaming command exited with status 127
2014/08/02 17:16:03 nothing
Aug  2 17:16:03 mac-pro.local statsite[94176] <Warning>: Streaming command exited with status 127
2014/08/02 17:16:13 nothing
2014/08/02 17:16:23 nothing
2014/08/02 17:16:33 nothing
Aug  2 17:16:33 mac-pro.local statsite[94176] <Warning>: Streaming command exited with status 127
2014/08/02 17:16:43 nothing
2014/08/02 17:16:53 nothing
2014/08/02 17:17:03 nothing
2014/08/02 17:17:13 nothing
Aug  2 17:17:13 mac-pro.local statsite[94176] <Warning>: Streaming command exited with status 127
2014/08/02 17:17:23 nothing
2014/08/02 17:17:33 nothing
2014/08/02 17:17:43 nothing
Aug  2 17:17:43 mac-pro.local statsite[94176] <Warning>: Streaming command exited with status 127
2014/08/02 17:17:53 nothing
Aug  2 17:17:53 mac-pro.local statsite[94176] <Warning>: Streaming command exited with status 127
2014/08/02 17:18:03 nothing
2014/08/02 17:18:13 nothing
2014/08/02 17:18:23 nothing
2014/08/02 17:18:33 nothing
2014/08/02 17:18:43 nothing
2014/08/02 17:18:53 nothing

Here is the actual sink code:

package main

import (
    "bufio"
    "encoding/binary"
    "io"
    "log"
    "math"
    "os"
    "time"
)

func main() {
    stdin := bufio.NewReader(os.Stdin)
    var buf [20]byte
    for done := false; !done; {
        n, err := stdin.Read(buf[:])
        if err == io.EOF && n == 0 {
            log.Printf("nothing")
            break
        }
        if err != nil {
            log.Panicf("unable to read slice: %s", err)
        } else if n < 20 {
            log.Panicf("unable to read entire header, only %d bytes", n)
        }
        done = parseLine(buf, stdin)
    }
    time.Sleep(95 * time.Millisecond)
}

func parseFloat64(b []byte) float64 {
    return math.Float64frombits(binary.LittleEndian.Uint64(b))
}

func parseLine(header [20]byte, r io.Reader) (done bool) {
    timestamp := binary.LittleEndian.Uint64(header[:8])
    metricType := header[8]
    valueType := header[9]
    keyLength := binary.LittleEndian.Uint16(header[10:12])
    value := parseFloat64(header[12:20])
    key := make([]byte, keyLength)
    n, err := r.Read(key)
    if err != nil {
        log.Panicf("read too short: %s", err)
    } else if n < int(keyLength) {
        log.Panicf("read too short: %d < %d", n, keyLength)
    }
    var count [4]byte
    n, err = r.Read(count[:])
    if err == io.EOF {
        done = true
    }
    log.Printf("%d %c %c %d %f %s %d", timestamp, metricType, valueType, keyLength, value, key[:keyLength-1], binary.LittleEndian.Uint32(count[:]))
    return
}

And here is the statsite config:

[statsite]
tcp_port = 8125
udp_port = 8125
log_level = INFO
flush_interval = 10
timer_eps = 0.01
set_eps = 0.02
global_prefix = "local."
extended_counters = 1
binary_stream = yes
stream_cmd = ./my-sink

Debug port

Hi,
did you think about something like debug port or health-check endpoint?

Having debug port would be nice (where you can telnet and see what's going on; how statsite streams metrics, etc.), but currently (on my private branch) I added simple health-check.
You can telnet on defined port and statsite will dump 'quasi-json' with status and size of aggregated metrics and close a connection.

Let me know what do you think,
Thanks.
--kuba.

Statsite crash report

One of my statsite daemons crashed on me today, writing something like this to /var/log/messages:

Apr 10 01:31:42 hostname statsite[31303]: Failed to write() to connection [6]! Bad file descriptor.
Apr 10 01:31:42 hostname statsite[31303]: *** glibc detected *** /path/to/statsite/statsite: corrupted double-linked list: 0x00007ffff071e1b0 ***

Relevant sections of my statsite.conf:

[statsite]
port = 8125
udp_port = 8125
log_level = INFO
flush_interval = 60
timer_eps = 0.001
daemonize = true

The input counter reported 113k for this time period. I don't think I can survive a DEBUG log_level. Is there some more detail I can provide when these types of crashes occur?

Thank you for your time!

Cluster Support

I have a federated graphite cluster that uses consistent hashing setup as follows:

Graphite Cluster

My statsite instances flush every 10 seconds.

I want to be able to load-balance requests to any node in the cluster. The problem is that samples of any given metric will most likely be sent to more than one statsite instance within the 10 second flush interval. When the statsite instances flush there will be a race condition that will cause colliding metrics to overwrite one another. I toyed with the idea of using aggregators to address this issue. Aggregators would work fine for counts and kv as because I could sum the counts and average the kv metrics. Dealing with the timing stats however is not so straight forward. I would have write my own weighted average method. Additionally, I would have to mess with the consistent hashing mechanism to make sure that all the stats derived from a specific timing metric arrive at the same aggregator. Even with all this in place I am still not sure what to do with the standard deviations and percentile stats. I would probably just have to throw them away. This is obviously not a clean or ideal solution.

Alternatively, I decided it would be much better to do consistent hashing before statsite aggregates the metrics. This way I don't have to worry about merging aggregated results from multiple statsite instances. This could be handled on the client side but I really want to address this issue on the server side.

My plan is to extend your statsite to handle the consistent hashing and relaying of metrics to the appropriate statsite instance.

I would add a couple parameters to the config to enable relaying and to specify the other statsite destinations.

Whenever a metric is recieved before processing it, I would hash on the metric name and then if necessary forward it to another statsite instance. If the stat does not need to be forwarded it would be processed as normal.

Do you accept pull request?

Also, have you ever run into a similar situation or can you think of any pitfalls to this approach?

newlines shouldn't be required for UDP packets

UDP packets appear to be discarded if they don't include a newline. In checking 2 different Python client implementations (including the one I use, from py-statsd), neither adds a newline. I doubt it's common.

It might be a good idea to automatically add a newline to the end of received UDP packets when writing to the buffer?

sinks/graphite.py - invalid line received from client 127.0.0.1:60773, ignoring

Hello,

I have just installed statsite and I am having problems getting it to work, statsite start correctly but no metrics are stored/send to graphite the issue seems to be that stream cmd "sinks/graphite.py" sendsa an invalid line.

graphite/storage/log/carbon-cache/carbon-cache-a/listener.log
03/07/2012 11:22:53 :: invalid line received from client 127.0.0.1:60772, ignoring
03/07/2012 11:22:53 :: MetricLineReceiver connection with 127.0.0.1:60772 closed clean
03/07/2012 11:22:58 :: invalid line received from client 127.0.0.1:60773, ignoring
03/07/2012 11:22:58 :: MetricLineReceiver connection with 127.0.0.1:60773 closed cleanly

Do you know what could be causing this issue?

This is my config file:
statsite.conf
[statsite]
port = 8125
udp_port = 8125
log_level = INFO
flush_interval = 10
timer_eps = 0.01
stream_cmd = python sinks/graphite.py localhost 2003 stats

Best regards and thanks in advance for any help,
Sebastian

Why do yu expect a \n from a single command?

In the statsd protocol, \n is only used for bulk sending.

Why your documentation says :

Messages must be terminated by newlines (\n).

With the python "statsd" client, it doesn't works, \n is only used with piped commands.

A few documentation additions

Just a few modifications to README.md to save someone some searching:

  1. I needed to add --egg to pip install Scons.
  2. I needed to define SCONS_LIB_DIR in order to get scons to run.

Add install targets into scons

I was trying to build a deb package using debhelper built-in support for scons, and the resulting package was empty.

Looking further it appears that scons script is missing install targets for the resulting binary and sink commands. We've worked around this issue by listing the required files straight in debian/install file, but Debian policy suggests against it, so It would be nice to have install targets added upstream.

Bind to STDIO instead of TCP/UDP

It'd be really nice to be able to run statsite as a subprocess of my application that I could talk to over stdio (i.e. pipes). Additionally, being able to disable the TCP and UDP ports while in this mode would be helpful.

Several cores to statsite

Hi!

We want to use statsite instead of statsd. But i have one question. Will statsite use all cores or just one, when it will need(on high load)?

Thanks.

per-second metrics

Statsd gives you the rate/s independent of the flush interval, which would definitely be nice, I actually had thought the counters here were per/second and totally misread our graphs haha. Keeping everything per-sec for timers/counters would be really helpful. Snippet from statsd:

current_timer_data["count_ps"] = timer_counters[key] / (flushInterval / 1000);

mention or fix dependency on C++ compiler

Funny problem that took me a while to debug... on a fresh box, I installed gcc.

I had a funny fail recently compiling statsite on a fresh box.

The build ended in

gcc -o src/statsite.o -c -g -std=c99 -D_GNU_SOURCE -Wall -Werror -Wstrict-aliasing=0 -O3 -pthread -Ideps/inih/ -Ideps/libev/ -Isrc/ src/statsite.c
o deps/murmurhash/MurmurHash3.o -c -fno-exceptions -O3 -Ideps/murmurhash deps/murmurhash/MurmurHash3.cpp
sh: 1: o: not found
ar rc libmurmur.a deps/murmurhash/MurmurHash3.o
ar: deps/murmurhash/MurmurHash3.o: No such file or directory
scons: *** [libmurmur.a] Error 1
scons: building terminated because of errors.

?? "o not found"

it's because I didn't have C++ compiler installed. lol.

Anyways, I don't know much about scons, but a hard fail saying "hey you need a C++ compiler" would be cool.

It would 15m work to turn the .cpp into a legit C file. That I do know how, but not sure if you want to go that direction. Let me know, and I'll be happy to produce a patch.

thanks!

staatsite not accepting all metrics

Hi,

I have started using statsite to collect metrics from diamond , statiste send to graphite at 30 sec flush interval. i collect and send to 3500 metrics to statsite at 30 sec interval. but seems like statsite doesn't accept all the metrics and same is not showing up in graphite.

i tried debugging the logs of graphite, diamond and statite and observed that diamond is sending missing metrics to statsite but statsite is not sending it to graphite.

could you help me what could go wrong and is there way to scale so that statsite can accept more metrics.

my statiste conf file

{
statsite]
port = 8125
udp_port = 8125
log_level = INFO
flush_interval = 30
timer_eps = 0.01
stream_cmd = python sinks/graphite.py graphite-app-hostname 2003

Support opentsdb

It would be great if statsite supported opentsdb. The solutions I found were to proxy it..

clean up old stats from statsite

Hi, with statsd/graphite, we can telnet to a port and clean up old stats so statsd does not continuously send 0 to older metric to carbon. It's not obvious to me how we could do that with statsite. I deleted some older metric from /opt/graphite/.../whisper/..., but they show up pretty quickly again. Here is the stackoverflow link to how to fix it with statsd:

http://stackoverflow.com/questions/15501677/deleted-empty-graphite-whisper-files-automatically-re-generating

Thanks for your help.

Using multiple sinks on different flush_interval's

I was wondering if there is any support for having multiple sinks with different flush_interval's (planned/finished).

I'm trying to dump stats to 2 places (graphite/sql), and I would like to dump to graphite every 5 seconds, but to sql every 1 minute. I can't seem to find any information on this, but I'm assuming it's not supported.

If there are no plans to support this, I am planning on chaining 2 statsite daemons.
Statsite Daemon#1 will have a flush_interval of 5s and will execute a python script to fork data to graphite and Statsite Daemon#2. Statsite Daemon#2 will have a flush_interval of 1m and write to SQL. Thoughts on this?

Sampling Question

Looking through the code and running a few test, it would appear that this implementation of statsite does not support the @{sample-rate} sampling flag even though it is shown in the protocol.

There are no usage examples with flag so I am wondering if this functionality was dropped. Are there any plans to support the sampling flag?

Compiling errors - CentOS

Hello,

I am trying to build statsite on my server and I am getting compilation errors.

Do you know what could be causing these errors?

cat /etc/redhat-release
Red Hat Enterprise Linux Server release 5.8 (Tikanga) (x86_64)

scons

scons: Reading SConscript files ...
scons: done reading SConscript files.
scons: Building targets ...
gcc -o src/hashmap.o -c -std=c99 -D_GNU_SOURCE -Wall -Werror -O3 -pthread -Ideps/inih/ -Ideps/libev/ -Isrc/ src/hashmap.c
gcc -o src/heap.o -c -std=c99 -D_GNU_SOURCE -Wall -Werror -O3 -pthread -Ideps/inih/ -Ideps/libev/ -Isrc/ src/heap.c
gcc -o src/cm_quantile.o -c -std=c99 -D_GNU_SOURCE -Wall -Werror -O3 -pthread -Ideps/inih/ -Ideps/libev/ -Isrc/ src/cm_quantile.c
cc1: warnings being treated as errors
src/cm_quantile.c: In function 'cm_insert':
src/cm_quantile.c:251: warning: dereferencing type-punned pointer will break strict-aliasing rules
src/cm_quantile.c:271: warning: dereferencing type-punned pointer will break strict-aliasing rules
src/cm_quantile.c:272: warning: dereferencing type-punned pointer will break strict-aliasing rules
src/cm_quantile.c:290: warning: dereferencing type-punned pointer will break strict-aliasing rules
src/cm_quantile.c:291: warning: dereferencing type-punned pointer will break strict-aliasing rules
scons: *** [src/cm_quantile.o] Error 1
scons: building terminated because of errors.

Best regards,
Sebas

Stop statsite

Hi,

If i use kill to stop statsite, it keeps using the same port. starting again falls with error, port already in use. Is there any clean way to stop statsite.

--Sandeep

Sinks can filter percentiles

Sinks can filter out given percentiles if the user is not interested in them. This came up specifically in reference to the Librato backend.

For a user with a single sink, it might be nice to have this be a global statsite config option as well?

/cc @rcrowley

intermittent errors sending to graphite

Hi,

statite is sending stats to graphite every 10 sec.
I am getting intermittent errors on statsite.

ERROR:statsite.graphitestore:Error while flushing to graphite. Reattempting...
Traceback (most recent call last):
File "sinks/graphite.py", line 72, in _write_metric
self.sock.sendall(metric)
File "", line 1, in sendall
error: (32, 'Broken pipe')
CRITICAL:statsite.graphitestore:Failed to flush to Graphite! Gave up after 3 attempts.
ERROR:statsite.graphitestore:Error while flushing to graphite. Reattempting...
Traceback (most recent call last):
File "sinks/graphite.py", line 72, in _write_metric
self.sock.sendall(metric)
File "", line 1, in sendall
error: (32, 'Broken pipe')
ERROR:statsite.graphitestore:Error while flushing to graphite. Reattempting...
Traceback (most recent call last):
File "sinks/graphite.py", line 72, in _write_metric
self.sock.sendall(metric)
File "", line 1, in sendall
error: (32, 'Broken pipe')
ERROR:statsite.graphitestore:Error while flushing to graphite. Reattempting...
Traceback (most recent call last):
File "sinks/graphite.py", line 72, in _write_metric
self.sock.sendall(metric)
File "", line 1, in sendall
error: (32, 'Broken pipe')

Only uses LOCAL0 log facility

Just as the log_level is a configurable option in statsite.conf, could the log_facility also be an option so it can be changed from LOCAL0 to whatever is preferred?

I would be willing to contribute a PR for this. Thoughts?

Support for histograms

Any ideas on implementing histogram into statsite? This is the one thing statsd provides (relatively recently, in the last 3-4 months) that this implementation does not provide yet.

Failed to recv() from connection [7]! Bad address.

When I start statsite and send UDP messages to it (eg. "cn.nginx.latency:45|ms\n"), I see the following message on stdout:

statsite[7686]: Failed to recv() from connection [7]! Bad address.

Digging in a bit more, these start once statsd receives a malformed stat on the udp port. For example:

my.stats.are.bad

With no value causes close_client_connection(handle->conn); to be called on the UDP socket in conn_handler.c:308. This causes all further recv() calls to that socket to fail.

How to ignore specific stats

Hi,

I have been using statsite and liked so far. I have a question.

Is there any way to ignore specific metrics or metrics matching patterns being recorded to statsite.

What info to expect from DEBUG log_level?

log_level = DEBUG

I see this and no further debug messages:

./statsite -f /etc/statsite.conf

statsite[633]: Starting statsite.

statsite is accepting data and sinking to graphite just fine but I need debug to see incoming messages. Any tips?

add delete idle stats feature

A lot of other statsd implementation have a feature to trim the noise on empty counters and gauges that haven't changed since last flush.

From statsd-nodejs config

deleteIdleStats:  don't send values to graphite for inactive counters, sets, gauges, or timers
                    as opposed to sending 0.  For gauges, this unsets the gauge (instead of sending
                    the previous value). Can be individually overriden. [default: false]

Documentation: UDP vs TCP

In the readme:

"tcp_port : Integer, sets the tcp port to listen on. Default 8125.

udp_port : Integer, sets the udp port. Currently listened on but otherwise unused. Default 8125."

To me this implies that statsite only supports TCP - but my local testing would suggest that actually, it only supports UDP. Some correction or clarity in the docs would be appreciated - I think I have a working setup, but it'd be nice to know if I'm doing it right or if things are working by some other coincidence :)

make error

scons statsite
scons: Reading SConscript files ...
ValueError: zero length field name in format:
File "/media/ephemeral0/deployment/lokesh/statsite/SConstruct", line 3:
envmurmur = Environment(CPATH = ['deps/murmurhash/'], CFLAGS="-std=c99 -O3")
File "/usr/lib/scons-2.3.2/SCons/Environment.py", line 1003:
apply_tools(self, tools, toolpath)
File "/usr/lib/scons-2.3.2/SCons/Environment.py", line 107:
env.Tool(tool)
File "/usr/lib/scons-2.3.2/SCons/Environment.py", line 1787:
tool(self)
File "/usr/lib/scons-2.3.2/SCons/Tool/init.py", line 183:
self.generate(env, _args, *_kw)
File "/usr/lib/scons-2.3.2/SCons/Tool/default.py", line 41:
SCons.Tool.Tool(t)(env)
File "/usr/lib/scons-2.3.2/SCons/Tool/init.py", line 183:
self.generate(env, _args, *_kw)
File "/usr/lib/scons-2.3.2/SCons/Tool/dmd.py", line 131:
env['DLIBCOM'] = '$DLIB $_DLIBFLAGS {} $TARGET $SOURCES $_DLIBFLAGS'.format('-c' if env['PLATFORM'] == 'win32' else '')
make: *** [build] Error 2

statsd Gauge compatability

Currently gauges are just an alias for KV. In statsite as far as I can see, it just passes through every input value, with prefix and timestamp added when flushed.

But in statsd gauges are supposed to be a 'sticky' value, thus only the latest value is sent, and it is kept between flushes. (This last part can disabled though).

For a graphite backend, I guess it will just overwrite the value with the latest as all of them have the same timestamp. But there is no need for actually storing and tracking all values?

Defaults to only bind IPv6

This tripped me up for a little while until I ran an lsof. Perhaps tweak the docs a bit? I can send a PR if it would help.

On Ubuntu 14.04
Commit 4d99c63

With defaults:

$ lsof -np `pgrep statsite`
COMMAND   PID  USER   FD   TYPE             DEVICE SIZE/OFF    NODE NAME
statsite 7627 isaac  cwd    DIR               0,34     4096 2024014 /home/isaac/programming/statsite
statsite 7627 isaac  rtd    DIR                8,3     4096       2 /
statsite 7627 isaac  txt    REG               0,34   411654 2022370 /home/isaac/programming/statsite/statsite
statsite 7627 isaac  mem    REG                8,3  1845024 1970238 /lib/x86_64-linux-gnu/libc-2.19.so
statsite 7627 isaac  mem    REG                8,3    31792 1970366 /lib/x86_64-linux-gnu/librt-2.19.so
statsite 7627 isaac  mem    REG                8,3   141574 1970358 /lib/x86_64-linux-gnu/libpthread-2.19.so
statsite 7627 isaac  mem    REG                8,3  1071552 1970288 /lib/x86_64-linux-gnu/libm-2.19.so
statsite 7627 isaac  mem    REG                8,3   149120 1970214 /lib/x86_64-linux-gnu/ld-2.19.so
statsite 7627 isaac    0u   CHR             136,15      0t0      18 /dev/pts/15
statsite 7627 isaac    1u   CHR             136,15      0t0      18 /dev/pts/15
statsite 7627 isaac    2u   CHR             136,15      0t0      18 /dev/pts/15
statsite 7627 isaac    3u  unix 0x0000000000000000      0t0 3245582 socket
statsite 7627 isaac    4u  0000                0,9        0    6333 anon_inode
statsite 7627 isaac    5u  0000                0,9        0    6333 anon_inode
statsite 7627 isaac    6u  IPv6            3245583      0t0     TCP *:8125 (LISTEN)
statsite 7627 isaac    7u  IPv6            3245584      0t0     UDP *:8125 

With bind_address = 127.0.0.1

$ lsof -n | grep 8125                                                                                                                                                                                                                                   
statsite  12388            isaac    6u     IPv4            3322117       0t0       TCP 127.0.0.1:8125 (LISTEN)
statsite  12388            isaac    7u     IPv4            3322118       0t0       UDP 127.0.0.1:8125 

build error

I tried to build with SCons, but I get this:

gcc -o src/statsite.o -c -std=c99 -D_GNU_SOURCE -Wall -Werror -O3 -pthread -Ideps/inih/ -Ideps/libev/ -Isrc/ src/statsite.c
o deps/murmurhash/MurmurHash3.o -c -fno-exceptions -O3 -Ideps/murmurhash deps/murmurhash/MurmurHash3.cpp
sh: o: command not found
ar rc libmurmur.a deps/murmurhash/MurmurHash3.o
ar: deps/murmurhash/MurmurHash3.o: No such file or directory
scons: *** [libmurmur.a] Error 1
scons: building terminated because of errors.

If i run manual:
gcc -o deps/murmurhash/MurmurHash3.o -c -fno-exceptions -O3 -Ideps/murmurhash deps/murmurhash/MurmurHash3.cpp

I get:
gcc: error trying to exec 'cc1plus': execvp: No such file or directory

Gauges are broken and are incompatible with statsd

Gauges in statsite are broken - they are not sent every flush interval and they do not support deltas like it mentions in the documentation (+/- values).

I sent a test gauge (test_gauge:123|g) to both statsD and statsite, the following are the results

StatsD

[danny@dannyfallon ~/src/statsd (master)]$ cat config.js 
{
  graphitePort: 2004
, graphiteHost: "127.0.0.1"
, port: 8125
, flushInterval: 10000
, dumpMessages: true
, debug: true
, backends: [ "./backends/graphite" ]
}

[danny@dannyfallon ~/src/statsd (master)]$ node stats.js config.js 
19 Jun 15:05:29 - reading config file: config.js
19 Jun 15:05:29 - server is up
19 Jun 15:05:29 - DEBUG: Loading backend: ./backends/graphite
19 Jun 15:05:39 - DEBUG: numStats: 2
19 Jun 15:05:48 - DEBUG: test_gauge:123|g
19 Jun 15:05:49 - DEBUG: numStats: 4
19 Jun 15:05:59 - DEBUG: numStats: 4
...

At the same time I was using tcpdump to see what was going to Graphite and I see the gauge is sent every flush interval:

[danny@dannyfallon ~/src/statsd (master)]$ sudo tcpdump -l -t -A -s0 dst port 2004 | grep gauges
tcpdump: data link type PKTAP
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on pktap, link-type PKTAP (Packet Tap), capture size 65535 bytes
stats.gauges.test_gauge 123 1403186760
stats.gauges.test_gauge 123 1403186770
stats.gauges.test_gauge 123 1403186780
stats.gauges.test_gauge 123 1403186790
stats.gauges.test_gauge 123 1403186800
stats.gauges.test_gauge 123 1403186810
...

Statsite

[danny@dannyfallon ~/src/statsite (master)]$ cat statsite.conf
[statsite]
port = 8125
udp_port = 8125
flush_interval = 10
timer_eps = 0.01
set_eps = 0.02
stream_cmd = python sinks/graphite.py 127.0.0.1 2004

[danny@dannyfallon ~/src/statsite (master)]$ ./statsite -f statsite.conf
Jun 19 15:18:28 dannyfallon.local statsite[51551] <Info>: Starting statsite.
Jun 19 15:18:28 dannyfallon.local statsite[51551] <Info>: stdin is disabled
Jun 19 15:18:28 dannyfallon.local statsite[51551] <Info>: Listening on tcp ':::8125'
Jun 19 15:18:28 dannyfallon.local statsite[51551] <Info>: Listening on udp ':::8125'.
Jun 19 15:19:55 dannyfallon.local statsite[51551] <Debug>: Accepted client connection: 0.0.0.0 50940 [8]
Jun 19 15:19:55 dannyfallon.local statsite[51551] <Debug>: Closed client connection. [8]
Jun 19 15:19:55 dannyfallon.local statsite[51551] <Debug>: Closed connection. [8]

And the TCP dump

[danny@dannyfallon ~/src/statsite (master)]$ sudo tcpdump -l -t -A -s0 dst port 2004 | grep gauges
tcpdump: data link type PKTAP
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on pktap, link-type PKTAP (Packet Tap), capture size 65535 bytes
>....z'.statsite.gauges.test_gauge 123.000000 1403187598
^C171 packets captured
10852 packets received by filter
0 packets dropped by kernel

[danny@dannyfallon ~/src/statsd (master)]$ date +%s
1403187875

As you can see, I left the tcpdump running for almost 300 seconds (or around 30 flush intervals) and the gauge was sent just once, when I initially set it.

This is the same bug mentioned in this issue.

On further investigation I tried to decrement the gauge by sending test_gauge:-123|g as in your documentation. I expected the value to be 0, since the gauge was initialised as 123. What I got was -123. This is the result from TCP dump

tcpdump: data link type PKTAP
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on pktap, link-type PKTAP (Packet Tap), capture size 65535 bytes
>..8.}+.statsite.gauges.test_gauge -123.000000 1403188388

If you want statsd compatiblity then gauges should be kept in memory and all gauges are sent with every flush. Keeping the gauge in memory also allows delta operations to be performed.

StatsD offers an optional config variable that allows you to not send gauges in every flush but this still keeps them in memory so you can do delta operations from time to time.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.