influxdata / telegraf Goto Github PK

View Code? Open in Web Editor NEW

13.7K 308.0 5.5K 64.92 MB

The plugin-driven server agent for collecting & reporting metrics.

Home Page: https://influxdata.com/telegraf

License: MIT License

Go 99.42% Shell 0.29% Makefile 0.14% Ragel 0.09% Ruby 0.02% Dockerfile 0.01% Python 0.04%

telegraf monitoring time-series metrics

telegraf's People

Contributors

Stargazers

Watchers

Forkers

nkatsaros voxxit rhoml brian-brazil codevlabs jipperinbham sherifzain es crazyjvm lamefox sxhao colinrymer acalephstorage tylernisonoff justintung alvaromorales cyclefusion lukaf jhofeditz kissthink jmptrader nathanielmichael mtorluemke nstott nicolaslm bewiwi asana kotopes codeb2cc srfraser leo-project gfloyd godeep yong-lee duzhanyuan cinderalla pablrod aristanetworks nickscript0 zepouet vadv wodin ninech ekini kpachbiulliano4 toorop mced bmxpert1 buschjost airtoxin allen13 imcom adamn-porch yankcrime gotyaoi sokoloffa shirou koksan83 corwin7 duanshuaimin realgo kokarn sbilo oldmantaiter acherunilam allenj is00hcw saiello jrxfive downlord zahradtj pombredanne cornerot merlindmc jp2007 stephanebunel mattaezell feelobot tanji mxlxm schen59 alrocky qmsk croomes vlaadbrain croblesuptake rplessl d1rtym0nk3y osnexus jimmystewpot hodgesds sparrc higebu ezotrank liangfei delkyd titilambert chrispeterson lmx1989219 dhiltonp

telegraf's Issues

Disabling measurements in plugins?

It's not clear if this is possible (but it'd be a nice feature if not), but is there a way to only select certain measurements from a plugin?

E.g. if I'm only interested in the mem_free and mem_used measurements, can I avoid storing the 9 other mem_ measurements that I'm not using?

Openstack vm support?

libvirt support?

Telegraf should report version number on startup

1000+ series per host causes slow performance in influxdb

I'm running a test using influxdb 9.2 nightly and telegraf 0.1.3 on a single host with SSD. I am sending metrics from 500 hosts to influxdb using telegraf's default plugins and config. Within a week, influxdb begins returning 500 timeout errors to all requests. The database size is 340G with 137,269 unique series.

In https://influxdb.com/docs/v0.9/concepts/schema_and_data_layout.html, it mentions keeping tag cardinality <100k so I dug in to see where all of the tags are coming from. Each VM host sends about 150 series while a physical hypervisor sends over 1000.

Here is the breakdown from one physical server:
disk metrics: 180 (6 metrics * 30 paths)
cpu metrics: 352 (11 data points * 32 cores)
io metrics: 378 (7 metrics * 54 disk partitions)
net metrics: 128 (8 metrics * 16 interfaces)
load/swap metrics: 9

Here is the best case result (with filtering and CPU aggregation):
disk metrics: 48 (6 metrics * 8 paths)
cpu metrics: 11 (11 data points * 1 aggregate cpu value)
io metrics: 378 (7 metrics * 16 disks in iostat)
net metrics: 128 (8 metrics * 16 interfaces)
load/swap metrics: 9

The same update of a VM host reduces the per host metric count from 146 to 50. The doc says "As a rule of thumb, keep tag cardinality below 100,000. " With 1000 series per host, that is at most 100 servers. Even the low end, 50 metrics per host only allows around 2,000 servers. Is this a limitation that can be addressed in telegraf or influxdb?

I love the simplicity and drop-in nature of the telegraf/influxdb/grafana stack. But the system needs to be able to scale out of the box. Multiple databases would make grafana more complex but it might be an short term solution. Is there a better way?

Unable to build telegraf, yet circle ci says the last build of master succeeds?

Last build... https://circleci.com/gh/influxdb/telegraf/47
Not sure if I am missing something?

Using instructions from https://github.com/influxdb/telegraf/wiki/Building-from-source

andrew@andrew-laptop:~/Desktop/projects/github/ajohnstone/telegraf$ gvm install go1.4
Downloading Go source...
Installing go1.4...
 * Compiling...

andrew@andrew-laptop:~/Desktop/projects/github/ajohnstone/telegraf$ gvm use go1.4 --default
Now using version go1.4

andrew@andrew-laptop:~/Desktop/projects/github/ajohnstone/telegraf$ go get -u github.com/influxdb/telegraf/...

# github.com/influxdb/telegraf/plugins/kafka_consumer
/home/andrew/.gvm/pkgsets/go1.4/global/src/github.com/influxdb/telegraf/plugins/kafka_consumer/kafka_consumer.go:76: cannot use k.Consumer.Messages() (type <-chan *"github.com/Shopify/sarama".ConsumerMessage) as type <-chan *"gopkg.in/Shopify/sarama.v1".ConsumerMessage in argument to readFromKafka
/home/andrew/.gvm/pkgsets/go1.4/global/src/github.com/influxdb/telegraf/plugins/kafka_consumer/kafka_consumer.go:76: cannot use k.Consumer.CommitUpto (type func(*"github.com/Shopify/sarama".ConsumerMessage) error) as type ack in argument to readFromKafka

andrew@andrew-laptop:~/Desktop/projects/github/ajohnstone/telegraf$ cd $GOPATH/src/github.com/influxdb/telegraf

andrew@andrew-laptop:~/.gvm/pkgsets/go1.4/global/src/github.com/influxdb/telegraf$ ./release.sh 
Building Telegraf version 0.9.b1
=> darwin-amd64: go build runtime: darwin/amd64 must be bootstrapped using make.bash
du: cannot access ‘pkg/telegraf-darwin-amd64’: No such file or directory
=> linux-amd64: # github.com/influxdb/telegraf/plugins/kafka_consumer
plugins/kafka_consumer/kafka_consumer.go:76: cannot use k.Consumer.Messages() (type <-chan *"github.com/Shopify/sarama".ConsumerMessage) as type <-chan *"gopkg.in/Shopify/sarama.v1".ConsumerMessage in argument to readFromKafka
plugins/kafka_consumer/kafka_consumer.go:76: cannot use k.Consumer.CommitUpto (type func(*"github.com/Shopify/sarama".ConsumerMessage) error) as type ack in argument to readFromKafka
du: cannot access ‘pkg/telegraf-linux-amd64’: No such file or directory
=> linux-386: go build runtime: linux/386 must be bootstrapped using make.bash
du: cannot access ‘pkg/telegraf-linux-386’: No such file or directory
=> linux-arm: go build runtime: linux/arm must be bootstrapped using make.bash
du: cannot access ‘pkg/telegraf-linux-arm’: No such file or directory

Aggregate and Analyze Syslog

Is it possible to do with Telegraf what Fluentd does to aggregate and analyze Syslog with InfluxDB?

http://www.fluentd.org/guides/recipes/syslog-influxdb

not reinventing the wheel

If that is something that you are interested in there is already a nice library called sigar which is cross platform and provide most of the system data: cpu, memory, network interfaces, processes, ...
( https://github.com/hyperic/sigar )

I currently use it successfully for a similar project on:

FreeBSD
Linux
OpenBSD
Illumos (not entirely)

(it also supports Mac OS X, my development server)

Automated builds... No circleci

influxdb and telegraf conflicting for ownership of /var/run/influxdb/ directory

Installing both influxdb and telegraf on the same host results in one or the other failing to start correctly from the init.d scripts (on Ubuntu)

The problem seems to be down to the ownership of the /var/run/influxdb directory - If influx is installed first, then the directory will be owned by influxdb:influxdb and the telegraf:telegraf user is not able to write a pid file into there and the daemon fails to start. If the packages are installed in the opposite order, then the telegraf user will own the dir and influx will fail to start.

It's not a hard problem to work around, but it would be nice if the two packages did not conflict with each other.

Redis protocol uses "\r\n" as line ending

Redis protocol actually needs \r\n as line ending. Newer versions of redis seem to handle both, however older ones like 2.4, which is the one in Wheezy repository, hang when given only \n

See redis protocol documentation

setting `hostname` in config file generates no metadata in influxdb

From the comments on the README, I would assume that setting

# Configuration for tivan itself
 [agent]
 interval = "10s"
 debug = false
 hostname = "catalyst"

should add a hostname=catalyst tag to all points written, but I do not see that tag key or value anywhere in the data.

remove sample config from repo?

Right now there's a tivan.toml in the root of the repo. It's non-functional. Valid configs must be generated with tivan -sample-config > file.toml.

On the one hand, it's confusing to have a non-working config ship with the repo. On the other hand, it's nice to have an example of the config in the repo so people can see it without downloading the app and generating one.

redis servers is not tagged to multiple ports

in config:
[redis]

servers = ["10.0.0.13:6386", "10.0.0.13:6380", "10.0.0.13:6381", "10.0.0.13:6382", "10.0.0.13:6383", "10.0.0.13:6384", "10.0.0.13:6385"]

in influx 0.9.1:

show measurements
...
redis_total_commands_processed
...

select * from redis_total_commands_processed
....
name: redis_total_commands_processed
tags: host=wpr01
time value

2015-07-22T11:31:32.905606408Z 2387730910
2015-07-22T11:31:37.909249167Z 2387738534
2015-07-22T11:31:42.907242355Z 2387746082
....

where i see in tags "port: 6380(6381...6386)" ?

package.sh swaps to master branch if .git/config contains upstream

If you wish to test whether package.sh works on a different branch, you must make sure that .git/config does not contain a branch definition, such as:

[branch "logrotation-v2"]
    remote = origin
    merge = refs/heads/logrotation-v2

If this definition exists, then package.sh swaps to the master branch and builds that.

Without that entry, the git pull fails (see below), but package.sh carries on with the current branch, allowing the test package build to work.

There is no tracking information for the current branch.
Please specify which branch you want to merge with.
See git-pull(1) for details

    git pull <remote> <branch>

If you wish to set tracking information for this branch you can do so with:

    git branch --set-upstream-to=origin/<branch> logrotation-v2

package github.com/srfraser/telegraf: exit status 1

Create DB automatically if not exist

I think it should be a custom option in config

Update telegraf to use new line protocol rather than JSON

Given the major performance improvements it seems we should have telegraf use the line protocol rather than the JSON protocol, especially since the JSON protocol is likely to eventually be deprecated.

Here's more on the protocol: influxdata/influxdb#2696

how to compile from sources ?

Can you explain how to compile from sources ?

I have installed GO and GVM, but the next steps are not clear for me .. how to build .deb ?

Thanks !

Working demo

Amazing solution but the documentation doesn't really go all the way through with a working example, which makes it relatively hard for a non-expert to start using it. Can someone fix that?

Sending metrics to sensu agent

New features, be able to send datas output to sensu agent

Support for libvirt/kvm

Are there plans to support polling metrics from libvirt/kvm ?

Fail to run go get?

I try to write my first plugin and want to build it manually. When I run go get in ~/go/src/github.com/influxdb/telegraf/cmd/telegraf

../../../influxdb/meta/store.go:221: config.Logger undefined (type *raft.Config has no field or method Logger)
../../../influxdb/meta/store.go:386: invalid operation: s.raft.Leader() != "" (mismatched types net.Addr and string)
../../../influxdb/meta/store.go:404: invalid operation: s.raft.Leader() != "" (mismatched types net.Addr and string)
../../../influxdb/meta/store.go:435: cannot use s.raft.Leader() (type net.Addr) as type string in return argument
../../../influxdb/meta/store.go:457: cannot use a (type []string) as type []net.Addr in argument to s.raft.SetPeers
../../../influxdb/meta/store.go:1291: invalid operation: leader == "" (mismatched types net.Addr and string)
../../../influxdb/meta/store.go:1296: cannot use leader (type net.Addr) as type string in argument to net.DialTimeout

Any ideas?

advantages of telegraf over heka

telegraf seems to have good overlap with heka:
https://github.com/mozilla-services/heka

heka seems to have the InfluxDB 0.9.x+ line protocol almost complete.
mozilla-services/heka#1574
mozilla-services/heka#1595

what is the advantage of telegraf over heka?
Perhaps it would be nice to include that information in the readme.

thanks.

telegraf\plugins\system\ps.go:70: cannot use du (type disk.DiskUsageStat) as type *disk.DiskUsageStat in append

Tried to build Telegraf with go version go1.4.2 windows/amd64, got the error in the title,
telegraf\plugins\system\ps.go:70: cannot use du (type disk.DiskUsageStat) as type *disk.DiskUsageStat in append

[feature request] On launch telegraf should create `telegraf` database if it doesn't already

Currently the user has to create a telegraf database manually. The tool should do this for us.

Unable to package telegraf due to missing logrotate -- cp: cannot stat ‘etc/logrotate.d/telegraf’: No such file or directory

$ go get -u github.com/influxdb/telegraf/...
$ cd $GOPATH/src/github.com/influxdb/telegraf
$ ./package.sh 1

Starting package process...

/home/andrew/.gvm/bin/gvm
Now using version go1.4.2
GOPATH (/home/andrew/.gvm/pkgsets/go1.4.2/global) looks sane, using /home/andrew/.gvm/pkgsets/go1.4.2/global for installation.
Git tree is clean.
From https://github.com/influxdb/telegraf
 * branch            master     -> FETCH_HEAD
Already up-to-date.
Git tree updated successfully.
Build completed successfully.
telegraf copied to /tmp/tmp.MgYgIBqRvH//opt/telegraf/versions/1
scripts/init.sh copied to /tmp/tmp.MgYgIBqRvH//opt/telegraf/versions/1/scripts
cp: cannot stat ‘etc/logrotate.d/telegraf’: No such file or directory
Failed to copy etc/logrotate.d/telegraf to packaging directory -- aborting.

CPU Usage Plugin

I want to add support for CPU Usage (percentage). I've generally seen this done by querying /proc/stat, sleeping for a second, querying again, and calculating the percentage from the diff. Does this sound reasonable if added to the current cpu plugin?

Wrong format for InfluxDB 0.9.1?

I'm getting these in the log

[http] 2015/07/03 10:32:26 127.0.0.1 - - [03/Jul/2015:10:32:26 +0200] POST /write?consistency=&db=telegraf&precision=&rp= HTTP/1.1 400 106 - InfluxDBClient 08d797be-215e-11e5-8005-000000000000 2.541832ms

When running telegraf_0.1.2_amd64 on Ubuntu Server 14.04 LTS

Create a "logstreamer" plugin

Inspired by issue #48, create a plugin for aggregating and pushing data from log files, allowing user-defined regex filters.

This would behave in a similar manner to heka's logstreamer plugin: https://hekad.readthedocs.org/en/v0.9.2/pluginconfig/logstreamer.html#logstreamerplugin

/cc @steverweber

Windows version

Hi,
What's an windows version is planned ?
thanks for your feedback

Changelog references Influxdb repo issues/pull requests instead of telegraf repo

https://github.com/influxdb/telegraf/blob/bbc6fa57fa6d594e8095107eb74860e6c239b23c/CHANGELOG.md

- [#35](https://github.com/influxdb/influxdb/pull/35): Add Kafka plugin. Thanks @EmilS!

Should be:

- [#35](https://github.com/influxdb/telegraf/pull/35): Add Kafka plugin. Thanks @EmilS!

Language-agnostic Plugins

Does Telegraf plan to allow users to write plugins in any language? Being able to quickly write something in Bash or Python would probably make a lot of ops people happy. Quite often I only need to collect a measurement once an hour or so, so the extra overhead would not be an issue at all.

Maybe this could be implemented as a plugin that just runs a configured list of commands with some configuration in env vars.

SNMP/ICMP Support?

Are you planning on supporting remote services, like SNMP or ICMP ?

Support sending metrics to Riemann

Are you planning on supporting different "output" methods like writing metrics to Riemann?

Add Docker labels as tags.

With telegraf it only supports tags in the form of id, name and command.
However doesn't import the actual labels associated to the container.

https://docs.docker.com/userguide/labels-custom-metadata/.

This is quite important with things like ECS.

change name from Tivan to Telegraf

rebranding to commence in 3...2...1...

Error in `docker` plugin: invalid integer.

Error in plugins: unable to parse 'docker_memory_limit,command=/opt/telegraf/telegraf\ -config\ /opt/telegraf/telegraf.toml,host=ebfbff1ca0aa,id=ebfbff1ca0aa8a6b047bdc038ed0d14cfba8495f9f4eca1ce0d588c2cb1bd051,name=/influxdb09telegraf01_telegraf_run_4 value=18446744073709551615': invalid integer

I am not sure that value is right or not. It is too long for int. But telegraf seems like that can't send metrics to influxdb. Maybe cause of this error.

I ran and tested telegraf on docker container. Is it not good for gathering metrics?

FROM ubuntu:14.04
RUN apt-get -y install wget
RUN wget http://get.influxdb.org/telegraf/telegraf_0.1.4_amd64.deb 
RUN dpkg -i telegraf_0.1.4_amd64.deb
ADD ./telegraf.toml /opt/telegraf/
WORKDIR /opt/telegraf
CMD ["/opt/telegraf/telegraf","-config","/opt/telegraf/telegraf.toml"]

telegraf:
    build: .
    dockerfile: telegraf.Dockerfile
    volumes: 
        - /sys:/sys:ro
        - /var/run/docker.sock:/var/run/docker.sock

Hostname is not passed as a tag

I installed telegraf using the Debian/Ubuntu package on the README file. It is sending metrics to influxdb 0.9. The hostname of the machine running telegraf is not being passed as a tag with metrics.

debug setting in config doesn't work

In the config file, the debug setting is non-functional:

# Configuration for tivan itself
 [agent]
 interval = "10s"
 debug = false
 hostname = "catalyst"

Whether debug is set to true or false, no debugging output happens. I can use tivan -config tivan.toml -debug to get debug output. I'd be fine with that being the only option.

If removing the debug setting from the config is simple I vote we do that. It seems better as a command-line flag anyway.

Add support for StatsD style aggregator

We should support the StatsD protocol and aggregation. However, unlike StatsD, the metric names should follow the conventions of the key section of the InfluxDB line protocol.

The StatsD values should be output as a single field called value. This should be able to flush to any of the output sinks like what is mentioned in #35.

This means that a single Telegraf instance could serve as a StatsD aggregator that works with the InfluxDB schema design of measurements and tags.

Installing together telegraf-0.1.1-1.x86_64.rpm and influxdb-0.9.0-1.x86_64.rpm

In a Centos 7 host I have installed the Influxdb package first and then the telegraf package.

The telegraf package overwrites the startup script of influxdb "/opt/influxdb/init.sh".
In the /etc/init.d both links are pointing to the same file:

lrwxrwxrwx. 1 root root 21 jul 1 15:00 influxdb -> /opt/influxdb/init.sh
lrwxrwxrwx. 1 root root 21 jul 1 15:06 telegraf -> /opt/influxdb/init.sh

The consequence is that installing the telegraph package breaks the influxdb startup scripts

Add logrotate script for telegraf.log to rpm/deb packaging

something like this in /etc/logrotate.d/telegraf would work for redhat/centos (rotating weekly, keep for a month):
/var/log/telegraf/telegraf.log {
missingok
nocreate
postrotate
/sbin/service telegraf restart > /dev/null 2>/dev/null || true
endscript
}

error getting docker info: No such file or directory

Running telegraf 0.1.4 on CentOS 7 with Docker 1.7.1 fails to gather docker metrics. What I get is

error getting docker info: open /sys/fs/cgroup/cpuacct/docker//cpuacct.stat: no such file or directory

I tried running find / -name docker -type d and only found:

/run/docker
/var/lib/docker
/etc/docker
/usr/libexec/docker

Perhaps the cgroup paths are different in CentOS/RHEL 7?

Packager: Influxdb and telegraf share same init.sh path

Since influxdb and telegraf share the same ROOT DIR's, the init.sh file will get overwritten when both are installed via RPM.

ls -lah /etc/init.d/{telegraf,influxdb}
lrwxrwxrwx 1 root root 21 Jun 22 15:53 /etc/init.d/influxdb -> /opt/influxdb/init.sh
lrwxrwxrwx 1 root root 21 Jun 22 16:06 /etc/init.d/telegraf -> /opt/influxdb/init.sh

Suggest changing the following s/influxdb/telegraf/ so that telegraf is treated independent from influxdb

INSTALL_ROOT_DIR=/opt/influxdb
TELEGRAF_LOG_DIR=/var/log/influxdb
CONFIG_ROOT_DIR=/etc/opt/influxdb

update integer output to new line protocol format

With 0.9.3, the line protocol has changed for integers: influxdata/influxdb#3526

This change means that current telegraf users who upgrade to 0.9.3 cannot write a number of values, because they are integer fields in the db but the current telegraf sends them without the trailing i, meaning they parse as floats. That causes a type mismatch and a write failure.

Telegraf Ctrl-C lag

When ^C'ing Telegraf, there is quite a bit of lag, is should be able to exit immediately

Reload configuration without downtime

Are you planning on supporting config reloading? nginx -s reload comes to mind.

I'm thinking of a use case where telegraf is monitoring a cluster of servers. Some nodes fail and new ones are added -- the config needs to be updated.

A reload command would happen within the gather interval so metrics would not be dropped.

Running raw `telegraf` command without a config file causes error

Simply running telegraf by itself yields this error

2015/06/23 13:51:16 Error in plugins: Post write?consistency=&db=&precision=&rp=: unsupported protocol scheme ""

If you generate a sample config and run it with the config file, everything works fine.

Tags no longer passed in due to change in Client

I was experimenting with Telegraf, and I am pretty impressed with the ease of getting it setup and the potential for adding plugins. That said, I noticed that Tags don't seem to be passed at the point level. Not even the default "host" as I would have expected after finding issue #4 .

I think I tracked down the cause.

It looks like a recent change in the influxdb client ( influxdata/influxdb@e6c36d5 ) has made it so that passing common parameters at the batch level is no longer allowed.

If we look at the agent.go code, it appears that the "Tags" were passed at the top of the batch level and expected to be inherited down through the "sub" points. This seems to be leveraging the now removed common batch parameters functionality.

    close(points)

    var acc BatchPoints
    acc.Tags = a.Config.Tags
    acc.Time = time.Now()
    acc.Database = a.Config.Database

    for sub := range points {
        acc.Points = append(acc.Points, sub.Points...)
    }

I am not familiar enough with Go to effectively fix and properly add unit testing, but hopefully this is enough for someone to jump in and take it from here.

nothing prevents launching multiple instances of telegraf

Not sure if that's intentional or not, but it could lead to confusion. I can see a use case, where I want to sample different metrics at different intervals.

What's the intention?

Support sending metrics to Kafka

We should be able to send metrics to a variety of places other than InfluxDB. This means that we'll need to have some sort of framework for defining new output sinks. This also means that we should be able to disable sending metrics to any of these output sinks, including InfluxDB.

What I'm thinking is to pull the InfluxDB settings into another area like outputs. Then, like plugins, define a number of different outputs where metrics can be sent to.

For Kafka, we should send metrics using the line protocol. This should be fairly simply using the InfluxDB client to convert the metrics to their line protocol equivalents.

The work to add support for Riemann (#34) will also need this output sink implementation.

influxdata / telegraf Goto Github PK

telegraf's People

Contributors

Stargazers

Watchers

Forkers

telegraf's Issues

select * from redis_total_commands_processed .... name: redis_total_commands_processed tags: host=wpr01 time value

Recommend Projects

Recommend Topics

Recommend Org

select * from redis_total_commands_processed
....
name: redis_total_commands_processed
tags: host=wpr01
time value