Coder Social home page Coder Social logo

influxdata / telegraf Goto Github PK

View Code? Open in Web Editor NEW
13.7K 308.0 5.5K 64.92 MB

The plugin-driven server agent for collecting & reporting metrics.

Home Page: https://influxdata.com/telegraf

License: MIT License

Go 99.42% Shell 0.29% Makefile 0.14% Ragel 0.09% Ruby 0.02% Dockerfile 0.01% Python 0.04%
telegraf monitoring time-series metrics

telegraf's People

Contributors

akrantz01 avatar brocaar avatar danielnelson avatar davidgs avatar dependabot[bot] avatar evanphx avatar glinton avatar helenosheaa avatar hipska avatar ivorybilled avatar jacobmarble avatar jipperinbham avatar myalongmire avatar phemmer avatar pierref avatar powersj avatar prydin avatar reimda avatar russorat avatar sjwang90 avatar sparrc avatar srebhan avatar srfraser avatar ssoroka avatar sspaink avatar telegraf-tiger[bot] avatar titilambert avatar toddboom avatar trovalo avatar zak-pawel avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

telegraf's Issues

Disabling measurements in plugins?

It's not clear if this is possible (but it'd be a nice feature if not), but is there a way to only select certain measurements from a plugin?

E.g. if I'm only interested in the mem_free and mem_used measurements, can I avoid storing the 9 other mem_ measurements that I'm not using?

1000+ series per host causes slow performance in influxdb

I'm running a test using influxdb 9.2 nightly and telegraf 0.1.3 on a single host with SSD. I am sending metrics from 500 hosts to influxdb using telegraf's default plugins and config. Within a week, influxdb begins returning 500 timeout errors to all requests. The database size is 340G with 137,269 unique series.

In https://influxdb.com/docs/v0.9/concepts/schema_and_data_layout.html, it mentions keeping tag cardinality <100k so I dug in to see where all of the tags are coming from. Each VM host sends about 150 series while a physical hypervisor sends over 1000.

Here is the breakdown from one physical server:
disk metrics: 180 (6 metrics * 30 paths)
cpu metrics: 352 (11 data points * 32 cores)
io metrics: 378 (7 metrics * 54 disk partitions)
net metrics: 128 (8 metrics * 16 interfaces)
load/swap metrics: 9

Here is the best case result (with filtering and CPU aggregation):
disk metrics: 48 (6 metrics * 8 paths)
cpu metrics: 11 (11 data points * 1 aggregate cpu value)
io metrics: 378 (7 metrics * 16 disks in iostat)
net metrics: 128 (8 metrics * 16 interfaces)
load/swap metrics: 9

The same update of a VM host reduces the per host metric count from 146 to 50. The doc says "As a rule of thumb, keep tag cardinality below 100,000. " With 1000 series per host, that is at most 100 servers. Even the low end, 50 metrics per host only allows around 2,000 servers. Is this a limitation that can be addressed in telegraf or influxdb?

I love the simplicity and drop-in nature of the telegraf/influxdb/grafana stack. But the system needs to be able to scale out of the box. Multiple databases would make grafana more complex but it might be an short term solution. Is there a better way?

Unable to build telegraf, yet circle ci says the last build of master succeeds?

Last build... https://circleci.com/gh/influxdb/telegraf/47
Not sure if I am missing something?

Using instructions from https://github.com/influxdb/telegraf/wiki/Building-from-source

andrew@andrew-laptop:~/Desktop/projects/github/ajohnstone/telegraf$ gvm install go1.4
Downloading Go source...
Installing go1.4...
 * Compiling...

andrew@andrew-laptop:~/Desktop/projects/github/ajohnstone/telegraf$ gvm use go1.4 --default
Now using version go1.4

andrew@andrew-laptop:~/Desktop/projects/github/ajohnstone/telegraf$ go get -u github.com/influxdb/telegraf/...

# github.com/influxdb/telegraf/plugins/kafka_consumer
/home/andrew/.gvm/pkgsets/go1.4/global/src/github.com/influxdb/telegraf/plugins/kafka_consumer/kafka_consumer.go:76: cannot use k.Consumer.Messages() (type <-chan *"github.com/Shopify/sarama".ConsumerMessage) as type <-chan *"gopkg.in/Shopify/sarama.v1".ConsumerMessage in argument to readFromKafka
/home/andrew/.gvm/pkgsets/go1.4/global/src/github.com/influxdb/telegraf/plugins/kafka_consumer/kafka_consumer.go:76: cannot use k.Consumer.CommitUpto (type func(*"github.com/Shopify/sarama".ConsumerMessage) error) as type ack in argument to readFromKafka

andrew@andrew-laptop:~/Desktop/projects/github/ajohnstone/telegraf$ cd $GOPATH/src/github.com/influxdb/telegraf

andrew@andrew-laptop:~/.gvm/pkgsets/go1.4/global/src/github.com/influxdb/telegraf$ ./release.sh 
Building Telegraf version 0.9.b1
=> darwin-amd64: go build runtime: darwin/amd64 must be bootstrapped using make.bash
du: cannot access ‘pkg/telegraf-darwin-amd64’: No such file or directory
=> linux-amd64: # github.com/influxdb/telegraf/plugins/kafka_consumer
plugins/kafka_consumer/kafka_consumer.go:76: cannot use k.Consumer.Messages() (type <-chan *"github.com/Shopify/sarama".ConsumerMessage) as type <-chan *"gopkg.in/Shopify/sarama.v1".ConsumerMessage in argument to readFromKafka
plugins/kafka_consumer/kafka_consumer.go:76: cannot use k.Consumer.CommitUpto (type func(*"github.com/Shopify/sarama".ConsumerMessage) error) as type ack in argument to readFromKafka
du: cannot access ‘pkg/telegraf-linux-amd64’: No such file or directory
=> linux-386: go build runtime: linux/386 must be bootstrapped using make.bash
du: cannot access ‘pkg/telegraf-linux-386’: No such file or directory
=> linux-arm: go build runtime: linux/arm must be bootstrapped using make.bash
du: cannot access ‘pkg/telegraf-linux-arm’: No such file or directory

not reinventing the wheel

If that is something that you are interested in there is already a nice library called sigar which is cross platform and provide most of the system data: cpu, memory, network interfaces, processes, ...
( https://github.com/hyperic/sigar )

I currently use it successfully for a similar project on:

  • FreeBSD
  • Linux
  • OpenBSD
  • Illumos (not entirely)

(it also supports Mac OS X, my development server)

influxdb and telegraf conflicting for ownership of /var/run/influxdb/ directory

Installing both influxdb and telegraf on the same host results in one or the other failing to start correctly from the init.d scripts (on Ubuntu)

The problem seems to be down to the ownership of the /var/run/influxdb directory - If influx is installed first, then the directory will be owned by influxdb:influxdb and the telegraf:telegraf user is not able to write a pid file into there and the daemon fails to start. If the packages are installed in the opposite order, then the telegraf user will own the dir and influx will fail to start.

It's not a hard problem to work around, but it would be nice if the two packages did not conflict with each other.

setting `hostname` in config file generates no metadata in influxdb

From the comments on the README, I would assume that setting

# Configuration for tivan itself
 [agent]
 interval = "10s"
 debug = false
 hostname = "catalyst"

should add a hostname=catalyst tag to all points written, but I do not see that tag key or value anywhere in the data.

remove sample config from repo?

Right now there's a tivan.toml in the root of the repo. It's non-functional. Valid configs must be generated with tivan -sample-config > file.toml.

On the one hand, it's confusing to have a non-working config ship with the repo. On the other hand, it's nice to have an example of the config in the repo so people can see it without downloading the app and generating one.

redis servers is not tagged to multiple ports

in config:
[redis]

servers = ["10.0.0.13:6386", "10.0.0.13:6380", "10.0.0.13:6381", "10.0.0.13:6382", "10.0.0.13:6383", "10.0.0.13:6384", "10.0.0.13:6385"]


in influx 0.9.1:

show measurements
...
redis_total_commands_processed
...

select * from redis_total_commands_processed
....
name: redis_total_commands_processed
tags: host=wpr01
time value

2015-07-22T11:31:32.905606408Z 2387730910
2015-07-22T11:31:37.909249167Z 2387738534
2015-07-22T11:31:42.907242355Z 2387746082
....

where i see in tags "port: 6380(6381...6386)" ?

package.sh swaps to master branch if .git/config contains upstream

If you wish to test whether package.sh works on a different branch, you must make sure that .git/config does not contain a branch definition, such as:

[branch "logrotation-v2"]
    remote = origin
    merge = refs/heads/logrotation-v2

If this definition exists, then package.sh swaps to the master branch and builds that.

Without that entry, the git pull fails (see below), but package.sh carries on with the current branch, allowing the test package build to work.

There is no tracking information for the current branch.
Please specify which branch you want to merge with.
See git-pull(1) for details

    git pull <remote> <branch>

If you wish to set tracking information for this branch you can do so with:

    git branch --set-upstream-to=origin/<branch> logrotation-v2

package github.com/srfraser/telegraf: exit status 1

how to compile from sources ?

Can you explain how to compile from sources ?

I have installed GO and GVM, but the next steps are not clear for me .. how to build .deb ?

Thanks !

Working demo

Amazing solution but the documentation doesn't really go all the way through with a working example, which makes it relatively hard for a non-expert to start using it. Can someone fix that?

Fail to run go get?

I try to write my first plugin and want to build it manually. When I run go get in ~/go/src/github.com/influxdb/telegraf/cmd/telegraf

../../../influxdb/meta/store.go:221: config.Logger undefined (type *raft.Config has no field or method Logger)
../../../influxdb/meta/store.go:386: invalid operation: s.raft.Leader() != "" (mismatched types net.Addr and string)
../../../influxdb/meta/store.go:404: invalid operation: s.raft.Leader() != "" (mismatched types net.Addr and string)
../../../influxdb/meta/store.go:435: cannot use s.raft.Leader() (type net.Addr) as type string in return argument
../../../influxdb/meta/store.go:457: cannot use a (type []string) as type []net.Addr in argument to s.raft.SetPeers
../../../influxdb/meta/store.go:1291: invalid operation: leader == "" (mismatched types net.Addr and string)
../../../influxdb/meta/store.go:1296: cannot use leader (type net.Addr) as type string in argument to net.DialTimeout

Any ideas?

Unable to package telegraf due to missing logrotate -- cp: cannot stat ‘etc/logrotate.d/telegraf’: No such file or directory

$ go get -u github.com/influxdb/telegraf/...
$ cd $GOPATH/src/github.com/influxdb/telegraf
$ ./package.sh 1

Starting package process...

/home/andrew/.gvm/bin/gvm
Now using version go1.4.2
GOPATH (/home/andrew/.gvm/pkgsets/go1.4.2/global) looks sane, using /home/andrew/.gvm/pkgsets/go1.4.2/global for installation.
Git tree is clean.
From https://github.com/influxdb/telegraf
 * branch            master     -> FETCH_HEAD
Already up-to-date.
Git tree updated successfully.
Build completed successfully.
telegraf copied to /tmp/tmp.MgYgIBqRvH//opt/telegraf/versions/1
scripts/init.sh copied to /tmp/tmp.MgYgIBqRvH//opt/telegraf/versions/1/scripts
cp: cannot stat ‘etc/logrotate.d/telegraf’: No such file or directory
Failed to copy etc/logrotate.d/telegraf to packaging directory -- aborting.

CPU Usage Plugin

I want to add support for CPU Usage (percentage). I've generally seen this done by querying /proc/stat, sleeping for a second, querying again, and calculating the percentage from the diff. Does this sound reasonable if added to the current cpu plugin?

Wrong format for InfluxDB 0.9.1?

I'm getting these in the log

[http] 2015/07/03 10:32:26 127.0.0.1 - - [03/Jul/2015:10:32:26 +0200] POST /write?consistency=&db=telegraf&precision=&rp= HTTP/1.1 400 106 - InfluxDBClient 08d797be-215e-11e5-8005-000000000000 2.541832ms

When running telegraf_0.1.2_amd64 on Ubuntu Server 14.04 LTS

Windows version

Hi,
What's an windows version is planned ?
thanks for your feedback

Language-agnostic Plugins

Does Telegraf plan to allow users to write plugins in any language? Being able to quickly write something in Bash or Python would probably make a lot of ops people happy. Quite often I only need to collect a measurement once an hour or so, so the extra overhead would not be an issue at all.

Maybe this could be implemented as a plugin that just runs a configured list of commands with some configuration in env vars.

Error in `docker` plugin: invalid integer.

Error in plugins: unable to parse 'docker_memory_limit,command=/opt/telegraf/telegraf\ -config\ /opt/telegraf/telegraf.toml,host=ebfbff1ca0aa,id=ebfbff1ca0aa8a6b047bdc038ed0d14cfba8495f9f4eca1ce0d588c2cb1bd051,name=/influxdb09telegraf01_telegraf_run_4 value=18446744073709551615': invalid integer

I am not sure that value is right or not. It is too long for int. But telegraf seems like that can't send metrics to influxdb. Maybe cause of this error.

I ran and tested telegraf on docker container. Is it not good for gathering metrics?

FROM ubuntu:14.04
RUN apt-get -y install wget
RUN wget http://get.influxdb.org/telegraf/telegraf_0.1.4_amd64.deb 
RUN dpkg -i telegraf_0.1.4_amd64.deb
ADD ./telegraf.toml /opt/telegraf/
WORKDIR /opt/telegraf
CMD ["/opt/telegraf/telegraf","-config","/opt/telegraf/telegraf.toml"]

telegraf:
    build: .
    dockerfile: telegraf.Dockerfile
    volumes: 
        - /sys:/sys:ro
        - /var/run/docker.sock:/var/run/docker.sock

Hostname is not passed as a tag

I installed telegraf using the Debian/Ubuntu package on the README file. It is sending metrics to influxdb 0.9. The hostname of the machine running telegraf is not being passed as a tag with metrics.

debug setting in config doesn't work

In the config file, the debug setting is non-functional:

# Configuration for tivan itself
 [agent]
 interval = "10s"
 debug = false
 hostname = "catalyst"

Whether debug is set to true or false, no debugging output happens. I can use tivan -config tivan.toml -debug to get debug output. I'd be fine with that being the only option.

If removing the debug setting from the config is simple I vote we do that. It seems better as a command-line flag anyway.

Add support for StatsD style aggregator

We should support the StatsD protocol and aggregation. However, unlike StatsD, the metric names should follow the conventions of the key section of the InfluxDB line protocol.

The StatsD values should be output as a single field called value. This should be able to flush to any of the output sinks like what is mentioned in #35.

This means that a single Telegraf instance could serve as a StatsD aggregator that works with the InfluxDB schema design of measurements and tags.

Installing together telegraf-0.1.1-1.x86_64.rpm and influxdb-0.9.0-1.x86_64.rpm

In a Centos 7 host I have installed the Influxdb package first and then the telegraf package.

The telegraf package overwrites the startup script of influxdb "/opt/influxdb/init.sh".
In the /etc/init.d both links are pointing to the same file:

lrwxrwxrwx. 1 root root 21 jul 1 15:00 influxdb -> /opt/influxdb/init.sh
lrwxrwxrwx. 1 root root 21 jul 1 15:06 telegraf -> /opt/influxdb/init.sh

The consequence is that installing the telegraph package breaks the influxdb startup scripts

Add logrotate script for telegraf.log to rpm/deb packaging

something like this in /etc/logrotate.d/telegraf would work for redhat/centos (rotating weekly, keep for a month):
/var/log/telegraf/telegraf.log {
missingok
nocreate
postrotate
/sbin/service telegraf restart > /dev/null 2>/dev/null || true
endscript
}

error getting docker info: No such file or directory

Running telegraf 0.1.4 on CentOS 7 with Docker 1.7.1 fails to gather docker metrics. What I get is

error getting docker info: open /sys/fs/cgroup/cpuacct/docker//cpuacct.stat: no such file or directory

I tried running find / -name docker -type d and only found:

/run/docker
/var/lib/docker
/etc/docker
/usr/libexec/docker

Perhaps the cgroup paths are different in CentOS/RHEL 7?

Packager: Influxdb and telegraf share same init.sh path

Since influxdb and telegraf share the same ROOT DIR's, the init.sh file will get overwritten when both are installed via RPM.

ls -lah /etc/init.d/{telegraf,influxdb}
lrwxrwxrwx 1 root root 21 Jun 22 15:53 /etc/init.d/influxdb -> /opt/influxdb/init.sh
lrwxrwxrwx 1 root root 21 Jun 22 16:06 /etc/init.d/telegraf -> /opt/influxdb/init.sh

Suggest changing the following s/influxdb/telegraf/ so that telegraf is treated independent from influxdb

INSTALL_ROOT_DIR=/opt/influxdb
TELEGRAF_LOG_DIR=/var/log/influxdb
CONFIG_ROOT_DIR=/etc/opt/influxdb

update integer output to new line protocol format

With 0.9.3, the line protocol has changed for integers: influxdata/influxdb#3526

This change means that current telegraf users who upgrade to 0.9.3 cannot write a number of values, because they are integer fields in the db but the current telegraf sends them without the trailing i, meaning they parse as floats. That causes a type mismatch and a write failure.

Telegraf Ctrl-C lag

When ^C'ing Telegraf, there is quite a bit of lag, is should be able to exit immediately

Reload configuration without downtime

Are you planning on supporting config reloading? nginx -s reload comes to mind.

I'm thinking of a use case where telegraf is monitoring a cluster of servers. Some nodes fail and new ones are added -- the config needs to be updated.

A reload command would happen within the gather interval so metrics would not be dropped.

Tags no longer passed in due to change in Client

I was experimenting with Telegraf, and I am pretty impressed with the ease of getting it setup and the potential for adding plugins. That said, I noticed that Tags don't seem to be passed at the point level. Not even the default "host" as I would have expected after finding issue #4 .

I think I tracked down the cause.

It looks like a recent change in the influxdb client ( influxdata/influxdb@e6c36d5 ) has made it so that passing common parameters at the batch level is no longer allowed.

If we look at the agent.go code, it appears that the "Tags" were passed at the top of the batch level and expected to be inherited down through the "sub" points. This seems to be leveraging the now removed common batch parameters functionality.

    close(points)

    var acc BatchPoints
    acc.Tags = a.Config.Tags
    acc.Time = time.Now()
    acc.Database = a.Config.Database

    for sub := range points {
        acc.Points = append(acc.Points, sub.Points...)
    }

I am not familiar enough with Go to effectively fix and properly add unit testing, but hopefully this is enough for someone to jump in and take it from here.

Support sending metrics to Kafka

We should be able to send metrics to a variety of places other than InfluxDB. This means that we'll need to have some sort of framework for defining new output sinks. This also means that we should be able to disable sending metrics to any of these output sinks, including InfluxDB.

What I'm thinking is to pull the InfluxDB settings into another area like outputs. Then, like plugins, define a number of different outputs where metrics can be sent to.

For Kafka, we should send metrics using the line protocol. This should be fairly simply using the InfluxDB client to convert the metrics to their line protocol equivalents.

The work to add support for Riemann (#34) will also need this output sink implementation.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.