influxdata / telegraf Goto Github PK
View Code? Open in Web Editor NEWThe plugin-driven server agent for collecting & reporting metrics.
Home Page: https://influxdata.com/telegraf
License: MIT License
The plugin-driven server agent for collecting & reporting metrics.
Home Page: https://influxdata.com/telegraf
License: MIT License
It's not clear if this is possible (but it'd be a nice feature if not), but is there a way to only select certain measurements from a plugin?
E.g. if I'm only interested in the mem_free and mem_used measurements, can I avoid storing the 9 other mem_ measurements that I'm not using?
libvirt support?
I'm running a test using influxdb 9.2 nightly and telegraf 0.1.3 on a single host with SSD. I am sending metrics from 500 hosts to influxdb using telegraf's default plugins and config. Within a week, influxdb begins returning 500 timeout errors to all requests. The database size is 340G with 137,269 unique series.
In https://influxdb.com/docs/v0.9/concepts/schema_and_data_layout.html, it mentions keeping tag cardinality <100k so I dug in to see where all of the tags are coming from. Each VM host sends about 150 series while a physical hypervisor sends over 1000.
Here is the breakdown from one physical server:
disk metrics: 180 (6 metrics * 30 paths)
cpu metrics: 352 (11 data points * 32 cores)
io metrics: 378 (7 metrics * 54 disk partitions)
net metrics: 128 (8 metrics * 16 interfaces)
load/swap metrics: 9
Here is the best case result (with filtering and CPU aggregation):
disk metrics: 48 (6 metrics * 8 paths)
cpu metrics: 11 (11 data points * 1 aggregate cpu value)
io metrics: 378 (7 metrics * 16 disks in iostat)
net metrics: 128 (8 metrics * 16 interfaces)
load/swap metrics: 9
The same update of a VM host reduces the per host metric count from 146 to 50. The doc says "As a rule of thumb, keep tag cardinality below 100,000. " With 1000 series per host, that is at most 100 servers. Even the low end, 50 metrics per host only allows around 2,000 servers. Is this a limitation that can be addressed in telegraf or influxdb?
I love the simplicity and drop-in nature of the telegraf/influxdb/grafana stack. But the system needs to be able to scale out of the box. Multiple databases would make grafana more complex but it might be an short term solution. Is there a better way?
Last build... https://circleci.com/gh/influxdb/telegraf/47
Not sure if I am missing something?
Using instructions from https://github.com/influxdb/telegraf/wiki/Building-from-source
andrew@andrew-laptop:~/Desktop/projects/github/ajohnstone/telegraf$ gvm install go1.4
Downloading Go source...
Installing go1.4...
* Compiling...
andrew@andrew-laptop:~/Desktop/projects/github/ajohnstone/telegraf$ gvm use go1.4 --default
Now using version go1.4
andrew@andrew-laptop:~/Desktop/projects/github/ajohnstone/telegraf$ go get -u github.com/influxdb/telegraf/...
# github.com/influxdb/telegraf/plugins/kafka_consumer
/home/andrew/.gvm/pkgsets/go1.4/global/src/github.com/influxdb/telegraf/plugins/kafka_consumer/kafka_consumer.go:76: cannot use k.Consumer.Messages() (type <-chan *"github.com/Shopify/sarama".ConsumerMessage) as type <-chan *"gopkg.in/Shopify/sarama.v1".ConsumerMessage in argument to readFromKafka
/home/andrew/.gvm/pkgsets/go1.4/global/src/github.com/influxdb/telegraf/plugins/kafka_consumer/kafka_consumer.go:76: cannot use k.Consumer.CommitUpto (type func(*"github.com/Shopify/sarama".ConsumerMessage) error) as type ack in argument to readFromKafka
andrew@andrew-laptop:~/Desktop/projects/github/ajohnstone/telegraf$ cd $GOPATH/src/github.com/influxdb/telegraf
andrew@andrew-laptop:~/.gvm/pkgsets/go1.4/global/src/github.com/influxdb/telegraf$ ./release.sh
Building Telegraf version 0.9.b1
=> darwin-amd64: go build runtime: darwin/amd64 must be bootstrapped using make.bash
du: cannot access ‘pkg/telegraf-darwin-amd64’: No such file or directory
=> linux-amd64: # github.com/influxdb/telegraf/plugins/kafka_consumer
plugins/kafka_consumer/kafka_consumer.go:76: cannot use k.Consumer.Messages() (type <-chan *"github.com/Shopify/sarama".ConsumerMessage) as type <-chan *"gopkg.in/Shopify/sarama.v1".ConsumerMessage in argument to readFromKafka
plugins/kafka_consumer/kafka_consumer.go:76: cannot use k.Consumer.CommitUpto (type func(*"github.com/Shopify/sarama".ConsumerMessage) error) as type ack in argument to readFromKafka
du: cannot access ‘pkg/telegraf-linux-amd64’: No such file or directory
=> linux-386: go build runtime: linux/386 must be bootstrapped using make.bash
du: cannot access ‘pkg/telegraf-linux-386’: No such file or directory
=> linux-arm: go build runtime: linux/arm must be bootstrapped using make.bash
du: cannot access ‘pkg/telegraf-linux-arm’: No such file or directory
Is it possible to do with Telegraf what Fluentd does to aggregate and analyze Syslog with InfluxDB?
If that is something that you are interested in there is already a nice library called sigar which is cross platform and provide most of the system data: cpu, memory, network interfaces, processes, ...
( https://github.com/hyperic/sigar )
I currently use it successfully for a similar project on:
(it also supports Mac OS X, my development server)
Automated builds... No circleci
Installing both influxdb and telegraf on the same host results in one or the other failing to start correctly from the init.d scripts (on Ubuntu)
The problem seems to be down to the ownership of the /var/run/influxdb directory - If influx is installed first, then the directory will be owned by influxdb:influxdb and the telegraf:telegraf user is not able to write a pid file into there and the daemon fails to start. If the packages are installed in the opposite order, then the telegraf user will own the dir and influx will fail to start.
It's not a hard problem to work around, but it would be nice if the two packages did not conflict with each other.
Redis protocol actually needs \r\n
as line ending. Newer versions of redis seem to handle both, however older ones like 2.4, which is the one in Wheezy repository, hang when given only \n
From the comments on the README, I would assume that setting
# Configuration for tivan itself
[agent]
interval = "10s"
debug = false
hostname = "catalyst"
should add a hostname=catalyst
tag to all points written, but I do not see that tag key or value anywhere in the data.
Right now there's a tivan.toml in the root of the repo. It's non-functional. Valid configs must be generated with tivan -sample-config > file.toml
.
On the one hand, it's confusing to have a non-working config ship with the repo. On the other hand, it's nice to have an example of the config in the repo so people can see it without downloading the app and generating one.
in config:
[redis]
servers = ["10.0.0.13:6386", "10.0.0.13:6380", "10.0.0.13:6381", "10.0.0.13:6382", "10.0.0.13:6383", "10.0.0.13:6384", "10.0.0.13:6385"]
in influx 0.9.1:
show measurements
...
redis_total_commands_processed
...select * from redis_total_commands_processed
....
name: redis_total_commands_processed
tags: host=wpr01
time value2015-07-22T11:31:32.905606408Z 2387730910
2015-07-22T11:31:37.909249167Z 2387738534
2015-07-22T11:31:42.907242355Z 2387746082
....
where i see in tags "port: 6380(6381...6386)" ?
If you wish to test whether package.sh works on a different branch, you must make sure that .git/config does not contain a branch definition, such as:
[branch "logrotation-v2"]
remote = origin
merge = refs/heads/logrotation-v2
If this definition exists, then package.sh swaps to the master branch and builds that.
Without that entry, the git pull
fails (see below), but package.sh carries on with the current branch, allowing the test package build to work.
There is no tracking information for the current branch.
Please specify which branch you want to merge with.
See git-pull(1) for details
git pull <remote> <branch>
If you wish to set tracking information for this branch you can do so with:
git branch --set-upstream-to=origin/<branch> logrotation-v2
package github.com/srfraser/telegraf: exit status 1
I think it should be a custom option in config
Given the major performance improvements it seems we should have telegraf use the line protocol rather than the JSON protocol, especially since the JSON protocol is likely to eventually be deprecated.
Here's more on the protocol: influxdata/influxdb#2696
Can you explain how to compile from sources ?
I have installed GO and GVM, but the next steps are not clear for me .. how to build .deb ?
Thanks !
Amazing solution but the documentation doesn't really go all the way through with a working example, which makes it relatively hard for a non-expert to start using it. Can someone fix that?
New features, be able to send datas output to sensu agent
Are there plans to support polling metrics from libvirt/kvm ?
I try to write my first plugin and want to build it manually. When I run go get in ~/go/src/github.com/influxdb/telegraf/cmd/telegraf
../../../influxdb/meta/store.go:221: config.Logger undefined (type *raft.Config has no field or method Logger)
../../../influxdb/meta/store.go:386: invalid operation: s.raft.Leader() != "" (mismatched types net.Addr and string)
../../../influxdb/meta/store.go:404: invalid operation: s.raft.Leader() != "" (mismatched types net.Addr and string)
../../../influxdb/meta/store.go:435: cannot use s.raft.Leader() (type net.Addr) as type string in return argument
../../../influxdb/meta/store.go:457: cannot use a (type []string) as type []net.Addr in argument to s.raft.SetPeers
../../../influxdb/meta/store.go:1291: invalid operation: leader == "" (mismatched types net.Addr and string)
../../../influxdb/meta/store.go:1296: cannot use leader (type net.Addr) as type string in argument to net.DialTimeout
Any ideas?
telegraf seems to have good overlap with heka:
https://github.com/mozilla-services/heka
heka seems to have the InfluxDB 0.9.x+ line protocol almost complete.
mozilla-services/heka#1574
mozilla-services/heka#1595
what is the advantage of telegraf over heka?
Perhaps it would be nice to include that information in the readme.
thanks.
Tried to build Telegraf with go version go1.4.2 windows/amd64, got the error in the title,
telegraf\plugins\system\ps.go:70: cannot use du (type disk.DiskUsageStat) as type *disk.DiskUsageStat in append
Currently the user has to create a telegraf
database manually. The tool should do this for us.
$ go get -u github.com/influxdb/telegraf/...
$ cd $GOPATH/src/github.com/influxdb/telegraf
$ ./package.sh 1
Starting package process...
/home/andrew/.gvm/bin/gvm
Now using version go1.4.2
GOPATH (/home/andrew/.gvm/pkgsets/go1.4.2/global) looks sane, using /home/andrew/.gvm/pkgsets/go1.4.2/global for installation.
Git tree is clean.
From https://github.com/influxdb/telegraf
* branch master -> FETCH_HEAD
Already up-to-date.
Git tree updated successfully.
Build completed successfully.
telegraf copied to /tmp/tmp.MgYgIBqRvH//opt/telegraf/versions/1
scripts/init.sh copied to /tmp/tmp.MgYgIBqRvH//opt/telegraf/versions/1/scripts
cp: cannot stat ‘etc/logrotate.d/telegraf’: No such file or directory
Failed to copy etc/logrotate.d/telegraf to packaging directory -- aborting.
I want to add support for CPU Usage (percentage). I've generally seen this done by querying /proc/stat, sleeping for a second, querying again, and calculating the percentage from the diff. Does this sound reasonable if added to the current cpu plugin?
I'm getting these in the log
[http] 2015/07/03 10:32:26 127.0.0.1 - - [03/Jul/2015:10:32:26 +0200] POST /write?consistency=&db=telegraf&precision=&rp= HTTP/1.1 400 106 - InfluxDBClient 08d797be-215e-11e5-8005-000000000000 2.541832ms
When running telegraf_0.1.2_amd64 on Ubuntu Server 14.04 LTS
Inspired by issue #48, create a plugin for aggregating and pushing data from log files, allowing user-defined regex filters.
This would behave in a similar manner to heka's logstreamer plugin: https://hekad.readthedocs.org/en/v0.9.2/pluginconfig/logstreamer.html#logstreamerplugin
/cc @steverweber
Hi,
What's an windows version is planned ?
thanks for your feedback
Changelog references Influxdb repo issues/pull requests instead of telegraf repo
https://github.com/influxdb/telegraf/blob/bbc6fa57fa6d594e8095107eb74860e6c239b23c/CHANGELOG.md
- [#35](https://github.com/influxdb/influxdb/pull/35): Add Kafka plugin. Thanks @EmilS!
Should be:
- [#35](https://github.com/influxdb/telegraf/pull/35): Add Kafka plugin. Thanks @EmilS!
Does Telegraf plan to allow users to write plugins in any language? Being able to quickly write something in Bash or Python would probably make a lot of ops people happy. Quite often I only need to collect a measurement once an hour or so, so the extra overhead would not be an issue at all.
Maybe this could be implemented as a plugin that just runs a configured list of commands with some configuration in env vars.
Are you planning on supporting remote services, like SNMP or ICMP ?
Are you planning on supporting different "output" methods like writing metrics to Riemann?
With telegraf it only supports tags in the form of id, name and command.
However doesn't import the actual labels associated to the container.
https://docs.docker.com/userguide/labels-custom-metadata/.
This is quite important with things like ECS.
rebranding to commence in 3...2...1...
Error in plugins: unable to parse 'docker_memory_limit,command=/opt/telegraf/telegraf\ -config\ /opt/telegraf/telegraf.toml,host=ebfbff1ca0aa,id=ebfbff1ca0aa8a6b047bdc038ed0d14cfba8495f9f4eca1ce0d588c2cb1bd051,name=/influxdb09telegraf01_telegraf_run_4 value=18446744073709551615': invalid integer
I am not sure that value is right or not. It is too long for int. But telegraf
seems like that can't send metrics to influxdb
. Maybe cause of this error.
I ran and tested telegraf
on docker
container. Is it not good for gathering metrics?
FROM ubuntu:14.04
RUN apt-get -y install wget
RUN wget http://get.influxdb.org/telegraf/telegraf_0.1.4_amd64.deb
RUN dpkg -i telegraf_0.1.4_amd64.deb
ADD ./telegraf.toml /opt/telegraf/
WORKDIR /opt/telegraf
CMD ["/opt/telegraf/telegraf","-config","/opt/telegraf/telegraf.toml"]
telegraf:
build: .
dockerfile: telegraf.Dockerfile
volumes:
- /sys:/sys:ro
- /var/run/docker.sock:/var/run/docker.sock
I installed telegraf using the Debian/Ubuntu package on the README file. It is sending metrics to influxdb 0.9. The hostname of the machine running telegraf is not being passed as a tag with metrics.
In the config file, the debug
setting is non-functional:
# Configuration for tivan itself
[agent]
interval = "10s"
debug = false
hostname = "catalyst"
Whether debug is set to true or false, no debugging output happens. I can use tivan -config tivan.toml -debug
to get debug output. I'd be fine with that being the only option.
If removing the debug
setting from the config is simple I vote we do that. It seems better as a command-line flag anyway.
We should support the StatsD protocol and aggregation. However, unlike StatsD, the metric names should follow the conventions of the key section of the InfluxDB line protocol.
The StatsD values should be output as a single field called value
. This should be able to flush to any of the output sinks like what is mentioned in #35.
This means that a single Telegraf instance could serve as a StatsD aggregator that works with the InfluxDB schema design of measurements and tags.
In a Centos 7 host I have installed the Influxdb package first and then the telegraf package.
The telegraf package overwrites the startup script of influxdb "/opt/influxdb/init.sh".
In the /etc/init.d both links are pointing to the same file:
lrwxrwxrwx. 1 root root 21 jul 1 15:00 influxdb -> /opt/influxdb/init.sh
lrwxrwxrwx. 1 root root 21 jul 1 15:06 telegraf -> /opt/influxdb/init.sh
The consequence is that installing the telegraph package breaks the influxdb startup scripts
something like this in /etc/logrotate.d/telegraf would work for redhat/centos (rotating weekly, keep for a month):
/var/log/telegraf/telegraf.log {
missingok
nocreate
postrotate
/sbin/service telegraf restart > /dev/null 2>/dev/null || true
endscript
}
Running telegraf 0.1.4 on CentOS 7 with Docker 1.7.1 fails to gather docker metrics. What I get is
error getting docker info: open /sys/fs/cgroup/cpuacct/docker//cpuacct.stat: no such file or directory
I tried running find / -name docker -type d
and only found:
/run/docker
/var/lib/docker
/etc/docker
/usr/libexec/docker
Perhaps the cgroup paths are different in CentOS/RHEL 7?
Since influxdb and telegraf share the same ROOT DIR's, the init.sh file will get overwritten when both are installed via RPM.
ls -lah /etc/init.d/{telegraf,influxdb}
lrwxrwxrwx 1 root root 21 Jun 22 15:53 /etc/init.d/influxdb -> /opt/influxdb/init.sh
lrwxrwxrwx 1 root root 21 Jun 22 16:06 /etc/init.d/telegraf -> /opt/influxdb/init.sh
Suggest changing the following s/influxdb/telegraf/
so that telegraf is treated independent from influxdb
INSTALL_ROOT_DIR=/opt/influxdb
TELEGRAF_LOG_DIR=/var/log/influxdb
CONFIG_ROOT_DIR=/etc/opt/influxdb
With 0.9.3, the line protocol has changed for integers: influxdata/influxdb#3526
This change means that current telegraf users who upgrade to 0.9.3 cannot write a number of values, because they are integer fields in the db but the current telegraf sends them without the trailing i
, meaning they parse as floats. That causes a type mismatch and a write failure.
When ^C'ing Telegraf, there is quite a bit of lag, is should be able to exit immediately
Are you planning on supporting config reloading? nginx -s reload
comes to mind.
I'm thinking of a use case where telegraf is monitoring a cluster of servers. Some nodes fail and new ones are added -- the config needs to be updated.
A reload command would happen within the gather interval so metrics would not be dropped.
I was experimenting with Telegraf, and I am pretty impressed with the ease of getting it setup and the potential for adding plugins. That said, I noticed that Tags don't seem to be passed at the point level. Not even the default "host" as I would have expected after finding issue #4 .
I think I tracked down the cause.
It looks like a recent change in the influxdb client ( influxdata/influxdb@e6c36d5 ) has made it so that passing common parameters at the batch level is no longer allowed.
If we look at the agent.go code, it appears that the "Tags" were passed at the top of the batch level and expected to be inherited down through the "sub" points. This seems to be leveraging the now removed common batch parameters functionality.
close(points)
var acc BatchPoints
acc.Tags = a.Config.Tags
acc.Time = time.Now()
acc.Database = a.Config.Database
for sub := range points {
acc.Points = append(acc.Points, sub.Points...)
}
I am not familiar enough with Go to effectively fix and properly add unit testing, but hopefully this is enough for someone to jump in and take it from here.
Not sure if that's intentional or not, but it could lead to confusion. I can see a use case, where I want to sample different metrics at different intervals.
What's the intention?
We should be able to send metrics to a variety of places other than InfluxDB. This means that we'll need to have some sort of framework for defining new output sinks. This also means that we should be able to disable sending metrics to any of these output sinks, including InfluxDB.
What I'm thinking is to pull the InfluxDB settings into another area like outputs
. Then, like plugins, define a number of different outputs where metrics can be sent to.
For Kafka, we should send metrics using the line protocol. This should be fairly simply using the InfluxDB client to convert the metrics to their line protocol equivalents.
The work to add support for Riemann (#34) will also need this output sink implementation.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.