rossmcdonald / telegraf Goto Github PK
View Code? Open in Web Editor NEWAnsible role for installing, configuring, and maintaining Telegraf
Ansible role for installing, configuring, and maintaining Telegraf
https://github.com/influxdata/telegraf/tree/master/plugins/inputs/procstat
the procstat input plugin configuration block can have more than one instance in telegraf.conf.
Can't do that with the current template. Should change that to a list. I'll make a pull request eventually
Can you update Ansible Galaxy package? Last version is 2 years old and there are 24 commits since then (including one of my own).
fatal: [proxmox1]: FAILED! => {"changed": false, "msg": "Failed to update apt cache: W:GPG error: https://repos.influxdata.com/debian bullseye InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY D8FF8E1F7DF8B07E, E:The repository 'https://repos.influxdata.com/debian bullseye InRelease' is not signed."}
Based on the following quote in telegraf documentation, order in mappings
should be preserved.
NOTE: Order matters, the [inputs.cpu.tags] table must be at the end of the plugin definition.
in https://github.com/influxdata/telegraf/blob/master/docs/CONFIGURATION.md
and considering the following:
telegraf_plugins:
- name: net
options:
interval: 5s
fielddrop: ['ip_*','tcp_*','udp_*','icmp*','packets_*','drop_*','err_*']
fieldpass: ['bytes_*']
tagdrop:
interface: [ "imq0", "imq1" ]
It seems that when using the tagdrop
or tagpass
attributes, the order is not preserved and I end up with an invalid configuration in telegraf that looks like this:
[[inputs.net]]
[inputs.net.tagdrop]
interface = [ "imq0", "imq1" ]
fielddrop = [ "ip_*", "tcp_*", "udp_*", "icmp*", "packets_*", "drop_*", "err_*" ]
interval = "5s"
fieldpass = [ "bytes_*" ]
Expected end result should look like this:
[[inputs.net]]
fielddrop = [ "ip_*", "tcp_*", "udp_*", "icmp*", "packets_*", "drop_*", "err_*" ]
interval = "5s"
fieldpass = [ "bytes_*" ]
[inputs.net.tagdrop]
interface = [ "imq0", "imq1" ]
Running this playbook to deploy Telegraf instances in several Ubuntu 16.04.2 LTS servers using ansible and python 3.6 on a Macbook.
I had to put the full path to "src: /etc/ansible/roles/rossmcdonald.telegraf/templates/telegraf.conf.j2" in configure.yml to avoid errors but now I get this problem:
"AnsibleUndefinedVariable: 'dict object' has no attribute 'iteritems'"
And with the -vvvv traces, I get:
(0, b'\r\n\r\n{"stat": {"rgrp": true, "xusr": false, "exists": true, "block_size": 4096, "path": "/etc/telegraf/telegraf.conf", "isuid": false, "readable": true, "pw_name": "root", "wgrp": false, "attr_flags": "e", "rusr": true, "xgrp": false, "isreg": true, "woth": false, "blocks": 176, "uid": 0, "charset": "utf-8", "writeable": true, "mtime": 1499906354.0, "executable": false, "nlink": 1, "ischr": false, "wusr": true, "checksum": "e942793878aabb97aed9c8832c6ae669b112354c", "isblk": false, "version": "1977560557", "isdir": false, "mode": "0644", "roth": true, "issock": false, "atime": 1500324802.128394, "gr_name": "root", "inode": 405322, "attributes": ["extents"], "isgid": false, "xoth": false, "isfifo": false, "size": 86929, "mimetype": "text/plain", "dev": 64512, "device_type": 0, "ctime": 1500324801.868396, "islnk": false, "gid": 0}, "changed": false, "invocation": {"module_args": {"checksum_algorithm": "sha1", "checksum_algo": "sha1", "get_checksum": true, "follow": true, "get_md5": false, "path": "/etc/telegraf/telegraf.conf", "get_attributes": true, "get_mime": true}}}\r\n', b'Shared connection to cdc-sng-001 closed.\r\n')
fatal: [cdc-sng-001]: FAILED! => {
"changed": false,
"failed": true,
"msg": "AnsibleUndefinedVariable: 'dict object' has no attribute 'iteritems'"
}
TASK [rossmcdonald.telegraf : Install any necessary dependencies [Debian/Ubuntu]] ********************************************************************************************************************************
FAILED - RETRYING: Install any necessary dependencies [Debian/Ubuntu] (2 retries left).
FAILED - RETRYING: Install any necessary dependencies [Debian/Ubuntu] (1 retries left).
fatal: [m]: FAILED! => {"attempts": 2, "changed": false, "msg": "No package matching 'python-httplib2' is available"}
Testing Debian 11 and python2 is deprecated in the repos so this task fails. python3-httplib2
is available though so should be easy fix.
https://github.com/rossmcdonald/telegraf/blob/master/tasks/install-debian.yml#L5
Hi,
Installing on Ubuntu 22.04 is failing because a "jammy" repo is not yet published. But apparently the repo structure is changing anyway and we should be using "stable" instead.
https://community.influxdata.com/t/repo-for-ubuntu-jammy/24657/5
so repo should look like this:
https://repos.influxdata.com/ubuntu stable main
Weirder yet is that the doc site recomends using /debian stable main
for both Ubuntu and Debian. And https://repos.influxdata.com/debian/dists/ seems to contain both Debian and Ubuntu releases. Anyways either should work. Would you like me to open a PR for this ?
Second question:
I've been using github actions to test various distros with my ansible-roles, would you be interested in a PR enabling such tests to allow detection of issues such as this ?
Cheers!
When i set the telegraf_install_url value on the role, ansible ignores it.
role:
- role: rossmcdonald.telegraf
telegraf_install_url: http://some_url.com/telegraf_1.0.0-beta3_amd64.deb
telegraf_hostname: ""
ansible output:
TASK [rossmcdonald.telegraf : Download Telegraf package via URL [Debian/Ubuntu]] ***
task path: /tmp/kitchen/roles/rossmcdonald.telegraf/tasks/install-debian.yml:34
skipping: [localhost] => {"changed": false, "skip_reason": "Conditional check failed", "skipped": true}
TASK [rossmcdonald.telegraf : Install downloaded Telegraf package [Debian/Ubuntu]] ***
task path: /tmp/kitchen/roles/rossmcdonald.telegraf/tasks/install-debian.yml:38
skipping: [localhost] => {"changed": false, "skip_reason": "Conditional check failed", "skipped": true}
I'm sorry to be so blunt, but why is there a neo4j procstat set in defaults/main.yml
? Is this supposed to be your personal ansible role, or? I've reached it through https://github.com/influxdata/telegraf#ansible-role, so I unfairly assumed it should be as generic as possible.
How would I go about removing it without recreating the whole dictionary? As far as I understood it, I'm supposed to override only the telegraf_plugins_extra
-- did I minsinterpret the intention?
Thanks, and once again, this is just an inquiry, I don't feel like I'm entitled to anything. I'll gladly fork and maintain my own flavor, I'm just looking for context.
There is a StatsD option percentiles: [50,90,95,99]
It seems like
{% if value is sequence and value is not string %}
{{ key }} = [ "{{ value|join('", "') }}" ]
{% else %}
is causing the configuration to not allow an array of integers to be defined.
Or am I missing something?
InfluxData rotated their signing key due to a security concern: https://www.influxdata.com/blog/linux-package-signing-key-rotation/
The URL of the signing key has changed to https://repos.influxdata.com/influxdata-archive_compat.key
This role depends on the amazon.aws
collection which seems to be included in Ansible 2.11+, but not in earlier versions.
Could you add this in the role's metadata?
This role fails on Amazon linux because it tries to access https://repos.influxdata.com/amazon instead of https://repos.influxdata.com/centos
fatal: [localhost]: FAILED! => {"changed": false, "failed": true, "msg": "https://repos.influxdata.com//amazon/latest/x86_64/stable/repodata/repomd.xml: [Errno 14] PYCURL ERROR 22 - \"The requested URL returned error: 404 Not Found\"\nTrying other mirror.\n\n\n One of the configured repositories failed (InfluxDB Repository - Amazon latest),\n and yum doesn't have enough cached data to continue. At this point the only\n safe thing yum can do is fail. There are a few ways to work \"fix\" this:\n\n 1. Contact the upstream for the repository and get them to fix the problem.\n\n 2. Reconfigure the baseurl/etc. for the repository, to point to a working\n upstream. This is most often useful if you are using a newer\n distribution release than is supported by the repository (and the\n packages for the previous distribution release still work).\n\n 3. Disable the repository, so yum won't use it by default. Yum will then\n just ignore the repository until you permanently enable it again or use\n --enablerepo for temporary usage:\n\n yum-config-manager --disable influxdb\n\n 4. Configure the failing repository to be skipped, if it is unavailable.\n Note that yum will try to contact the repo. when it runs most commands,\n so will have to try and fail each time (and thus. yum will be be much\n slower). If it is a very temporary problem though, this is often a nice\n compromise:\n\n yum-config-manager --save --setopt=influxdb.skip_if_unavailable=true\n\nfailure: repodata/repomd.xml from influxdb: [Errno 256] No more mirrors to try.\n", "rc": 1, "results": []}
Using this role on a RHEL 7 server fails because baseurl goes 404.
When running the role and installing telegraf I get this error:
15:20:22 RUNNING HANDLER [rossmcdonald.telegraf : check status] *************************
15:20:22 fatal: [514.154.116.12]: FAILED! => {"changed": true, "cmd": ["service", "telegraf", "status"], "delta": "0:00:00.011123", "end": "2017-01-21 15:20:22.550148", "failed": true, "rc": 3, "start": "2017-01-21 15:20:22.539025", "stderr": "", "stdout": "telegraf Process is not running [ FAILED ]", "stdout_lines": ["telegraf Process is not running [ FAILED ]"], "warnings": ["Consider using service module rather than running service"]}
15:20:22 ...ignoring
15:20:22 fatal: [54.154.2233.28]: FAILED! => {"changed": true, "cmd": ["service", "telegraf", "status"], "delta": "0:00:00.011904", "end": "2017-01-21 15:20:22.540582", "failed": true, "rc": 3, "start": "2017-01-21 15:20:22.528678", "stderr": "", "stdout": "telegraf Process is not running [ FAILED ]", "stdout_lines": ["telegraf Process is not running [ FAILED ]"], "warnings": ["Consider using service module rather than running service"]}
15:20:22 ...ignoring
15:20:22 fatal: [54.171.52.126]: FAILED! => {"changed": true, "cmd": ["service", "telegraf", "status"], "delta": "0:00:00.025982", "end": "2017-01-21 15:20:22.801210", "failed": true, "rc": 3, "start": "2017-01-21 15:20:22.775228", "stderr": "", "stdout": "telegraf Process is not running [ FAILED ]", "stdout_lines": ["telegraf Process is not running [ FAILED ]"], "warnings": ["Consider using service module rather than running service"]}
15:20:22 ...ignoring
1
Any idea what could be wrong?
In defaults/main.yml
- name: procstat options: pid_file: "/var/lib/neo4j/data/neo4j-service.pid" prefix: "neo4j_proc"
I'm guessing this is a carryover from your local deployment, as it doesn't seem to be a standard part of a telegraf deployment.
I'll submit a PR to remove that default, and if this isn't intentional you can use it, otherwise you can drop it :)
Getting errors when running ansible-playbook with -C:
RUNNING HANDLER [rossmcdonald.telegraf : assert running] *****************************************************************************************
fatal: [my.host.name]: FAILED! => {"msg": "The conditional check 'telegraf_service_status.rc == 0' failed. The error was: error while evaluating conditional (telegraf_service_status.rc == 0): 'dict object' has no attribute 'rc'"}
When running with -C
, ansible-playbook doesn't execute command
resources; therefore telegraf_service_status
doesn't get set and the check fails. This precludes this role from being run in an automated fashion; for example via a cron or Jenkins job.
Role should probably be changed instead to use a service
resource rather than trying to call the service
command (which doesn't exist for all systems) directly.
I'm new to telegraf and this role, but the role looks great!
Do you have any suggestions for using this role with alternate output plugins (librato, datadog, etc) and without influxdb?
It looks like telegraf.conf.j2
could be modified to make the [[outputs.influxdb]]
section optional and add a new section [[outputs.{{ plugin.name }}]]
similar to the inputs one.
Also in the same file, I see a section header service PLUGINS
. What, if anything, was intended to go in there?
Finally, do telegraf administrators typically use /etc/telegraf/telegraf.d/
or not?
On Ubuntu 16.04+ the unit location is in /lib/systemd/system/telegraf.service
. Otherwise great role !
Firstly - thanks for the excellent ansible role. If I can suggest a relatively minor issue/improvement....
In the scenario where you are running the ansible role as a part of creating a machine image (e.g. AWS AMI, VirtualBox OVF, etc..) then you may not want Telegraf to start automatically because any telegraf outputs from the "machine-image building VM" are junk and unwanted.
Such outputs would be :
Ideally one should be able to install Telegraf with the option to disable the systemd service. The systemd service can then be explicitly Enabled and Started when the machine image (e.g. AWS AMI) is deployed as an EC2 instance.
The change to ansible is small. I can create a fork and supply the code change if you think it is a good idea.
My team uses this Ansible Role as part of our performance test suite and we started failing as soon as Telegraf v1.11.0 was released.
We have started the investigation as to what is the root cause, but our short term fix has been to set telegraf_install_url
to the previous version.
Is anyone else seeing anything like this:
fatal: [perf-use-cases-chunkedtransferencoding-repose-test01]: FAILED! => {
"changed": true,
"cmd": ["service", "telegraf", "status"],
"start": "2019-06-18 11:33:50.597645",
"end": "2019-06-18 11:33:50.607863",
"delta": "0:00:00.010218",
"failed": true,
"rc": 3,
"stderr_lines": [],
"stdout_lines": [
"โ telegraf.service - The plugin-driven server agent for reporting metrics into InfluxDB",
" Loaded: loaded (/lib/systemd/system/telegraf.service; enabled; vendor preset: enabled)",
" Active: inactive (dead) (Result: exit-code) since Tue 2019-06-18 11:33:45 UTC; 5s ago",
" Docs: https://github.com/influxdata/telegraf",
" Process: 22531 ExecStart=/usr/bin/telegraf -config /etc/telegraf/telegraf.conf -config-directory /etc/telegraf/telegraf.d $TELEGRAF_OPTS (code=exited, status=1/FAILURE)",
" Main PID: 22531 (code=exited, status=1/FAILURE)",
"",
"Jun 18 11:33:45 perf-use-cases-chunkedtransferencoding-repose-test01 systemd[1]: telegraf.service: Unit entered failed state.",
"Jun 18 11:33:45 perf-use-cases-chunkedtransferencoding-repose-test01 systemd[1]: telegraf.service: Failed with result 'exit-code'.",
"Jun 18 11:33:45 perf-use-cases-chunkedtransferencoding-repose-test01 systemd[1]: telegraf.service: Service hold-off time over, scheduling restart.",
"Jun 18 11:33:45 perf-use-cases-chunkedtransferencoding-repose-test01 systemd[1]: Stopped The plugin-driven server agent for reporting metrics into InfluxDB.",
"Jun 18 11:33:45 perf-use-cases-chunkedtransferencoding-repose-test01 systemd[1]: telegraf.service: Start request repeated too quickly.",
"Jun 18 11:33:45 perf-use-cases-chunkedtransferencoding-repose-test01 systemd[1]: Failed to start The plugin-driven server agent for reporting metrics into InfluxDB."
]
}
Hello
In this line
adding the key for installing telegraf, did you guys get the next error ?
FAILED! => {"changed": false, "cmd": "/usr/bin/apt-key adv --keyserver https://repos.influxdata.com/influxdb.key --recv None", "msg": "Error fetching key None from keyserver: https://repos.influxdata.com/influxdb.key", "rc": 2, "stderr": "Warning: apt-key output should not be parsed (stdout is not a terminal)\ngpg: \"None\" not a key ID: skipping\n", "stderr_lines": ["Warning: apt-key output should not be parsed (stdout is not a terminal)", "gpg: \"None\" not a key ID: skipping"], "stdout": "Executing: /tmp/apt-key-gpghome.yw4v6KFcxF/gpg.1.sh --keyserver https://repos.influxdata.com/influxdb.key --recv None\n", "stdout_lines": ["Executing: /tmp/apt-key-gpghome.yw4v6KFcxF/gpg.1.sh --keyserver https://repos.influxdata.com/influxdb.key --recv None"]}
The current templating for the Telegraf plugins is very complicated and hard to reason about. It should be rewritten using Jinja2 macros.
Is this normal to not have the same default vars as the default telegraf config ?
I install manually telegraf 0.12.1 but :
In your defaults, you have the option :
drop:
- cpu_time
but in the default install is :
fielddrop:
- time_*
In your defaults, you have the option :
mountpoints:
- "/"
but in the default install mountpoints is commented and another is present :
ignore_fs:
- tmpfs
- devtmpfs
In your defaults, you have the option :
skip_serial_number: "true"
but in the default install all options are commented
This plugin is not in your conf but enabled in telegraf conf
Maybe all of this it's a choice ?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.