rossmcdonald / telegraf Goto Github PK

View Code? Open in Web Editor NEW

138.0 138.0 82.0 49 KB

Ansible role for installing, configuring, and maintaining Telegraf

Jinja 100.00%

telegraf's People

Contributors

Stargazers

Watchers

Forkers

undergreen millerjam viasite-ansible ahivert amnay-mo edificeio rbrutas valtaz rlizana mckelvin zhangdavids codecounselor gurayops behid pklkelly ghostpunk neneko-kun mediapeers codesplicer laangarita dmtrsokolov judy-zz jacob-go actioniq-oss nyddle haiderny juju4 damir-manapov castaglia zyun53 morsicus trongnhanbmt nhanatl unlikelyzero tryfan verygood-ops cptcanuck ajanis hydrandt diraol drmagpie evfonarik billklinevt snehabhi maxisme markfsummers nimisha-97 mjuliaq alanbbr debrez gaviscada zklapow florinandone kusold rremizov avs6 davidcallen e1mo cheyunhua mchandler-wowcorp rgevaert filviu carloshorro dylwong roozbehossia mlindes segator neem320

telegraf's Issues

inputs plugin configuration shouldn't be defined as key/value

https://github.com/influxdata/telegraf/tree/master/plugins/inputs/procstat

the procstat input plugin configuration block can have more than one instance in telegraf.conf.
Can't do that with the current template. Should change that to a list. I'll make a pull request eventually

Updating Ansible Galaxy package

Can you update Ansible Galaxy package? Last version is 2 years old and there are 24 commits since then (including one of my own).

Support Python 3

http://docs.ansible.com/ansible/latest/python_3_support.html

Role not working, missing GPG Key

fatal: [proxmox1]: FAILED! => {"changed": false, "msg": "Failed to update apt cache: W:GPG error: https://repos.influxdata.com/debian bullseye InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY D8FF8E1F7DF8B07E, E:The repository 'https://repos.influxdata.com/debian bullseye InRelease' is not signed."}

order matters when defining pluging settings

Based on the following quote in telegraf documentation, order in mappings should be preserved.

NOTE: Order matters, the [inputs.cpu.tags] table must be at the end of the plugin definition.
in https://github.com/influxdata/telegraf/blob/master/docs/CONFIGURATION.md

and considering the following:

telegraf_plugins:                                                                      
  - name: net                                                                            
    options:                                                                             
      interval: 5s                                                                       
      fielddrop: ['ip_*','tcp_*','udp_*','icmp*','packets_*','drop_*','err_*']           
      fieldpass: ['bytes_*']                                                             
      tagdrop:                                                                           
        interface: [ "imq0", "imq1" ]

It seems that when using the tagdrop or tagpass attributes, the order is not preserved and I end up with an invalid configuration in telegraf that looks like this:

[[inputs.net]]
  [inputs.net.tagdrop]
      interface = [ "imq0", "imq1" ]
    fielddrop = [ "ip_*", "tcp_*", "udp_*", "icmp*", "packets_*", "drop_*", "err_*" ]
    interval = "5s"
    fieldpass = [ "bytes_*" ]

Expected end result should look like this:

[[inputs.net]]
  fielddrop = [ "ip_*", "tcp_*", "udp_*", "icmp*", "packets_*", "drop_*", "err_*" ]
  interval = "5s"
  fieldpass = [ "bytes_*" ]
  [inputs.net.tagdrop]
    interface = [ "imq0", "imq1" ]

configure.yml failing with "AnsibleUndefinedVariable: 'dict object' has no attribute 'iteritems'"

Running this playbook to deploy Telegraf instances in several Ubuntu 16.04.2 LTS servers using ansible and python 3.6 on a Macbook.

I had to put the full path to "src: /etc/ansible/roles/rossmcdonald.telegraf/templates/telegraf.conf.j2" in configure.yml to avoid errors but now I get this problem:

"AnsibleUndefinedVariable: 'dict object' has no attribute 'iteritems'"

And with the -vvvv traces, I get:

(0, b'\r\n\r\n{"stat": {"rgrp": true, "xusr": false, "exists": true, "block_size": 4096, "path": "/etc/telegraf/telegraf.conf", "isuid": false, "readable": true, "pw_name": "root", "wgrp": false, "attr_flags": "e", "rusr": true, "xgrp": false, "isreg": true, "woth": false, "blocks": 176, "uid": 0, "charset": "utf-8", "writeable": true, "mtime": 1499906354.0, "executable": false, "nlink": 1, "ischr": false, "wusr": true, "checksum": "e942793878aabb97aed9c8832c6ae669b112354c", "isblk": false, "version": "1977560557", "isdir": false, "mode": "0644", "roth": true, "issock": false, "atime": 1500324802.128394, "gr_name": "root", "inode": 405322, "attributes": ["extents"], "isgid": false, "xoth": false, "isfifo": false, "size": 86929, "mimetype": "text/plain", "dev": 64512, "device_type": 0, "ctime": 1500324801.868396, "islnk": false, "gid": 0}, "changed": false, "invocation": {"module_args": {"checksum_algorithm": "sha1", "checksum_algo": "sha1", "get_checksum": true, "follow": true, "get_md5": false, "path": "/etc/telegraf/telegraf.conf", "get_attributes": true, "get_mime": true}}}\r\n', b'Shared connection to cdc-sng-001 closed.\r\n')
fatal: [cdc-sng-001]: FAILED! => {
"changed": false,
"failed": true,
"msg": "AnsibleUndefinedVariable: 'dict object' has no attribute 'iteritems'"
}

python3 migration causes a failure

TASK [rossmcdonald.telegraf : Install any necessary dependencies [Debian/Ubuntu]] ********************************************************************************************************************************
FAILED - RETRYING: Install any necessary dependencies [Debian/Ubuntu] (2 retries left).
FAILED - RETRYING: Install any necessary dependencies [Debian/Ubuntu] (1 retries left).
fatal: [m]: FAILED! => {"attempts": 2, "changed": false, "msg": "No package matching 'python-httplib2' is available"}

Testing Debian 11 and python2 is deprecated in the repos so this task fails. python3-httplib2 is available though so should be easy fix.

https://github.com/rossmcdonald/telegraf/blob/master/tasks/install-debian.yml#L5

Failing for Ubuntu 22.04 (Jammy) and GH Actions proposal

Hi,

Installing on Ubuntu 22.04 is failing because a "jammy" repo is not yet published. But apparently the repo structure is changing anyway and we should be using "stable" instead.

https://community.influxdata.com/t/repo-for-ubuntu-jammy/24657/5

so repo should look like this:

https://repos.influxdata.com/ubuntu stable main

Weirder yet is that the doc site recomends using /debian stable main for both Ubuntu and Debian. And https://repos.influxdata.com/debian/dists/ seems to contain both Debian and Ubuntu releases. Anyways either should work. Would you like me to open a PR for this ?

Second question:

I've been using github actions to test various distros with my ansible-roles, would you be interested in a PR enabling such tests to allow detection of issues such as this ?

Cheers!

telegraf_install_url is being ignored

When i set the telegraf_install_url value on the role, ansible ignores it.

role:

- role: rossmcdonald.telegraf
      telegraf_install_url: http://some_url.com/telegraf_1.0.0-beta3_amd64.deb
      telegraf_hostname: ""

ansible output:

TASK [rossmcdonald.telegraf : Download Telegraf package via URL [Debian/Ubuntu]] ***
       task path: /tmp/kitchen/roles/rossmcdonald.telegraf/tasks/install-debian.yml:34
       skipping: [localhost] => {"changed": false, "skip_reason": "Conditional check failed", "skipped": true}

       TASK [rossmcdonald.telegraf : Install downloaded Telegraf package [Debian/Ubuntu]] ***
       task path: /tmp/kitchen/roles/rossmcdonald.telegraf/tasks/install-debian.yml:38
       skipping: [localhost] => {"changed": false, "skip_reason": "Conditional check failed", "skipped": true}

Neo4j procstat plugin defaults

I'm sorry to be so blunt, but why is there a neo4j procstat set in defaults/main.yml? Is this supposed to be your personal ansible role, or? I've reached it through https://github.com/influxdata/telegraf#ansible-role, so I unfairly assumed it should be as generic as possible.

How would I go about removing it without recreating the whole dictionary? As far as I understood it, I'm supposed to override only the telegraf_plugins_extra -- did I minsinterpret the intention?

Thanks, and once again, this is just an inquiry, I don't feel like I'm entitled to anything. I'll gladly fork and maintain my own flavor, I'm just looking for context.

Don't assume arrays are string values

There is a StatsD option percentiles: [50,90,95,99]

It seems like

{% if value is sequence and value is not string %}
    {{ key }} = [ "{{ value|join('", "') }}" ]
{% else %}

is causing the configuration to not allow an array of integers to be defined.

Or am I missing something?

New Signing Key for InfluxDB

InfluxData rotated their signing key due to a security concern: https://www.influxdata.com/blog/linux-package-signing-key-rotation/

The URL of the signing key has changed to https://repos.influxdata.com/influxdata-archive_compat.key

Missing amazon.aws dependency

This role depends on the amazon.aws collection which seems to be included in Ansible 2.11+, but not in earlier versions.

Could you add this in the role's metadata?

yum repo does not work on Amazon Linux

This role fails on Amazon linux because it tries to access https://repos.influxdata.com/amazon instead of https://repos.influxdata.com/centos

fatal: [localhost]: FAILED! => {"changed": false, "failed": true, "msg": "https://repos.influxdata.com//amazon/latest/x86_64/stable/repodata/repomd.xml: [Errno 14] PYCURL ERROR 22 - \"The requested URL returned error: 404 Not Found\"\nTrying other mirror.\n\n\n One of the configured repositories failed (InfluxDB Repository - Amazon latest),\n and yum doesn't have enough cached data to continue. At this point the only\n safe thing yum can do is fail. There are a few ways to work \"fix\" this:\n\n 1. Contact the upstream for the repository and get them to fix the problem.\n\n 2. Reconfigure the baseurl/etc. for the repository, to point to a working\n upstream. This is most often useful if you are using a newer\n distribution release than is supported by the repository (and the\n packages for the previous distribution release still work).\n\n 3. Disable the repository, so yum won't use it by default. Yum will then\n just ignore the repository until you permanently enable it again or use\n --enablerepo for temporary usage:\n\n yum-config-manager --disable influxdb\n\n 4. Configure the failing repository to be skipped, if it is unavailable.\n Note that yum will try to contact the repo. when it runs most commands,\n so will have to try and fail each time (and thus. yum will be be much\n slower). If it is a very temporary problem though, this is often a nice\n compromise:\n\n yum-config-manager --save --setopt=influxdb.skip_if_unavailable=true\n\nfailure: repodata/repomd.xml from influxdb: [Errno 256] No more mirrors to try.\n", "rc": 1, "results": []}

Telegraf does have RedHat Repos but the playbook assumes there is one.

Using this role on a RHEL 7 server fails because baseurl goes 404.

Getting "telegraf is not running" on AMI linux

When running the role and installing telegraf I get this error:


15:20:22 RUNNING HANDLER [rossmcdonald.telegraf : check status] *************************
15:20:22 fatal: [514.154.116.12]: FAILED! => {"changed": true, "cmd": ["service", "telegraf", "status"], "delta": "0:00:00.011123", "end": "2017-01-21 15:20:22.550148", "failed": true, "rc": 3, "start": "2017-01-21 15:20:22.539025", "stderr": "", "stdout": "telegraf Process is not running [ FAILED ]", "stdout_lines": ["telegraf Process is not running [ FAILED ]"], "warnings": ["Consider using service module rather than running service"]}
15:20:22 ...ignoring
15:20:22 fatal: [54.154.2233.28]: FAILED! => {"changed": true, "cmd": ["service", "telegraf", "status"], "delta": "0:00:00.011904", "end": "2017-01-21 15:20:22.540582", "failed": true, "rc": 3, "start": "2017-01-21 15:20:22.528678", "stderr": "", "stdout": "telegraf Process is not running [ FAILED ]", "stdout_lines": ["telegraf Process is not running [ FAILED ]"], "warnings": ["Consider using service module rather than running service"]}
15:20:22 ...ignoring
15:20:22 fatal: [54.171.52.126]: FAILED! => {"changed": true, "cmd": ["service", "telegraf", "status"], "delta": "0:00:00.025982", "end": "2017-01-21 15:20:22.801210", "failed": true, "rc": 3, "start": "2017-01-21 15:20:22.775228", "stderr": "", "stdout": "telegraf Process is not running [ FAILED ]", "stdout_lines": ["telegraf Process is not running [ FAILED ]"], "warnings": ["Consider using service module rather than running service"]}
15:20:22 ...ignoring
1

Any idea what could be wrong?

Default includes monitoring for neo4j

In defaults/main.yml
- name: procstat options: pid_file: "/var/lib/neo4j/data/neo4j-service.pid" prefix: "neo4j_proc"

I'm guessing this is a carryover from your local deployment, as it doesn't seem to be a standard part of a telegraf deployment.

I'll submit a PR to remove that default, and if this isn't intentional you can use it, otherwise you can drop it :)

Module throws errors when running ansible-playbook with -C

Getting errors when running ansible-playbook with -C:

RUNNING HANDLER [rossmcdonald.telegraf : assert running] *****************************************************************************************
fatal: [my.host.name]: FAILED! => {"msg": "The conditional check 'telegraf_service_status.rc == 0' failed. The error was: error while evaluating conditional (telegraf_service_status.rc == 0): 'dict object' has no attribute 'rc'"}

When running with -C, ansible-playbook doesn't execute command resources; therefore telegraf_service_status doesn't get set and the check fails. This precludes this role from being run in an automated fashion; for example via a cron or Jenkins job.

Role should probably be changed instead to use a service resource rather than trying to call the service command (which doesn't exist for all systems) directly.

Suggestions for using telegraf without influxdb

I'm new to telegraf and this role, but the role looks great!

Do you have any suggestions for using this role with alternate output plugins (librato, datadog, etc) and without influxdb?

It looks like telegraf.conf.j2 could be modified to make the [[outputs.influxdb]] section optional and add a new section [[outputs.{{ plugin.name }}]] similar to the inputs one.

Also in the same file, I see a section header service PLUGINS. What, if anything, was intended to go in there?

Finally, do telegraf administrators typically use /etc/telegraf/telegraf.d/ or not?

Different location of systemd unit file on ubuntu

On Ubuntu 16.04+ the unit location is in /lib/systemd/system/telegraf.service. Otherwise great role !

Add variable to decide if telegraf systemd service is enabled

Firstly - thanks for the excellent ansible role. If I can suggest a relatively minor issue/improvement....

In the scenario where you are running the ansible role as a part of creating a machine image (e.g. AWS AMI, VirtualBox OVF, etc..) then you may not want Telegraf to start automatically because any telegraf outputs from the "machine-image building VM" are junk and unwanted.

Such outputs would be :

do not want telegraf log file to contain lines from the machine image building VM
do not want the output information (stats, logs) to go to your output databases e.g. InfluxDB

Ideally one should be able to install Telegraf with the option to disable the systemd service. The systemd service can then be explicitly Enabled and Started when the machine image (e.g. AWS AMI) is deployed as an EC2 instance.

The change to ansible is small. I can create a fork and supply the code change if you think it is a good idea.

Issues using this role with Telegraf v1.11.0

My team uses this Ansible Role as part of our performance test suite and we started failing as soon as Telegraf v1.11.0 was released.

We have started the investigation as to what is the root cause, but our short term fix has been to set telegraf_install_url to the previous version.

Is anyone else seeing anything like this:

fatal: [perf-use-cases-chunkedtransferencoding-repose-test01]: FAILED! => {
    "changed": true,
    "cmd": ["service", "telegraf", "status"],
    "start": "2019-06-18 11:33:50.597645",
    "end": "2019-06-18 11:33:50.607863",
    "delta": "0:00:00.010218",
    "failed": true,
    "rc": 3,
    "stderr_lines": [],
    "stdout_lines": [
        "● telegraf.service - The plugin-driven server agent for reporting metrics into InfluxDB",
        "   Loaded: loaded (/lib/systemd/system/telegraf.service; enabled; vendor preset: enabled)",
        "   Active: inactive (dead) (Result: exit-code) since Tue 2019-06-18 11:33:45 UTC; 5s ago",
        "     Docs: https://github.com/influxdata/telegraf",
        "  Process: 22531 ExecStart=/usr/bin/telegraf -config /etc/telegraf/telegraf.conf -config-directory /etc/telegraf/telegraf.d $TELEGRAF_OPTS (code=exited, status=1/FAILURE)",
        " Main PID: 22531 (code=exited, status=1/FAILURE)",
        "",
        "Jun 18 11:33:45 perf-use-cases-chunkedtransferencoding-repose-test01 systemd[1]: telegraf.service: Unit entered failed state.",
        "Jun 18 11:33:45 perf-use-cases-chunkedtransferencoding-repose-test01 systemd[1]: telegraf.service: Failed with result 'exit-code'.",
        "Jun 18 11:33:45 perf-use-cases-chunkedtransferencoding-repose-test01 systemd[1]: telegraf.service: Service hold-off time over, scheduling restart.",
        "Jun 18 11:33:45 perf-use-cases-chunkedtransferencoding-repose-test01 systemd[1]: Stopped The plugin-driven server agent for reporting metrics into InfluxDB.",
        "Jun 18 11:33:45 perf-use-cases-chunkedtransferencoding-repose-test01 systemd[1]: telegraf.service: Start request repeated too quickly.",
        "Jun 18 11:33:45 perf-use-cases-chunkedtransferencoding-repose-test01 systemd[1]: Failed to start The plugin-driven server agent for reporting metrics into InfluxDB."
    ]
}

apt-key output should not be parsed (stdout is not a terminal)

Hello

In this line

line19

adding the key for installing telegraf, did you guys get the next error ?
FAILED! => {"changed": false, "cmd": "/usr/bin/apt-key adv --keyserver https://repos.influxdata.com/influxdb.key --recv None", "msg": "Error fetching key None from keyserver: https://repos.influxdata.com/influxdb.key", "rc": 2, "stderr": "Warning: apt-key output should not be parsed (stdout is not a terminal)\ngpg: \"None\" not a key ID: skipping\n", "stderr_lines": ["Warning: apt-key output should not be parsed (stdout is not a terminal)", "gpg: \"None\" not a key ID: skipping"], "stdout": "Executing: /tmp/apt-key-gpghome.yw4v6KFcxF/gpg.1.sh --keyserver https://repos.influxdata.com/influxdb.key --recv None\n", "stdout_lines": ["Executing: /tmp/apt-key-gpghome.yw4v6KFcxF/gpg.1.sh --keyserver https://repos.influxdata.com/influxdb.key --recv None"]}

Rewrite Plugin Configuration

The current templating for the Telegraf plugins is very complicated and hard to reason about. It should be rewritten using Jinja2 macros.

Set defaults vars like default telegraf conf ?

Is this normal to not have the same default vars as the default telegraf config ?

I install manually telegraf 0.12.1 but :

Default plugins

I don't have the input call io in your defaults but diskio. Maybe this is due to an update of telegraf ?
I don't find in your defaults the input kernel but it's enabled with the default install of telegraf

Default conf plugins

cpu plugin

In your defaults, you have the option :

drop:
      - cpu_time

but in the default install is :

fielddrop:
    - time_*

disk plugin

In your defaults, you have the option :

mountpoints:
      - "/"

but in the default install mountpoints is commented and another is present :

ignore_fs:
    - tmpfs
    - devtmpfs

diskio or io plugin

In your defaults, you have the option :

skip_serial_number: "true"

but in the default install all options are commented

kernel plugin

This plugin is not in your conf but enabled in telegraf conf

Maybe all of this it's a choice ?

rossmcdonald / telegraf Goto Github PK

telegraf's People

Contributors

Stargazers

Watchers

Forkers

telegraf's Issues

Default plugins

Default conf plugins

cpu plugin

disk plugin

diskio or io plugin

kernel plugin

Recommend Projects

Recommend Topics

Recommend Org