Coder Social home page Coder Social logo

ovh / rtm Goto Github PK

View Code? Open in Web Editor NEW
12.0 16.0 1.0 402 KB

RTM (OVH Real Time Monitoring)

Home Page: https://docs.ovh.com/fr/dedicated/installer-rtm/

License: BSD 3-Clause "New" or "Revised" License

Makefile 1.54% Shell 1.76% Perl 96.70%
ovh-rtm rtm real-time-monitoring ovh-monitoring

rtm's Introduction

OVH RTM

(OVH Real Time Monitoring probes)

This repository contain OVH RTM probes and packaging script. It depends on the default implementation of the ovh-rtm-metrics-toolkit package and the additional tools noderig and beamium.

Real Time Monitoring is composed of 2 packages:

  • ovh-metrics-toolkit: configure beamium and noderig to push monitoring metrics and probes result's to OVH monitoring platform.

  • ovh-rtm-binaries: copies OVH monitoring probes into /usr/bin/rtm*

For details about Noderig see the main repository

For details about Beamium see the main repository

By installing ovh-rtm-metrics-toolkit package you will be able to have a metrics based monitoring solution for your server. (only working for baremetal and public cloud). (public cloud users can only use Insight to retrieve their metrics)

  1. Displayed on OVHCloud web manager:

  1. on API:

  1. But you can also create your own Grafana monitoring dashboard to display metrics values in real time: (recommended method)

How to proceed: RTM on Grafana

How to install OVH RTM packages:

Please refer to OVH docs: https://docs.ovh.com/gb/en/dedicated/install-rtm/

Releases:

http://last.public.ovh.rtm.snap.mirrors.ovh.net/

Status

OVH datacenters are composed of many type of servers, each running differents OSes with different components. This monitoring solution try to be compatible with the most part of them and thus still currently under development.

Feel free to comment or contribute!

What is collected ?

RTM metrics

RTM collects real time monitoring data (based on noderig default collectors) on CPU, LOAD, RAM, DISK, NET.

RTM probes

RTM probes are perl scripts. Results are mainly available on ovh API. Located in /usr/bin/rtm*, they are launched at differents intervals. It depends on which noderig external collectors folders they are linked. You can see links located in noderig external collector:

/opt/noderig/
├── 3600
│   ├── rtmHourly -> /usr/bin/rtmHourly
│   └── rtmRaidCheck -> /usr/bin/rtmRaidCheck
└── 43200
    └── rtmHardware -> /usr/bin/rtmHardware

Thoses scripts exposes monitoring results as prometheus format (https://github.com/prometheus/docs/blob/master/content/docs/instrumenting/exposition_formats.md).

rtmHourly

This probe is executed each 3600s (1hour). It collects informations like the uptime, load average, memory usage, current rtm version installed,the top processes, open ports and the number of ongoing processes.

data exemple

{"metric":"rtm.info.rtm.version","timestamp":1582207693,"value":"1.0.11"}
{"metric":"rtm.info.uptime","timestamp":1582207693,"value":"11400912"}
{"metric":"rtm.hostname","timestamp":1582207693,"value":"ns10000"}

{"metric":"os.load.processesactive","timestamp":1582207693,"value":"1"}
{"metric":"os.load.processesup","timestamp":1582207693,"value":"650"}

{"metric":"rtm.info.mem.top_mem_1_name","timestamp":1582207693,"value":"/usr/bin/syncthing"}
{"metric":"rtm.info.mem.top_mem_1_size","timestamp":1582207693,"value":"4833764"}
{"metric":"rtm.info.mem.top_mem_2_name","timestamp":1582207693,"value":"/usr/bin/smbd"}
{"metric":"rtm.info.mem.top_mem_2_size","timestamp":1582207693,"value":"1871772"}
{"metric":"rtm.info.mem.top_mem_3_name","timestamp":1582207693,"value":"/usr/bin/smbd"}
{"metric":"rtm.info.mem.top_mem_3_size","timestamp":1582207693,"value":"1510164"}
{"metric":"rtm.info.mem.top_mem_4_name","timestamp":1582207693,"value":"/usr/sbin/named"}
{"metric":"rtm.info.mem.top_mem_4_size","timestamp":1582207693,"value":"1025712"}
{"metric":"rtm.info.mem.top_mem_5_name","timestamp":1582207693,"value":"/usr/sbin/rsyslogd"}
{"metric":"rtm.info.mem.top_mem_5_size","timestamp":1582207693,"value":"361880"}

{"metric":"rtm.info.tcp.listen.ip-0-0-0-0.port-79.uid","timestamp":1582207693,"value":"111"}
{"metric":"rtm.info.tcp.listen.ip-0-0-0-0.port-79.pid","timestamp":1582207693,"value":"4851"}
{"metric":"rtm.info.tcp.listen.ip-0-0-0-0.port-79.username","timestamp":1582207693,"value":"oco"}
{"metric":"rtm.info.tcp.listen.ip-0-0-0-0.port-79.exe","timestamp":1582207693,"value":"/usr/bin/perl"}
{"metric":"rtm.info.tcp.listen.ip-0-0-0-0.port-79.cmdline","timestamp":1582207693,"value":"perl"}
{"metric":"rtm.info.tcp.listen.ip-0-0-0-0.port-79.procname","timestamp":1582207693,"value":"perl"}

rtmHardware

This probe is executed each 43200s (12h). It collects information on the hardware such as the motherboard, PCI devices, disk health (S.M.A.R.T data), etc. Also collects some information on the software, such as the kernel and BIOS version.

data exemple

{"metric":"rtm.hw.mb.manufacture","timestamp":1582208062,"value":"Supermicro"}
{"metric":"rtm.hw.mb.name","timestamp":1582208062,"value":"X10SRi-F"}
{"metric":"rtm.hw.mb.serial","timestamp":1582208062,"value":"NM175S506822"}

{"metric":"rtm.info.bios_date","timestamp":1582208062,"value":"12/17/2015"}
{"metric":"rtm.info.bios_version","timestamp":1582208062,"value":"2.0"}
{"metric":"rtm.info.bios_vendor","timestamp":1582208062,"value":"American Megatrends Inc."}

{"metric":"rtm.info.release.os","timestamp":1582208062,"value":"Ubuntu 16.04 xenial"}
{"metric":"rtm.info.kernel.version","timestamp":1582208062,"value":"#193-Ubuntu SMP Tue Sep 17 17:42:52 UTC 2019"}
{"metric":"rtm.info.kernel.release","timestamp":1582208062,"value":"4.4.0-165-generic"}

{"metric":"rtm.hw.cpu.name","timestamp":1582208062,"value":"Intel(R) Xeon(R) CPU E5-1650 v4 @ 3.60GHz"}
{"metric":"rtm.hw.cpu.number","timestamp":1582208062,"value":"12"}
{"metric":"rtm.hw.cpu.cache","timestamp":1582208062,"value":"15360 KB"}
{"metric":"rtm.hw.cpu.mhz","timestamp":1582208062,"value":"1212.750"}

{"metric":"rtm.info.check.vm","timestamp":1582208062,"value":"False"}
{"metric":"rtm.info.check.oops","timestamp":1582208062,"value":"False"}

{"metric":"rtm.hw.mem.bank-P0-Node0-Channel0-Dimm1-DIMMA2","timestamp":1582208062,"value":"No Module Installed"}
{"metric":"rtm.hw.mem.bank-P0-Node0-Channel1-Dimm1-DIMMB2","timestamp":1582208062,"value":"No Module Installed"}
{"metric":"rtm.hw.mem.bank-P0-Node0-Channel2-Dimm0-DIMMC1","timestamp":1582208062,"value":"16384"}
{"metric":"rtm.hw.mem.bank-P0-Node0-Channel1-Dimm0-DIMMB1","timestamp":1582208062,"value":"16384"}
{"metric":"rtm.hw.mem.bank-P0-Node0-Channel2-Dimm1-DIMMC2","timestamp":1582208062,"value":"No Module Installed"}
{"metric":"rtm.hw.mem.bank-P0-Node0-Channel3-Dimm1-DIMMD2","timestamp":1582208062,"value":"No Module Installed"}
{"metric":"rtm.hw.mem.bank-P0-Node0-Channel3-Dimm0-DIMMD1","timestamp":1582208062,"value":"16384"}
{"metric":"rtm.hw.mem.bank-P0-Node0-Channel0-Dimm0-DIMMA1","timestamp":1582208062,"value":"16384"}

{"metric":"rtm.hw.lspci.pci.ff-15-2","timestamp":1582208062,"value":"8086:6fb6"}
....

{"metric":"rtm.info.hdd.sda.capacity","timestamp":1582208062,"value":"4.00 TB"}
{"metric":"rtm.info.hdd.sda.link_type","timestamp":1582208062,"value":"sata"}
{"metric":"rtm.info.hdd.sda.firmware","timestamp":1582208062,"value":"A5GNT920"}
{"metric":"rtm.info.hdd.sda.dmesg.io.errors","timestamp":1582208062,"value":"0"}
{"metric":"rtm.info.hdd.sda.disk_type","timestamp":1582208062,"value":"hdd"}
{"metric":"rtm.info.hdd.sda.iostat.busy","timestamp":1582208062,"value":"3.34"}
{"metric":"rtm.info.hdd.sda.model","timestamp":1582208062,"value":"HGST HUS726040ALA610"}
{"metric":"rtm.info.hdd.sda.serial","timestamp":1582208062,"value":"K3GDA42B"}
{"metric":"rtm.info.hdd.sda.temperature","timestamp":1582208062,"value":"37"}

{"metric":"rtm.info.hdd.sda.iostat.read.per.sec","timestamp":1582208062,"value":"5.11"}
{"metric":"rtm.info.hdd.sda.iostat.writekb.per.sec","timestamp":1582208062,"value":"108.74"}
{"metric":"rtm.info.hdd.sda.iostat.write.merged.per.sec","timestamp":1582208062,"value":"1.38"}
{"metric":"rtm.info.hdd.sda.iostat.write.avg.wait","timestamp":1582208062,"value":"5.39"}
{"metric":"rtm.info.hdd.sda.iostat.read.avg.wait","timestamp":1582208062,"value":"9.25"}
{"metric":"rtm.info.hdd.sda.iostat.read.merged.per.sec","timestamp":1582208062,"value":"0.01"}
{"metric":"rtm.info.hdd.sda.iostat.readkb.per.sec","timestamp":1582208062,"value":"616.81"}
{"metric":"rtm.info.hdd.sda.iostat.write.per.sec","timestamp":1582208062,"value":"3.19"}

{"metric":"rtm.info.hdd.sda.smart.highest-temperature","timestamp":1582208062,"value":"47"}
{"metric":"rtm.info.hdd.sda.smart.bytes-read","timestamp":1582208062,"value":"21596906627584"}
{"metric":"rtm.info.hdd.sda.smart.udma-crc-error","timestamp":1582208062,"value":"0"}
{"metric":"rtm.info.hdd.sda.smart.link-failures","timestamp":1582208062,"value":"0"}
{"metric":"rtm.info.hdd.sda.smart.temperature","timestamp":1582208062,"value":"37"}
{"metric":"rtm.info.hdd.sda.smart.offline-uncorrectable","timestamp":1582208062,"value":"0"}
{"metric":"rtm.info.hdd.sda.smart.percentage-used","timestamp":1582208062,"value":"0"}
{"metric":"rtm.info.hdd.sda.smart.realocated-event-count","timestamp":1582208062,"value":"0"}
{"metric":"rtm.info.hdd.sda.smart.reallocated-sector-count","timestamp":1582208062,"value":"0"}
{"metric":"rtm.info.hdd.sda.smart.reported-corrected","timestamp":1582208062,"value":"-1"}
{"metric":"rtm.info.hdd.sda.smart.power-cycles","timestamp":1582208062,"value":"23"}
{"metric":"rtm.info.hdd.sda.smart.power-on-hours","timestamp":1582208062,"value":"10521"}
{"metric":"rtm.info.hdd.sda.smart.global-health","timestamp":1582208062,"value":"1"}
{"metric":"rtm.info.hdd.sda.smart.logged-error-count","timestamp":1582208062,"value":"0"}
{"metric":"rtm.info.hdd.sda.smart.current-pending-sector","timestamp":1582208062,"value":"0"}
{"metric":"rtm.info.hdd.sda.smart.reported-uncorrect","timestamp":1582208062,"value":"0"}
{"metric":"rtm.info.hdd.sda.smart.lowest-temperature","timestamp":1582208062,"value":"19"}
{"metric":"rtm.info.hdd.sda.smart.bytes-written","timestamp":1582208062,"value":"10071769343488"}
{"metric":"rtm.info.hdd.sda.smart.time","timestamp":1582208062,"value":"0"}
{"metric":"rtm.info.hdd.sda.smart.command-timeout","timestamp":1582208062,"value":"-1"}

rtmRaidCheck

This probe is executed each 3600s (1hour). It collects information on RAID health's (if available).

data exemple

{"metric":"rtm.hw.scsiraid.unit.md3.vol0.capacity","timestamp":1582208405,"value":"24.4 GB"}
{"metric":"rtm.hw.scsiraid.unit.md3.vol0.phys","timestamp":1582208405,"value":"3"}
{"metric":"rtm.hw.scsiraid.unit.md3.vol0.type","timestamp":1582208405,"value":"raid1"}
{"metric":"rtm.hw.scsiraid.unit.md3.vol0.status","timestamp":1582208405,"value":"active"}
{"metric":"rtm.hw.scsiraid.unit.md3.vol0.flags","timestamp":1582208405,"value":"clean"}

{"metric":"rtm.hw.scsiraid.port.md3.vol0.sda3.capacity","timestamp":1582208405,"value":"24.4 GB"}
{"metric":"rtm.hw.scsiraid.port.md3.vol0.sda3.status","timestamp":1582208405,"value":"active"}
{"metric":"rtm.hw.scsiraid.port.md3.vol0.sda3.flags","timestamp":1582208405,"value":"sync"}
{"metric":"rtm.hw.scsiraid.port.md3.vol0.sdb3.capacity","timestamp":1582208405,"value":"24.4 GB"}
{"metric":"rtm.hw.scsiraid.port.md3.vol0.sdb3.status","timestamp":1582208405,"value":"active"}
{"metric":"rtm.hw.scsiraid.port.md3.vol0.sdb3.flags","timestamp":1582208405,"value":"sync"}

nvidia_smi_stats

Only available with nvidia cards. Collects information and metrics on nvidia hardware.

*need nvidia-smi driver and application installed

Contributing

Instructions on how to contribute to OVH RTM are available on the Contributing page.

rtm's People

Contributors

bigbigbang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

rtm's Issues

Remove OVH RTM (beamium, noderig…) properly, how to fully uninstall the Real Time Monitoring ?

Hello,

The OVH Real Time Monitoring is no more officially compatible with CloudLinux (OVHCloud support said) ; and beamium and noderig seems discontinued (Github said), so I wanted to properly remove these packages.

I've already did :

yum remove ovh-rtm-binaries ovh-rtm-metrics-toolkit beamium noderig

/usr/bin/rtmHourly
/usr/bin/rtmRaidCheck
/usr/bin/rtmHardware

seems to be removed properly

But I wonder if that's the only one thing I must to do. I am worry that some CRON jobs (@reboot or something) are still in my system ?

How to remove all the OVH repositories added by RTM : /etc/yum.repos.d/ovh-rtm.repo, Metrics, and so on ?

yum repolist

rm -f /etc/yum.repos.d/OVH-metrics.repo ; rm -f /etc/yum.repos.d/ovh-rtm.repo ; rm -f /etc/yum.repos.d/OVH-rtm.repo ;

Is that necessary ?

The OVH official documentation do not explain how to remove everything about the Real Time Monitoring.

Do I miss something ?

If someone walk through here and want to comment this, welcome ;-)

Supporting install of RTM on XCP-ng 8 series

I managed to install rtm on a XCP-ng 8.2.0, but it required some hacking on the dependencies of the ovh-rtm-binaries package. I have removed two dependencies lsscsi, and redhat-lsb.

analysis:

After the successfull install, I was researching a bit on the dependency list, and found that

  • lsscsi is available in the centos7 base-repo, and it's possible to use --enablerepo=base to satisfy the dependency
  • the bigger issue was redhat-lsb package, which recursively depends on packages, that would install desktop stuff on the XCP-ng.
  • Looked at the script rtmHardware.pl, i found that the required files in /etc were installed on the XCP-ng, so i was digging a bit, and found out, the required stuff comes from package redhat-lsb-core

conclusion:

It is easy to support the installation of RTM on XCP-ng 8 series:

  1. in the package ovh-rtm-binaries change the dependency redhat-lsb => redhat-lsb-core

  2. some minor change in the docs:

    1. the /etc/yum.repos.d/ovh-rtm.repo should not enable the repos per default
    [rtm]
    name=OVH RTM RHEL/ CentOS $releasever - $basearch
    baseurl=http://last.public.ovh.rtm.snap.mirrors.ovh.net/centos/$releasever/$basearch/Packages/
    enabled=0
    repo_gpgcheck=1
    gpgcheck=0
    gpgkey=http://last.public.ovh.rtm.snap.mirrors.ovh.net/ovh_rtm.pub
    
    [metrics]
    name=OVH METRICS RHEL/ CentOS $releasever - $basearch
    baseurl=http://last.public.ovh.metrics.snap.mirrors.ovh.net/centos/$releasever/$basearch/Packages/
    enabled=0
    repo_gpgcheck=1
    gpgcheck=0
    gpgkey=http://last.public.ovh.metrics.snap.mirrors.ovh.net/pub.key
    
    1. the command to install the RTM packages should be
    yum --enablerepo=base,metrics,rtm install ovh-rtm-metrics-toolkit
    

Update documentation and provide an alternative

Hi, I just followed the tutorial to install RTM on my dedicated server only to find out it was deprecated:
image

PLEASE:

  • Update documentation to explain deprecation, nobody wants to find out after install they did something obsolete
  • Provide an alternative: how can I monitor my CPU/RAM usage on a dedicated server? Honestly I'm surprised it's not integrated by default.

Thanks

RTM Support for Debian 11 / Proxmox 7

I'll like to see RTM support for Debian 11 / Proxmox 7.
I've used the automatic install after upgrading Proxmox 6 to 7 and it isnt working.
Thanks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.