lausser / check_nwc_health Goto Github PK

View Code? Open in Web Editor NEW

148.0 32.0 88.0 10.06 MB

nwc = network component. This plugin checks lots of aspects of routers, switches, wlan controllers, firewalls,.....

Home Page: http://labs.consol.de/nagios/check_nwc_health

License: GNU General Public License v2.0

PHP 0.24% Shell 1.94% Perl 94.04% Awk 0.13% Makefile 3.22% M4 0.43%

nagios icinga

check_nwc_health's Introduction

Auf meiner Startseite steht:

I will ignore all of your mails unless you give me an ip address and a community/root account. No joke! No exception!

Herrschaften, das ist durchaus ernst gemeint. Freibier ist alle! Und wenn ich eine Mail zur Kenntnis nehme, dann bedeutet das immer noch nicht, daß ich für lau eure Arbeit mache.

Description

The plugin check_nwc_health was developed with the aim of having a single tool for all aspects of monitoring of network components.

Motivation

Instead of installing a variety of plug-ins for monitoring of interfaces, hardware, bandwidth, sessions, pools, etc. and possibly more than one for each brand, with check_nwc_health you need only one a single plugin.

Documentation

Command line parameters

--hostname
--community
--mode

Modi

Mode	Function
hardware-health	Check the status of environmental equipment (fans, temperatures, power)
cpu-load	Check the CPU load of the device
memory-usage	Check the memory usage of the device
interface-usage	Check the utilization of interfaces
interface-errors	Check the error-rate of interfaces
interface-discards	Check the discard-rate of interfaces
interface-status	Check the status of interfaces (oper/admin)
interface-nat-count-sessions	Count the number of nat sessions
interface-nat-rejects	Count the number of nat sessions rejected due to lack of resources)
list-interfaces	Show the interfaces of the device and update the name cache
list-interfaces-detail	Show the interfaces of the device and some details
interface-availability	Show the availability (oper != up of interfaces)
link-aggregation-availability	Check the percentage of up interfaces in a link aggregation
list-routes	Check the percentage of up interfaces in a link aggregation
route-exists	Check if a route exists. (--name is the dest, --name2 check also the next hop)
count-routes	Count the routes. (--name is the dest, --name2 is the hop)
vpn-status	Check the status of vpns (up/down)
create-shinken-service	Create a Shinken service definition
hsrp-state	Check the state in a HSRP group)
hsrp-failover	Check if a HSRP group's nodes have changed their roles
list-hsrp-groups	Show the HSRP groups configured on this device
bgp-peer-status	Check status of BGP peers
list-bgp-peers	Show BGP peers known to this device
ospf-neighbor-status	Check status of OSPF neighbors
list-ospf-neighbors	Show OSPF neighbors
ha-role	Check the role in a ha group
svn-status	Check the status of the svn subsystem
mngmt-status	Check the status of the management subsystem
fw-policy	Check the installed firewall policy
fw-connections	Check the number of firewall policy connections
session-usage	Check the session limits of a load balancer
security-status	Check if there are security-relevant incidents
pool-completeness	Check the members of a load balancer pool
pool-connections	Check the number of connections of a load balancer pool
pool-complections	Check the members and connections of a load balancer pool
list-pools	List load balancer pools
check-licenses	Check the installed licences/keys
count-users	Count the (connected) users/sessions
check-config	Check the status of configs (cisco, unsaved config changes)
check-connections	Check the quality of connections
count-connections	Check the number of connections (-client, -server is possible)
watch-fexes	Check if FEXes appear and disappear (use --lookback)
accesspoint-status	Check the status of access points
count-accesspoints	Check if the number of access points is within a certain range
watch-accesspoints	Check if access points appear and disappear (use --lookback)
list-accesspoints	List access points managed by this device
phone-cm-status	Check if the callmanager is up
phone-status	Check the number of registered/unregistered/rejected phones
list-smart-home-devices	List Fritz!DECT 200 plugs managed by this device
smart-home-device-status	Check if a Fritz!DECT 200 plug is on
smart-home-device-energy	Show the current power consumption of a Fritz!DECT 200 plug
walk	Show snmpwalk command with the oids necessary for a simulation
supportedmibs	Shows the names of the mibs which this devices has implemented (only lausser may run this command)

The list is not complete. Some devices that are not listed here can possibly be monitored because they implement the same MIBs as supported models. Just try it .... (If a device is not recognized, i can extend the plugin. But not for free)

Installation

git clone
cd check_nwc_health
git submodule update --init
autoreconf
./configure
make
cp plugins-scripts/check_nwc_health wherever...

Examples

# Hardware checks

$ check_nwc_health --hostname 10.0.12.114 --mode hardware-health --community cisco/asa5510
OK - no alarms

$ check_nwc_health --hostname 10.0.12.114 --mode hardware-health --community cisco/c2900
OK - no alarms, environmental hardware working fine

$ check_nwc_health --hostname 10.0.12.114 --mode hardware-health --community cisco/3500xl
OK - no alarms, environmental hardware working fine

$ check_nwc_health --hostname 10.0.12.114 --mode hardware-health --community cisco/3750e
OK - environmental hardware working fine | 'temp_1006'=27;60;;; 'temp_2006'=26;60;;; 'temp_3006'=26;60;;;

$ check_nwc_health --hostname 10.0.12.114 --mode hardware-health --community cisco/3750e --verbose
I am a Cisco IOS Software, C3750E Software (C3750E-UNIVERSALK9-M), Version 12.2(58)SE2, RELEASE SOFTWARE (fc1) Technical Support: http://www.cisco.com/techsupport Copyright (c) 1986-2011 by Cisco Systems, Inc. Compiled Thu 21-Jul-11 01:23 by prod_rel_team
OK - environmental hardware working fine
checking fans
fan 1059 (Switch#1, Fan#1) is normal
fan 1060 (Switch#1, Fan#2) is normal
fan 2060 (Switch#2, Fan#1) is normal
fan 2061 (Switch#2, Fan#2) is normal
fan 3036 (Switch#3, Fan#1) is normal
fan 3037 (Switch#3, Fan#2) is normal
checking temperatures
temperature 1006 SW#1, Sensor#1, GREEN  is 27 (of 60 max = normal)
temperature 2006 SW#2, Sensor#1, GREEN  is 26 (of 60 max = normal)
temperature 3006 SW#3, Sensor#1, GREEN  is 26 (of 60 max = normal)
checking voltages
checking supplies
powersupply 1058 (Sw1, PS1 Normal, RPS NotExist) is normal
powersupply 1062 (Sw1, PS2 Normal, RPS NotExist) is normal
powersupply 2058 (Sw2, PS1 Normal, RPS NotExist) is normal
powersupply 2059 (Sw2, PS2 Normal, RPS NotExist) is normal
powersupply 3034 (Sw3, PS1 Normal, RPS NotExist) is normal
powersupply 3035 (Sw3, PS2 Normal, RPS NotExist) is normal | 'temp_1006'=27;60;;; 'temp_2006'=26;60;;; 'temp_3006'=26;60;;;


$ check_nwc_health --hostname 10.0.12.114 --mode hardware-health --community cisco/n5000
OK - environmental hardware working fine | 'sens_celsius_100021590'=44;45;57;; 'sens_celsius_101021590'=41;45;57;; 'sens_celsius_21590'=44;50;60;; 'sens_celsius_21591'=46;50;60;; 'sens_celsius_21592'=33;50;60;; 'sens_celsius_21593'=34;50;60;; 'sens_celsius_21594'=34;40;50;; 'sens_celsius_21595'=33;40;50;; 'sens_celsius_21596'=33;50;60;; 'sens_celsius_21597'=31;50;60;; 'sens_celsius_21602'=38;50;60;;

$ check_nwc_health --hostname 10.0.12.114 --mode hardware-health --community cisco/n5000 --verbose
I am a Cisco NX-OS(tm) n5000, Software (n5000-uk9), Version 4.2(1)N1(1), RELEASE SOFTWARE Copyright (c) 2002-2010 by Cisco Systems, Inc. Device Manager Version 5.0(1a),  Compiled 4/29/2010 19:00:00
OK - environmental hardware working fine
checking thresholds
checking entities
checking sensor_entities
checking sensors
celsius sensor 100021590 (Fex-100 Module-1 Outlet-1) is ok
celsius sensor 100021591 (Fex-100 Module-1 Outlet-2) is unknown_10
celsius sensor 100021592 (Fex-100 Module-1 Inlet-1) is unknown_10
celsius sensor 101021590 (Fex-101 Module-1 Outlet-1) is ok
celsius sensor 101021591 (Fex-101 Module-1 Outlet-2) is unknown_10
celsius sensor 101021592 (Fex-101 Module-1 Inlet-1) is unknown_10
celsius sensor 21590 (Module-1, Outlet-1) is ok
celsius sensor 21591 (Module-1, Outlet-2) is ok
celsius sensor 21592 (Module-1, Intake-1) is ok
celsius sensor 21593 (Module-1, Intake-2) is ok
celsius sensor 21594 (Module-1, Intake-3) is ok
celsius sensor 21595 (Module-1, Intake-4) is ok
celsius sensor 21596 (PowerSupply-1 Sensor-1) is ok
celsius sensor 21597 (PowerSupply-2 Sensor-1) is ok
celsius sensor 21602 (Module-2, Outlet-1) is ok
checking fans
fan/tray 100000534 (Fex-100 FanModule-1) status is up
fan/tray 100000539 (Fex-100 PowerSupply-1 Fan-1) status is up
fan/tray 100000540 (Fex-100 PowerSupply-2 Fan-1) status is up
fan/tray 101000534 (Fex-101 FanModule-1) status is up
fan/tray 101000539 (Fex-101 PowerSupply-1 Fan-1) status is up
fan/tray 101000540 (Fex-101 PowerSupply-2 Fan-1) status is up
fan/tray 534 (FanModule-1) status is up
fan/tray 535 (FanModule-2) status is up
fan/tray 536 (PowerSupply-1 Fan-1) status is up
fan/tray 537 (PowerSupply-1 Fan-2) status is up
fan/tray 538 (PowerSupply-2 Fan-1) status is up
fan/tray 539 (PowerSupply-2 Fan-2) status is up
checking entities
checking powersupplygroups
checking supplies
checking entities
checking powersupplies
power supply 100000022 admin status is on, oper status is on
power supply 100000470 admin status is on, oper status is on
power supply 100000471 admin status is on, oper status is on
power supply 101000022 admin status is on, oper status is on
power supply 101000470 admin status is on, oper status is on
power supply 101000471 admin status is on, oper status is on
power supply 22 admin status is on, oper status is on
power supply 23 admin status is on, oper status is on
power supply 470 admin status is on, oper status is on
power supply 471 admin status is on, oper status is on | 'sens_celsius_100021590'=44;45;57;; 'sens_celsius_101021590'=41;45;57;; 'sens_celsius_21590'=44;50;60;; 'sens_celsius_21591'=46;50;60;; 'sens_celsius_21592'=33;50;60;; 'sens_celsius_21593'=34;50;60;; 'sens_celsius_21594'=34;40;50;; 'sens_celsius_21595'=33;40;50;; 'sens_celsius_21596'=33;50;60;; 'sens_celsius_21597'=31;50;60;; 'sens_celsius_21602'=38;50;60;;


$ check_nwc_health --hostname 10.0.12.114 --mode hardware-health --community cisco/ucos
OK - storage 10 (/partB) has 71.07% free space left, storage 11 (/spare) has 99.88% free space left, storage 3 (/) has 70.71% free space left, storage 7 (/common) has 69.07% free space left, storage 9 (/grub) has 95.87% free space left, environmental hardware working fine | '/partB_free_pct'=71.07%;10:;5:;0;100 '/spare_free_pct'=99.88%;10:;5:;0;100 '/_free_pct'=70.71%;10:;5:;0;100 '/common_free_pct'=69.07%;10:;5:;0;100 '/grub_free_pct'=95.87%;10:;5:;0;100

$ check_nwc_health --hostname 10.0.12.114 --mode hardware-health --community cisco/ucos --verbose
I am a Hardware:VMware, 2  Intel(R) Xeon(R) CPU           E5640  @ 2.67GHz, 4096 MB Memory: Software:UCOS 4.0.0.0-45
OK - storage 10 (/partB) has 71.07% free space left, storage 11 (/spare) has 99.88% free space left, storage 3 (/) has 70.71% free space left, storage 7 (/common) has 69.07% free space left, storage 9 (/grub) has 95.87% free space left, environmental hardware working fine
checking storages
storage 10 (/partB) has 71.07% free space left
storage 11 (/spare) has 99.88% free space left
storage 3 (/) has 70.71% free space left
storage 7 (/common) has 69.07% free space left
storage 9 (/grub) has 95.87% free space left | '/partB_free_pct'=71.07%;10:;5:;0;100 '/spare_free_pct'=99.88%;10:;5:;0;100 '/_free_pct'=70.71%;10:;5:;0;100 '/common_free_pct'=69.07%;10:;5:;0;100 '/grub_free_pct'=95.87%;10:;5:;0;100


$ check_nwc_health --hostname 10.0.12.114 --mode hardware-health --community cisco/cisco-3745-switch
OK - environmental hardware working fine | 'temp_1005'=44;65;;;

$ check_nwc_health --hostname 10.0.12.114 --mode hardware-health --community cisco/cisco-3745-switch --verbose
I am a Cisco IOS Software, 3700 Software (C3745-ADVENTERPRISEK9-M), Version 12.4(25b), RELEASE SOFTWARE (fc1)
OK - environmental hardware working fine
checking fans
fan 1004 (Switch#1, Fan#1) is normal
checking temperatures
temperature 1005 SW#1, Sensor#1, GREEN  is 44 (of 65 max = normal)
checking voltages
checking supplies
powersupply 1003 (Sw1, PS1 Normal, RPS NotExist) is normal | 'temp_1005'=44;65;;;


$ check_nwc_health --hostname 10.0.12.114 --mode hardware-health --community cisco/cisco-cat6509
OK - environmental hardware working fine | 'temp_100050'=35;65;;; 'temp_100051'=36;65;;; 'temp_40010'=31;115;;; 'temp_40020'=34;115;;; 'temp_40030'=30;115;;; 'temp_60010'=45;85;;; 'temp_60011'=34;65;;; 'temp_60020'=30;95;;; 'temp_60021'=30;70;;; 'temp_60030'=35;100;;; 'temp_60031'=29;70;;; 'temp_60050'=43;85;;; 'temp_60051'=29;80;;; 'temp_60054'=62;105;;; 'temp_60055'=45;110;;; 'temp_60056'=57;110;;; 'temp_90010'=38;80;;; 'temp_90011'=33;75;;; 'temp_90050'=38;75;;; 'temp_90051'=31;65;;;

$ check_nwc_health --hostname 10.0.12.114 --mode hardware-health --community cisco/cisco-cat6509 --verbose
I am a Cisco IOS Software, s72033_rp Software (s72033_rp-ADVIPSERVICES), Version 12.2(33)SXJ, RELEASE SOFTWARE (fc3) Technical Support: http://www.cisco.com/techsupport Copyright (c) 1986-2011 by Cisco Systems, Inc. Compiled Thu 17-Mar-11 15:10 by pro
OK - environmental hardware working fine
checking fans
fan 1 ( Chassis Fan Tray 1) is normal
fan 2 ( Power Supply 1 Fan) is normal
fan 3 ( Power Supply 2 Fan) is normal
checking temperatures
temperature 100050 module 5 RP outlet temperature is 35 (of 65 max = normal)
temperature 100051 module 5 RP inlet temperature is 36 (of 65 max = normal)
temperature 40010 VTT 1 outlet temperature is 31 (of 115 max = normal)
temperature 40020 VTT 2 outlet temperature is 34 (of 115 max = normal)
temperature 40030 VTT 3 outlet temperature is 30 (of 115 max = normal)
temperature 60010 module 1 outlet temperature is 45 (of 85 max = normal)
temperature 60011 module 1 inlet temperature is 34 (of 65 max = normal)
temperature 60020 module 2 outlet temperature is 30 (of 95 max = normal)
temperature 60021 module 2 inlet temperature is 30 (of 70 max = normal)
temperature 60030 module 3 outlet temperature is 35 (of 100 max = normal)
temperature 60031 module 3 inlet temperature is 29 (of 70 max = normal)
temperature 60050 module 5 outlet temperature is 43 (of 85 max = normal)
temperature 60051 module 5 inlet temperature is 29 (of 80 max = normal)
temperature 60054 module 5 asic-1 temperature is 62 (of 105 max = normal)
temperature 60055 module 5 asic-3 temperature is 45 (of 110 max = normal)
temperature 60056 module 5 asic-4 temperature is 57 (of 110 max = normal)
temperature 90010 module 1 EARL outlet temperature is 38 (of 80 max = normal)
temperature 90011 module 1 EARL inlet temperature is 33 (of 75 max = normal)
temperature 90050 module 5 EARL outlet temperature is 38 (of 75 max = normal)
temperature 90051 module 5 EARL inlet temperature is 31 (of 65 max = normal)
checking voltages
checking supplies
powersupply 1 ( Power Supply 1, WS-CAC-3000W) is normal
powersupply 2 ( Power Supply 2, WS-CAC-3000W) is normal | 'temp_100050'=35;65;;; 'temp_100051'=36;65;;; 'temp_40010'=31;115;;; 'temp_40020'=34;115;;; 'temp_40030'=30;115;;; 'temp_60010'=45;85;;; 'temp_60011'=34;65;;; 'temp_60020'=30;95;;; 'temp_60021'=30;70;;; 'temp_60030'=35;100;;; 'temp_60031'=29;70;;; 'temp_60050'=43;85;;; 'temp_60051'=29;80;;; 'temp_60054'=62;105;;; 'temp_60055'=45;110;;; 'temp_60056'=57;110;;; 'temp_90010'=38;80;;; 'temp_90011'=33;75;;; 'temp_90050'=38;75;;; 'temp_90051'=31;65;;;

$ check_nwc_health --hostname 10.0.12.114 --mode chassis hardware-health --community cisco/cisco-cat6509 --verbose
I am a Cisco IOS Software, s72033_rp Software (s72033_rp-ADVIPSERVICESK9_WAN-M), Version 12.2(33)SXJ, RELEASE SOFTWARE (fc3) Technical Support: http://www.cisco.com/techsupport Copyright (c) 1986-2011 by Cisco Systems, Inc. Compiled Thu 17-Mar-11 15:10 by pro
WARNING - 4 new module(s) (SAL1536ZWYC, SAL1538HZF9, SAL1521S03A, SAL1528D536), 85 new ports, chassis sys status is ok, power supply 1 status is ok, power supply 2 status is ok, found 4 modules with 85 ports
module 1 (serial SAL1536ZWYC) is ok
module 2 (serial SAL1538HZF9) is ok
module 3 (serial SAL1521S03A) is ok
module 5 (serial SAL1528D536) is ok
chassis sys status is ok
chassis fan status is ok
chassis minor alarm is off
chassis major alarm is off
chassis temperature alarm is off
power supply 1 status is ok
power supply 2 status is ok
found 4 modules with 85 ports

$ check_nwc_health --hostname 10.0.12.114 --mode hardware-health --community foundry/ironware
OK - environmental hardware working fine

$ check_nwc_health --hostname 10.0.12.114 --mode hardware-health --community foundry/ironware --verbose
I am a Foundry Networks, Inc. Router, IronWare Version 10.2.01lTI4 Compiled on Oct 28 2009 at 16:46:07 labeled as WJR10201l
OK - environmental hardware working fine
checking powersupplies
powersupply 1 is normal
checking fans
fan 1 is normal
fan 2 is normal
checking temperatures


$ check_nwc_health --hostname 10.0.12.114 --mode hardware-health --community f5/f5app
CRITICAL - chassis fan 1 is unknown_3 (9642rpm), chassis fan 2 is unknown_3 (10227rpm), chassis fan 3 is unknown_3 (9642rpm) | 'temp_c1'=38;;;; 'fan_c1'=9926;;;; 'fan_1'=9642;;;; 'fan_2'=10227;;;; 'fan_3'=9642;;;; 'temp_1'=25;;;;
$ check_nwc_health --hostname 10.0.12.114 --mode hardware-health --community f5/f5app --verbose
I am a Linux lb02 2.6.1-1.1.el5.1.0.f5app #1 SMP Mon Mar 5 12:40:48 PST 2012 x86_64
CRITICAL - chassis fan 1 is unknown_3 (9642rpm), chassis fan 2 is unknown_3 (10227rpm), chassis fan 3 is unknown_3 (9642rpm)
checking cpus
cpu 1 has 38C (9926rpm)
checking fans
chassis fan 1 is unknown_3 (9642rpm)
chassis fan 2 is unknown_3 (10227rpm)
chassis fan 3 is unknown_3 (9642rpm)
checking temperatures
chassis temperature 1 is 25C
checking powersupplies
chassis powersupply 1 is good
chassis powersupply 2 is notpresent
checking disks | 'temp_c1'=38;;;; 'fan_c1'=9926;;;; 'fan_1'=9642;;;; 'fan_2'=10227;;;; 'fan_3'=9642;;;; 'temp_1'=25;;;;


$ check_nwc_health --hostname 10.0.12.114 --mode hardware-health --community bluecoat/bluecoat-proxy-sg
OK - disk 0 usage is 35.00%, environmental hardware working fine | 'sensor_Motherboard temperature 1'=18.70;;;; 'sensor_+12V bus voltage'=12.13;;;; 'sensor_CPU core voltage'=1.10;;;; 'sensor_CPU +1.8V bus voltage'=1.81;;;; 'sensor_Motherboard temperature 2'=20.50;;;; 'sensor_CPU temperature'=28;;;; 'sensor_System Fan 1 speed'=8280;;;; 'sensor_System Fan 2 speed'=8400;;;; 'sensor_System Fan 3 speed'=9764.80;;;; 'sensor_System Fan 4 speed'=8460;;;; 'sensor_+2.5V bus voltage'=2.51;;;; 'sensor_+5V bus voltage'=5.07;;;; 'disk_0_usage'=35%;60;60;0;100
$ 
$ check_nwc_health --hostname 10.0.12.114 --mode hardware-health --community bluecoat/bluecoat-proxy-sg --verbose
I am a Blue Coat ProxySG600
OK - disk 0 usage is 35.00%, environmental hardware working fine
sensor Motherboard temperature 1 (18.7 celsius) is ok
sensor +12V bus voltage (12.13 volts) is ok
sensor CPU core voltage (1.1 volts) is ok
sensor CPU +1.8V bus voltage (1.81 volts) is ok
sensor Motherboard temperature 2 (20.5 celsius) is ok
sensor CPU temperature (28 celsius) is ok
sensor System Fan 1 speed (8280 rpm) is ok
sensor System Fan 2 speed (8400 rpm) is ok
sensor System Fan 3 speed (9764.8 rpm) is ok
sensor System Fan 4 speed (8460 rpm) is ok
sensor +2.5V bus voltage (2.51 volts) is ok
sensor +5V bus voltage (5.07 volts) is ok
checking disks
disk 1 (SEAGATE 0002) is present
disk 2 ( ) is not-present
checking filesystems
disk 0 usage is 35.00% | 'sensor_Motherboard temperature 1'=18.70;;;; 'sensor_+12V bus voltage'=12.13;;;; 'sensor_CPU core voltage'=1.10;;;; 'sensor_CPU +1.8V bus voltage'=1.81;;;; 'sensor_Motherboard temperature 2'=20.50;;;; 'sensor_CPU temperature'=28;;;; 'sensor_System Fan 1 speed'=8280;;;; 'sensor_System Fan 2 speed'=8400;;;; 'sensor_System Fan 3 speed'=9764.80;;;; 'sensor_System Fan 4 speed'=8460;;;; 'sensor_+2.5V bus voltage'=2.51;;;; 'sensor_+5V bus voltage'=5.07;;;; 'disk_0_usage'=35%;60;60;0;100


# CPU checks

$ check_nwc_health --hostname 10.0.12.114 --mode cpu-load --community bluecoat/bluecoat-proxy-sg --verbose
I am a Blue Coat ProxySG600
OK - cpu 1 usage is 18.00%
checking cpus
cpu 1 usage is 18.00% | 'cpu_1_usage'=18%;80;90;0;100

$ check_nwc_health --hostname 10.0.12.114 --mode cpu-load --community f5/f5app --verbose
I am a Linux lb02 2.6.1-1.1.el5.1.0.f5app #1 SMP Mon Mar 5 12:40:48 PST 2012 x86_64
OK - tmm cpu usage is 1.24%
checking cpus
tmm cpu usage is 1.24% | 'cpu_tmm_usage'=1.24%;80;90;0;100

$ check_nwc_health --hostname 10.0.12.114 --mode cpu-load --community foundry/ironware --verbose
I am a Foundry Networks, Inc. Router, IronWare Version 10.2.01lTI4 Compiled on Oct 28 2009 at 16:46:07 labeled as WHEJR64WH
OK - cpu 1 usage is 6.00, cpu 1 usage is 1.90
cpu 1 usage is 6.00
cpu 1 usage is 1.90 | 'cpu_1'=6%;80;90;0;100 'cpu_1'=1.90%;80;90;0;100


# Memory

$ check_nwc_health --hostname 10.0.12.114 --mode memory-usage --community foundry/ironware --verbose
I am a Foundry Networks, Inc. Router, IronWare Version 10.2.01lTI4 Compiled on Oct 28 2009 at 16:46:07 labeled as WHEJR64WH
OK - memory usage is 23.00%
checking memory
memory usage is 23.00% | 'memory_usage'=23%;80;99;0;100

$ check_nwc_health --hostname 10.0.12.114 --mode memory-usage --community bluecoat/bluecoat-proxy-sg --verbose
I am a Blue Coat ProxySG600
OK - memory usage is 17.00%
checking memory
memory usage is 17.00% | 'memory_usage'=17%;75;90;0;100

$ check_nwc_health --hostname 10.0.12.114 --mode memory-usage --community cisco/n5000 --verbose
I am a Cisco NX-OS(tm) n5000, Software (n5000-uk9), Version 4.2(1)N1(1), RELEASE SOFTWARE Copyright (c) 2002-2010 by Cisco Systems, Inc. Device Manager Version 5.0(1a),  Compiled 4/29/2010 19:00:00
OK - memory usage is 53.00%
checking memory
memory usage is 53.00% | 'memory_usage'=53%;80;90;0;100

$ check_nwc_health --hostname 10.0.12.114 --mode memory-usage --community cisco/asa5510 --verbose
I am a Cisco Adaptive Security Appliance Version 9.1(5)
WARNING - mempool MEMPOOL_DMA usage is 80.68%, mempool System memory usage is 29.78%, mempool MEMPOOL_GLOBAL_SHARED usage is 12.73%
checking mems
mempool System memory usage is 29.78%
mempool MEMPOOL_DMA usage is 80.68%
mempool MEMPOOL_GLOBAL_SHARED usage is 12.73% | 'System memory_usage'=29.78%;80;90;0;100 'MEMPOOL_DMA_usage'=80.68%;80;90;0;100 'MEMPOOL_GLOBAL_SHARED_usage'=12.73%;80;90;0;100


# Interfaces

$ check_nwc_health --hostname 10.0.12.114 --mode interface-usage --community checkpoint/fw-1 --verbose
I am a Linux m-nm09 2.6.18-92cpx86_64 #1 SMP Tue Aug 14 06:41:50 IDT 2012 x86_64
OK - interface lo usage is in:0.25% (24796.45Bits/s) out:0.25% (24796.45Bits/s), interface eth0 usage is in:0.00% (849.78Bits/s) out:0.00% (349.95Bits/s), interface eth1 usage is in:0.00% (1103.22Bits/s) out:0.00% (466.08Bits/s), interface eth2 usage is in:0.00% (0.00Bits/s) out:0.00% (0.00Bits/s) (down), interface eth3 usage is in:0.00% (0.00Bits/s) out:0.00% (0.00Bits/s) (down), interface bond1 usage is in:0.02% (1953.00Bits/s) out:0.01% (816.03Bits/s), interface sit0 usage is in:0.00% (0.00Bits/s) out:0.00% (0.00Bits/s) (down)
checking interfaces
interface lo usage is in:0.25% (24796.45Bits/s) out:0.25% (24796.45Bits/s)
interface eth0 usage is in:0.00% (849.78Bits/s) out:0.00% (349.95Bits/s)
interface eth1 usage is in:0.00% (1103.22Bits/s) out:0.00% (466.08Bits/s)
interface eth2 usage is in:0.00% (0.00Bits/s) out:0.00% (0.00Bits/s) (down)
interface eth3 usage is in:0.00% (0.00Bits/s) out:0.00% (0.00Bits/s) (down)
interface bond1 usage is in:0.02% (1953.00Bits/s) out:0.01% (816.03Bits/s)
interface sit0 usage is in:0.00% (0.00Bits/s) out:0.00% (0.00Bits/s) (down) | 'lo_usage_in'=0.25%;80;90;0;100 'lo_usage_out'=0.25%;80;90;0;100 'lo_traffic_in'=24796.45;8000000;9000000;0;10000000 'lo_traffic_out'=24796.45;8000000;9000000;0;10000000 'eth0_usage_in'=0.00%;80;90;0;100 'eth0_usage_out'=0.00%;80;90;0;100 'eth0_traffic_in'=849.78;800000000;900000000;0;1000000000 'eth0_traffic_out'=349.95;800000000;900000000;0;1000000000 'eth1_usage_in'=0.00%;80;90;0;100 'eth1_usage_out'=0.00%;80;90;0;100 'eth1_traffic_in'=1103.22;800000000;900000000;0;1000000000 'eth1_traffic_out'=466.08;800000000;900000000;0;1000000000 'eth2_usage_in'=0%;80;90;0;100 'eth2_usage_out'=0%;80;90;0;100 'eth2_traffic_in'=0;;;0;0 'eth2_traffic_out'=0;;;0;0 'eth3_usage_in'=0%;80;90;0;100 'eth3_usage_out'=0%;80;90;0;100 'eth3_traffic_in'=0;;;0;0 'eth3_traffic_out'=0;;;0;0 'bond1_usage_in'=0.02%;80;90;0;100 'bond1_usage_out'=0.01%;80;90;0;100 'bond1_traffic_in'=1953.00;8000000;9000000;0;10000000 'bond1_traffic_out'=816.03;8000000;9000000;0;10000000 'sit0_usage_in'=0%;80;90;0;100 'sit0_usage_out'=0%;80;90;0;100 'sit0_traffic_in'=0;;;0;0 'sit0_traffic_out'=0;;;0;0

Homepage

The full documentation can be found here: check_nwc_health @ ConSol Labs

check_nwc_health's People

Contributors

Stargazers

Watchers

Forkers

floppy84 rhuss cayliffe giner lazyfrosch mhoogveld sni dgoetz andreaspfaffeneder simonmeggle thknepper mathieumd dhoffend sdouce k0ste skipp1510 micke2k cruscher vifino benedikt1992 trentasis sweitzel fredide tiagojferreira ciscoqid seemuellera tontonitch cjwallgren jediblair dupondje tommyorndorff nikkyo arigaud studost briansumma cweiland criz24 slider143 godkming latuannetnam rikadenia xforce-redhat jvandermeulen leoss volalla rampagejack2 oakwhiz frackena mgit-at jspaleta garnesh82 raxisa itbane henriknoerr ace-da ogsenge bierchermuesli pdorschner rotdrop napsty tomschambeck rohlik sebek72 sol1-matt johan-- smirta adfinis-forks adiosspandit meni2029 gaeel keshavray kirk444 bgeels cmock greg17477 sk-rama chicco27 molekuul crivchri dcec arnotron mobitux lgmu c-kr slalomsk8er darkninja77 storg2001 log1-c

check_nwc_health's Issues

checkpoint hardware-health wrong status with --warningx and --criticalx

E.g. on a checkpoint I have /boot which has 24% sapce free

now the check check_nwc_health --mode hardware-health --name xxx --comunity xxx --warrningx /boot_free_pct=20 --criticalx /boot_free_pct=10 should be OK since the sapce has more than 20% space left. But it is critical!

It seems that the check works like
actual value > critical/warning limit => Not OK but it should be
actual value < critical/warning limit => Not OK

Interfaces missing when querying Cisco Catalyst 2960 switches

I am using check_nwc_health (currently version 3.4.3) and there seems to be a interface-related problem. When querying Cisco switches for interface-related data (tested with modes interface-usage, interface-status, list-interfaces, list-interfaces-detail) not all interfaces are shown.

I have seen this issue on different switches, e.g. C2960S and C2960X. One switch is a stacked switch and the other switches are single switches.

When using this command:

check_nwc_health --hostname 192.168.123.45 --mode list-interfaces

I get this output:

000001 Vlan1
000200 Vlan200
000400 Vlan400
005001 Port-channel1
005179 StackPort1
005180 StackSub-St1-1
005181 StackSub-St1-2
005182 StackPort2
005183 StackSub-St2-1
005184 StackSub-St2-2
010101 GigabitEthernet1/0/1
010102 GigabitEthernet1/0/2
010103 GigabitEthernet1/0/3
010104 GigabitEthernet1/0/4
010105 GigabitEthernet1/0/5
010106 GigabitEthernet1/0/6
010107 GigabitEthernet1/0/7
010108 GigabitEthernet1/0/8
010109 GigabitEthernet1/0/9
010110 GigabitEthernet1/0/10
010111 GigabitEthernet1/0/11
010112 GigabitEthernet1/0/12
010113 GigabitEthernet1/0/13
010114 GigabitEthernet1/0/14
010115 GigabitEthernet1/0/15
OK - have fun

But I know there should be more interfaces, the following snmpwalk gives me more interfaces:

snmpwalk -v2c -c public 192.168.123.45 1.3.6.1.2.1.31.1.1.1.1

IF-MIB::ifName.1 = STRING: Vl1
IF-MIB::ifName.200 = STRING: Vl200
IF-MIB::ifName.400 = STRING: Vl400
IF-MIB::ifName.5001 = STRING: Po1
IF-MIB::ifName.5179 = STRING: StackPort1
IF-MIB::ifName.5180 = STRING: StackSub-St1-1
IF-MIB::ifName.5181 = STRING: StackSub-St1-2
IF-MIB::ifName.5182 = STRING: StackPort2
IF-MIB::ifName.5183 = STRING: StackSub-St2-1
IF-MIB::ifName.5184 = STRING: StackSub-St2-2
IF-MIB::ifName.10101 = STRING: Gi1/0/1
IF-MIB::ifName.10102 = STRING: Gi1/0/2
IF-MIB::ifName.10103 = STRING: Gi1/0/3
IF-MIB::ifName.10104 = STRING: Gi1/0/4
IF-MIB::ifName.10105 = STRING: Gi1/0/5
IF-MIB::ifName.10106 = STRING: Gi1/0/6
IF-MIB::ifName.10107 = STRING: Gi1/0/7
IF-MIB::ifName.10108 = STRING: Gi1/0/8
IF-MIB::ifName.10109 = STRING: Gi1/0/9
IF-MIB::ifName.10110 = STRING: Gi1/0/10
IF-MIB::ifName.10111 = STRING: Gi1/0/11
IF-MIB::ifName.10112 = STRING: Gi1/0/12
IF-MIB::ifName.10113 = STRING: Gi1/0/13
IF-MIB::ifName.10114 = STRING: Gi1/0/14
IF-MIB::ifName.10115 = STRING: Gi1/0/15
IF-MIB::ifName.10116 = STRING: Gi1/0/16
IF-MIB::ifName.10117 = STRING: Gi1/0/17
IF-MIB::ifName.10118 = STRING: Gi1/0/18
IF-MIB::ifName.10119 = STRING: Gi1/0/19
IF-MIB::ifName.10120 = STRING: Gi1/0/20
IF-MIB::ifName.10121 = STRING: Gi1/0/21
IF-MIB::ifName.10122 = STRING: Gi1/0/22
IF-MIB::ifName.10123 = STRING: Gi1/0/23
IF-MIB::ifName.10124 = STRING: Gi1/0/24
IF-MIB::ifName.10125 = STRING: Gi1/0/25
IF-MIB::ifName.10126 = STRING: Gi1/0/26
IF-MIB::ifName.10127 = STRING: Gi1/0/27
IF-MIB::ifName.10128 = STRING: Gi1/0/28
IF-MIB::ifName.10601 = STRING: Gi2/0/1
IF-MIB::ifName.10602 = STRING: Gi2/0/2
IF-MIB::ifName.10603 = STRING: Gi2/0/3
IF-MIB::ifName.10604 = STRING: Gi2/0/4
IF-MIB::ifName.10605 = STRING: Gi2/0/5
IF-MIB::ifName.10606 = STRING: Gi2/0/6
IF-MIB::ifName.10607 = STRING: Gi2/0/7
IF-MIB::ifName.10608 = STRING: Gi2/0/8
IF-MIB::ifName.10609 = STRING: Gi2/0/9
IF-MIB::ifName.10610 = STRING: Gi2/0/10
IF-MIB::ifName.10611 = STRING: Gi2/0/11
IF-MIB::ifName.10612 = STRING: Gi2/0/12
IF-MIB::ifName.10613 = STRING: Gi2/0/13
IF-MIB::ifName.10614 = STRING: Gi2/0/14
IF-MIB::ifName.10615 = STRING: Gi2/0/15
IF-MIB::ifName.10616 = STRING: Gi2/0/16
IF-MIB::ifName.10617 = STRING: Gi2/0/17
IF-MIB::ifName.10618 = STRING: Gi2/0/18
IF-MIB::ifName.10619 = STRING: Gi2/0/19
IF-MIB::ifName.10620 = STRING: Gi2/0/20
IF-MIB::ifName.10621 = STRING: Gi2/0/21
IF-MIB::ifName.10622 = STRING: Gi2/0/22
IF-MIB::ifName.10623 = STRING: Gi2/0/23
IF-MIB::ifName.10624 = STRING: Gi2/0/24
IF-MIB::ifName.10625 = STRING: Gi2/0/25
IF-MIB::ifName.10626 = STRING: Gi2/0/26
IF-MIB::ifName.10627 = STRING: Gi2/0/27
IF-MIB::ifName.10628 = STRING: Gi2/0/28
IF-MIB::ifName.14001 = STRING: Nu0
IF-MIB::ifName.14002 = STRING: Fa0

When using the ultra-verbose mode of check_nwc_health with this command:

./check_nwc_health --hostname 192.168.123.45 --mode list-interfaces -vvvvvvvvvvv

The output is very long, I pasted it here: http://pastebin.com/CxdqFZUQ

I don't know much about the internals of check_nwc_health but the lines 55 to 69 seem to be interesting.

Pulse Secure Gateways not supported any more

Hello Gerhard,

i've updated to V5.6 and my health check for the Pulse Secure Gateways (ex. Juniper SAGates) does not work any more.
"Mode hardware-health is not implemented for this type of device"
The old(!) V3.2.0.1 i used before it worked perfectly.
Is it possible to make that working again?

greets, Rico

Unique names for cisco memory

When querying a Cisco access point for the memory usage I get the following output:

CRITICAL - mempool SRAM usage is 94.59%, mempool I/O usage is 87.05%, mempool Processor usage is 43.45%, mempool I/O usage is 58.15% | 'Processor_usage'=43.45%;80;90 'SRAM_usage'=94.59%;80;90 'I/O_usage'=87.05%;80;90 'I/O_usage'=58.15%;80;90

"mempool I/O usage" occurs 2 times and this produces errors if you are using something like pnp4nagios which uses RRD:

RRDs::update ERROR rrdcached: illegal attempt to update using time 1424953027.000000 when last update time is 1424953027.000000 (minimum one second step)

I think there should be displayed unique names for the memory usage, this problem seems similar to the following I found in the changelog:

2014-12-06 3.2.2

unique names for cisco cpus pointing to the same physical entity

Before version 3.2.2 I also got the same RRD error for the most cpu-usage services on Cisco devices.

Unfortunately I don't know much about the access point, --verbose gives me:

I am a Cisco IOS Software, C3600 Software (AP3G2-K9W7-M), Version 15.2(4)JB4, RELEASE SOFTWARE (fc1)

I have performed a snmp walk with --mode walk which I can send via mail or upload/attach somewhere if someone is interested.

CISCO-ENTITY-ALARM-MIB doesn't work with live data

The snmp attribute cefcAlarmList contains a Hex-String "SYNTAX OCTET STRING (SIZE (0..32))". When using the live data, snmpwalk is actually returning the result as packed data while the simulation returns some hex data as string. (This is linked to issue #51)

The result: The AlarmSubSystem is correctly working and announcing failed entities while the run using live data is ending withing problems parsing and processing the hex data.

Support for Juniper EX / QFX

Hello,

It is possible to add support for juniper Ex / QFX ?
Ex:
http://kb.juniper.net/InfoCenter/index?page=content&id=KB17526
http://kb.juniper.net/library/CUSTOMERSERVICE/GLOBAL_JTAC/BK26246/EX%20SNMP%20Monitoring%20Guide.pdf

Regards

Change Output in Verbose Mode

I really would like a change in the normal verbose mode. The first line should be the Plugin Command

OK/CRITICAL/WARNING/... <message>
the rest can be long text/verbose | plus perfdata

Right now the first line is always: I am a Cisco IOS Software... which makes the verbose mode unusable (most likely) for additional debug information in nagios/naemon/incigna.

This would have the nice effect that the first line is still the real plugin output message which gets shown in the webpage or in the notification and the rest gets moved into the additional long text field

ME-3600 reports warning with temperature sensors which are not active

Hi,

I use the check_nwc plugin heavily - it makes life so much easier. However, I encounter a problem:

Running 3.4.2 with hardware-health on a Cisco ME-3600x reports:
WARNING - temperature 1 SW#1, Sensor#3, YELLOW is warning, temperature 2 SW#1, Sensor#5, YELLOW is warning

While the sh env all reports
FAN in PS-1 is OK
FAN in PS-2 is OK
SYSTEM TEMPERATURE is GREEN
SYSTEM Temperature Value: 40.0 Degree Celsius
SYSTEM Temperature State: GREEN
SYSTEM Low Temperature Alert Threshold: 0.0 Degree Celsius
SYSTEM Low Temperature Shutdown Threshold: -20.0 Degree Celsius
SYSTEM High Temperature Alert Threshold: 58.0 Degree Celsius
SYSTEM High Temperature Shutdown Threshold: 80.0 Degree Celsius
POWER SUPPLY 1 Temperature Value: 41.7500 Degree Celsius
POWER SUPPLY 1 Temperature Alert Threshold: 85.0000 Degree Celsius
POWER SUPPLY 1 Temperature Shutdown Threshold: 110.0000 Degree Celsius
POWER SUPPLY 2 Temperature Value: 43.7500 Degree Celsius
POWER SUPPLY 2 Temperature Alert Threshold: 85.0000 Degree Celsius
POWER SUPPLY 2 Temperature Shutdown Threshold: 110.0000 Degree Celsius
POWER SUPPLY 1 is AC OK
POWER SUPPLY 2 is AC OK

ALARM CONTACT 1 is not asserted
ALARM CONTACT 2 is not asserted
ALARM CONTACT 3 is not asserted
ALARM CONTACT 4 is not asserted

The check command with -v shows
I am a Cisco IOS Software, ME360x Software (ME360x-UNIVERSALK9-M), Version 15.4(3)S2, RELEASE SOFTWARE (fc1)
Technical Support: http://www.cisco.com/techsupport
Copyright (c) 1986-2015 by Cisco Systems, Inc.
Compiled Wed 28-Jan-15 16:04 by prod_rel_team
WARNING - temperature 1 SW#1, Sensor#3, YELLOW is warning, temperature 2 SW#1, Sensor#5, YELLOW is warning
checking fans
fan 0 (Switch#1, Fan#1) is normal
fan 0 (Switch#1, Fan#2) is normal
checking temperatures
temperature 0 SW#1, Sensor#1, GREEN is 40 (of 80 max = normal)
temperature 1 SW#1, Sensor#3, YELLOW is warning
temperature 2 SW#1, Sensor#5, YELLOW is warning
temperature 3 SW#1, PowerSupply#1, GREEN is 41 (of 80 max = normal)
temperature 4 SW#1, PowerSupply#2, GREEN is 43 (of 80 max = normal)
checking voltages
checking supplies
powersupply 0 (Switch#1, PowerSupply 1) is normal
powersupply 0 (Switch#1, PowerSupply 2) is normal | 'temp_0'=40;80 'temp_3'=41;80 'temp_4'=43;80

I have tried to blacklist these, however to no avail.

Any suggestions ?

Many thanks
Paul

Palo Alto PA-200 firewall

version check_nwc_health-3.4.2.2 under SLES11 SP3

./check_nwc_health -t 30 --mode hardware-health --hostname x.x.x.x --community xxxxxx -v
I am a Palo Alto Networks PA-200 series firewall
Can't use an undefined value as a HASH reference at ./check_nwc_health line 9474.

Can you please fix this?
Thanks Heiko
------------interesting OID value----------------
Name/OID: entPhySensorType.1; Value (Integer): rpm (10) .1.3.6.1.2.1.99.1.1.1.1.1
Name/OID: entPhySensorType.2; Value (Integer): celsius (8) .1.3.6.1.2.1.99.1.1.1.1.2
Name/OID: entPhySensorType.3; Value (Integer): celsius (8) .1.3.6.1.2.1.99.1.1.1.1.3
Name/OID: entPhySensorValue.1; Value (Integer): 3715 .1.3.6.1.2.1.99.1.1.1.4.1
Name/OID: entPhySensorValue.2; Value (Integer): 59 .1.3.6.1.2.1.99.1.1.1.4.2
Name/OID: entPhySensorValue.3; Value (Integer): 37 .1.3.6.1.2.1.99.1.1.1.4.3
Name/OID: entPhySensorOperStatus.1; Value (Integer): ok (1) .1.3.6.1.2.1.99.1.1.1.5.1
Name/OID: entPhySensorOperStatus.2; Value (Integer): ok (1) .1.3.6.1.2.1.99.1.1.1.5.2
Name/OID: entPhySensorOperStatus.3; Value (Integer): ok (1) .1.3.6.1.2.1.99.1.1.1.5.3
Name/OID: entPhySensorUnitsDisplay.1; Value (OctetString): rpm .1.3.6.1.2.1.99.1.1.1.6.1
Name/OID: entPhySensorUnitsDisplay.2; Value (OctetString): (C) .1.3.6.1.2.1.99.1.1.1.6.2

Name/OID: entPhySensorUnitsDisplay.3; Value (OctetString): (C) .1.3.6.1.2.1.99.1.1.1.6.3

--mode=hardware-health, bad psu not detected

Hello,
our network team told me, they have an cisco switch with a psu failure, but the hardware-health check shows that everything is ok.

Maybe you can check this, thanks!

They gave me the following output from the switch, see below
Switch: WS-C3750X-48P
IOS: 12.2(53)SE2

sh env all
FAN 1 is OK
FAN 2 is OK
FAN PS-1 is NOT INITIALIZED
FAN PS-2 is NOT PRESENT
TEMPERATURE is OK
Temperature Value: 25 Degree Celsius
Temperature State: GREEN
Yellow Threshold : 46 Degree Celsius
Red Threshold : 60 Degree Celsius
SW PID Serial# Status Sys Pwr PoE Pwr Watts

1A No Input Power Bad N/A 235/0
1B Not Present
2A C3KX-PWR-1100WAC ### OK Good Good 1100/0
2B Not Present
3A C3KX-PWR-1100WAC ### OK Good Good 1100/0
3B Not Present

SW Status RPS Name RPS Serial# RPS Port#

1 Not Present <>
2 Not Present <>
3 Not Present <>

Bug in timeticks sub for certain timestring

The sub timeticks can't parse the timestring correctly when the time is returned as
1 day, 13:12:19.27
This is at least the case with a Cisco 3825. Timeticks only matches the plural form "days".
(Patch included at bottom)

Command output after fix:

maarten@monitor:~$ ./check_nwc_health --hostname 172.16.1.1 --community public --mode uptime -v -v -v 
I am a Cisco IOS Software, 3800 Software (C3825-ADVENTERPRISEK9-M), Version 12.4(12), RELEASE SOFTWARE (fc1)
Technical Support: http://www.cisco.com/techsupport
Copyright (c) 1986-2006 by Cisco Systems, Inc.
Compiled Fri 17-Nov-06 15:31 by prod_rel_team
OK - device is up since 1d 13h 12m 19s | 'uptime'=2232.32;15:;5:

Trace:

Thu Apr 10 08:41:02 2014: cache: 1.3.6.1.2.1.1.3.0
Thu Apr 10 08:41:02 2014: cache: 1.3.6.1.2.1.1.3.0
Thu Apr 10 08:41:02 2014: snmp agent answered: 1 day, 13:12:19.27
Thu Apr 10 08:41:02 2014: cache: 1.3.6.1.2.1.1.1.0
Thu Apr 10 08:41:02 2014: cache: 1.3.6.1.2.1.1.3.0
Thu Apr 10 08:41:02 2014: uptime: 133939
Thu Apr 10 08:41:02 2014: up since: Tue Apr  8 19:28:43 2014
Thu Apr 10 08:41:02 2014: whoami: Cisco IOS Software, 3800 Software (C3825-ADVENTERPRISEK9-M), Version 12.4(12), RELEASE SOFTWARE (fc1)
Technical Support: http://www.cisco.com/techsupport
Copyright (c) 1986-2006 by Cisco Systems, Inc.
Compiled Fri 17-Nov-06 15:31 by prod_rel_team
Thu Apr 10 08:41:02 2014: using NWC::Cisco

Patch

diff --git a/plugins-scripts/GLPlugin.pm b/plugins-scripts/GLPlugin.pm
index 40a0f5d..a358359 100755
--- a/plugins-scripts/GLPlugin.pm
+++ b/plugins-scripts/GLPlugin.pm
@@ -1748,7 +1748,7 @@ sub timeticks {
   if ($timestr =~ /\((\d+)\)/) {
     # Timeticks: (20718727) 2 days, 9:33:07.27
     $timestr = $1 / 100;
-  } elsif ($timestr =~ /(\d+)\s*days.*?(\d+):(\d+):(\d+)\.(\d+)/) {
+  } elsif ($timestr =~ /(\d+)\s*day[s]*.*?(\d+):(\d+):(\d+)\.(\d+)/) {
     # Timeticks: 2 days, 9:33:07.27
     $timestr = $1 * 24 * 3600 + $2 * 3600 + $3 * 60 + $4;
   } elsif ($timestr =~ /(\d+):(\d+):(\d+)\.(\d+)/) {

UNKNOWN instead of CRITICAL for timeouts

Timeouts for some modes like interface-discards, interface-usage, interface-errors are reported as critical:

CRITICAL - could not contact snmp agent, got neither sysUptime nor sysDescr
wrong device

This should be considered as UNKNOWN instead in order not to confuse.

Brocade FastIron SuperX-PREM should be recognized as Foundry

Hello,

Some checks doesn't work for Brocade FastIron because it is not recognized as Foundry. If I try to set servertype to "brocade" manually it doesn't work either.

Here is a description line from SNMP:

Brocade Communications Systems, Inc. FastIron SuperX-PREM, IronWare Version 07.2.02eT3e3 Compiled on Oct 12 2011 at 15:24:57 labeled as SXR07202e

As a quick fix I changed the following:

@@ -6930,7 +6930,7 @@
     bless $self, 'NWC::FabOS';
     $self->debug('using NWC::FabOS');
     $self->init();
-  } elsif ($self->{productname} =~ /ICX6/i) {
+  } elsif ($self->{productname} =~ /ICX6|FastIron/i) {
     bless $self, 'NWC::Foundry';
     $self->debug('using NWC::Foundry');
     $self->init();
@@ -19762,7 +19762,7 @@
       } elsif ($self->{productname} =~ /EMC\s*DS-24M2/i) {
         bless $self, 'NWC::Brocade';
         $self->debug('using NWC::Brocade');
-      } elsif ($self->{productname} =~ /Brocade.*ICX/i) {
+      } elsif ($self->{productname} =~ /Brocade/i) {
         bless $self, 'NWC::Brocade';
         $self->debug('using NWC::Brocade');
       } elsif ($self->{productname} =~ /Fibre Channel Switch/i) {

What would be the best way to make it work?

Best regards,
Stanislav

Wrong uptime on cisco devices

snmpwalk -v2c -c community IP uptime
DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (702398739) 81 days, 7:06:27.39

/usr/lib64/nagios/plugins/check_nwc_health --hostname IP --mode uptime --community community
OK - device is up since 5549d 10h 13m 46s | 'uptime'=7991174;15:;5:;;

snmpwalk -v2c -c community IP sysDescr.0
SNMPv2-MIB::sysDescr.0 = STRING: Cisco NX-OS(tm) n6000, Software (n6000-uk9), Version 7.1(1)N1(1), RELEASE SOFTWARE Copyright (c) 2002-2012 by Cisco Systems, Inc. Device Manager Version 6.0(2)N1(1),Compiled 4/18/2015 10:00:00

snmpwalk -v2c -c community IP uptime
DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (801679283) 92 days, 18:53:12.83

/usr/lib64/nagios/plugins/check_nwc_health --hostname IP --mode uptime --community community
OK - device is up since 1086d 23h 49m 5s | 'uptime'=1565269;15:;5:;;

snmpwalk -v2c -c community IP sysDescr.0
SNMPv2-MIB::sysDescr.0 = STRING: Cisco IOS Software, s72033_rp Software (s72033_rp-ADVENTERPRISEK9_WAN-M), Version 12.2(33)SXJ10, RELEASE SOFTWARE (fc3)
Technical Support: http://www.cisco.com/techsupport
Copyright (c) 1986-2015 by Cisco Systems, Inc.
Compiled Fri 14-Aug-15 08:58 by p

It seems be fine on IOS-XE:

snmpwalk -v2c -c community IP sysDescr.0
SNMPv2-MIB::sysDescr.0 = STRING: Cisco IOS Software, IOS-XE Software, Catalyst 4500 L3 Switch Software (cat4500e-UNIVERSALK9-M), Version 03.04.00.SG RELEASE SOFTWARE (fc3)
Technical Support: http://www.cisco.com/techsupport
Copyright (c) 1986-2012 by Cisco Systems, Inc.
Compiled Wed 0

snmpwalk -v2c -c community IP uptime
DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (200980836) 23 days, 6:16:48.36

/usr/lib64/nagios/plugins/check_nwc_health --hostname IP --mode uptime --community community
OK - device is up since 23d 6h 16m 8s | 'uptime'=33496;15:;5:;;

Allow modification of "statefilesdir" during configure

The statefilesdir is hard coded to "/var/tmp" (see GLPlugin.pm line 355 to 359) in case $ENV{OMD_ROOT} is not set.

} elsif (exists $ENV{OMD_ROOT}) {
  $self->override_opt('statefilesdir', $ENV{OMD_ROOT}."/var/tmp/".$Monitoring::GLPlugin::plugin->{name});
} else {
  $self->override_opt('statefilesdir', "/var/tmp/".$Monitoring::GLPlugin::plugin->{name});
}

It would be nice if the path can be changed during configure. The use case of this feature request is utilizing any other directory (e.g. a ram disk) to store all statefile data.

--warningx and --criticalx not working

Hello.

I have a monitoring host running icinga2.
The service is defined with variables:

# icinga2 object list --type Service --name "if-5" | egrep "(check_command|nwc)"

* check_command = "nwc_health"
    * nwc_health_community = "xxxxxxxx"
    * nwc_health_criticalx = "GigaEthernet0/5_discards_in=100"
    * nwc_health_mode = "interface-health"
    * nwc_health_name = "GigaEthernet0/5"
    * nwc_health_regexp = false
    * nwc_health_units = "MBi"
    * nwc_health_warningx = "GigaEthernet0/5_discards_in=50"

In the Icinga2 web interface i see the critical and warning value set correctly:

The problem is i still receive Warning Notifications:

SNMP results with Hex-STRING data is processed wrongly

This is a problem when you try to compare Live SNMP requests with the result of a snmpwalk simulation file.

Net-SNMP returns the Hex-STRING data as "packed" binary stream while the snmpwalk simulation function reads the data as "0x01 00 00 00 00 ...".

Right now I don't know which functions of check_nwc_health are based on hex-data so I don't know which parts are afffected.

Dynamic time margin in config check

Currently the time margin between running and saved config is hardcoded to 300s (5min) in plugins-scripts/Classes/Cisco/IOS/Component/ConfigSubsystem.pm:

sub check {
  my $self = shift;
  my $info;
  my $runningChangedMarginAfterReload = 300;
  $self->add_info('checking config');
[...]
    # If running config is reported to have changed within the (5 minute) margin since the last reload
    if (($runningUnchangedDuration + $runningChangedMarginAfterReload) > $self->uptime()) {
      $self->add_ok(sprintf("running config has not changed since reload (using a %d second margin)",
          $runningChangedMarginAfterReload));

It would be nice if $runningChangedMarginAfterReload could be dynamically adjusted with a parameter.

No Warning and Critical Values in Perfdata for mode=hardware-health

Hello,

with the newst version of your plugin, there are no warning and critical values in the perfdata for --mode=hardware-health anymore.

Version 3.0n:
OK - environmental hardware working fine | 'sens_celsius_21590'=28;;;; 'sens_celsius_21598'=48;;;;

Version 2.6.5:
OK - environmental hardware working fine | 'sens_celsius_21590'=28;57;67 'sens_celsius_21598'=48;50;60

As I can see, these values came directly from the device itself.

Maybe its just a simple bug and you can fix it

Thanks for this great plugin

Human Timeticks wrong

I have tested this on several Cisco devices and I get a wrong uptime with the latest version of check_nwc_health.

WRONG: With check_nwc_health $Revision: 2.6.4.2

check_nwc_health --hostname 192.168.1.1 --community public --mode uptime
OK - device is up since 1d 13h 31m 23s | 'uptime'=135083.43;15:;5:

RIGHT: With check_nwc_health.bak $Revision: 2.5.1.2

check_nwc_health.bak --hostname 192.168.1.1 --community public --mode uptime
OK - device is up since 135080 minutes | 'uptime'=135080.87;15:;5:

Expected:

check_nwc_health --hostname 192.168.1.1 --community public --mode uptime
OK - device is up since 93d 19h 19m  0s | 'uptime'=135079.70;15:;5:

Checkpoint VSX connections wrong

When Checkpoint is not a FW-1 but a VSX, the "fw-connections" check is wrong. fw-connections checks for the number of connections in the following oid:

 'fwNumConn' => '1.3.6.1.4.1.2620.1.1.25.3.0',

However in Checkpoint VSX, the correct OID to retrieve the current number of connections is the following: 1.3.6.1.4.1.2620.1.16.23.1.1.2

Furthermore, this OID can return multiple values (per virtual system id):

# snmpwalk -v 2c -c monitoring vxsip 1.3.6.1.4.1.2620.1.16.23.1.1.2
iso.3.6.1.4.1.2620.1.16.23.1.1.2.1.0 = Counter32: 58
iso.3.6.1.4.1.2620.1.16.23.1.1.2.2.0 = Counter32: 0
iso.3.6.1.4.1.2620.1.16.23.1.1.2.3.0 = Counter32: 6077
iso.3.6.1.4.1.2620.1.16.23.1.1.2.4.0 = Counter32: 0
iso.3.6.1.4.1.2620.1.16.23.1.1.2.5.0 = Counter32: 10246

There are two changes that need to be done to the plugin:

Auto-detect that this is a VSX, not a classical FW-1 firewall (maybe check if OID 1.3.6.1.4.1.2620.1.16.23.1.1.2 returns something valid)
Get the values of all found entries of the OID 1.3.6.1.4.1.2620.1.16.23.1.1.2 and make a sum for the real number of fw-connections.

All other checks seem to work fine (tested with hardware-health, interface-usage, uptime, cpu-load, memory-usage, ha-role, svn-status.

--mitigation not working

Hallo!

Unfortunately it seems that the switch --mitigation is not working properly (or at all)

./check_nwc_health --community **** --hostname ***.201 --mode hsrp-state -v
I am a Cisco IOS Software, 2801 Software (C2801-ADVIPSERVICESK9-M), Version 12.4(23), RELEASE SOFTWARE (fc1)
Technical Support: http://www.cisco.com/techsupport
Copyright (c) 1986-2008 by Cisco Systems, Inc.
Compiled Sat 08-Nov-08 22:58 by prod_rel_team
CRITICAL - state in group 0 (interface 2) is standby instead of active
checking hsrp groups
hsrp group 0 (interface 2) state is standby (active router is ***.200, standby router is ***.201

Trying to "mitigate" the problem doesn't work...

./check_nwc_health --community **** --hostname ***.201 --mode hsrp-state --mitigation ok -v
I am a Cisco IOS Software, 2801 Software (C2801-ADVIPSERVICESK9-M), Version 12.4(23), RELEASE SOFTWARE (fc1)
Technical Support: http://www.cisco.com/techsupport
Copyright (c) 1986-2008 by Cisco Systems, Inc.
Compiled Sat 08-Nov-08 22:58 by prod_rel_team
CRITICAL - state in group 0 (interface 2) is standby instead of active
checking hsrp groups
hsrp group 0 (interface 2) state is standby (active router is ***.200, standby router is ***.201

Thank you for the great work you have done!

Schönen Tag noch...

--mode=hardware-health, some minor bugs

Hello Gerhard,

I think I found two bugs in the hardware-health mode.

We have a Cisco Switch which has two celsius sensors. One of the sensors reported threshold evaluation=true and basically thats ok. The output shows this:
CRITICAL - celsius sensor 21598 threshold evaluation is true | 'sens_celsius_21590'=27;57;67;; 'sens_celsius_21598'=51;50;60;;
As you can see, the actual value (51) only exceeded the first threshold (50), but your plugin shows a "CRITICAL" message. I think this should only be a "WARNING" instead of CRITICAL.

I tried to overwrite the threshold coming from the device itself with our own values, because our network team told me, that the default thresholds are a bit too low. I tried --warning=55 and --critical=60, but the plugin uses the default values instead.
This was the command and the output:
check_nwc_health --hostname=Y.Y.Y.Y --community=xxxxxx --mode=hardware-health --warning=55 --critical=60
CRITICAL - celsius sensor 21598 threshold evaluation is true | 'sens_celsius_21590'=27;57;67;; 'sens_celsius_21598'=51;50;60;;

Im using the newest Version:
check_nwc_health -V
check_nwc_health $Revision: 3.0.3.8 $ [http://labs.consol.de/nagios/check_nwc_health]

Maybe you have some time to look into that.

Thanks! :)

uninitialized value in string when using --mode hardware-health on Nexus 7k with version 3.1

After upgrading to version 3.1 all our Cisco Nexus 7k switches are throwing the following error:

jemurray@nagios:~$ /usr/lib/nagios/plugins/check_nwc_health --community public --hostname ncdc-nx7k-0.example.com --mode hardware-health
Use of uninitialized value in string eq at /usr/lib/nagios/plugins/check_nwc_health line 4950.
Use of uninitialized value in string eq at /usr/lib/nagios/plugins/check_nwc_health line 4982.
.....

I tracked it down to this variable:

$self->{entSensorThresholdEvaluation}

It is undefined when the script runs.

Switch version:

I also tested with the latest release ($Revision: 3.2) from github and the problem still exists.

Let me know what additional information you need from me?

Full output of the script run normally:

jemurray@selleck:~$ /usr/lib/nagios/plugins/check_nwc_health --community public --hostname ncdc-nx7k-0.example.com --mode hardware-health
Use of uninitialized value in string eq at /usr/lib/nagios/plugins/check_nwc_health line 4950.
Use of uninitialized value in string eq at /usr/lib/nagios/plugins/check_nwc_health line 4982.
Use of uninitialized value in string eq at /usr/lib/nagios/plugins/check_nwc_health line 4950.
Use of uninitialized value in string eq at /usr/lib/nagios/plugins/check_nwc_health line 4982.
Use of uninitialized value in string eq at /usr/lib/nagios/plugins/check_nwc_health line 4950.
Use of uninitialized value in string eq at /usr/lib/nagios/plugins/check_nwc_health line 4982.
Use of uninitialized value in string eq at /usr/lib/nagios/plugins/check_nwc_health line 4950.
Use of uninitialized value in string eq at /usr/lib/nagios/plugins/check_nwc_health line 4982.
Use of uninitialized value in string eq at /usr/lib/nagios/plugins/check_nwc_health line 4950.
Use of uninitialized value in string eq at /usr/lib/nagios/plugins/check_nwc_health line 4982.
Use of uninitialized value in string eq at /usr/lib/nagios/plugins/check_nwc_health line 4950.
Use of uninitialized value in string eq at /usr/lib/nagios/plugins/check_nwc_health line 4982.
Use of uninitialized value in string eq at /usr/lib/nagios/plugins/check_nwc_health line 4950.
Use of uninitialized value in string eq at /usr/lib/nagios/plugins/check_nwc_health line 4982.
Use of uninitialized value in string eq at /usr/lib/nagios/plugins/check_nwc_health line 4950.
Use of uninitialized value in string eq at /usr/lib/nagios/plugins/check_nwc_health line 4982.
Use of uninitialized value in string eq at /usr/lib/nagios/plugins/check_nwc_health line 4950.
Use of uninitialized value in string eq at /usr/lib/nagios/plugins/check_nwc_health line 4982.
Use of uninitialized value in string eq at /usr/lib/nagios/plugins/check_nwc_health line 4950.
Use of uninitialized value in string eq at /usr/lib/nagios/plugins/check_nwc_health line 4982.
Use of uninitialized value in string eq at /usr/lib/nagios/plugins/check_nwc_health line 4950.
Use of uninitialized value in string eq at /usr/lib/nagios/plugins/check_nwc_health line 4982.
Use of uninitialized value in string eq at /usr/lib/nagios/plugins/check_nwc_health line 4950.
Use of uninitialized value in string eq at /usr/lib/nagios/plugins/check_nwc_health line 4982.
Use of uninitialized value in string eq at /usr/lib/nagios/plugins/check_nwc_health line 4950.
Use of uninitialized value in string eq at /usr/lib/nagios/plugins/check_nwc_health line 4982.
Use of uninitialized value in string eq at /usr/lib/nagios/plugins/check_nwc_health line 4950.
Use of uninitialized value in string eq at /usr/lib/nagios/plugins/check_nwc_health line 4982.
Use of uninitialized value in string eq at /usr/lib/nagios/plugins/check_nwc_health line 4950.
Use of uninitialized value in string eq at /usr/lib/nagios/plugins/check_nwc_health line 4982.
OK - environmental hardware working fine | 'sens_celsius_21590'=16;42;60;; 'sens_celsius_21591'=32;60;80;; 'sens_celsius_21592'=39;95;105;; 'sens_celsius_21593'=28;95;105;; 'sens_celsius_21594'=35;95;105;; 'sens_celsius_21595'=33;70;85;; 'sens_celsius_21596'=32;70;85;; 'sens_celsius_21597'=34;70;85;; 'sens_celsius_21598'=29;70;85;; 'sens_celsius_21599'=29;70;85;; 'sens_celsius_21600'=27;70;85;; 'sens_celsius_21601'=29;70;85;; 'sens_celsius_21602'=24;70;85;; 'sens_celsius_21719'=27;105;115;; 'sens_celsius_21720'=26;105;115;; 'sens_celsius_21721'=28;105;115;; 'sens_celsius_21722'=35;105;115;; 'sens_celsius_21723'=39;105;115;; 'sens_celsius_21724'=42;105;115;; 'sens_celsius_21725'=23;105;115;; 'sens_celsius_21726'=24;105;115;; 'sens_celsius_21727'=20;105;115;; 'sens_celsius_21728'=34;105;115;; 'sens_celsius_21729'=24;105;115;; 'sens_celsius_21730'=21;105;115;; 'sens_celsius_21731'=61;105;115;; 'sens_celsius_21732'=61;105;115;; 'sens_celsius_21733'=52;105;115;; 'sens_celsius_21734'=52;105;115;; 'sens_celsius_21735'=58;105;115;; 'sens_celsius_21736'=58;105;115;; 'sens_celsius_21737'=41;105;115;; 'sens_celsius_21738'=41;105;115;; 'sens_celsius_21739'=64;105;115;; 'sens_celsius_21740'=64;105;115;; 'sens_celsius_21741'=63;105;115;; 'sens_celsius_21742'=63;105;115;; 'sens_celsius_21743'=55;105;115;; 'sens_celsius_21744'=55;105;115;; 'sens_celsius_21745'=49;105;115;;

Exclude VLANs for performance output

There should be possible to exclude VLANs for performance output of:

check_nwc_health_interface-usage
check_nwc_health_interface-errors
check_nwc_health_interface-discards

because numbers of VLANs are changing and this brakes pnp4nagios graphs.

mode hardware-health on HH3C mismatches indices of hh3cEntityExtErrorStatus with entity MIB

The mode hardware-health on HH3C platform like HP5900 switches use entPhysicalDescr hh3cEntityExtErrorStatus MIB variables. The code in sub get_sub_table returns an array that is in pseudo random order from the perl built in functions keys and values.

package Classes::HH3C::Component::EnvironmentalSubsystem;
our @ISA = qw(Classes::HH3C::Component::EntitySubsystem);
use strict;

sub init {
  my $self = shift;

  $self->get_entities('Classes::HH3C::Component::EnvironmentalSubsystem::EntityState');

  # ** $self->entities at this time is in pseudo random order. It is not ordered by
  # ** $self->entities->[0]->flat_indices 

  my $i = 0;

  # ** The result of $self->get_sub_table is in another pseudo random order but
  # ** not the same as before as the code assumes.
  # ** This results in different results every time check_nwc_health is called

  foreach my $h ($self->get_sub_table('HH3C-ENTITY-EXT-MIB', [ 'hh3cEntityExtErrorStatus' ])) {
    foreach (keys %$h) {
      next if $_ =~ /indices/;
      @{$self->{entities}}[$i]->{$_} = $h->{$_};
    }
    $i++;
  }
}

I don't understand the structure of the script enough to give the best solution but I tried the following and it seems to work (for me):

sub init {
  my $self = shift;

  $self->get_entities('Classes::HH3C::Component::EnvironmentalSubsystem::EntityState');

  # create a lookup hash table
  my %index;
  my $j = 0;
  foreach my $entity (@{$self->{entities}}) {
    $index{$entity->{flat_indices}} = $j;
    $j++
  }

  my $i = 0;
  foreach my $h ($self->get_sub_table('HH3C-ENTITY-EXT-MIB', [ 'hh3cEntityExtErrorStatus' ])) {
    next unless exists($h->{flat_indices});
    foreach (keys %$h) {
      next if $_ =~ /indices/;
      @{$self->{entities}}[$index{$h->{flat_indices}}]->{$_} = $h->{$_};
    }
    $i++;
  }
}

HP serie 5700 and 5900 doen't work

HI All,

I try to monitor HP serie 5700, 5900 and i got ::

./check_nwc_health check_nwc_health --hostname ... --timeout 240 --community toto --mode cpu-load

Mode cpu-load is not implemented for this type of device

Use of uninitialized value in pattern match

Just upgraded to the newest 5.1, there seems to be an issue with the mode interface-usage as i always get the the output like this:


check_nwc_health --hostname 1.1.1.1--mode interface-usage --community test --name Bundle-Ether1401 --units MBi

Use of uninitialized value in pattern match (m//) at ./check_nwc_health line 1788.
Use of uninitialized value in pattern match (m//) at ./check_nwc_health line 1788.
OK - interface Bundle-Ether1401 (alias * to bun1500.ZHHARG1Esr002 L2 *) usage is in:1.38% (131.18MBi/s) out:0.46% (43.90MBi/s) | 'Bundle-Ether1401_usage_in'=1.38%;80;90;0;100 'Bundle-Ether1401_usage_out'=0.46%;80;90;0;100 'Bundle-Ether1401_traffic_in'=131.18;7629.3945;8583.0688;0;9536.7432 'Bundle-Ether1401_traffic_out'=43.90;7629.3945;8583.0688;0;9536.7432

The exact same command worked fine with version 3.5.1, i'm not that familiar with perl to see whats the problem here.

Thanks
Reto

MIB Implementation check gives wrong result in simulation mode

Checking for a supported MIB gives wrong result when a similar MIB is available:

Example:

$ ./check_nwc_health -vvv --snmpwalk simulation.txt --mode supportedmibs | grep 1.3.6.1.4.1.9.9.13
implements CISCO-ENTITY-ALARM-MIB 1.3.6.1.4.1.9.9.138.1
implements CISCO-ENVMON-MIB 1.3.6.1.4.1.9.9.13

Simulation Data:

$ grep 1.3.6.1.4.1.9.9.13 PPD88138.snmpwalk.txt | cut -d. -f -11 | uniq -c
761 .1.3.6.1.4.1.9.9.138.1

Use of uninitialized value in hash element error when using list-interfaces

When I'm using de mode list-interfaces I get the following errors:

Use of uninitialized value in hash element at ./check_nwc_health line 21017.
Use of uninitialized value in hash element at ./check_nwc_health line 21018.
Use of uninitialized value in hash element at ./check_nwc_health line 21019.

The switch I want to monitor is a Cisco Nexus.

H3C S3600 FW3.1 use Huawei MIB

Hi,

as described here :
https://h20565.www2.hpe.com/hpsc/doc/public/display?sp4ts.oid=4177547&docId=emr_na-c02918147&docLocale=en_US
the H3C S3600 switches do not use the H3C, nor the a3Com MIBs. They use the Huawei MIB, see page 4 bottom of the PDF.

It makes the script scream :

-bash-4.1$ ./check_nwc_health --hostname 10.249.249.19 --community my_community --mode hardware-health
Use of uninitialized value in sprintf at ./check_nwc_health line 23111.
Use of uninitialized value in string eq at ./check_nwc_health line 23116.
Use of uninitialized value in string eq at ./check_nwc_health line 23118.
Use of uninitialized value in pattern match (m//) at ./check_nwc_health line 23118.
[…]
Use of uninitialized value in sprintf at ./check_nwc_health line 23111.
Use of uninitialized value in string eq at ./check_nwc_health line 23116.
Use of uninitialized value in string eq at ./check_nwc_health line 23118.
Use of uninitialized value in pattern match (m//) at ./check_nwc_health line 23118.
CRITICAL - Ethernet1/0/19 (port) is , Ethernet1/0/18 (port) is , Ethernet1/0/7 (port) is ,  (module) is , Ethernet1/0/12 (port) is , Ethernet1/0/4 (port) is , Ethernet1/0/2 (port) is , Ethernet1/0/30 (port) is , Ethernet1/0/41 (port) is , Ethernet1/0/13 (port) is , Ethernet1/0/43 (port) is , Ethernet1/0/47 (port) is , Ethernet1/0/6 (port) is , FAN (fan) is , Ethernet1/0/17 (port) is , Ethernet1/0/21 (port) is , CONTAINER LEVEL2 (container) is , Ethernet1/0/34 (port) is , Ethernet1/0/15 (port) is , Ethernet1/0/36 (port) is , Ethernet1/0/25 (port) is , Ethernet1/0/44 (port) is , CONTAINER LEVEL2 (container) is , Ethernet1/0/27 (port) is , Ethernet1/0/1 (port) is , Ethernet1/0/38 (port) is , Ethernet1/0/46 (port) is , Ethernet1/0/42 (port) is , Ethernet1/0/31 (port) is , Ethernet1/0/5 (port) is , Ethernet1/0/48 (port) is , Ethernet1/0/40 (port) is , GigabitEthernet1/1/1 (port) is , GigabitEthernet1/1/4 (port) is , GigabitEthernet1/1/2 (port) is , H3C S3600-52P-PWR-EI Software Version Release 1702P03 (chassis) is , Ethernet1/0/3 (port) is , H3C S3600-EI Software Version Release 1702P03 (stack) is , Ethernet1/0/16 (port) is , Ethernet1/0/11 (port) is , Ethernet1/0/14 (port) is , Ethernet1/0/26 (port) is ,  (module) is , GigabitEthernet1/1/3 (port) is , Ethernet1/0/45 (port) is , Ethernet1/0/35 (port) is , Ethernet1/0/10 (port) is , Ethernet1/0/39 (port) is , Ethernet1/0/28 (port) is , Ethernet1/0/8 (port) is , Ethernet1/0/32 (port) is , Ethernet1/0/9 (port) is ,  (module) is , CONTAINER LEVEL1 (container) is , Ethernet1/0/22 (port) is , CONTAINER LEVEL1 (container) is , Ethernet1/0/37 (port) is , PSU (powerSupply) is , Ethernet1/0/33 (port) is , Ethernet1/0/24 (port) is , PSU (powerSupply) is , CONTAINER LEVEL1 (container) is , Ethernet1/0/20 (port) is , Ethernet1/0/23 (port) is , Ethernet1/0/29 (port) is , CONTAINER LEVEL1 (container) is
-bash-4.1$

If I force the Huawei MIB, the script is not verbose at all :

-bash-4.1$ ./check_nwc_health --hostname 10.249.249.19 --community my_community --mode **hardware-health** --servertype **huawei**
OK
-bash-4.1$ ./check_nwc_health --hostname 10.249.249.19 --community my_community --mode **cpu-load** --servertype **huawei**
OK
-bash-4.1$ ./check_nwc_health --hostname 10.249.249.19 --community my_community --mode **memory-usage** --servertype **huawei**
OK
-bash-4.1$

See MIB snmpwalk attached.
10.249.251.18-walk.txt.zip

Detect failed power supply in Checkpoint

Currently check_nwc_health does not detect when a power supply is down in Checkpoint running on Nokia harddware.

# ./check_nwc_health --hostname 1.2.3.4 --community public --mode hardware-health  -v -v
I am a Linux checkpoint1 2.6.18-92cp #1 SMP Wed Apr 8 15:46:38 IDT 2015 i686
[DISKSUBSYSTEM]
diskPercent: 86
info: checking disks

[STORAGE_31]
hrStorageAllocationUnits: 4096
hrStorageDescr: /
hrStorageIndex: 31
hrStorageSize: 8125916
hrStorageType: hrStorageFixedDisk
hrStorageUsed: 1093091
info: storage 31 (/) has 86.55% free space left

[STORAGE_32]
hrStorageAllocationUnits: 1024
hrStorageDescr: /boot
hrStorageIndex: 32
hrStorageSize: 147692
hrStorageType: hrStorageFixedDisk
hrStorageUsed: 87030
info: storage 32 (/boot) has 41.07% free space left

[STORAGE_33]
hrStorageAllocationUnits: 4096
hrStorageDescr: /var/log
hrStorageIndex: 33
hrStorageSize: 38090235
hrStorageType: hrStorageFixedDisk
hrStorageUsed: 5424730
info: storage 33 (/var/log) has 85.76% free space left

[TEMPERATURESUBSYSTEM]

[TEMPERATURE_1.0]
sensorsTemperatureIndex: 1
sensorsTemperatureName: CPU Temp
sensorsTemperatureStatus: normal
sensorsTemperatureType: Temperature
sensorsTemperatureUOM: degrees C
sensorsTemperatureValue: 43
info: temperature CPU Temp is normal (43 degrees C)

[TEMPERATURE_2.0]
sensorsTemperatureIndex: 2
sensorsTemperatureName: System Temp
sensorsTemperatureStatus: normal
sensorsTemperatureType: Temperature
sensorsTemperatureUOM: degrees C
sensorsTemperatureValue: 34
info: temperature System Temp is normal (34 degrees C)

[FANSUBSYSTEM]

[FAN_1.0]
sensorsFanIndex: 1
sensorsFanName: Case Fan 1
sensorsFanStatus: normal
sensorsFanType: Fan
sensorsFanUOM: RPM
sensorsFanValue: 5273
info: fan Case Fan 1 is normal (5273 RPM)

[FAN_2.0]
sensorsFanIndex: 2
sensorsFanName: Case Fan 2
sensorsFanStatus: normal
sensorsFanType: Fan
sensorsFanUOM: RPM
sensorsFanValue: 5625
info: fan Case Fan 2 is normal (5625 RPM)

[FAN_3.0]
sensorsFanIndex: 3
sensorsFanName: Case Fan 3
sensorsFanStatus: normal
sensorsFanType: Fan
sensorsFanUOM: RPM
sensorsFanValue: 5443
info: fan Case Fan 3 is normal (5443 RPM)

[VOLTAGESUBSYSTEM]

[VOLTAGE_1.0]
sensorsVoltageIndex: 1
sensorsVoltageName: 3Vbat
sensorsVoltageStatus: normal
sensorsVoltageType: Voltage
sensorsVoltageUOM: Volts
sensorsVoltageValue: 3.12
info: voltage 3Vbat is normal (3.12 Volts)

[VOLTAGE_2.0]
sensorsVoltageIndex: 2
sensorsVoltageName: 3VSB
sensorsVoltageStatus: normal
sensorsVoltageType: Voltage
sensorsVoltageUOM: Volts
sensorsVoltageValue: 3.36
info: voltage 3VSB is normal (3.36 Volts)

[VOLTAGE_3.0]
sensorsVoltageIndex: 3
sensorsVoltageName: Vtt
sensorsVoltageStatus: normal
sensorsVoltageType: Voltage
sensorsVoltageUOM: Volts
sensorsVoltageValue: 1.10
info: voltage Vtt is normal (1.10 Volts)

[VOLTAGE_4.0]
sensorsVoltageIndex: 4
sensorsVoltageName: 5V
sensorsVoltageStatus: normal
sensorsVoltageType: Voltage
sensorsVoltageUOM: Volts
sensorsVoltageValue: 5.20
info: voltage 5V is normal (5.20 Volts)

[VOLTAGE_5.0]
sensorsVoltageIndex: 5
sensorsVoltageName: 1.5V
sensorsVoltageStatus: normal
sensorsVoltageType: Voltage
sensorsVoltageUOM: Volts
sensorsVoltageValue: 1.52
info: voltage 1.5V is normal (1.52 Volts)

[VOLTAGE_6.0]
sensorsVoltageIndex: 6
sensorsVoltageName: 3.3V
sensorsVoltageStatus: normal
sensorsVoltageType: Voltage
sensorsVoltageUOM: Volts
sensorsVoltageValue: 3.39
info: voltage 3.3V is normal (3.39 Volts)

[VOLTAGE_7.0]
sensorsVoltageIndex: 7
sensorsVoltageName: +12V
sensorsVoltageStatus: normal
sensorsVoltageType: Voltage
sensorsVoltageUOM: Volts
sensorsVoltageValue: 12.20
info: voltage +12V is normal (12.20 Volts)

[VOLTAGE_8.0]
sensorsVoltageIndex: 8
sensorsVoltageName: Vcore
sensorsVoltageStatus: normal
sensorsVoltageType: Voltage
sensorsVoltageUOM: Volts
sensorsVoltageValue: 1.10
info: voltage Vcore is normal (1.10 Volts)

OK - environmental hardware working fine
checking disks
storage 31 (/) has 86.55% free space left
storage 32 (/boot) has 41.07% free space left
storage 33 (/var/log) has 85.76% free space left
temperature CPU Temp is normal (43 degrees C)
temperature System Temp is normal (34 degrees C)
fan Case Fan 1 is normal (5273 RPM)
fan Case Fan 2 is normal (5625 RPM)
fan Case Fan 3 is normal (5443 RPM)
voltage 3Vbat is normal (3.12 Volts)
voltage 3VSB is normal (3.36 Volts)
voltage Vtt is normal (1.10 Volts)
voltage 5V is normal (5.20 Volts)
voltage 1.5V is normal (1.52 Volts)
voltage 3.3V is normal (3.39 Volts)
voltage +12V is normal (12.20 Volts)
voltage Vcore is normal (1.10 Volts) | '/_free_pct'=86.55%;10:;5:;0;100 '/boot_free_pct'=41.07%;10:;5:;0;100 '/var/log_free_pct'=85.76%;10:;5:;0;100 'temperature_CPU Temp'=43;60;70;; 'temperature_System Temp'=34;60;70;; 'fanCase Fan 1_rpm'=5273;60;70;; 'fanCase Fan 2_rpm'=5625;60;70;; 'fanCase Fan 3_rpm'=5443;60;70;; 'voltage3Vbat_rpm'=3.12;60;70;; 'voltage3VSB_rpm'=3.36;60;70;; 'voltageVtt_rpm'=1.10;60;70;; 'voltage5V_rpm'=5.20;60;70;; 'voltage1.5V_rpm'=1.52;60;70;; 'voltage3.3V_rpm'=3.39;60;70;; 'voltage+12V_rpm'=12.20;60;70;; 'voltageVcore_rpm'=1.10;60;70;;

The relevant information can be found in the powerSupplyTable (1.3.6.1.4.1.2620.1.6.7.9.1):

# snmpwalk -v 2c -c public 1.2.3.4 1.3.6.1.4.1.2620.1.6.7.9.1
iso.3.6.1.4.1.2620.1.6.7.9.1.1.1.1.0 = INTEGER: 1
iso.3.6.1.4.1.2620.1.6.7.9.1.1.1.2.0 = INTEGER: 2
iso.3.6.1.4.1.2620.1.6.7.9.1.1.2.1.0 = STRING: "Up"
iso.3.6.1.4.1.2620.1.6.7.9.1.1.2.2.0 = STRING: "Down"

Bug in 5.2.2: Checkpoint Management Appliance cpu and memory checks not working

Since using version 5.2.2, the cpu-load and memory-usage checks stopped working on a Checkpoint Management Appliance (basically a Linux server running some daemons).

$ ./check_nwc_health --version
check_nwc_health $Revision: 5.2.2 $ [http://labs.consol.de/nagios/check_nwc_health]

$ ./check_nwc_health --hostname mgmtserver --protocol 3 --username nagios --authpassword XXX --authprotocol md5 --mode cpu-load -v -v -v
I am a Linux u020sys0 2.6.18-92cp #1 SMP Tue Dec 1 00:01:24 IST 2015 i686
Model Linux u020sys0 2.6.18-92cp #1 SMP Tue Dec 1 00:01:24 IST 2015 i686 is not implemented

Tested it with an older version (4.6.1) and it works here:

$ ./check_nwc_health.4.6.1 --version
check_nwc_health.4.6.1 $Revision: 4.6.1 $ [http://labs.consol.de/nagios/check_nwc_health]

$ ./check_nwc_health.4.6.1 --hostname mgmtserver --protocol 3 --username nagios --authpassword XXX --authprotocol md5 --mode cpu-load -v -v -v
I am a Linux u020sys0 2.6.18-92cp #1 SMP Tue Dec 1 00:01:24 IST 2015 i686
[CPUSUBSYSTEM]
procUsage: 93
info: cpu usage is 93.00%

CRITICAL - cpu usage is 93.00%
checking cpus
cpu usage is 93.00% | 'cpu_usage'=93%;80;90;0;100

This is most likely due to the different Checkpoint detection method in commit 3560626.

cant see all interfaces when running list-interfaces

Hi,
Trying to receive the list of interfaces from HP Procurve 2810/2510 48 port.
I can see only 35 of them instead 48.

Nortel 4548GT-PWR with failed fan not displayed

Hi,

I have a Nortel 4548GT-PWR that is showing a fan 3 in failed state, with cli, running "show envir" appears as failed

Running hardware-health mode appear all OK, and output in debug mode is:

./check_nwc_health --hostname 192.168.170.107 --community public --mode hardware-health -v
I am a Ethernet Routing Switch 4548GT-PWR HW:06 FW:5.3.0.3 SW:v5.3.2.007 BN:07 (c) Nortel Networks
OK - environmental hardware working fine
component 3.10.0/48 ports 10/100/1000 BaseT with PoE and 4 shared SFP ports status is normal (admin enable)
component 3.10.1/ status is removed (admin enable)
component 3.10.2/ status is removed (admin enable)
component 3.10.3/SX GBIC status is normal (admin enable)
component 3.10.4/SX GBIC status is normal (admin enable)
component 3.10.5/ status is normal (admin enable)
component 4.10.0/Primary Power Supply status is normal (admin enable)
component 4.11.0/Redundant Power Supply status is removed (admin enable)
component 5.10.0/Temperature Sensor status is normal (admin enable)
component 6.10.0/Internal Fan 1 status is normal (admin enable)
component 6.11.0/Internal Fan 2 status is normal (admin enable)
component 6.12.0/Internal Fan 3 status is warning (admin enable)
component 6.13.0/Internal Fan 4 status is normal (admin enable)
component 8.1.0/48 ports 10/100/1000 BaseT with PoE and 4 shared SFP ports status is normal (admin enable)

and 6.12 (failed fan)
[COMP_6.12.0]
s5ChasComAdminState: enable
s5ChasComBaseNumPorts: 0
s5ChasComDescr: Internal Fan 3
s5ChasComGroupMap: 0
s5ChasComGrpIndx: 6
s5ChasComIndx: 12
s5ChasComIpAddress: 0.0.0.0
s5ChasComLocation:
s5ChasComLstChng: 0
s5ChasComMaxSubs: 0
s5ChasComNumSubs: 0
s5ChasComOperState: warning
s5ChasComRelPos: 0
s5ChasComSerNum:
s5ChasComShortDescr: Internal Fan 3
s5ChasComSubIndx: 0
s5ChasComTotalNumPorts: 0
s5ChasComType: 1.3.6.1.4.1.45.1.6.1.3.10.91.1
s5ChasComVer:
info: component 6.12.0/Internal Fan 3 status is warning (admin enable)

Why is shown OK when a warning is showed in debug mode?
How can I solve this?

Thanks

autoconf not working

i can't get autoconf/the resulting configure script working:

--- monitoring/check_nwc_health ‹master› » autoconf
configure.ac:5: error: possibly undefined macro: AM_INIT_AUTOMAKE
      If this token and others are legitimate, please use m4_pattern_allow.
      See the Autoconf documentation.
configure.ac:6: error: possibly undefined macro: AM_MAINTAINER_MODE
configure.ac:58: error: possibly undefined macro: AM_CONDITIONAL
--- monitoring/check_nwc_health ‹master* ⁇› » ./configure
./configure: line 1713: syntax error near unexpected token `1.9'
./configure: line 1713: `AM_INIT_AUTOMAKE(1.9 tar-pax)'

--- monitoring/check_nwc_health ‹master* ⁇› » autoconf --version
autoconf (GNU Autoconf) 2.69
…

btw, please update the README.md to include that autoconf is needed!

How to ignore proxy for FritzBox

I have a system wide http_proxy set and no exceptions defined at the moment. If I try to poll our FritzBox the app tries to connect via a proxy server. Using

no_proxy='*'

doesn't help, though it works with lynx.

If there is already a way to let check_nwc_health ignore the proxy please tell me, otherwise please implement a way. Either a "--no-proxy" or a environment variable would suffice.

Error mode hardware health Nortel 5632

Hi,

We have tested check_nwc_health with nortel switchs and are working perfect, only detected a problem with nortel 5632 in hardware health mode

we have two 5632 and output that we receive same error in both machines

[root@server check_nwc_health]# ./check_nwc_health --hostname 192.168.168.11 --community public --mode hardware-health
UNKNOWN - component 4.12.0/Unavailable status is notAvail (admin enable)
[root@server check_nwc_health]# ./check_nwc_health --hostname 192.168.168.12 --community public --mode hardware-health
UNKNOWN - component 4.12.0/Unavailable status is notAvail (admin enable)

we have investigated and seems that nortel hardware are ok, but check_nwc is reporting an incorrecte values of PSU that doesn't exist

I have found this:
http://www.mibdepot.com/cgi-bin/getmib3.cgi?win=mib_a&r=avaya&f=S5-REG-MIB.mib&v=v2&t=tree
s5ChasComERS5632-FDPwr-None OBJ ID 1.3.6.1.4.1.45.1.6.1.3.8.108.3 --> ERROR
s5ChasComERS5632-FDPwr-AC-DC-12V-300W OBJ ID 1.3.6.1.4.1.45.1.6.1.3.8.108.4 --> OK

It seems that 4.12 is 1.3.6.1.4.1.45.1.6.1.3.8.108.3 related with power, but this machine has two powers main and redund, .4, and .3 is the problem doesn't exist and report a failure?

check_nwc in debug mode show:
[COMP_4.10.0]
s5ChasComAdminState: enable
s5ChasComBaseNumPorts: 0
s5ChasComDescr: AC-DC-12V-300W
s5ChasComGroupMap: 0
s5ChasComGrpIndx: 4
s5ChasComIndx: 10
s5ChasComIpAddress: 0.0.0.0
s5ChasComLocation:
s5ChasComLstChng: 0
s5ChasComMaxSubs: 0
s5ChasComNumSubs: 0
s5ChasComOperState: normal
s5ChasComRelPos: 0
s5ChasComSerNum:
s5ChasComShortDescr: AC-DC-12V-300W
s5ChasComSubIndx: 0
s5ChasComTotalNumPorts: 0
s5ChasComType: 1.3.6.1.4.1.45.1.6.1.3.8.108.4
s5ChasComVer:
info: component 4.10.0/AC-DC-12V-300W status is normal (admin enable)

[COMP_4.11.0]
s5ChasComAdminState: enable
s5ChasComBaseNumPorts: 0
s5ChasComDescr: AC-DC-12V-300W
s5ChasComGroupMap: 0
s5ChasComGrpIndx: 4
s5ChasComIndx: 11
s5ChasComIpAddress: 0.0.0.0
s5ChasComLocation:
s5ChasComLstChng: 0
s5ChasComMaxSubs: 0
s5ChasComNumSubs: 0
s5ChasComOperState: normal
s5ChasComRelPos: 0
s5ChasComSerNum:
s5ChasComShortDescr: AC-DC-12V-300W
s5ChasComSubIndx: 0
s5ChasComTotalNumPorts: 0
s5ChasComType: 1.3.6.1.4.1.45.1.6.1.3.8.108.4
s5ChasComVer:
info: component 4.11.0/AC-DC-12V-300W status is normal (admin enable)

[COMP_4.12.0]
s5ChasComAdminState: enable
s5ChasComBaseNumPorts: 0
s5ChasComDescr: Unavailable
s5ChasComGroupMap: 0
s5ChasComGrpIndx: 4
s5ChasComIndx: 12
s5ChasComIpAddress: 0.0.0.0
s5ChasComLocation:
s5ChasComLstChng: 0
s5ChasComMaxSubs: 0
s5ChasComNumSubs: 0
s5ChasComOperState: notAvail
s5ChasComRelPos: 0
s5ChasComSerNum:
s5ChasComShortDescr: Unavailable
s5ChasComSubIndx: 0
s5ChasComTotalNumPorts: 0
s5ChasComType: 1.3.6.1.4.1.45.1.6.1.3.8.108.3
s5ChasComVer:
info: component 4.12.0/Unavailable status is notAvail (admin enable)

4.10 and 4.11 are correct powers working, and 4.12 I think that is the problme that reports incorrecte hardware health.

How can I make an exclusion or svole this problem to show correct hardware health in both machines?

Thanks

Error of initialized value

Hello,
I got a problem with the plugins , i compiled the last version ( 4.1) When i run the command:

./check_nwc_health --hostname cn2 --timeout 10 --community public --mode interface-usage --name TenGigabitEthernet1/5/3

here is the results:

Use of uninitialized value in sprintf at ./check_nwc_health line 1026.
Use of uninitialized value in division (/) at ./check_nwc_health line 1027.
Use of uninitialized value in sprintf at ./check_nwc_health line 1026.
Use of uninitialized value in division (/) at ./check_nwc_health line 1027.
Use of uninitialized value in multiplication () at ./check_nwc_health line 9798.
Use of uninitialized value in multiplication () at ./check_nwc_health line 9800.
Use of uninitialized value in division (/) at ./check_nwc_health line 9823.
Use of uninitialized value in division (/) at ./check_nwc_health line 9824.
OK - interface TenGigabitEthernet1/5/3 usage is in:0.00% (0.00Bits/s) out:0.00% (0.00Bits/s) | 'TenGigabitEthernet1/5/3_usage_in'=0%;80;90;0;100 'TenGigabitEthernet1/5/3_usage_out'=0%;80;90;0;100 'TenGigabitEthernet1/5/3_traffic_in'=0;800000000;900000000;0;1000000000 'TenGigabitEthernet1/5/3_traffic_out'=0;800000000;900000000;0;1000000000

And the usage is shown as 0,00% but I know that there is trafic.

Is there a way to fix that?
Thanks

Missing values while using mode interface-usage

I am using check_nwc_health 2.6.5 for checking cpu, memory and interface-usage on several Cisco devices, mainly Cisco ASA. It always looked like the plugin returned correct values for every host but today I found a firewall which has no outgoing traffic on the outside interface which isn't correct as my workmate said after he took a closer look at the firewall. Also the PNP graphs showed there never was any outgoing traffic on this interface.

$ ./check_nwc_health --hostname 10.10.10.10 --community public --mode interface-usage --name outside --regexp --verbose
I am a Cisco Adaptive Security Appliance Version 8.4(5)6
OK - interface Adaptive Security Appliance 'outside' interface usage is in:0.15% (1450753.00Bits/s) out:0.00% (0.00Bits/s)
checking interfaces
interface Adaptive Security Appliance 'outside' interface usage is in:0.15% (1450753.00Bits/s) out:0.00% (0.00Bits/s) | 'Adaptive Security Appliance 'outside' interface_usage_in'=0.15%;80;90 'Adaptive Security Appliance 'outside' interface_usage_out'=0%;80;90 'Adaptive Security Appliance 'outside' interface_traffic_in'=1450753 'Adaptive Security Appliance 'outside' interface_traffic_out'=0

I also performed a snmpwalk with check_nwc_health which I can mail to you if you want to.

Cisco devices <4Gb Memory

Hi,

When polling cisco devices with large memory pools the standard memory check returns an incorrect memory usage. Looking at the memory check subroutine it shows it's polling the traditional memory mib, ideally another check could be put to poll the enhanced memory mib. http://tools.cisco.com/Support/SNMP/do/BrowseMIB.do?local=en&step=2&mibName=CISCO-ENHANCED-MEMPOOL-MIB

Regards,

Miles

build issue

Hi ,
I am trying to build plugin under Ubuntu 14.04 64 i get following error during "make" below
Any idea how to solve it ?
Thanks
Making all in plugins-scripts make[1]: Entering directory/root/check_nwc_health/plugins-scripts'
make[1]: *** No rule to make target ../GLPlugin/lib/Monitoring/GLPlugin/SNMP/MibsAndOids/VRRPMIB.pm', needed bycheck_nwc_health'. Stop.
make[1]: Leaving directory /root/check_nwc_health/plugins-scripts' make: *** [all-recursive] Error 1

Cisco / wrong messages with powersupply Faulty

Hello,

i get a wrong status for some cisco devices with the "PS1 Faulty", nwc say that is critical, but at the IOS cli all locks fine.

[x]# ./check_nwc_health --mode hardware-health --hostname 10.x.8 --community XX
CRITICAL - powersupply 4034 (Sw4, PS1 Faulty, RPS NotExist) is notFunctioning | 'temp_1006'=24;60;;; 'temp_2006'=25;60;;; 'temp_3006'=24;60;;; 'temp_4006'=22;60;;;

debug check_nwc_health:
[x]# ./check_nwc_health -vvv --mode hardware-health --hostname 10.XXX.8 --community XXX
I am a Cisco IOS Software, C3750E Software (C3750E-IPBASEK9-M), Version 12.2(55)SE7, RELEASE SOFTWARE (fc1)
Technical Support: http://www.cisco.com/techsupport
Copyright (c) 1986-2013 by Cisco Systems, Inc.
Compiled Mon 28-Jan-13 09:55 by prod_rel_team
[FANSUBSYSTEM]
info: checking fans

[FAN_1035]
ciscoEnvMonFanState: normal
ciscoEnvMonFanStatusDescr: Switch#1, Fan#1
ciscoEnvMonFanStatusIndex: 1035
info: fan 1035 (Switch#1, Fan#1) is normal

[FAN_1036]
ciscoEnvMonFanState: normal
ciscoEnvMonFanStatusDescr: Switch#1, Fan#2
ciscoEnvMonFanStatusIndex: 1036
info: fan 1036 (Switch#1, Fan#2) is normal

[FAN_2035]
ciscoEnvMonFanState: normal
ciscoEnvMonFanStatusDescr: Switch#2, Fan#1
ciscoEnvMonFanStatusIndex: 2035
info: fan 2035 (Switch#2, Fan#1) is normal

[FAN_3035]
ciscoEnvMonFanState: normal
ciscoEnvMonFanStatusDescr: Switch#3, Fan#1
ciscoEnvMonFanStatusIndex: 3035
info: fan 3035 (Switch#3, Fan#1) is normal

[FAN_4035]
ciscoEnvMonFanState: normal
ciscoEnvMonFanStatusDescr: Switch#4, Fan#1
ciscoEnvMonFanStatusIndex: 4035
info: fan 4035 (Switch#4, Fan#1) is normal

[TEMPERATURESUBSYSTEM]
info: checking temperatures

[TEMPERATURE_1006]
ciscoEnvMonTemperatureLastShutdown: 0
ciscoEnvMonTemperatureState: normal
ciscoEnvMonTemperatureStatusDescr: SW#1, Sensor#1, GREEN
ciscoEnvMonTemperatureStatusIndex: 1006
ciscoEnvMonTemperatureStatusValue: 24
ciscoEnvMonTemperatureThreshold: 60
info: temperature 1006 SW#1, Sensor#1, GREEN is 24 (of 60 max = normal)

[TEMPERATURE_2006]
ciscoEnvMonTemperatureLastShutdown: 0
ciscoEnvMonTemperatureState: normal
ciscoEnvMonTemperatureStatusDescr: SW#2, Sensor#1, GREEN
ciscoEnvMonTemperatureStatusIndex: 2006
ciscoEnvMonTemperatureStatusValue: 25
ciscoEnvMonTemperatureThreshold: 60
info: temperature 2006 SW#2, Sensor#1, GREEN is 25 (of 60 max = normal)

[TEMPERATURE_3006]
ciscoEnvMonTemperatureLastShutdown: 0
ciscoEnvMonTemperatureState: normal
ciscoEnvMonTemperatureStatusDescr: SW#3, Sensor#1, GREEN
ciscoEnvMonTemperatureStatusIndex: 3006
ciscoEnvMonTemperatureStatusValue: 24
ciscoEnvMonTemperatureThreshold: 60
info: temperature 3006 SW#3, Sensor#1, GREEN is 24 (of 60 max = normal)

[TEMPERATURE_4006]
ciscoEnvMonTemperatureLastShutdown: 0
ciscoEnvMonTemperatureState: normal
ciscoEnvMonTemperatureStatusDescr: SW#4, Sensor#1, GREEN
ciscoEnvMonTemperatureStatusIndex: 4006
ciscoEnvMonTemperatureStatusValue: 22
ciscoEnvMonTemperatureThreshold: 60
info: temperature 4006 SW#4, Sensor#1, GREEN is 22 (of 60 max = normal)

[VOLTAGESUBSYSTEM]
info: checking voltages

[POWERSUPPLYSUBSYSTEM]
info: checking supplies

[POWERSUPPLY_1034]
ciscoEnvMonSupplySource: 2
ciscoEnvMonSupplyState: normal
ciscoEnvMonSupplyStatusDescr: Sw1, PS1 Normal, RPS NotExist
ciscoEnvMonSupplyStatusIndex: 1034
info: powersupply 1034 (Sw1, PS1 Normal, RPS NotExist) is normal

[POWERSUPPLY_2034]
ciscoEnvMonSupplySource: 2
ciscoEnvMonSupplyState: normal
ciscoEnvMonSupplyStatusDescr: Sw2, PS1 Normal, RPS NotExist
ciscoEnvMonSupplyStatusIndex: 2034
info: powersupply 2034 (Sw2, PS1 Normal, RPS NotExist) is normal

[POWERSUPPLY_3034]
ciscoEnvMonSupplySource: 2
ciscoEnvMonSupplyState: normal
ciscoEnvMonSupplyStatusDescr: Sw3, PS1 Normal, RPS NotExist
ciscoEnvMonSupplyStatusIndex: 3034
info: powersupply 3034 (Sw3, PS1 Normal, RPS NotExist) is normal

[POWERSUPPLY_4034]
ciscoEnvMonSupplySource: 2
ciscoEnvMonSupplyState: notFunctioning
ciscoEnvMonSupplyStatusDescr: Sw4, PS1 Faulty, RPS NotExist
ciscoEnvMonSupplyStatusIndex: 4034
info: powersupply 4034 (Sw4, PS1 Faulty, RPS NotExist) is notFunctioning

CRITICAL - powersupply 4034 (Sw4, PS1 Faulty, RPS NotExist) is notFunctioning
checking fans
fan 1035 (Switch#1, Fan#1) is normal
fan 1036 (Switch#1, Fan#2) is normal
fan 2035 (Switch#2, Fan#1) is normal
fan 3035 (Switch#3, Fan#1) is normal
fan 4035 (Switch#4, Fan#1) is normal
checking temperatures
temperature 1006 SW#1, Sensor#1, GREEN is 24 (of 60 max = normal)
temperature 2006 SW#2, Sensor#1, GREEN is 25 (of 60 max = normal)
temperature 3006 SW#3, Sensor#1, GREEN is 24 (of 60 max = normal)
temperature 4006 SW#4, Sensor#1, GREEN is 22 (of 60 max = normal)
checking voltages
checking supplies
powersupply 1034 (Sw1, PS1 Normal, RPS NotExist) is normal
powersupply 2034 (Sw2, PS1 Normal, RPS NotExist) is normal
powersupply 3034 (Sw3, PS1 Normal, RPS NotExist) is normal
powersupply 4034 (Sw4, PS1 Faulty, RPS NotExist) is notFunctioning | 'temp_1006'=24;60;;; 'temp_2006'=25;60;;; 'temp_3006'=24;60;;; 'temp_4006'=22;60;;;

At the switch cli looks fine:

MXXX8#sh env power all
SW PID Serial# Status Sys Pwr PoE Pwr Watts

1A C3KX-PWR-715WAC LITXFD OK Good Good 715/0
1B Not Present
2A C3KX-PWR-715WAC LIXBPG OK Good Good 715/0
2B Not Present
3A C3KX-PWR-715WAC LITXNE OK Good Good 715/0
3B Not Present
4A C3KX-PWR-715WAC LITXQ7 OK Good Good 715/0
4B Not Present

MXXX8#sh env power switch 4
SW PID Serial# Status Sys Pwr PoE Pwr Watts

4A C3KX-PWR-715WAC LITX7 OK Good Good 715/0
4B Not Present

MXXX8#sh env rps
SW Status RPS Name RPS Serial# RPS Port#

1 Not Present <>
2 Not Present <>
3 Not Present <>
4 Not Present <>

MXXX8#sh env all
FAN 1 is OK
FAN 2 is OK
FAN PS-1 is OK
FAN PS-2 is NOT PRESENT
TEMPERATURE is OK
Temperature Value: 24 Degree Celsius
Temperature State: GREEN
Yellow Threshold : 46 Degree Celsius
Red Threshold : 60 Degree Celsius
SW PID Serial# Status Sys Pwr PoE Pwr Watts

1A C3KX-PWR-715WAC LXXD OK Good Good 715/0
1B Not Present
2A C3KX-PWR-715WAC LIXG OK Good Good 715/0
2B Not Present
3A C3KX-PWR-715WAC LIXE OK Good Good 715/0
3B Not Present
4A C3KX-PWR-715WAC LIX7 OK Good Good 715/0
4B Not Present

SW Status RPS Name RPS Serial# RPS Port#

1 Not Present <>
2 Not Present <>
3 Not Present <>
4 Not Present <>

HW: Cisco WS-C3750X-24P
IOS: Version 12.2(55)SE7

Option --selectedthresholds not available

I just checked the documentation and compiled the latest version for check_nwc_health (v5.3) but the --selectedthresholds option is not available for use.

Cisco Stack Switch: Defect of switch #1 not detected

On a stacked Cisco switch (two switches stacked together) switch 1 failed. But hardware-health check didn't detect it. Instead it only shows OK for the values of switch 2.

# /usr/lib/nagios/plugins/check_nwc_health --hostname stackswitch --community public --mode hardware-health -v -v -v
I am a Cisco IOS Software, IOS-XE Software, Catalyst L3 Switch Software (CAT3K_CAA-UNIVERSALK9-M), Version 03.03.05SE RELEASE SOFTWARE (fc1)
Technical Support: http://www.cisco.com/techsupport
Copyright (c) 1986-2014 by Cisco Systems, Inc.
Compiled Thu 30-Oct
[FANSUBSYSTEM]
info: checking fans

[FAN_1010]
ciscoEnvMonFanState: normal
ciscoEnvMonFanStatusDescr: , Normal
ciscoEnvMonFanStatusIndex: 1010
info: fan 1010 (, Normal) is normal

[FAN_1011]
ciscoEnvMonFanState: normal
ciscoEnvMonFanStatusDescr: , Normal
ciscoEnvMonFanStatusIndex: 1011
info: fan 1011 (, Normal) is normal

[FAN_1012]
ciscoEnvMonFanState: normal
ciscoEnvMonFanStatusDescr: , Normal
ciscoEnvMonFanStatusIndex: 1012
info: fan 1012 (, Normal) is normal

[FAN_2010]
ciscoEnvMonFanState: normal
ciscoEnvMonFanStatusDescr: Switch 2 - FAN 1, Normal
ciscoEnvMonFanStatusIndex: 2010
info: fan 2010 (Switch 2 - FAN 1, Normal) is normal

[FAN_2011]
ciscoEnvMonFanState: normal
ciscoEnvMonFanStatusDescr: Switch 2 - FAN 2, Normal
ciscoEnvMonFanStatusIndex: 2011
info: fan 2011 (Switch 2 - FAN 2, Normal) is normal

[FAN_2012]
ciscoEnvMonFanState: normal
ciscoEnvMonFanStatusDescr: Switch 2 - FAN 3, Normal
ciscoEnvMonFanStatusIndex: 2012
info: fan 2012 (Switch 2 - FAN 3, Normal) is normal

[TEMPERATURESUBSYSTEM]
info: checking temperatures

[TEMPERATURE_1006]
ciscoEnvMonTemperatureLastShutdown: 0
ciscoEnvMonTemperatureState: normal
ciscoEnvMonTemperatureStatusDescr: , GREEN 
ciscoEnvMonTemperatureStatusIndex: 1006
ciscoEnvMonTemperatureStatusValue: 30
ciscoEnvMonTemperatureThreshold: 56
info: temperature 1006 , GREEN  is 30 (of 56 max = normal)

[TEMPERATURE_1007]
ciscoEnvMonTemperatureLastShutdown: 0
ciscoEnvMonTemperatureState: normal
ciscoEnvMonTemperatureStatusDescr: , GREEN 
ciscoEnvMonTemperatureStatusIndex: 1007
ciscoEnvMonTemperatureStatusValue: 35
ciscoEnvMonTemperatureThreshold: 125
info: temperature 1007 , GREEN  is 35 (of 125 max = normal)

[TEMPERATURE_1008]
ciscoEnvMonTemperatureLastShutdown: 0
ciscoEnvMonTemperatureState: normal
ciscoEnvMonTemperatureStatusDescr: , GREEN 
ciscoEnvMonTemperatureStatusIndex: 1008
ciscoEnvMonTemperatureStatusValue: 39
ciscoEnvMonTemperatureThreshold: 125
info: temperature 1008 , GREEN  is 39 (of 125 max = normal)

[TEMPERATURE_2006]
ciscoEnvMonTemperatureLastShutdown: 0
ciscoEnvMonTemperatureState: normal
ciscoEnvMonTemperatureStatusDescr: Switch 2 - Temp Sensor 0, GREEN 
ciscoEnvMonTemperatureStatusIndex: 2006
ciscoEnvMonTemperatureStatusValue: 30
ciscoEnvMonTemperatureThreshold: 0
info: temperature 2006 Switch 2 - Temp Sensor 0, GREEN  is too high (30 of 0 max = normal)

[TEMPERATURE_2007]
ciscoEnvMonTemperatureLastShutdown: 0
ciscoEnvMonTemperatureState: normal
ciscoEnvMonTemperatureStatusDescr: Switch 2 - Temp Sensor 1, GREEN 
ciscoEnvMonTemperatureStatusIndex: 2007
ciscoEnvMonTemperatureStatusValue: 35
ciscoEnvMonTemperatureThreshold: 0
info: temperature 2007 Switch 2 - Temp Sensor 1, GREEN  is too high (35 of 0 max = normal)

[TEMPERATURE_2008]
ciscoEnvMonTemperatureLastShutdown: 0
ciscoEnvMonTemperatureState: normal
ciscoEnvMonTemperatureStatusDescr: Switch 2 - Temp Sensor 2, GREEN 
ciscoEnvMonTemperatureStatusIndex: 2008
ciscoEnvMonTemperatureStatusValue: 40
ciscoEnvMonTemperatureThreshold: 0
info: temperature 2008 Switch 2 - Temp Sensor 2, GREEN  is too high (40 of 0 max = normal)

[VOLTAGESUBSYSTEM]
info: checking voltages

[POWERSUPPLYSUBSYSTEM]
info: checking supplies

[POWERSUPPLY_1009]
ciscoEnvMonSupplySource: 1
ciscoEnvMonSupplyState: normal
ciscoEnvMonSupplyStatusDescr: , Normal
ciscoEnvMonSupplyStatusIndex: 1009
info: powersupply 1009 (, Normal) is normal

[POWERSUPPLY_2009]
ciscoEnvMonSupplySource: 2
ciscoEnvMonSupplyState: normal
ciscoEnvMonSupplyStatusDescr: Switch 2 - Power Supply A, Normal
ciscoEnvMonSupplyStatusIndex: 2009
info: powersupply 2009 (Switch 2 - Power Supply A, Normal) is normal

OK - environmental hardware working fine
checking fans
fan 1010 (, Normal) is normal
fan 1011 (, Normal) is normal
fan 1012 (, Normal) is normal
fan 2010 (Switch 2 - FAN 1, Normal) is normal
fan 2011 (Switch 2 - FAN 2, Normal) is normal
fan 2012 (Switch 2 - FAN 3, Normal) is normal
checking temperatures
temperature 1006 , GREEN  is 30 (of 56 max = normal)
temperature 1007 , GREEN  is 35 (of 125 max = normal)
temperature 1008 , GREEN  is 39 (of 125 max = normal)
temperature 2006 Switch 2 - Temp Sensor 0, GREEN  is too high (30 of 0 max = normal)
temperature 2007 Switch 2 - Temp Sensor 1, GREEN  is too high (35 of 0 max = normal)
temperature 2008 Switch 2 - Temp Sensor 2, GREEN  is too high (40 of 0 max = normal)
checking voltages
checking supplies
powersupply 1009 (, Normal) is normal
powersupply 2009 (Switch 2 - Power Supply A, Normal) is normal | 'temp_1006'=30;56;;; 'temp_1007'=35;125;;; 'temp_1008'=39;125;;; 'temp_2006'=30;;;; 'temp_2007'=35;;;; 'temp_2008'=40;;;;

Two minor issues since 5.3.2 on SRX and F5

Hi,

Noticed two small issues since version 5.3.2.

5.3.2 on F5:

./check_nwc_health --host 192.168.206.131 --mode interface-health --community public
OK - mgmt is up/up, interface mgmt usage is in:0.00% (0.00bit/s) out:0.00% (0.00bit/s), interface mgmt errors in:0.00/s out:0.00/s , interface mgmt discards in:0.00/s out:0.00/s , 1.1 is up/up, interface 1.1 usage is in:0.00% (14824.00bit/s) out:0.00% (1928.00bit/s), interface 1.1 errors in:0.00/s out:0.00/s , interface 1.1 discards in:0.00/s out:0.00/s , 1.2 is notPresent/up, 1.3 is notPresent/up, /Common/test is up/up, interface /Common/test usage is in:0.00% (16736.00bit/s) out:0.00% (16736.00bit/s), interface /Common/test errors in:0.00/s out:0.00/s , interface /Common/test discards in:0.00/s out:0.00/s | 'mgmt_usage_in'=0%;80;90;0;100 'mgmt_usage_out'=0%;80;90;0;100 'mgmt_traffic_in'=0;80000000;90000000;0;100000000 'mgmt_traffic_out'=0;80000000;90000000;0;100000000 'mgmt_errors_in'=0;1;10;; 'mgmt_errors_out'=0;1;10;; 'mgmt_discards_in'=0;1;10;; 'mgmt_discards_out'=0;1;10;; '1.1_usage_in'=0%;80;90;0;100 '1.1_usage_out'=0%;80;90;0;100 '1.1_traffic_in'=14824;0;0;0;0 '1.1_traffic_out'=1928;0;0;0;0 '1.1_errors_in'=0;1;10;; '1.1_errors_out'=0;1;10;; '1.1_discards_in'=0;1;10;; '1.1_discards_out'=0;1;10;; '/Common/test_usage_in'=0%;80;90;0;100 '/Common/test_usage_out'=0%;80;90;0;100 '/Common/test_traffic_in'=16736;0;0;0;0 '/Common/test_traffic_out'=16736;0;0;0;0 '/Common/test_errors_in'=0;1;10;; '/Common/test_errors_out'=0;1;10;; '/Common/test_discards_in'=0;1;10;; '/Common/test_discards_out'=0;1;10;;

5.4 on F5:

[root@icinga plugins]# ./check_nwc_health --host 192.168.206.131 --mode interface-health --community public
Mode interface-health is not implemented for this type of device

5.3.2 on SRX:

./check_nwc_health --host 192.168.206.30 --mode interface-health --community public
CRITICAL - fault condition is presumed to exist on vlan, lsi is up/up, interface lsi usage is in:0.00% (0.00bit/s) out:0.00% (0.00bit/s), interface lsi errors in:0.00/s out:0.00/s , interface lsi discards in:0.00/s out:0.00/s , dsc is up/up, interface dsc usage is in:0.00% (0.00bit/s) out:0.00% (0.00bit/s), interface dsc errors in:0.00/s out:0.00/s , interface dsc discards in:0.00/s out:0.00/s , lo0 is up/up, interface lo0 usage is in:0.00% (0.01bit/s) out:0.00% (0.01bit/s), interface lo0 errors in:0.00/s out:0.00/s , interface lo0 discards in:0.00/s out:0.00/s , tap is up/up, interface tap usage is in:0.00% (0.00bit/s) out:0.00% (0.00bit/s), interface tap errors in:0.00/s out:0.00/s , interface tap discards in:0.00/s out:0.00/s , gre is up/up, interface gre usage is in:0.00% (0.00bit/s) out:0.00% (0.00bit/s), interface gre errors in:0.00/s out:0.00/s , interface gre discards in:0.00/s out:0.00/s , ipip is up/up, interface ipip usage is in:0.00% (0.00bit/s) out:0.00% (0.00bit/s), interface ipip errors in:0.00/s out:0.00/s , interface ipip discards in:0.00/s out:0.00/s , pime is up/up, interface pime usage is in:0.00% (0.00bit/s) out:0.00% (0.00bit/s), interface pime errors in:0.00/s out:0.00/s , interface pime discards in:0.00/s out:0.00/s , pimd is up/up, interface pimd usage is in:0.00% (0.00bit/s) out:0.00% (0.00bit/s), interface pimd errors in:0.00/s out:0.00/s , interface pimd discards in:0.00/s out:0.00/s , mtun is up/up, interface mtun usage is in:0.00% (0.00bit/s) out:0.00% (0.00bit/s), interface mtun errors in:0.00/s out:0.00/s , interface mtun discards in:0.00/s out:0.00/s , lo0.16384 is up/up, interface lo0.16384 usage is in:0.00% (0.00bit/s) out:0.00% (0.00bit/s), interface lo0.16384 errors in:0.00/s out:0.00/s , interface lo0.16384 discards in:0.00/s out:0.00/s , lo0.16385 is up/up, interface lo0.16385 usage is in:0.00% (0.00bit/s) out:0.00% (0.00bit/s), interface lo0.16385 errors in:0.00/s out:0.00/s , interface lo0.16385 discards in:0.00/s out:0.00/s , lo0.32768 is up/up, interface lo0.32768 usage is in:0.00% (0.00bit/s) out:0.00% (0.00bit/s), interface lo0.32768 errors in:0.00/s out:0.00/s , interface lo0.32768 discards in:0.00/s out:0.00/s , pp0 is up/up, interface pp0 usage is in:0.00% (0.00bit/s) out:0.00% (0.00bit/s), interface pp0 errors in:0.00/s out:0.00/s , interface pp0 discards in:0.00/s out:0.00/s , irb is up/up, interface irb usage is in:0.00% (0.00bit/s) out:0.00% (0.00bit/s), interface irb errors in:0.00/s out:0.00/s , interface irb discards in:0.00/s out:0.00/s , st0 is up/up, interface st0 usage is in:0.00% (0.00bit/s) out:0.00% (0.00bit/s), interface st0 errors in:0.00/s out:0.00/s , interface st0 discards in:0.00/s out:0.00/s , ppd0 is up/up, interface ppd0 usage is in:0.00% (0.00bit/s) out:0.00% (0.00bit/s), interface ppd0 errors in:0.00/s out:0.00/s , interface ppd0 discards in:0.00/s out:0.00/s , ppe0 is up/up, interface ppe0 usage is in:0.00% (0.00bit/s) out:0.00% (0.00bit/s), interface ppe0 errors in:0.00/s out:0.00/s , interface ppe0 discards in:0.00/s out:0.00/s , vlan is down/up, ge-0/0/0 is up/up, interface ge-0/0/0 usage is in:0.00% (6666.67bit/s) out:0.00% (18380.67bit/s), interface ge-0/0/0 errors in:0.00/s out:0.00/s , interface ge-0/0/0 discards in:0.00/s out:0.00/s , ge-0/0/1 is up/up, interface ge-0/0/1 usage is in:0.00% (498.67bit/s) out:0.00% (56.00bit/s), interface ge-0/0/1 errors in:0.00/s out:0.00/s , interface ge-0/0/1 discards in:0.00/s out:0.00/s , sp-0/0/0 is up/up, interface sp-0/0/0 usage is in:0.00% (0.00bit/s) out:0.00% (0.00bit/s), interface sp-0/0/0 errors in:0.00/s out:0.00/s , interface sp-0/0/0 discards in:0.00/s out:0.00/s , gr-0/0/0 is up/up, interface gr-0/0/0 usage is in:0.00% (0.00bit/s) out:0.00% (0.00bit/s), interface gr-0/0/0 errors in:0.00/s out:0.00/s , interface gr-0/0/0 discards in:0.00/s out:0.00/s , ip-0/0/0 is up/up, interface ip-0/0/0 usage is in:0.00% (0.00bit/s) out:0.00% (0.00bit/s), interface ip-0/0/0 errors in:0.00/s out:0.00/s , interface ip-0/0/0 discards in:0.00/s out:0.00/s , lsq-0/0/0 is up/up, interface lsq-0/0/0 usage is in:0.00% (0.00bit/s) out:0.00% (0.00bit/s), interface lsq-0/0/0 errors in:0.00/s out:0.00/s , interface lsq-0/0/0 discards in:0.00/s out:0.00/s , mt-0/0/0 is up/up, interface mt-0/0/0 usage is in:0.00% (0.00bit/s) out:0.00% (0.00bit/s), interface mt-0/0/0 errors in:0.00/s out:0.00/s , interface mt-0/0/0 discards in:0.00/s out:0.00/s , lt-0/0/0 is up/up, interface lt-0/0/0 usage is in:0.00% (0.00bit/s) out:0.00% (0.00bit/s), interface lt-0/0/0 errors in:0.00/s out:0.00/s , interface lt-0/0/0 discards in:0.00/s out:0.00/s , ge-0/0/0.0 is up/up, interface ge-0/0/0.0 usage is in:0.00% (12918.67bit/s) out:0.00% (35375.33bit/s), interface ge-0/0/0.0 errors in:0.00/s out:0.00/s , interface ge-0/0/0.0 discards in:0.00/s out:0.00/s , sp-0/0/0.0 is up/up, interface sp-0/0/0.0 usage is in:0.00% (0.00bit/s) out:0.00% (0.00bit/s), interface sp-0/0/0.0 errors in:0.00/s out:0.00/s , interface sp-0/0/0.0 discards in:0.00/s out:0.00/s , sp-0/0/0.16383 is up/up, interface sp-0/0/0.16383 usage is in:0.00% (0.00bit/s) out:0.00% (0.00bit/s), interface sp-0/0/0.16383 errors in:0.00/s out:0.00/s , interface sp-0/0/0.16383 discards in:0.00/s out:0.00/s , ge-0/0/1.0 is up/up, interface ge-0/0/1.0 usage is in:0.00% (509.33bit/s) out:0.00% (106.67bit/s), interface ge-0/0/1.0 errors in:0.00/s out:0.00/s , interface ge-0/0/1.0 discards in:0.00/s out:0.00/s , ge-0/0/2 is up/up, interface ge-0/0/2 usage is in:0.00% (0.00bit/s) out:0.00% (0.00bit/s), interface ge-0/0/2 errors in:0.00/s out:0.00/s , interface ge-0/0/2 discards in:0.00/s out:0.00/s , ge-0/0/2.0 is up/up, interface ge-0/0/2.0 usage is in:0.00% (50.67bit/s) out:0.00% (50.67bit/s), interface ge-0/0/2.0 errors in:0.00/s out:0.00/s , interface ge-0/0/2.0 discards in:0.00/s out:0.00/s , ip-0/0/0.0 is up/up, interface ip-0/0/0.0 usage is in:0.00% (0.00bit/s) out:0.00% (0.00bit/s), interface ip-0/0/0.0 errors in:0.00/s out:0.00/s , interface ip-0/0/0.0 discards in:0.00/s out:0.00/s | 'lsi_usage_in'=0%;80;90;0;100 'lsi_usage_out'=0%;80;90;0;100 'lsi_traffic_in'=0;0;0;0;0 'lsi_traffic_out'=0;0;0;0;0 'lsi_errors_in'=0;1;10;; 'lsi_errors_out'=0;1;10;; 'lsi_discards_in'=0;1;10;; 'lsi_discards_out'=0;1;10;; 'dsc_usage_in'=0%;80;90;0;100 'dsc_usage_out'=0%;80;90;0;100 'dsc_traffic_in'=0;0;0;0;0 'dsc_traffic_out'=0;0;0;0;0 'dsc_errors_in'=0;1;10;; 'dsc_errors_out'=0;1;10;; 'dsc_discards_in'=0;1;10;; 'dsc_discards_out'=0;1;10;; 'lo0_usage_in'=0%;80;90;0;100 'lo0_usage_out'=0%;80;90;0;100 'lo0_traffic_in'=0.01;0;0;0;0 'lo0_traffic_out'=0.01;0;0;0;0 'lo0_errors_in'=0;1;10;; 'lo0_errors_out'=0;1;10;; 'lo0_discards_in'=0;1;10;; 'lo0_discards_out'=0;1;10;; 'tap_usage_in'=0%;80;90;0;100 'tap_usage_out'=0%;80;90;0;100 'tap_traffic_in'=0;0;0;0;0 'tap_traffic_out'=0;0;0;0;0 'tap_errors_in'=0;1;10;; 'tap_errors_out'=0;1;10;; 'tap_discards_in'=0;1;10;; 'tap_discards_out'=0;1;10;; 'gre_usage_in'=0%;80;90;0;100 'gre_usage_out'=0%;80;90;0;100 'gre_traffic_in'=0;0;0;0;0 'gre_traffic_out'=0;0;0;0;0 'gre_errors_in'=0;1;10;; 'gre_errors_out'=0;1;10;; 'gre_discards_in'=0;1;10;; 'gre_discards_out'=0;1;10;; 'ipip_usage_in'=0%;80;90;0;100 'ipip_usage_out'=0%;80;90;0;100 'ipip_traffic_in'=0;0;0;0;0 'ipip_traffic_out'=0;0;0;0;0 'ipip_errors_in'=0;1;10;; 'ipip_errors_out'=0;1;10;; 'ipip_discards_in'=0;1;10;; 'ipip_discards_out'=0;1;10;; 'pime_usage_in'=0%;80;90;0;100 'pime_usage_out'=0%;80;90;0;100 'pime_traffic_in'=0;0;0;0;0 'pime_traffic_out'=0;0;0;0;0 'pime_errors_in'=0;1;10;; 'pime_errors_out'=0;1;10;; 'pime_discards_in'=0;1;10;; 'pime_discards_out'=0;1;10;; 'pimd_usage_in'=0%;80;90;0;100 'pimd_usage_out'=0%;80;90;0;100 'pimd_traffic_in'=0;0;0;0;0 'pimd_traffic_out'=0;0;0;0;0 'pimd_errors_in'=0;1;10;; 'pimd_errors_out'=0;1;10;; 'pimd_discards_in'=0;1;10;; 'pimd_discards_out'=0;1;10;; 'mtun_usage_in'=0%;80;90;0;100 'mtun_usage_out'=0%;80;90;0;100 'mtun_traffic_in'=0;0;0;0;0 'mtun_traffic_out'=0;0;0;0;0 'mtun_errors_in'=0;1;10;; 'mtun_errors_out'=0;1;10;; 'mtun_discards_in'=0;1;10;; 'mtun_discards_out'=0;1;10;; 'lo0.16384_usage_in'=0%;80;90;0;100 'lo0.16384_usage_out'=0%;80;90;0;100 'lo0.16384_traffic_in'=0;0;0;0;0 'lo0.16384_traffic_out'=0;0;0;0;0 'lo0.16384_errors_in'=0;1;10;; 'lo0.16384_errors_out'=0;1;10;; 'lo0.16384_discards_in'=0;1;10;; 'lo0.16384_discards_out'=0;1;10;; 'lo0.16385_usage_in'=0%;80;90;0;100 'lo0.16385_usage_out'=0%;80;90;0;100 'lo0.16385_traffic_in'=0;0;0;0;0 'lo0.16385_traffic_out'=0;0;0;0;0 'lo0.16385_errors_in'=0;1;10;; 'lo0.16385_errors_out'=0;1;10;; 'lo0.16385_discards_in'=0;1;10;; 'lo0.16385_discards_out'=0;1;10;; 'lo0.32768_usage_in'=0%;80;90;0;100 'lo0.32768_usage_out'=0%;80;90;0;100 'lo0.32768_traffic_in'=0;0;0;0;0 'lo0.32768_traffic_out'=0;0;0;0;0 'lo0.32768_errors_in'=0;1;10;; 'lo0.32768_errors_out'=0;1;10;; 'lo0.32768_discards_in'=0;1;10;; 'lo0.32768_discards_out'=0;1;10;; 'pp0_usage_in'=0%;80;90;0;100 'pp0_usage_out'=0%;80;90;0;100 'pp0_traffic_in'=0;0;0;0;0 'pp0_traffic_out'=0;0;0;0;0 'pp0_errors_in'=0;1;10;; 'pp0_errors_out'=0;1;10;; 'pp0_discards_in'=0;1;10;; 'pp0_discards_out'=0;1;10;; 'irb_usage_in'=0%;80;90;0;100 'irb_usage_out'=0%;80;90;0;100 'irb_traffic_in'=0;0;0;0;0 'irb_traffic_out'=0;0;0;0;0 'irb_errors_in'=0;1;10;; 'irb_errors_out'=0;1;10;; 'irb_discards_in'=0;1;10;; 'irb_discards_out'=0;1;10;; 'st0_usage_in'=0%;80;90;0;100 'st0_usage_out'=0%;80;90;0;100 'st0_traffic_in'=0;0;0;0;0 'st0_traffic_out'=0;0;0;0;0 'st0_errors_in'=0;1;10;; 'st0_errors_out'=0;1;10;; 'st0_discards_in'=0;1;10;; 'st0_discards_out'=0;1;10;; 'ppd0_usage_in'=0%;80;90;0;100 'ppd0_usage_out'=0%;80;90;0;100 'ppd0_traffic_in'=0;640000000;720000000;0;800000000 'ppd0_traffic_out'=0;640000000;720000000;0;800000000 'ppd0_errors_in'=0;1;10;; 'ppd0_errors_out'=0;1;10;; 'ppd0_discards_in'=0;1;10;; 'ppd0_discards_out'=0;1;10;; 'ppe0_usage_in'=0%;80;90;0;100 'ppe0_usage_out'=0%;80;90;0;100 'ppe0_traffic_in'=0;640000000;720000000;0;800000000 'ppe0_traffic_out'=0;640000000;720000000;0;800000000 'ppe0_errors_in'=0;1;10;; 'ppe0_errors_out'=0;1;10;; 'ppe0_discards_in'=0;1;10;; 'ppe0_discards_out'=0;1;10;; 'ge-0/0/0_usage_in'=0.00%;80;90;0;100 'ge-0/0/0_usage_out'=0.00%;80;90;0;100 'ge-0/0/0_traffic_in'=6666.67;800000000;900000000;0;1000000000 'ge-0/0/0_traffic_out'=18380.67;800000000;900000000;0;1000000000 'ge-0/0/0_errors_in'=0;1;10;; 'ge-0/0/0_errors_out'=0;1;10;; 'ge-0/0/0_discards_in'=0;1;10;; 'ge-0/0/0_discards_out'=0;1;10;; 'ge-0/0/1_usage_in'=0.00%;80;90;0;100 'ge-0/0/1_usage_out'=0.00%;80;90;0;100 'ge-0/0/1_traffic_in'=498.67;800000000;900000000;0;1000000000 'ge-0/0/1_traffic_out'=56;800000000;900000000;0;1000000000 'ge-0/0/1_errors_in'=0;1;10;; 'ge-0/0/1_errors_out'=0;1;10;; 'ge-0/0/1_discards_in'=0;1;10;; 'ge-0/0/1_discards_out'=0;1;10;; 'sp-0/0/0_usage_in'=0%;80;90;0;100 'sp-0/0/0_usage_out'=0%;80;90;0;100 'sp-0/0/0_traffic_in'=0;640000000;720000000;0;800000000 'sp-0/0/0_traffic_out'=0;640000000;720000000;0;800000000 'sp-0/0/0_errors_in'=0;1;10;; 'sp-0/0/0_errors_out'=0;1;10;; 'sp-0/0/0_discards_in'=0;1;10;; 'sp-0/0/0_discards_out'=0;1;10;; 'gr-0/0/0_usage_in'=0%;80;90;0;100 'gr-0/0/0_usage_out'=0%;80;90;0;100 'gr-0/0/0_traffic_in'=0;640000000;720000000;0;800000000 'gr-0/0/0_traffic_out'=0;640000000;720000000;0;800000000 'gr-0/0/0_errors_in'=0;1;10;; 'gr-0/0/0_errors_out'=0;1;10;; 'gr-0/0/0_discards_in'=0;1;10;; 'gr-0/0/0_discards_out'=0;1;10;; 'ip-0/0/0_usage_in'=0%;80;90;0;100 'ip-0/0/0_usage_out'=0%;80;90;0;100 'ip-0/0/0_traffic_in'=0;640000000;720000000;0;800000000 'ip-0/0/0_traffic_out'=0;640000000;720000000;0;800000000 'ip-0/0/0_errors_in'=0;1;10;; 'ip-0/0/0_errors_out'=0;1;10;; 'ip-0/0/0_discards_in'=0;1;10;; 'ip-0/0/0_discards_out'=0;1;10;; 'lsq-0/0/0_usage_in'=0%;80;90;0;100 'lsq-0/0/0_usage_out'=0%;80;90;0;100 'lsq-0/0/0_traffic_in'=0;497664000;559872000;0;622080000 'lsq-0/0/0_traffic_out'=0;497664000;559872000;0;622080000 'lsq-0/0/0_errors_in'=0;1;10;; 'lsq-0/0/0_errors_out'=0;1;10;; 'lsq-0/0/0_discards_in'=0;1;10;; 'lsq-0/0/0_discards_out'=0;1;10;; 'mt-0/0/0_usage_in'=0%;80;90;0;100 'mt-0/0/0_usage_out'=0%;80;90;0;100 'mt-0/0/0_traffic_in'=0;640000000;720000000;0;800000000 'mt-0/0/0_traffic_out'=0;640000000;720000000;0;800000000 'mt-0/0/0_errors_in'=0;1;10;; 'mt-0/0/0_errors_out'=0;1;10;; 'mt-0/0/0_discards_in'=0;1;10;; 'mt-0/0/0_discards_out'=0;1;10;; 'lt-0/0/0_usage_in'=0%;80;90;0;100 'lt-0/0/0_usage_out'=0%;80;90;0;100 'lt-0/0/0_traffic_in'=0;640000000;720000000;0;800000000 'lt-0/0/0_traffic_out'=0;640000000;720000000;0;800000000 'lt-0/0/0_errors_in'=0;1;10;; 'lt-0/0/0_errors_out'=0;1;10;; 'lt-0/0/0_discards_in'=0;1;10;; 'lt-0/0/0_discards_out'=0;1;10;; 'ge-0/0/0.0_usage_in'=0.00%;80;90;0;100 'ge-0/0/0.0_usage_out'=0.00%;80;90;0;100 'ge-0/0/0.0_traffic_in'=12918.67;800000000;900000000;0;1000000000 'ge-0/0/0.0_traffic_out'=35375.33;800000000;900000000;0;1000000000 'ge-0/0/0.0_errors_in'=0;1;10;; 'ge-0/0/0.0_errors_out'=0;1;10;; 'ge-0/0/0.0_discards_in'=0;1;10;; 'ge-0/0/0.0_discards_out'=0;1;10;; 'sp-0/0/0.0_usage_in'=0%;80;90;0;100 'sp-0/0/0.0_usage_out'=0%;80;90;0;100 'sp-0/0/0.0_traffic_in'=0;640000000;720000000;0;800000000 'sp-0/0/0.0_traffic_out'=0;640000000;720000000;0;800000000 'sp-0/0/0.0_errors_in'=0;1;10;; 'sp-0/0/0.0_errors_out'=0;1;10;; 'sp-0/0/0.0_discards_in'=0;1;10;; 'sp-0/0/0.0_discards_out'=0;1;10;; 'sp-0/0/0.16383_usage_in'=0%;80;90;0;100 'sp-0/0/0.16383_usage_out'=0%;80;90;0;100 'sp-0/0/0.16383_traffic_in'=0;640000000;720000000;0;800000000 'sp-0/0/0.16383_traffic_out'=0;640000000;720000000;0;800000000 'sp-0/0/0.16383_errors_in'=0;1;10;; 'sp-0/0/0.16383_errors_out'=0;1;10;; 'sp-0/0/0.16383_discards_in'=0;1;10;; 'sp-0/0/0.16383_discards_out'=0;1;10;; 'ge-0/0/1.0_usage_in'=0.00%;80;90;0;100 'ge-0/0/1.0_usage_out'=0.00%;80;90;0;100 'ge-0/0/1.0_traffic_in'=509.33;800000000;900000000;0;1000000000 'ge-0/0/1.0_traffic_out'=106.67;800000000;900000000;0;1000000000 'ge-0/0/1.0_errors_in'=0;1;10;; 'ge-0/0/1.0_errors_out'=0;1;10;; 'ge-0/0/1.0_discards_in'=0;1;10;; 'ge-0/0/1.0_discards_out'=0;1;10;; 'ge-0/0/2_usage_in'=0%;80;90;0;100 'ge-0/0/2_usage_out'=0%;80;90;0;100 'ge-0/0/2_traffic_in'=0;800000000;900000000;0;1000000000 'ge-0/0/2_traffic_out'=0;800000000;900000000;0;1000000000 'ge-0/0/2_errors_in'=0;1;10;; 'ge-0/0/2_errors_out'=0;1;10;; 'ge-0/0/2_discards_in'=0;1;10;; 'ge-0/0/2_discards_out'=0;1;10;; 'ge-0/0/2.0_usage_in'=0.00%;80;90;0;100 'ge-0/0/2.0_usage_out'=0.00%;80;90;0;100 'ge-0/0/2.0_traffic_in'=50.67;800000000;900000000;0;1000000000 'ge-0/0/2.0_traffic_out'=50.67;800000000;900000000;0;1000000000 'ge-0/0/2.0_errors_in'=0;1;10;; 'ge-0/0/2.0_errors_out'=0;1;10;; 'ge-0/0/2.0_discards_in'=0;1;10;; 'ge-0/0/2.0_discards_out'=0;1;10;; 'ip-0/0/0.0_usage_in'=0%;80;90;0;100 'ip-0/0/0.0_usage_out'=0.00%;80;90;0;100 'ip-0/0/0.0_traffic_in'=0;640000000;720000000;0;800000000 'ip-0/0/0.0_traffic_out'=0.00;640000000;720000000;0;800000000 'ip-0/0/0.0_errors_in'=0;1;10;; 'ip-0/0/0.0_errors_out'=0;1;10;; 'ip-0/0/0.0_discards_in'=0;1;10;; 'ip-0/0/0.0_discards_out'=0;1;10;;

5.4 on SRX:
./check_nwc_health --host 192.168.206.30 --mode interface-health --community public
Illegal division by zero at ./check_nwc_health line 25188.

Everything else is working super. Thanks for a great plugin!

Interfaces alias names in perfdata

Hi,

Is there is away to display interfaces alias names in perfdata e.g.

interface GigabitEthernet0/0/1 (alias WAN LINK) instead of just interface GigabitEthernet0/0/1 ?

I looked through help and code briefly and couldn't find an option, Only way I see it can be currently done is to check specific interface with --name and set the service check name to match the alias.

Thanks

Dom

CPU Usage of multicore systems

I would like to have a mode to get a more detailed view of the cpu usage of every single core in a multicore system instead of one single value (mode: cpu-load)

Is there anybody else with this need?

check-config not working on NXOS

Seems that in the current version the "check-config" mode is not supported on Cisco Nexus switches:

# /usr/lib/nagios/plugins/check_nwc_health --hostname nexusswitch --community public --mode check-config -v -v -v
I am a Cisco NX-OS(tm) n5000, Software (n5000-uk9), Version 5.2(1)N1(7), RELEASE SOFTWARE Copyright (c) 2002-2011 by Cisco Systems, Inc. Device Manager Version 6.1(1),  Compiled 2/4/2014 23:00:00
Mode check-config is not implemented for this type of device

However when I "fake" the servertype and tell the plugin that the target is a classical Cisco (IOS) device, it works:

# /usr/lib/nagios/plugins/check_nwc_health --hostname nexusswitch --community public --mode check-config -v -v -v --servertype cisco
I am a cisco
[CONFIG]
ccmHistoryRunningLastChanged: 1451401338.09 (Tue Dec 29 16:02:18 2015)
ccmHistoryRunningLastSaved: 1451401455.87 (Tue Dec 29 16:04:15 2015)
ccmHistoryStartupLastChanged: 1416492000 (Thu Nov 20 15:00:00 2014)
CRITICAL - running config is ahead of startup config since 88837 minutes. changes will be lost in case of a reboot
checking config
running config is ahead of startup config since 88837 minutes. changes will be lost in case of a reboot

I modified NXOS.pm and recompiled the plugin, it works:

$ ./check_nwc_health --hostname nexusswitch --community public --mode check-config -v -v -v 
I am a Cisco NX-OS(tm) n5000, Software (n5000-uk9), Version 5.2(1)N1(7), RELEASE SOFTWARE Copyright (c) 2002-2011 by Cisco Systems, Inc. Device Manager Version 6.1(1),  Compiled 2/4/2014 23:00:00
[CONFIG]
ccmHistoryRunningLastChanged: 1451401338.09 (Tue Dec 29 16:02:18 2015)
ccmHistoryRunningLastSaved: 1451401455.87 (Tue Dec 29 16:04:15 2015)
ccmHistoryStartupLastChanged: 1416492000 (Thu Nov 20 15:00:00 2014)
CRITICAL - running config is ahead of startup config since 88865 minutes. changes will be lost in case of a reboot
checking config
running config is ahead of startup config since 88865 minutes. changes will be lost in case of a reboot

I will create the pull request shortly.

Support human readable uptime

This is more a feature request than a bug..

I am checking the uptime of some cisco devices (asa, switch etc..) and I can't find the possibilty to change the plugins output. This is what I have:

OK - device is up since 50583 minutes

Would it be possible to display the uptime in minutes, hours, days and, if neccessary, in years or something like that?

lausser / check_nwc_health Goto Github PK

check_nwc_health's Introduction

Description

Motivation

Documentation

Command line parameters

Modi

Installation

Examples

Homepage

check_nwc_health's People

Contributors

Stargazers

Watchers

Forkers

check_nwc_health's Issues

Name/OID: entPhySensorUnitsDisplay.3; Value (OctetString): (C) .1.3.6.1.2.1.99.1.1.1.6.3

Recommend Projects

Recommend Topics

Recommend Org