Nagios plugin to check the status of a Sun Cluster for Solaris (version 3.2 or greater).
The 3.2 release of Solaris Cluster introduced an all-new Command Line Interface for managing the cluster. These are the equivalences between the previous and the new commands used to get the cluster status:
+------------------+-----------------+-------------------------+
| | Sun Cluster 3.1 | Sun Cluster 3.2 |
+------------------+-----------------+-------------------------+
| Nodes | scstat –n | clnode status |
| Quorum | scstat –q | clqorum status |
| Transport info | scstat –W | clinterconnect status |
| Resources | scstat –g | clresource status |
| Resource Groups | scstat -g | clresourcegroup status |
+------------------+-----------------+-------------------------+
In your Nagios plugins directory (in one of the nodes of the Sun Cluster) run:
git clone git://github.com/rafacas/nagios-plugin-sun-cluster.git
Edit your commands.cfg in your Nagios server and add the following (we are using NRPE to connect to the nodes):
define command {
command_name check_nrpe
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$
}
The plugin uses the command clnode status
to get the node status:
$clnode status
Cluster Nodes ===
--- Node Status ---
Node Name Status
--------- ------
vincent Online
theo Online
The following are the possible states of a node:
+-------------+---------------+
| Node status | Nagios status |
+-------------+---------------+
| Online | OK |
| Offline | CRITICAL |
+-------------+---------------+
services.cfg file:
define service {
use generic-service
hostgroup_name Sun Cluster
service_description Sun Cluster Nodes
check_command check_nrpe!check_sun_cluster_nodes
}
nrpe.cfg file:
command[check_sun_cluster_nodes]=/usr/local/nagios/libexec/check_sun_cluster.pl -n
The plugin uses the command clquorum status
to get the quorum device status:
$ clq status
Cluster Quorum ===
--- Quorum Votes Summary ---
Needed Present Possible
------ ------- --------
2 3 3
--- Quorum Votes by Node ---
Node Name Present Possible Status
--------- ------- -------- ------
vincent 1 1 Online
theo 1 1 Online
The following are the possible states of a node:
+----------------+---------------+
| Quorum status | Nagios status |
+----------------+---------------+
| Online | OK |
| Offline | CRITICAL |
+----------------+---------------+
services.cfg file:
define service {
use generic-service
hostgroup_name Sun Cluster
service_description Sun Cluster Quorum
check_command check_nrpe!check_sun_cluster_quorum
}
nrpe.cfg file:
command[check_sun_cluster_quorum]=/usr/local/nagios/libexec/check_sun_cluster.pl -q
The plugin uses the command clintr status
to get the status of the interconnect paths:
$clintr status
Cluster Transport Paths ===
Endpoint1 Endpoint2 Status
--------- --------- ------
vincent:qfe0 theo:qfe0 Path online
vincent:hme0 theo:hme0 Path online
The following are the possible states of a transport path:
+-------------------+---------------+
| Transport status | Nagios status |
+-------------------+---------------+
| Path online | OK |
| waiting | WARNING |
| faulted | CRITICAL |
+-------------------+---------------+
services.cfg file:
define service {
use generic-service
hostgroup_name Sun Cluster
service_description Sun Cluster Transport Paths
check_command check_nrpe!check_sun_cluster_transport
}
nrpe.cfg file:
command[check_sun_cluster_transport]=/usr/local/nagios/libexec/check_sun_cluster.pl -t
The plugin uses the command clrg status
to get the status of the resource groups:
$clrg status
Cluster Resource Groups ===
Group Name Node Name Suspended Status
---------- --------- --------- ------
nfs-rg vincent No Offline
theo No Online
The following are the possible states of a resource group:
+------------------------+---------------+
| Resource Group status | Nagios status |
+------------------------+---------------+
| Online | OK |
| Unknown | WARNING |
| Degraded | CRITICAL |
| Faulted | CRITICAL |
| Offline | CRITICAL |
+------------------------+---------------+
services.cfg file:
define service {
use generic-service
hostgroup_name Sun Cluster
service_description Sun Cluster Resource Groups
check_command check_nrpe!check_sun_cluster_groups
}
nrpe.cfg file:
command[check_sun_cluster_groups]=/usr/local/nagios/libexec/check_sun_cluster.pl -g
The plugin uses the command clrs status
to get the status of the resources:
$clrs status
Cluster Resources ===
Resource Name Node Name State Status Message
------------- --------- ----- --------------
nfs-stor vincent Offline Offline
theo Online Online
orangecat-nfs vincent Offline Offline
theo Online Online - LogicalHostname online.
nfs-res vincent Offline Offline
theo Online Online - Service is online.
The following are the possible states of a resource:
+----------------------+---------------+
| Resource status | Nagios status |
+----------------------+---------------+
| Online | OK |
| Online_not_monitored | WARNING |
| Starting | WARNING |
| Offline | CRITICAL |
| Start_failed | CRITICAL |
| Stop_failed | CRITICAL |
| Monitor_failed | CRITICAL |
| Stopping | CRITICAL |
| Not_online | CRITICAL |
+----------------------+---------------+
services.cfg file:
define service {
use generic-service
hostgroup_name Sun Cluster
service_description Sun Cluster Resources
check_command check_nrpe!check_sun_cluster_resources
}
nrpe.cfg file:
command[check_sun_cluster_resources]=/usr/local/nagios/libexec/check_sun_cluster.pl -r
The option -n
checks all the previous options.
services.cfg file:
define service {
use generic-service
hostgroup_name Sun Cluster
service_description Sun Cluster
check_command check_nrpe!check_sun_cluster
}
nrpe.cfg file:
command[check_sun_cluster]=/usr/local/nagios/libexec/check_sun_cluster.pl -a
If everything is OK the plugin will return:
OK - [NODES OK] [QUORUM OK] [TRANSPORT OK] [GROUPS OK] [RESOURCES OK]
- Rafael Casado Sánchez
- Johannes Egger
Email your feedback at [email protected]
You also can report bugs or suggest features using issue tracker at GitHub https://github.com/rafacas/nagios-plugin-sun-cluster/issues