We had to create a plugin to basically do the following
1) Do a typical traceroute from the Nagios box to a destination IP
2) Instead of calculating the time between the Nagios to Destination Host, we are interested to know the time between two host in between
In other words, a typical traceroute will
NagionServer –> Gateway –> Hop 1 –> Hop 2 –> Hop 3 –> Destination
What this plugin can do is when defined correctly, to check the time (in ms) between Hop 1 up until Hop 3, plot a graph and put up warning and critical values for your alerting.
Here’s the sample plugin, and relevant configuration files you probably need.
NOTE: You may need to tweak for different Oses other than Debian as this was created and tested with a Debian.
The plugin
- The plugin (place typically in /usr/local/nagios/libexec)
- Paste below into a file say trace_time
- Make sure it belongs to user <nagios> and has execution right; e.g.
- chown nagios:nagios /usr/local/nagios/libexec/trace_time
- chmod +X /usr/local/nagios/libexec/trace_time
#####START PLUGIN##### #!/bin/bash # # usage # ./trace-time <final-dest> <startip> <endip> <warning> <critical> # Note: You must define all three, there's no error checking # tip: do a traceroute first, then determine from which ip to which ip do you want to calculate. If # # DEST=$1 IP1=$2 IP2=$3 WARNING=$4 CRITICAL=$5 PROG=`which traceroute` if [[ $DEST == "" ]]; then echo "UNKNOWN: No destination ip defined" exit 3 fi if [[ $IP1 == "" ]]; then echo "UNKNOWN: No start ip defined" exit 3 fi if [[ $IP2 == "" ]]; then IP2=$DEST fi if [[ $WARNING -eq "" ]]; then echo "UNKNOWN: No warning value defined" exit 3 fi if [[ $CRITICAL == "" ]]; then echo "UNKNOWN: No critical value defined" exit 3 fi if [[ $WARNING > $CRITICAL ]]; then echo "UNKNOWN: Warning value larger than critical value" exit 3 fi # myepoch=`date +%s` filename=/tmp/$myepoch.tmp.txt tempfile=/tmp/$myepoch.output # /bin/touch $filename /bin/touch $tempfile # /bin/chown nagios:nagios $filename /bin/chown nagios:nagios $tempfile # # getreading=`$PROG -n -q 1 $DEST > $tempfile` # numberip1=`cat $tempfile | grep ms | grep $IP1 | awk {'print $1'}` numberip2=`cat $tempfile | grep ms | grep $IP2 | awk {'print $1'}` # # for i in $(seq $numberip1 $numberip2) do getms=`cat $tempfile | sed -e 's/^[ \t]*//' | grep ^$i | awk {'print $3'}` echo $getms >> $filename done # startcalc=`awk '{s+=$0} END {print s}' $filename` # rm $filename rm $tempfile # # OUTPUTS # grapher="$IP1-->$IP2" # if awk 'BEGIN{if(0+'$startcalc'>'$CRITICAL'+0)exit 0;exit 1}' then echo "CRITICAL($startcalc): Time exceed critical value|$grapher=$startcalc;$WARNING;$CRITICAL" exit 2 fi if awk 'BEGIN{if(0+'$startcalc'>'$WARNING'+0)exit 0;exit 1}' then echo "WARNING($startcalc): Time exceed warning value|$grapher=$startcalc;$WARNING;$CRITICAL" exit 1 else echo "OK($startcalc): Time OK|'$grapher'=$startcalc;$WARNING;$CRITICAL;;" exit 0 fi #####END PLUGIN##### |
Nagios – Host.cfg
define host{ use debian5-linuxserver host_name Google WWW server alias For Tracing TimeHop Distances address 209.85.175.105 } |
Nagios – commands.cfg
define command{ command_name check_time_between_hosts command_line $USER1$/trace-time $HOSTADDRESS$ $ARG1$ $ARG2$ $ARG3$ $ARG4$ } |
Nagios – services.cfg
define service{ use debian5-linuxservice host_name Google WWW server service_description Between IP 210.5.40.153 to 209.85.250.237 action_url /nagios/pnp/index.php?host=$HOSTNAME$&srv=$SERVICEDESC$ check_command check_time_between_hosts!210.5.40.153!113.23.161.66!10!20 } |
Now, just restart Nagios to make it work.
More info
In order for you to know the hop you wish to monitor, simply do a traceroute;
# traceroute -n -q 1 209.85.175.105
-n = Numeric output
- q 1= Only do a single query
In this example below, I am tracing to one of Google’s servers at 209.85.175.105, the output of the trace is like below (NOTE!: actual IPs have been changed)
1 111.22.42.3 0.554 ms
2 111.22.40.153 0.667 ms
3 111.22.40.125 1.026 ms
4 203.188.233.121 1.218 ms
5 203.188.233.205 1.488 ms
6 113.23.161.66 1.627 ms
7 209.85.242.246 1.542 ms
8 209.85.242.125 2.322 ms
9 66.249.94.158 3.075 ms
10 209.85.175.105 2.801 ms
So lets say you wish to trace the time between IP 111.22.40.153 and IP113.23.161.66, simply use the plugin with these values on the CLI (to test);
# ./trace-time 209.85.175.105 111.2.40.153 113.23.161.66 10 20
And the output will look like this;
OK(5.909): Time OK|'111.22.40.153-->113.23.161.66'=5.909;10;20;;
*Which is a typical output expected by Nagios with PNP graphing enabled
Graphs will look like this