We had to create a plugin to basically do the following
1) Do a typical traceroute from the Nagios box to a destination IP
2) Instead of calculating the time between the Nagios to Destination Host, we are interested to know the time between two host in between
In other words, a typical traceroute will
NagionServer –> Gateway –> Hop 1 –> Hop 2 –> Hop 3 –> Destination
What this plugin can do is when defined correctly, to check the time (in ms) between Hop 1 up until Hop 3, plot a graph and put up warning and critical values for your alerting.
Here’s the sample plugin, and relevant configuration files you probably need.
NOTE: You may need to tweak for different Oses other than Debian as this was created and tested with a Debian.
The plugin
- The plugin (place typically in /usr/local/nagios/libexec)
- Paste below into a file say trace_time
- Make sure it belongs to user <nagios> and has execution right; e.g.
- chown nagios:nagios /usr/local/nagios/libexec/trace_time
- chmod +X /usr/local/nagios/libexec/trace_time
#####START PLUGIN##### #!/bin/bash # # usage # ./trace-time <final-dest> <startip> <endip> <warning> <critical> # Note: You must define all three, there's no error checking # tip: do a traceroute first, then determine from which ip to which ip do you want to calculate. If # # DEST=$1 IP1=$2 IP2=$3 WARNING=$4 CRITICAL=$5 PROG=`which traceroute` if [[ $DEST == "" ]]; then echo "UNKNOWN: No destination ip defined" exit 3 fi if [[ $IP1 == "" ]]; then echo "UNKNOWN: No start ip defined" exit 3 fi if [[ $IP2 == "" ]]; then IP2=$DEST fi if [[ $WARNING -eq "" ]]; then echo "UNKNOWN: No warning value defined" exit 3 fi if [[ $CRITICAL == "" ]]; then echo "UNKNOWN: No critical value defined" exit 3 fi if [[ $WARNING > $CRITICAL ]]; then echo "UNKNOWN: Warning value larger than critical value" exit 3 fi # myepoch=`date +%s` filename=/tmp/$myepoch.tmp.txt tempfile=/tmp/$myepoch.output # /bin/touch $filename /bin/touch $tempfile # /bin/chown nagios:nagios $filename /bin/chown nagios:nagios $tempfile # # getreading=`$PROG -n -q 1 $DEST > $tempfile` # numberip1=`cat $tempfile | grep ms | grep $IP1 | awk {'print $1'}` numberip2=`cat $tempfile | grep ms | grep $IP2 | awk {'print $1'}` # # for i in $(seq $numberip1 $numberip2) do getms=`cat $tempfile | sed -e 's/^[ \t]*//' | grep ^$i | awk {'print $3'}` echo $getms >> $filename done # startcalc=`awk '{s+=$0} END {print s}' $filename` # rm $filename rm $tempfile # # OUTPUTS # grapher="$IP1-->$IP2" # if awk 'BEGIN{if(0+'$startcalc'>'$CRITICAL'+0)exit 0;exit 1}' then echo "CRITICAL($startcalc): Time exceed critical value|$grapher=$startcalc;$WARNING;$CRITICAL" exit 2 fi if awk 'BEGIN{if(0+'$startcalc'>'$WARNING'+0)exit 0;exit 1}' then echo "WARNING($startcalc): Time exceed warning value|$grapher=$startcalc;$WARNING;$CRITICAL" exit 1 else echo "OK($startcalc): Time OK|'$grapher'=$startcalc;$WARNING;$CRITICAL;;" exit 0 fi #####END PLUGIN##### |
Nagios – Host.cfg
define host{ use debian5-linuxserver host_name Google WWW server alias For Tracing TimeHop Distances address } |
Nagios – commands.cfg
define command{ command_name check_time_between_hosts command_line $USER1$/trace-time $HOSTADDRESS$ $ARG1$ $ARG2$ $ARG3$ $ARG4$ } |
Nagios – services.cfg
define service{ use debian5-linuxservice host_name Google WWW server service_description Between IP to action_url /nagios/pnp/index.php?host=$HOSTNAME$&srv=$SERVICEDESC$ check_command check_time_between_hosts!!!10!20 } |
Now, just restart Nagios to make it work.
More info
In order for you to know the hop you wish to monitor, simply do a traceroute;
# traceroute -n -q 1
-n = Numeric output
- q 1= Only do a single query
In this example below, I am tracing to one of Google’s servers at, the output of the trace is like below (NOTE!: actual IPs have been changed)
1 0.554 ms
2 0.667 ms
3 1.026 ms
4 1.218 ms
5 1.488 ms
6 1.627 ms
7 1.542 ms
8 2.322 ms
9 3.075 ms
10 2.801 ms
So lets say you wish to trace the time between IP and IP113.23.161.66, simply use the plugin with these values on the CLI (to test);
# ./trace-time 10 20
And the output will look like this;
OK(5.909): Time OK|'>'=5.909;10;20;;
*Which is a typical output expected by Nagios with PNP graphing enabled
Graphs will look like this