Route Hops: How to Create a Nagios Plugin to Monitor Them
An important metric in network diagnostics is the number of hops it takes to reach a target host. In this article, we will create a plugin for Nagios using Python to monitor route hops to a target host.
Step 1: Prerequisites
- Python 3: The script uses Python, so make sure you have it installed so you can test your plugin as you develop it. It can be downloaded here.
- A code editor: To create this plugin I used VS Code, which can be downloaded here.
Step 2: Set Up the Python Script
Create a new Python file named check_route_hops.py and begin with a shebang line and essential import statements.
#!/usr/bin/env python3
import sys
import traceback
import argparse
import subprocess
def main():
if __name__ == "__main__":
main()Define the Nagios status codes to make the code more readable.
OK = 0
WARNING = 1
CRITICAL = 2
UNKNOWN = 3
Now, create a global variable for the command line arguments.
_args = None
Step 3: Parse Command-Line Arguments
The plugin should accept arguments to specify the target host, thresholds, and other options. Use Python’s argparse module to define and handle these arguments. This code should be placed within the main function.
parser = argparse.ArgumentParser(
description='This script checks the number of hops to a target host using traceroute.'
'You can define warning and critical thresholds for hop counts. '
'If the host is unreachable with the default protocol (UDP) other protocols will be tried.')
# Define the arguments
parser.add_argument('-H', '--host', required=True, type=str, help='Target host IP to check hop count')
parser.add_argument('-t', '--timeout', default=3, type=int, help='Timeout duration in seconds for the check (default is 3)')
parser.add_argument('-w', '--warning', required=False, type=int, help='Warning threshold for hop count')
parser.add_argument('-c', '--critical', required=False, type=int, help='Critical threshold for hop count')
parser.add_argument('-v', '--verbose', required=False, type=int, default=0, help=(
'The verbosity level of the output. '
'(0: Single line summary, '
'1: Single line with additional information, '
'2: Multi line with configuration debug output)'
))
parser.add_argument('-p', '--protocol', type=str, default='UDP', help='First protocol to attempt for traceroute (default: UDP)')
parser.add_argument('--debug', action='store_true', help='Enable debug mode')
_args = parser.parse_args(sys.argv[1:])Step 4: Implement the Traceroute Logic
Create a helper function to execute traceroute and calculate the number of hops to the host. Use subprocess.run to call the traceroute command. The stdout result of the command is then used to count the number of hops to the host. Different protocols are used by adding additional arguments to the command.
def get_route_hops(protocol):
global _args
try:
result = None
if protocol == "UDP":
result = subprocess.run(['traceroute', _args.host], stdout=subprocess.PIPE, stderr=subprocess.PIPE, text=True, timeout=_args.timeout)
elif protocol == "ICMP":
result = subprocess.run(['traceroute', _args.host, '-I'], stdout=subprocess.PIPE, stderr=subprocess.PIPE, text=True, timeout=_args.timeout)
# Count the number of hops (lines returned by traceroute)
hops = len(result.stdout.splitlines()) - 1
return hops if hops > 0 else 0
except subprocess.TimeoutExpired:
return 0
except Exception as e:
return 0Step 5: Combine Logic in the Main Function
The main function shall do the following:
- Parse arguments.
- Attempt traceroute on the specified protocol.
- Fallback to alternative protocols if the specified protocol fails.
- Compare the hop counts against the thresholds.
- Build and print a Nagios status message.
- Exit with the proper status code.
For testing, you can add debug statements that print out additional information if the –debug argument exists.
# Find the number of hops to the host using different protocols
protocols = ['UDP', 'ICMP']
if _args.protocol in protocols:
protocols.remove(_args.protocol)
protocols.insert(0, _args.protocol)
hops = 0
used_protocol = _args.protocol
# Check the default protocol first
hops = get_route_hops(_args.protocol)
# If the default protocol fails, try the other protocols
if hops == 0:
for protocol in protocols[1:]:
hops = get_route_hops(protocol)
if hops > 0:
used_protocol = protocol
break
if hops == 0:
print("CRITICAL - Failed to determine hop count for all protocols")
sys.exit(CRITICAL)
# Set the status code based on the number of hops to the host
status_code = OK
if _args.warning and hops >= _args.warning:
status_code = WARNING
if _args.critical and hops >= _args.critical:
status_code = CRITICAL
status_dict = {
0: "OK",
1: "WARNING",
2: "CRITICAL",
}
# Build the status message
message = ''
if (hops == 1):
message = f"{status_dict[status_code]} - {hops} hop was counted to {_args.host}"
else:
message = f"{status_dict[status_code]} - {hops} hops were counted to {_args.host}"
print(message)
sys.exit(status_code)Step 6: Test the Plugin
Run your script manually to verify that your plugin functions as expected. Try different combinations of arguments to ensure everything functions as expected.
Here is an example of what you could run in your terminal:
python3 check_route_hops.py -H example.com -w 10 -c 20 -p ICMP --debug
Step 7: Upload your plugin to Nagios XI
Follow the steps here to add your new plugin to Nagios XI!
Video
Conclusion
You’ve successfully created a custom Nagios plugin to monitor route hops. This plugin is highly customizable, allowing you to modify thresholds and protocols as needed. It’s a solid tool for monitoring network paths and troubleshooting connectivity issues.




