Monitor of Monitors: Easily Monitoring Nagios Servers

Picture of Shamas Demoret
Shamas Demoret
Technical Content Manager
A dashboard showing the status of several Nagios servers.

Day after day, IT admins around the world rely on Nagios solutions to verify the health of their infrastructures and keep things running smoothly, but monitoring Nagios servers is often overlooked. Who ‘watches the watchmen’? If you’re not monitoring your Nagios systems and they run into trouble, you’ll be left in the dark about the status of the critical assets they keep tabs on. In this article, we’ll discuss strategies you can employ for monitoring Nagios application servers, making sure they are available and performing as expected.

Start Monitoring Nagios XI Servers

Monitoring your Nagios XI server is a breeze using the built-in Nagios XI Monitoring Wizard. It provides easy access a variety of key application metrics that you’ll want to keep an eye on, monitored via XI’s REST API :

  • Nagios XI Web Interface – checks the availability of the web UI
  • Monitoring Daemons – ensures that the monitoring engine and supporting daemons (nagios and npcd) are running
  • Monitoring Jobs – ensures that core monitoring jobs are running
  • Load – 1, 5, and 15 minute CPU load averages
  • I/O Wait – measures disk read/write times

It’s also recommended to monitor standard items such as memory, drivespace, interface bandwidth, and CPU usage, as well as key system services such as mysqld/mariadb, crond, and apache (httpd). In this article we’ll use NCPA (the Nagios Cross Platform Agent) and built-in monitoring wizard to easily configure monitoring of these items.

Screenshot showing Nagios monitoring of a Nagios XI server, with a list of key services being monitored.
Nagios monitoring of key services on an XI server.

It’s also worth using your production XI server to monitor the XI server you monitor your other Nagios servers with. This best practice will ensure that both servers are being checked for availability.

For Nagios Core servers, you’ll want to monitor the nagios service and httpd, along with the same server performance metrics.

Monitoring Nagios Log Server

In addition to standard performance metrics like CPU, memory, and drivespace, you’ll also want to monitor critical system services such as elasticsearch and logstash. For added power, you can create a BPI group for your cluster for intelligent monitoring and root cause analysis of issues.

If the NCPA agent is used, you’ll also notice a couple of built-in plugins at the bottom of Step 2 of the wizard that can be used to monitor cluster status and JVM heap data.

A screenshot showing key services being monitored on a Nagios Log Server instance.
Key services being monitored on Nagios Log Server.

Monitoring Nagios Network Analyzer

In addition to key server performance checks, some critical Nagios Network Analyzer system services are nagiosna, mariadb/mysql, and httpd:

Screenshot showing key services being monitored on a Nagios Network Analyzer server.
Key services monitored on Nagios Network Analyzer.

Monitoring Nagios Fusion

On your Fusion server, monitoring of mariadb/mysqld and httpd services is important, along with standard system performance metrics:

Screenshot showing key services being monitored on a Nagios Fusion server.
Key services monitored on Nagios Fusion.

Use the Force: Set Up Custom Alerts and Dashboards

Once you’ve set up monitoring of your Nagios servers, the next step will be to set up alerts so you and your team will know when problems occur.

You can also set up custom Dashboards for a quick visual reference of how your Nagios deployment is running, such as this one:

A dashboard showing the status of several Nagios servers.
This Dashboard provides at-a-glace details on the status of the Nagios suite.

Here’s a video from Mahad on creating Dashboards in Nagios XI in case you’re looking for details on setting them up:

Be Proactive: Capacity Planning, Automated Actions & Real-Time Insights

In addition to monitoring the current health of your Nagios servers, you can take things a step further by leveraging other great features of Nagios XI, such as Capacity Planning, Actions, Event Handlers, and the NCPA Web UI:

  • Capacity Planning empowers you to project future usage based on the historical performance data you collect, and graph the results. Those results can be reviewed as needed, added to Dashboards for quick access to specific projections you want to keep tabs on, and even alerted on using the Capacity Planning Wizard. This powerful feature is part of the Enterprise Edition of XI. Here’s a video from Vadim that will help you quickly understand Capacity Planning:
  • Actions enable you to set up clickable icons in the Host and Service Status Detail pages of XI, which can be set to do useful things like execute scripts (for example, to restart a system service) and direct users to URLs.
  • Event Handlers are another option worth exploring, providing you with the ability to automatically execute scripted remediation actions when problems are detected.
  • The NCPA Web UI provides access to real-time performance graphs and top processes data on each system you monitor with this powerful agent:
The NCPA (Nagios Cross Platform Agent) web interface, showing graphs of CPU and Memory usage.
Live graphs in the NCPA web UI.
The NCPA (Nagios Cross Platform Agent) web interface, showing a list of Top Processes.
Top Processes live data in the NCPA web UI.

Nagios Licensing Considerations

One great option is the free 7 Node (Host)/ 100 Total Check (Hosts + Services) license of Nagios XI. To leverage this approach, simply set up a fresh install of Nagios XI, then navigate to Admin -> System Config -> License Information and select the Free (self supported) option.

We also offer smaller licenses like the 50 Node if your requirements exceed the limits of the free license.

Whatever licensing approach you choose, taking the time to set up monitoring of your Nagios servers is an important best practice that will help make sure your Nagios deployment is ready and running when you need it most, and by extension keep the rest of your environment spinning like a top.

Share: