Share this post
Shamas Demoret
Table of Contents
Nagios XI BPI: Actionable Insights for IT Monitoring and Optimization
|
Shamas Demoret
Nagios BPI (Business Process Intelligence) provides a way for you to combine Hosts and Services you monitor with Nagios XI into logical groups, then alert intelligently based on both the overall percent health of the group, and of ‘essential members’ (which must be in an OK state for the group to function). This powerful tool can be found in the Home > Details section of the Nagios XI web interface.
Business Process Intelligence can be leveraged for a wide variety of group monitoring use cases and is especially useful when monitoring clusters and complex applications, where some members can fail without compromising the application functionality, while others may be essential.
Not only will unlocking the capabilities of BPI enable you to alert more intelligently, but it will also enhance your ability to troubleshoot issues and provide rapid root cause analysis when set up properly.
Nagios BPI Group Example
Let’s look at the BPI group for a Nagios Log Server testing and development Cluster, running on a VMWare vWorker, as an excellent example of the power of BPI (note each Nagios license includes a free test/lab install, so you can test new configurations and upgrades before rolling them into production). In this example, we have assigned the BPI group a low priority since it isn’t a production setup. We’ve also made the Windows Host on which VMWare is running an essential member (indicated by the dotted circle to its left), because if the VM Host stopped working, so would all of the Log Server Instances running as VM guests.
Nesting Groups for Holistic Monitoring
BPI includes the ability to nest groups within groups, so the status of multiple BPI groups impacts the status of the larger overarching BPI group. This technique allows you to utilize multiple layers of BPI group logic to determine the status of your ultimate top-level BPI group’s status. The Nagios Log Server Lab Environment group, as depicted above, consists of three additional groups, each reflecting the state of a Log Server Instance, and a fourth group, which represents the vWorker they operate on.
These smaller groups include a ping check for the Host, key performance checks like CPU and memory usage, and Service status checks for elasticsearch and logstash. We’ve marked the Host ping check and the elasticsearch/logstash Service checks as essential members since their failure would render this Instance non-functional:
Troubleshooting with Nagios BPI
If any of these essential members enter a problem state, the NLSR1-Test-A/B/C group they are in will show as critical.
Once this happens, the larger ‘Nagios Log Server Lab Environment’ group will go into a Warning state based on the group health thresholds we’ve defined:
With a quick look at the group members, we can immediately see the problem is with the elasticsearch Service on the third Instance, and can expand the group to see if there are any other problems:
All is Well Again
Since the stopped elasticsearch Service is the sole problem on Instance C, we SSH into the machine and restart it. After doing so, we schedule an immediate check for the Service on its Status Detail page and revisit the BPI group. We see both the instance-specific group and the larger group are once again in an OK state:
Intelligent Alerting with BPI
As we’ve shown above, BPI not only provides handy visual insight into group status but also directly alerts you when a group enters a Warning or Critical state. You can quickly and easily configure monitoring and alerting for the entire group as an individual service using Nagios XI’s built-in BPI Wizard. Take a look at our Log Server Cluster through the lens of a Service that the BPI Wizard has created:
The “Using BPI in Nagios XI” guide is an excellent resource to help you learn how to manage your groups, including details on initial setup, group alert logic, and using the BPI Wizard.
Centralized Monitoring with Nagios Fusion Dashlet
Another great feature related to BPI is the BPI Dashlet in Nagios Fusion, which provides you with centralized visibility of the status of all your BPI groups across multiple Nagios XI servers, and empowers you with single-click access to the BPI page on each:
You can learn more about the role Nagios Fusion and each of the other Nagios solutions play in providing a holistic monitoring perspective here.
An Enterprise Edition Extra
The Enterprise Edition of Nagios XI also automatically enables you to leverage the Host groups and Service groups you define as BPI groups. To learn more about setting up your BPI groups, and about this great Enterprise Feature, the following video will be an excellent resource:
As you can see from the example, Nagios BPI is a powerful feature. Careful cultivation of meaningful BPI groups will empower you to more quickly identify individual issues preventing complex processes from carrying out their critical tasks and get your IT infrastructure back online faster.