NCPA: Supercharge Your Log Server Monitoring

Picture of Shamas Demoret
Shamas Demoret
Technical Content Manager
A screenshot of a Nagios XI dashboard showing the health of a Nagios Log Server cluster monitored with NCPA.

In this article, we’ll explore how you can use the Nagios Cross-Platform Agent (NCPA) to ensure that your Nagios Log Server instances and cluster are running smoothly. Using NCPA will enable you to check both standard and context-driven performance and status details on your Log Server deployment.

Getting Started the Easy Way

The great thing about using NCPA to monitor your Log Server instances is that it’s already built in. You can navigate to Admin > Management > Monitor Backend to view the details:

A screenshot of the NCPA menu in Nagios Log Server.
NCPA details in Log Server.

Here you can view and modify your NCPA token, check the status of NCPA on each instance, start/stop/restart NCPA on them, and quickly access the NCPA web interface.

Clicking the Open NCPA Interface button will bring you to the agent Web UI, where you can view live data graphs of CPU, memory, disk, and interface usage, along with top process details. Here you can also review status history and leverage the check API.

Special OpenSearch Plugins

The NCPA Monitoring Wizard in Nagios XI will allow you to quickly monitor a variety of key metrics on your Log Server instances. This ensures that key resources, such as CPU and drive space, are not being overused, and it verifies that essential services like OpenSearch and Logstash are running. One metric that can be a little tricky with Log Server is memory usage, and that’s where NCPA shines. At the bottom of the NCPA Wizard, you’ll notice that two special plugins are bundled with the agent for you: check_os_jvm_heap.sh and check_os_cluster_status.sh.

OpenSearch by design pulls aside and places in reserve as much memory as it reasonably can on each instance, and whether it’s using it or not, this will show as memory usage when viewed via the standard NCPA memory check. That’s where check_os_jvm_heap.sh comes in. This special plugin will show you how much of the reserved memory is actually being used, serving as a much more useful perspective on memory usage. In this screenshot from a Nagios XI dashboard, you can see that although raw memory usage is on the high side, JVM heap usage is within reasonable boundaries:

Closeup of a section from a Nagios XI dashboard showing the difference between JVM heap usage and raw memory usage on two Nagios Log Server instances.
JVM heap usage is fine, despite raw memory showing high.

The check_os_cluster_status.sh Plugin also adds context to your monitoring by providing insight into the state of your Log Server cluster as a whole. If all instances are operational, this will show as Green/OK. If one or more instances have failed, but at least one remains operational, it will show as Yellow/Warning. Although Log Server is still functional, you no longer have clustered redundancy. Note that this mirrors the status shown in the Cluster Health section of Log Server’s Admin > Cluster Status menu. Also, note that in a single instance setup, this will always show as Yellow. To achieve Green status, simply add at least one additional instance to your setup.

Advanced Monitoring

Nagios Business Process Intelligence (BPI) can be used to further enhance your Log Server cluster monitoring. BPI enables you to combine all of the instances into a logical group that will alert you intelligently based not just on individual instance health but the status and functionality of the cluster as a whole.

Here’s an article that provides a detailed example of this approach, which involves nesting groups representing the status of each instance under the umbrella of a group showing the overall operational status of the cluster:

Nagios XI BPI: Actionable Insights for IT Monitoring and Optimization

By harnessing NCPA to monitor your Log Server cluster with Nagios XI, and especially when combined with the power of BPI, you’ll have everything you need to ensure the peak performance of your instances and cluster.

Additional Resources

Nagios Log Server Admin Guide

Nagios XI Admin Guide

NCPA Help Reference

Share: