Four Critical Metrics for Solr Health Check

4 Critical Metrics for Solr Health Check

If you use Apache Solr for your website or application, you know that it is important to monitor the Solr health of your deployment in order to ensure that your system is running smoothly. To optimize Solr server performance, you need to monitor the right metrics for Solr health that will deliver insights into Solr implementation and quickly identify when things are getting out of whack.

Editor’s Note: Managed Solr is now SearchStax Cloud. It is the same great product that lets developers set up and deploy Solr infrastructure in minutes.

Monitoring Solr Health

A Solr monitoring system enables you to gather and analyze statistics, and visualize metrics, events, logs, and traces in real time. The screenshots in this blog post come from the Pulse feature of SearchStax Managed Solr. Pulse allows users to monitor the health of a Solr deployment using real-time and historical statistical graphs of critical Solr server and JVM metrics. For increased flexibility, you can also define the time frame and frequency of the performance graphs.

Let’s dive into the four critical metrics for monitoring the health of your Solr deployment:

  1.     JVM Heap Memory
  2.     System Load Average
  3.     Disk Space
  4.     Errors of all kinds

1. JVM Heap Memory

As a Java application, Solr consumes JVM Heap Memory and consumption is typically observed in a zigzag pattern. Understanding JVM Heap Memory as a standalone metric is important because running out of it will cause Solr to reach an Out-Of-Memory (OOM) state and cause a loss of the Solr service. 

We recommend configuring JVM Heap Memory alerts and taking action if JVM Heap Memory reaches an 80% threshold. Corrective actions can include pausing or throttling the indexing processes from your application or, if Solr is running in a clustered topology, restarting nodes one at a time to rebalance consumption to avoid any system downtime. If high JVM Heap Memory consumption persists, you may need to optimize how the application is making requests to Solr and possibly upgrade the size of your Solr deployment. 

The graph below depicts JVM Heap Memory consumption with the resource threshold being depicted with a dotted line at the top of the x-axis. 

Solr Health Metrics - JVM Heap Memory Usage

2. System Load Average

System Load Average metrics are commonly intertwined with the CPU Usage graph in Pulse. To understand System Load Average, you need to be familiar with the configuration of the Deployment you are analyzing. The number of virtual central processing units (vCPUs) available provides context into how sustainable the System Load Average is in its current state. 

You can view the vCPUs for your deployment from the Solr Admin Interface – JVM > Processors. As shown below, this deployment has 1 vCPU.

Solr - Number of VPCs
System Load Average can spike briefly above your configuration’s designed limit, but if your system sustains System Load Average above the intended threshold then performance impact is likely observable in the performance of your application. When the System Load Average is higher than your system configuration’s intended limit, processes will get backed up and they have a cumulative impact on system performance.


Solr Health Metrics - System Load Average

For our test deployment with 1 vCPU, while the System Load Average does occasionally spike above 1, the overall load is very sustainable as processes remain backed up only briefly. We would conclude that the plan size for this deployment is suitable for the project’s needs at this time.

3. Disk Usage

Disk Usage is another critical metric to track and understand as high Disk Usage (or its inverse, Free Disk Space) can result in corrupted files, high error rates and general system instability. 

SearchStax recommends configuring alerts and taking action if Disk Usage reaches 80% of the threshold. Clearing unneeded index data is encouraged, but SearchStax also provides the option to purchase additional disk space if needed. Before adding more disk space, you need to understand what is consuming space on your disk storage and analyze why additional space is recommended. 

If your project has business requirements for periodic backups, then additional disk space is required. Backups are taken on disk locally before they get exported to blob storage in the specified cloud region. In addition to backups, log expansion and index segmentation require additional disk space. 

As a best practice, SearchStax recommends selecting a deployment plan size with three times the amount of disk space that you expect the Solr indexes to occupy.

The graph below shows a deployment with an observable Solr backup that started at 10am and temporarily consumes disk space until the backup was migrated into a blob storage for longer retention. 

Solr Health Metrics - Disk Usage

4. Errors of All Kinds for Solr Health

SearchStax Pulse Monitoring provides metrics on several types of key errors and these errors should be avoided as they impact application functionality and performance. CPU Usage and System Load Average will rise if errors occur frequently as these processes are more compute intensive than a 200 response. 

We typically configure alerts on:

  • 5xx Errors 
  • Index Error Counts 
  • Search Error Counts 
  • Direct Update Handler Errors 

As best practice, SearchStax recommends setting alerts on all deployment nodes for Index Error Counts to provide you with notice when documents being indexed are greeted with missing field or undefined field index errors. It’s important for all your data to be properly ingested and to avoid using additional compute resources. 

The graph below shows Index Error Counts. We recommend configuring alerts that will identify when 10 error counts occur within a span of 5 minutes as a call to action. 

Solr Health Metrics - Index Error Count
In this System Load Average graph below, you can see how the errors are consuming additional compute power to resolve the erroring requests.

Use Solr Alerting to Look at Exceptions Only

In addition to the real time Solr health metrics, Managed Solr also has an automated Solr Alerting feature that lets users create Solr Heartbeat or Solr Threshold alerts which will send emails when the alerts are triggered. These alerts can also be added to the alerting tool of your choice using Webhooks to access the API.