Solr Service Alerting – SearchStax


Overview

SearchStax® Managed Solr provides two kinds of real-time email alerts:

  • Heartbeat alerts: Notify a list of email recipients when a server starts or stops operating.
  • Threshold alerts: Notify a list of email recipients when a server exceeds a performance threshold.

Either type of alert may optionally invoke a webhook to notify an external bug-tracking system or alerting system.

Both types of alerts create an “incident” report that you can inspect in the Managed Solr dashboard.

Alerts send a follow-up email when the condition is resolved.

Best Practice: Heartbeat Alerts on All Servers

We typically add Heartbeat Alerts to all servers — both Solr and Zookeeper.

Best Practice: Avoid False Alarms

There are innocent events that can make Pulse miss a beat, such as high CPU activity. We configure alerts to resample for five minutes before triggering.

Premium Alerting

For SearchStax customers with Premium Support Level Agreements (SLAs), we have an internal monitoring system that notifies our on-call support team of any issues.

Contents of this page:

Popular Alerts

SearchStax clients often implement some or all of the following alerts on a production system:

AlertNodeTriggerDelayMax AlertsRepeat
HeartbeatAll5 min115 min
Index Error Count *Solr>105 min115 min
JVM Heap Memory UsedSolr>80%5 min115 min
Search TimeoutsSolr>10%5 min115 min
Free Disk SpaceSolr<20%5 min115 min
Search Average Response Time / RequestSolr
>3000ms
5 min115 min
Index TimeoutSolr>105 min115 min
Index Average Response Time / RequestSolr>600005 min115 min
CPU UsageSolr>80%5 min115 min

* Note that index-error alerts often mean that some of your documents have been dropped from the index. See What Causes Indexing Errors?.

Heartbeat Alerts

Both Zookeeper and Solr send reports of system metrics to SearchStax once per minute. You can set up a “heartbeat” alert to notify you if these reports are interrupted. The system also notifies you when the updates resume.

Set up a Heartbeat Alert

To set up a heartbeat alert, open the Managed Solr dashboard and navigate to a specific deployment.

  • Expose the details page for that deployment.
  • Click the Pulse label in the left-side navigation bar.
  • Open the Alerting menu in the top menu bar.
  • Select Heartbeat
Managed Solr Heartbeat Alerting
ControlDescription
ServerThe Server control offers a list of the servers in this deployment. Select one of them to monitor.
NameGive the alert a name that you will recognize when you see it in email.
Notify if data is missing for more than…When heartbeat data stops flowing, wait this long before triggering the alert.
Max NotificationsAlert emails are reissued every two minutes. How many of them do you want to send?
Send alerts toChoose from a list of registered SearchStax users.
Send trigger alert to webhookInvoke this webhook when this alert is triggered.
Send resolve alert to webhookInvoke this webhook when the alert is resolved.

Heartbeat Email

A heartbeat email notification resembles this one:

Dear SearchStax Customer,

The alert ss123456-5 heartbeat alert for your deployment Films (ss123456) has been triggered.

The following host is unreachable.

Host: ss123456-5

To View Metrics in Dashboard: https://app.searchstax.com/admin/deployment/pulse/deployment/ss123456/alert/incident/update/65737

To Edit this Alert: https://app.searchstax.com/admin/deployment/pulse/deployment/ss123456/alert/heartbeat/update/841/

This alert was triggered at 2020-01-15 20:12:27 UTC.

This alert was raised for account AccountName.

You will receive a similar “UP” notification when the heartbeat is again detected.

Threshold Alerts

A “threshold” alert watches a specific system metric and sends you email when the metric meets or exceeds some value.

Managed Solr allows you to monitor the following metrics:

  • TotalRequests
  • CPU Usage
  • JVM Thread Count
  • Disk Space Used
  • Disk Space Free
  • JVM Heap Memory Used
  • 1 Min. 5XX Error Rate
  • Swap Used
  • System Load Average
  • Search – Avg. Requests/s
  • Search – 5 Min. Request Rate
  • Search Timeouts
  • Search Error Count
  • Index – Timeouts
  • Index – Error Count
  • QueryResultCache – evictions
  • QueryResultCache – warmupTime
  • QueryResultCache – hitratio
  • Filtercache – evictions
  • Filtercache – warumpTime
  • Filtercache – hitratio
  • DocumentCache – evictions
  • DocumentCache – hitratio
  • DocumentCache – warmupTime
  • FieldValueCache – evictions
  • FieldValueCache – hitratio
  • FieldValueCache – warmupTime
  • Search – Avg. Response Time/Request (ms)
  • Index – Avg Response Time/Request (ms)
  • JVM Non-Heap Memory Used
  • Physical Memory Used
  • Index – 5 min. Request Rate

Set up a Threshold Alert

To set up a threshold alert, open the Managed Solr dashboard and navigate to a specific deployment.

  • Expose the details page for that deployment.
  • Click the Pulse label in the left-side navigation bar.
  • Open the Alerting menu in the top menu bar.
  • Select Threshold.
  • Click the Create New Alert button.
Threshold Alert SearchStax
ControlDescription
Host MachineThe Host Machine control offers a list of the servers in this deployment. Select one of them to monitor.
Metric NameChoose one of many internal metrics monitored by Managed Solr.
CollectionSome metrics are collection-specific. Others apply to “all collections.”
Alert NameGive the alert a name that you will recognize when you see it in email.
Delay of at leastMetric must exceed threshold for this long before triggering the alert.
Max AlertsAlert emails are reissued every two minutes. How many of them do you want to send?
Repeat EveryTime to wait between sending repeat email messages.
Send alerts toChoose from a list of registered SearchStax users.
Send trigger alert to webhookInvoke this webhook when this alert is triggered.
Send resolve alert to webhookInvoke this webhook when the alert is resolved.

Receive a Threshold Alert

A threshold email notification resembles this one:

Dear SearchStax Customer,

The alert "Server 5 below 10% CPU" for your deployment Films (ss123456) has been triggered.

Host:           ss123456-5
Metric:         CPU Usage
Name:           "Server 5 below 10% CPU"
Threshold:      < 10.0%
Current Value:  0.01 %

To View Metrics in Dashboard: https://app.searchstax.com/admin/deployment/pulse/deployment/ss123456/system/

To Edit this Alert: https://app.searchstax.com/admin/deployment/pulse/deployment/ss123456/alert/incident/update/6012

This alert was triggered at 2019-12-20 17:51:42 UTC.

This alert was raised for account AccountName.

Incidents

To view a list of your heartbeat or threshold incidents, open the Managed Solr dashboard and navigate to the deployment in question. (Alternately, there is an incident link in the email message you received. The link takes you directly to the details of the incident.)

  • Expose the details page for that deployment.
  • Click the Pulse label in the left-side navigation bar.
  • Open the Alerting menu in the top menu bar.
  • Select Incidents
SearchStax Alerting Incidents

Click the incident to view its details. You’ll see a brief description of the incident followed by a timeline of events. Read the timeline from the bottom up.

Questions?

Do not hesitate to contact the SearchStax Support Desk.

Was this article helpful?
YesNo