A Heartbeat Alert notifies you by email that the Pulse server has not heard from one of your Solr nodes for some number of minutes. This creates an Incident in the Managed Solr Dashboard. Most Heartbeat incidents are false alarms that resolve themselves within a few minutes. Some of them, however require remedial action.
High Load, High CPU
Heartbeat alerts can be caused by any situation that prevents the Pulse Agent from sending status messages to the Pulse servers at SearchStax. For instance, sustained high CPU levels can delay the next Pulse report past the Heartbeat threshold. This is often associated with a heavy episode of indexing. If the alerts are associated with indexing events, consider throttling back the flow of /update messages.
These spontaneous service interruptions are so common that we set up heartbeat alerts with a five-minute delay to avoid false positive alerts.
Memory Issues
When a Solr Cloud server is about to run out of memory, it sometimes shuts down the Pulse Agent to scavenge resources. It does not automatically restart the Pulse Agent when the memory event resolves itself. You may see a notice in the Managed Solr dashboard saying Pulse Agent is Down.
A system in this situation is likely to have suffered Out-of-Memory crashes. Look for “solr_oom_killer” log files on that node.
If the Solr Dashboard shows all nodes and replicas are up, only the Pulse Agent might be down. Please open a support ticket so SearchStax engineers can restart the Pulse Agent for you.
Solr is Down
On the Solr dashboard, check the Cloud > Graph to see if all nodes/replicas are up. If a node is down, restart the node from the Managed Solr dashboard. If problems persist, open a ticket with the SearchStax Support Desk.
Note that it is normal to see the “Pulse Agent is Down” notice for a few minutes after restarting a node.