July 25, 2022
Thomas DiLascio
|
We get questions about how the SearchStax Disaster Recovery (DR) process works with Managed Solr from an operational standpoint and we dive into the details in this blog post. SearchStax offers three disaster recovery options for Solr deployments to meet a range of business requirements – Hot, Warm and Cold. You can learn more about SearchStax DR in our Solr DR options blog post.
Editor’s Note: Managed Solr is now SearchStax Cloud. It is the same great product that lets developers set up and deploy Solr infrastructure in minutes.
Before we get into how Solr Disaster Recovery works in SearchStax, here is a quick refresher on two critical DR terms, RTO and RPO. Check out our blog post on The Important R’s for a Solr Disaster Recovery Plan for more details on these important concepts.
Recovery Time Objective, or RTO, is the amount of time that your Solr service can be unavailable in case of an emergency. RTO defines the maximum acceptable time for your application’s Solr requests to fail before recovery to a healthy state occurs.
Recovery Point Objective, or RPO, defines the amount of time which your disaster recovery data can be comparatively stale versus what is stored in the Production deployment at the time of a DR incident. If your business can tolerate losing four hours of recent documents being added/updated and/or removed from Solr, then a 4 Hour RPO would be considered acceptable.
In this example, we will assume a Solr deployment with a warm DR strategy that includes a 24 Hour RPO and a 10 Minute RTO.
To meet these business objectives, we take a backup once every 24 hours and that backup includes all Solr configuration files, Solr collections, aliases, security files and custom JAR files. After a backup is taken successfully, a restoration process would then be initiated in the DR environment. This backup/restore process continues to run day in and day out to meet the 24 Hour RPO.
We use a couple of notification channels for DR events. First, account users who are assigned to the Heartbeat Alerts will be notified by email when Solr nodes become unreachable. Once both Solr nodes are unreachable for 5 minutes, a DR event will be triggered. At that time, a SearchStax support ticket is created and our team will CC your specified users into that ticket to inform them that a DR event has occurred. A bridge meeting link will be included for users to join and work with our team through the incident.
SearchStax DR offerings rely on a vanity DNS, or CName, to route traffic from the primary cluster to the secondary environment or vice versa. When SearchStax DR is fully configured the primary cluster is assigned what’s referred to as a Vanity URL. The DR mechanism will trigger a DNS entry update behind this Vanity URL when a DR event occurs and all requests to the Vanity URL will route to the secondary environment whilst the primary cluster is unhealthy.
A customer application can always connect to the standard Solr URL endpoint, but the Vanity URL needs to be used in order for DR capabilities to be leveraged. During the onboarding process it’s important to update the application to connect to Solr using the Vanity URL prior to going live or testing DR with your SearchStax onboarding team.
Below is an example of how a standard Solr URL looks compared to a Vanity URL
When we conduct a one-time DR test during onboarding, we try to make the test as realistic as possible. As a starting point, we verify which users need to be notified for a live DR event so our Standard Operating Procedure (SOP) can be updated appropriately.
To keep your team informed of disaster recovery events, you should also ensure that all of your critical users are set up to receive Heartbeat Alerts and that we have a complete list of users who need to be notified for DR failure events.
During a DR test, a simulated disaster occurs during which time the team makes the Primary Cluster unavailable. We run through a checklist on the SearchStax side, but for the client it’s good to verify from the application that RTO and RPO are being met.
We use the following checklist to confirm a successful DR test with customers.
Here are the steps we take during a real disaster recovery incident:
Now that you have a clear explanation of the SearchStax DR options and how Disaster Recovery works, you will be able to decide for yourself which option best fits your business case and needs.
The Stack is delivered bi-monthly with industry trends, insights, products and more
Copyrights © SearchStax Inc.2014-2024. All Rights Reserved.
SearchStax Site Search solution is engineered to give marketers the agility they need to optimize site search outcomes. Get full visibility into search analytics and make real-time changes with one click.
close
SearchStax Managed Search service automates, manages and scales hosted Solr infrastructure in public or private clouds. Free up developers for value-added tasks and reduce costs with fewer incidents.
close
close