Disaster Recovery Options for Solr Deployments

Disaster Recovery Options for Solr Deployments

Why Do We Need Disaster Recovery?

Machines and technology fail. Natural or man-made disasters occur. Networks go down. And website or application outages caused by any of these situations are going to happen. On the other hand, today’s customers have high expectations that any website they access will be available at all times and they have minimal tolerance for any inconveniences due to system outages.

Instead of hoping for good luck or that disasters will bypass them, most companies meet their business requirement to minimize customer disruptions by deploying disaster recovery or DR options to deliver high levels of uninterrupted uptime. The SearchStax options for disaster recovery for Solr-as-a-Service deployments are an insurance policy against the unexpected and provide the highest level of customer service while reducing risk and removing uncertainty while determining the amount of downtime your business will sustain.

Why are RTO and RPO Important in Disaster Recovery?

As noted in an earlier SearchStax blog post The Important Rs for Your Solr Disaster Recovery Plan, there are two key terms with respect to disaster recovery:

  • Recovery Time Objective or RTO is the amount of time that your Solr service can be unavailable in case of an emergency
  • Recovery Point Objective or RPO defines the amount of data that your business can tolerate losing in case of an emergency.

RPO determines how frequently you need to back up your data and/or synchronize it across your infrastructure. Assuming your RPO is two hours, you would need to make sure you have a backup within the last two hours available for your business. 

RTO determines how quickly you can recreate your infrastructure and recover the data from your backup.

Disaster Recovery Options for Solr Deployments

SearchStax offers three disaster recovery options for Solr deployments to meet a range of business requirements:

  • Hot Disaster Recovery
  • Warm Disaster Recovery
  • Cold Disaster Recovery

The differences are based on how quickly it takes to restore your service and how much data loss your business is willing to tolerate in the event of a disaster. 

Under all Disaster Recovery options, SearchStax takes on the responsibility of restoring your service to full operation following a disaster event. Disaster recovery efforts will be initiated upon customer request or when the managed Solr service is unavailable for five minutes.

Hot Disaster Recovery for Solr Deployments

For businesses with aggressive RTO and RPO requirements, a duplicate infrastructure that mirrors your primary production environment in real-time may be needed. SearchStax offers a Hot Disaster Recovery option that augments your production deployment and provides the highest level of redundancy and resource capacity during a disaster. 

To achieve this level of redundancy, we create a “hot” standby or secondary deployment in a different region than your production deployment. This secondary deployment is a full replica of your primary production environment and is kept in sync with your production system at all times. That means that an interruption in service at the primary site for the production system can be met by a near instantaneous switchover to the standby site. 

To ensure that no disaster can knock out the primary and backup systems, we architect the system to have the standby system run in a different cloud-provider region.

Fully Scaled Cross Region Failover and Disaster Recovery

Fully Scaled Cross Region Failover and Disaster Recovery

The Hot Disaster Recovery option has a service level agreement (SLA) to restore your site within 10 minutes with full functionality, but the real recovery will likely be less than 10 seconds after a failure of the primary environment.

Warm Disaster Recovery for Solr Deployments

The Warm Disaster Recovery is similar to the Hot Disaster Recovery option except the secondary deployment is not as redundant or resilient as the production system. For the Warm standby site, we create a scaled-down or skinny version of the main deployment in the secondary region. While this option provides similar benefits as the Hot option, it does so at a lower cost.

The difference is that the standby site will not have the same redundancies and capacity as the production site and new nodes will need to be added to replicate the capabilities of the production site on-demand. While the backup system will be available almost immediately, the new clusters need to be added to restore full redundancy.

Solr Warm Disaster Recovery

Solr Warm Disaster Recovery

The Warm Disaster Recovery option has a service level agreement (SLA) to restore your site within 10 minutes and will provide full replication functionality in under 4 hours.

Cold Disaster Recovery for Solr Deployments

For some businesses, downtime is not a critical requirement. Under the Cold Disaster Recovery option, the restoration process starts after the disaster occurs. A new Solr cluster is created and the data and configurations are then restored from a backup file. 

The SearchStax Cold Disaster Recovery handles this process for you and uses backup files that are stored in different cloud regions and provides a higher level of recovery than backups stored in the same region. While everyone should maintain regular backups of their deployments, the Cold Disaster Recovery option takes this to the next level by storing the backups in a different cloud region from the primary system. If the production site goes down, we will start a deployment in the same region as the backup and restore it. 

Solr Cold Disaster Recovery

Solr Cold Disaster Recovery

The Cold option has a service level agreement to restore your site within 8 hours, but is generally available within 2 to 4 hours.

What are the Details Behind Each of the Disaster Recovery Options?

The  details for the disaster recovery options offered through SearchStax are summarized in the table below. 

Hot Disaster
Recovery
Warm Disaster Recovery Cold Disaster Recovery
Secondary Environment Active full-size Solr deployment in another region Scaled-down Solr deployment in another region Created on-demand
DR Trigger Upon customer request or when Solr service is unavailable for 5 minutes Upon customer request or when Solr service is unavailable for 5 minutes Created on-demand
Upon customer request or when Solr service is unavailable for 5 minutes
DR Recovery Process An outage will automatically trigger a DNS failover Trigger a DNS failover
Scale up backup system
Deploy new nodes
Restore backup
Customer validates new environment
RPO Service Level (SLA) As low as instantaneous As low as 3 hours As low as 1 hour
RTO Service Level (SLA) Less than 10 minutes (typically 5 to 10 seconds after detection) 4 hours
(typically 1 to 2 hours after detection)
8 hours
(typically 2 to 4 hours)

Disaster Recovery with CDCR

SearchStax also a Solr Disaster recovery option using Cross Data Center Replication or CDCR which can provide nearly instantaneous synchronization of your data and reduce RPO to minutes. See the Disaster Recovery with CDCR Provides Near Real-Time RPO blog post with additional details with the benefits, use cases and limitations.


Now that you have a clear explanation of the SearchStax options for Disaster Recovery for Solr deployments, you will be able to decide for yourself which option best fits your business case and needs.


TALK TO A SEARCHSTAX EXPERT