SearchStax named one of the “Top 20 Open Source Software Solutions for 2017”. Read more here.
Photo by Kelli McClintock on Unsplash

It’s common for tech teams today to have their Apache Solr clusters and applications hosted on the Cloud. In the case of a natural or man-made disaster, businesses need to recover their search capabilities quickly and to get their operations live again. This is the first in a series of articles about Disaster Recovery for your applications that rely on Apache Solr.

There are two Rs that guide what type of Disaster Recovery plan is best suited for your business needs. These are the Recovery Point Objective (RPO) and the Recovery Time Objective (RTO). Although they sound very similar, they are two completely different aspects of a Disaster Recovery plan. These define the backup strategy that your enterprise would need, and how often data would need to be synchronized between your Solr clusters that are geographically apart. They have an impact on any downtime that your business can have.

What is RTO?

Recovery Time Objective, or RTO, is the amount of time that your Solr service can be unavailable in case of an emergency. This could be caused by a corrupt Solr Index or Solr replicas going into recovery mode. A cloud provider region could go down because of a natural disaster or a cyber attack. In any of these situations, your business might have to face some downtime. RTO defines the maximum acceptable time for your application to recover when such an event occurs.

What is RPO?

Recovery Point Objective, or RPO, defines the amount of data that your business can tolerate losing in case of an emergency. For example, if your business can tolerate losing four hours of history in order to get back on line quickly, then your RPO would be four hours.

How do RPO and RTO work together?

RPO determines how frequently you need to back up your data and/or synchronize it across your infrastructure. Assuming your RPO is two hours, you would need to make sure you have a backup within the last two hours available for your business. RTO would determine, given the backup, how quickly you can you recreate your infrastructure and recover the data from your backup.

For some businesses, downtime is not critical. The RTO and RPO can be expressed in hours, and a traditional backup and restore capability is all you need. These can be well-served by a “cold” DR plan, where one starts a new Solr cluster after a disaster and restore the data and configurations from a backup.

Others businesses have aggressive RPO / RTO requirements. These businesses need a duplicate infrastructure that mirrors the primary datacenter in real-time. If the primary goes down for any reason, the secondary assumes the load while repairs are made. This is a “hot” DR plan. This can be achieved using Solr’s Cross Data Center Region Replication (CDCR) feature for Solr 6.6.x and above.

In general, the more aggressive your RPO and RTO goals, the more cost you will incur to implement them.

RTO and RPO requirements are driven by your business’s needs and define the Disaster Recovery Plan that is best suited for you.

The next installment in the series will go in detail on how to set up Disaster Recovery for Solr, looking more deeply at cold, warm and hot DR.  Stay tuned…