While indexing, several clients have encountered an error like this one:
2023-08-15 04:03:35.596 ERROR (qtp752316209-22) [c:sitecore_master_index_rebuild s:shard3 r:core_node6 x:sitecore_master_index_rebuild_shard3_replica_n4] o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException: no servers hosting shard: shard2 => org.apache.solr.common.SolrException: no servers hosting shard: shard2
The key is this phrase:
no servers hosting shard: shard#
We have encountered this error in systems where Solr is overloaded (forcing replicas into recovery) and a collection has been misconfigured to use numShards=n in place of replicationFactor=n.
For instance, in this deployment, the sitecore_core_index uses one shard (numShards=1) with three replicas (replicationFactor=3). This puts a full copy of the index on each of the three Solr nodes, resulting in High-Availability and Fault-Tolerance (HAFT). This is highly desirable.
However, the sitecore_core_index_rebuild was created using three shards (numShards=3) and one replica (replicationFactor=1). This put one-third of the index on each of the three nodes. When system stress puts one of these shards into recovery, it takes a third of the content off-line. Solr.log reported the “no servers hosting shard” error.
To remedy this situation, consider these strategies:
- [Required] Recreate the faulty collections using proper replication. See Enable Sitecore’s SwitchOnRebuild with SearchStax Cloud.
- [Optional] Take steps to reduce the stress on the system. See Is 100% CPU a bad thing?
Do not hesitate to contact the SearchStax Support Desk.