What is a collection/core/shard/replica?

Welcome to SearchStax. Sooner or later, everyone is confused by the distinctions among Solr index components. Here is a short glossary to help out.

Note: In Solr terminology, there is a sharp distinction between the logical parts of an index (collections, shards) and the physical manifestations of those parts (cores, replicas). Also, the definitions shift slightly between distributed and non-distributed collections, as noted below.


A SearchStax production deployment is usually a cluster of three nodes coordinated by a Zookeeper ensemble. Zookeeper ensures that changes to config files and to indexes are automatically distributed across the nodes of the cluster.


A single instance of Solr. In SearchStax deployments, one node corresponds to one physical server.


A single logical index in its entirety, regardless of how many nodes it runs on or how many parts (shards) it has. One Solr node can serve multiple collections.


A logical subset of the documents in a collection. A non-distributed collection has all documents in a single default shard. A distributed collection divides the documents among multiple shards, each on its own server.

Best Practice: Use one shard!

Sharding lets you split up a huge index across multiple servers. SearchStax cannot back up a sharded index. Sharding complicates replication, making high-availability/fault-tolerance more difficult.

If your index can fit comfortably on one server, then use one shard.


A physical index on a node. In non-distributed collections, a core is the physical index of a single collection. In a distributed collection, each core includes only part of one index. Since a node can serve more than one collection, it can have more than one core.


A physical index containing the documents of one logical shard. A distributed collection distributes these partial indexes across multiple nodes.

