SearchStax Cloud Help Center

The SearchStax Help Center Frequently Asked Questions page includes FAQs about SearchStax Cloud, our hosted Apache Solr Cloud service.


Taming Commits

Solr index “commit” events are a necessary evil that often get out of hand. A commit involves writing a segment file to the disk (which can trigger segment sorting and merging). The new segment is copied across the network to replicas on other nodes, with further disk, network, and merging activity.

It is usually adequate to commit every five minutes, yet we often see Solr struggling to perform multiple commits per second! In that situation, CPU, JVM, and System Load max out and replicas go into recovery.

How does this happen? Solr can be overwhelmed by commit=true and commitWithin=1000 params attached to /update requests. Solr tries to execute all of these individual requests, no matter how many arrive per minute. This makes it difficult for the Solr manager to create an effective commit strategy.

Here’s a passage from Solr documentation:

In most cases, when running in SolrCloud mode, indexing client applications should not send explicit commit requests. Rather, you should configure auto commits with openSearcher=false and autoSoftCommit to make recent updates visible in search requests. This ensures that auto commits occur on a regular schedule in the cluster.

We recommend changing your application to remove commit=true and commitWithin=1000 from your /update messages. You don’t need them.

AutoCommit and AutoSoftCommit

Instead, you can rely on the autoCommit and autoSoftCommit settings in your solrconfig.xml file. By default, autoCommit performs a commit every minute, which writes new index entries to disk. AutoSoftCommit builds a new searcher every fifteen seconds, making new index records searchable.

Unfortunately, some updates are very large, and some collections can contain millions of records. We find that the default settings for autoCommit and autoSoftCommit are too low for many production systems. If your system is struggling to keep up, we recommend setting autoCommit to five minutes, and autoSoftCommit to two minutes.

Consider this Pulse image from an actual production system. The arrows indicate the point where the client switched from the default settings to our recommended settings. You can see the impact on system performance.

Reducing the commit frequency speeds up data ingestion. Step-by-step instructions follow on this page.

Notes on CommitWithin and AutoSoftCommit

As a matter of interest, note that commitWithin=120000 is about the same thing as autoSoftCommit=120000. It is low values of commitWithin that cause system overload. High values are not harmful.

But note this occasional odd behavior (from the Solr documentation):

Using autoSoftCommit or commitWithin requires the client app to embrace the realities of “eventual consistency”. Solr will make documents searchable at roughly the same time across replicas of a collection but there are no hard guarantees. Consequently, in rare cases, it’s possible for a document to show up in one search only for it not to appear in a subsequent search occurring immediately after the first search when the second search is routed to a different replica.

The results come back into agreement quickly, but you might observe the illusion that new documents appear in some searches but not others, or that new documents don’t appear at all. The missing documents will become searchable after the following hard or soft commit.

IgnoreCommitOptimizeUpdateProcessorFactory

There are applications where you have no control over commit and commitWithin params on /update requests. What do you do then?

Fortunately, Solr engineers have provided the IgnoreCommitOptimizeUpdateProcessorFactory to fix this problem. This processor lets Solr ignore commit and optimize demands that are attached to /update requests. The payload of the update is processed normally, but commit frequency conforms to the local autoCommit and autoSoftCommit settings.

In many cases, this dramatically improves indexing performance.

Context of the Modification

One can bring the Solr commit behavior under control by making a few edits to the solrconfig.xml file.

The following instructions show how to reset autoCommit and autoSoftCommit to more appropriate non-default settings. If desired, you can continue by integrating IgnoreCommitOptimizeUpdateProcessorFactory into your /update workflow.

Note that these modifications to solrconfig.xml have no impact on your index. You can try them out for a few minutes, and remove them again a few minutes later without side-effects.

In general, these modifications presume the following context:

  1. Obtain the deployment’s solrconfig.xml file.
    1. You can find a default file in the \searchstax-client-master\solr-n\configsets_default\conf directory.
    2. You can download the collection’s current configset using zkcli.
  2. Manually edit the file as described below.
  3. Upload the file to the deployment (as described in the previous link).
  4. Perform a rolling restart of the Solr nodes.

Step-by-Step Edits to Solrconfig.xml

The following images of changes to solrconfig.xml include line numbers at the left to help you find the right area of the file. Your line numbers won’t be quite the same as ours.

AutoCommit and AutoSoftCommit

Begin by increasing the autoCommit setting to 5 minutes.

The actual setting is in milliseconds, so five minutes is 5 x 60 x 1000 = 300000.

Next, scroll down a few lines and set autoSoftCommit to two minutes. A “soft” commit creates a new searcher, but does not write a segment to disk. This keeps your search results fresh while reducing the expense of hard commits.

2 x 60 x 1000 = 120000.

IgnoreCommitOptimizeUpdateProcessorFactory

If your search application forces commit and commitWithin params into your /update messages, you can try the IgnoreCommitOptimizeUpdateProcessorFactory update stage.

The next step is to add a new stage to the UpdateRequestProcessor workflow. Scroll down some more and insert this line:

Now we have to define the new stage. Scroll down to the Update Processors section of the file. Add this entire element:

Here it is again so you can copy/paste:

  <updateRequestProcessorChain name="ignore-commit-from-client" default="true">
    <processor class="solr.IgnoreCommitOptimizeUpdateProcessorFactory">
      <int name="statusCode">200</int>
    </processor>
    <processor class="solr.LogUpdateProcessorFactory" />
    <processor class="solr.DistributedUpdateProcessorFactory" />
    <processor class="solr.RunUpdateProcessorFactory" />
  </updateRequestProcessorChain>

That’s the final change. Save the file. Upload it to the appropriate configset. Restart the Solr nodes. Watch your Pulse performance graphs.

Questions?

Do not hesitate to contact the SearchStax Support Desk.


Return to Frequently Asked Questions.