December 14, 2023

Kevin Montgomery

|

10 min. read

How to Use SearchStax APIs with WordPress

Site search is a core feature for most websites. Search helps your visitors find content throughout your site without having to click through navigation, menus, and links – users can just type in what they’re interested in and select the best result for their needs.

Many content management systems include or support basic site search – but these simple search features can be difficult to scale, lack customization, and aren’t easy to optimize for your site design and visitor needs.

SearchStax Managed Search and SearchStax Site Search both provide the search engine and infrastructure for site search without having to manage servers, scale infrastructure, or deal with software and security upgrades. SearchStax makes it easier to add powerful customizable site search to any website or CMS with our easy-to-use search APIs.

We’ll show you how to set up SearchStax Managed Search or Site Search and add site search using WordPress as an example.  Here’s the github repo for the SearchStax WordPress module that this post is based on. You can clone or download this WordPress module and add it to your WordPress site to use SearchStax. Many of these API calls and design patterns are applicable for other content management systems and backend languages.

Here’s what’s inside:

  • Introduction to SearchStax Managed Search & Site Search
  • Authorization and Connecting to SearchStax APIs
  • Configuring Search Fields and Schema
  • Preparing to Index Content
  • Batch Indexing
  • Updating Items in the Index
  • Removing Items from the Search Index

What are SearchStax Managed Search and Site Search?

SearchStax Managed Search and SearchStax Site Search are managed search solutions powered by Solr. SearchStax manages the infrastructure, scaling, and deployment of the Solr instances and provides APIs and easy-to-use dashboards for managing your search indexes.
SearchStax Managed Search and Site Search search indexes can be queried and managed with JSON-backed APIs – so it’s easy to set up search fields, index documents, get search results, and more via the APIs.

Connecting to SearchStax APIs

SearchStax provides API endpoints for read-only operations (such as querying search results) and administrative read/write operations such as indexing new content, deleting items from the index, and updating schema. API requests will need to include the authorization token for all read and write/update operations.

You should protect your request tokens and avoid exposing the read/write token publicly. Read-only tokens should also be protected from public exposure so that bad actors don’t abuse or overuse the limited monthly API requests.

In this example we’ll be making SearchStax API calls from the WordPress backend – this ensures that search requests and updates can only come from the actions on the WordPress front end (either site administrators or visitors). This helps protect the API endpoints and tokens from leaking publicly.

Many SearchStax API requests can be made by passing data in URL parameters with a GET request to the API endpoint and including the authorization token in the request header.

				
					curl "https://<api-endpoint>.searchstax.com/<_id_>/<_indexName_>/select?q=*:*&wt=json" \
-H "Authorization: Token <_readToken_>"
				
			

Others may require passing a JSON object in a POST request body to change or update the index.

				
					curl -X POST "https://<api-endpoint>.searchstax.com/<_id_>/<_indexName_>/select?q=*:*&wt=json" \
-H "Authorization: Token <_readToken_> \
,Content-type: application/json"
-d "{
  "add-field": {
    "name": "title",
    "type": "string",
    "indexed": true,
    "required": false,
    "stored": true,
    "multiValued": false
  }
}"
				
			

Integration - Account Details and Connecting to Your SearchStax Instance

The first step toward indexing site content is storing API and token details in order to start making requests. In our WordPress example we’re capturing the API endpoint URLs and access tokens with an administrator options page. Other options include storing URLs and access tokens in environmental variables and including them in the deploy process so that your website or CMS can securely access them to make API requests.

Once the admin has provided the relevant credentials you can make a request to the SELECT endpoint and you should get the following response if your account and Solr index is set up properly.

				
					curl "https://<api-endpoint>.searchstax.com/<_id_>/<_indexName_>/select?q=*:*&wt=json" \
-H "Authorization: Token <_readToken_>"

// response
{
  status: 0,
  QTime: 13
}
				
			

A response that includes `status: 0` means the request was successful.

Integration - Configuring the Search Index

Now that we’re able to connect to the SearchStax API we can begin setting up the search index for our site and the content that we’ll be indexing.

We’re going to use the UPDATE API to set up the search fields and schema. We’ll be using the post type, category, and tag fields that are included in WordPress for our facets. We’ll also define a couple of other standard WordPress post fields.

This example adds a tags field and sets the type to ‘string’ and multiValued to ‘true’. We’ll be able to use this field later for faceting so that we can return the relevant tags for each post that’s included in search results.

				
					curl -X POST "https://<api-endpoint>.searchstax.com/<_id_>/<_indexName_>/select?q=*:*&wt=json" \
-H "Authorization: Token <_readToken_> \
,Content-type: application/json"
-d "{
  "add-field": {
    "name": "tags",
    "type": "string",
    "indexed": true,
    "required": false,
    "stored": true,
    "multiValued": true
  },
  //...additional fields...,
}"
				
			

Ideally you shouldn’t need to change your search schema after it’s been initially set up – but you will need to update your search schema if you’re adding new fields or facets to your site so that they can be indexed properly.

Integration - Preparing Content

Now that we’ve connected to the SearchStax instance and set up fields and their schema we can start preparing site content to push to the search index.

SearchStax search indexes use a unique ID field to track every document in the search index. We’ll need to create a document ID naming convention to ensure indexed content is unique and that it matches the content stored in the CMS or database. In many cases we can reuse the internal unique identifiers that your CMS is already using to track pages.

In our WordPress example we can create a function that will build request JSON for a post with the appropriate fields that match the search schema. The unique document IDs use the post-type prefix followed by the post ID (WordPress tracks pages and posts separately so if we just used post ID by itself we might overwrite pages or posts that have the same ID).

This function accepts a WordPress post object and the Solr document ID that we’re using to identify the post. We’re going to limit the body field to 100,000 characters for each post and the summary field will use the first 300 characters of the body content if a post summary is not available.

				
					public function post_to_solr_json( $post, $solr_id ) {
   	 $max_doc_size = 100000;
	 $max_summary_length = 300;

   	 $post_categories = wp_get_post_categories($post->ID);
   	 $categories = array();
   	 foreach ( $post_categories as $this_category) {
   		 $category = get_category($this_category);
   		 $categories[] = $category->name;
   	 }

   	 $post_tags = wp_get_post_tags($post->ID);
   	 $tags = array();
   	 foreach ( $post_tags as $this_tag) {
   		 $tags[] = $this_tag->name;
   	 }

   	 $solrDoc = array();
   	 
   	 $solrDoc['id'] = $solr_id;
   	 $solrDoc['title'] = $post->post_title;
   	 if( $post->post_excerpt != '' ) {
   		 $solrDoc['summary'] = $post->post_excerpt;
   	 }
   	 else {
   		 $solrDoc['summary'] = substr( wp_strip_all_tags( $post->post_content, true ), 0, $max_summary_length );
   	 }
   	 $solrDoc['body'] = substr( $post->post_content, 0, $max_doc_size );
   	 $solrDoc['thumbnail'] = wp_get_attachment_url( get_post_thumbnail_id($post->ID), 'thumbnail' );
   	 $solrDoc['guid'] = $post->guid;
   	 $solrDoc['url'] = get_permalink($post);
   	 $solrDoc['post_date'] = $post->post_date;
   	 $solrDoc['post_type'] = $post->post_type;
   	 $solrDoc['post_author'] = $post->post_author;
   	 $solrDoc['categories'] = $categories;
   	 $solrDoc['tags'] = $tags;

   	 return $solrDoc;
}

				
			
We can call this function when indexing content in batches or when individual pages are added or updated. Having a common handler will ensure that all site content will be indexed with the appropriate fields and can be searched correctly.

Integration - Indexing Content

Search indexes can be updated in batches or individually as needed. When we’re first adding site search to an existing site we’ll want to index all available content to get started. After that we can make updates to individual documents in the search index as needed when page content changes.

SearchStax Managed Search and Site Search have different request and API limits depending on your subscription – including maximum request size, maximum requests per minute, and maximum total number of documents in the index. You’ll need to configure your batching operations to ensure you don’t push too many documents into your index and that your request sizes are under the limits.

For our WordPress plugin we’ll need to get all posts that have a post_status of ‘publish’. We’ll get the Solr JSON for each post using the function we wrote before and then start submitting these documents to the search index in batches.

In this example we’re submitting 20 posts in each batch and waiting 250 milliseconds between each request with a maximum processing time of 30 seconds. We’re checking the API response code for each batch and keeping track of total request time so we can catch any errors or avoid going over API limitations.

				
					public function index_content() {
    $return = array();
    $return['status'] = 'none';
    $return['data'] = array();
    $post_batch = array();

    $posts = get_posts([
      'post_status' => 'publish',
      'numberposts' => -1
    ]);

    foreach ( $posts as $post ) {
   	 $post_batch[] = $this->post_to_solr_json($post, $post->post_type . '_' . $post->ID);
    }

    $batch_size = 20;
    $batches = ceil(count($post_batch) / $batch_size);
    $timeout = time() + 30;
    $delay = 250;
    $data = array();

    for ( $i = 0; $i < $batches; $i++ ) {
   	 $status = $this->push_to_solr( array_slice( $post_batch, $i * $batch_size, $batch_size ) );

   	 if ( $status['status'] == 'success' ) {
   		 $data = array_merge( $data, $status['data'] );
   	 }
   	 else {
   		 $return['status'] = $status['status'];
   		 break;
   	 }

   	 if ( time() > $timeout ) {
   		 $return['status'] = 'timeout';
   		 break;
   	 }
   	 set_time_limit(20);
   	 usleep( $delay );
    }

    if ( $return['status'] == 'none' ) {
   	 $return['status'] = 'success';
   	 $return['data']['posts'] = 'Successfully indexed ' . count( $data ) . ' items';
    }
    return json_encode( $return );
}

				
			
If no errors were returned from the SearchStax API and the index function didn’t timeout then the batch update was a success. We can update our `return` object and include the count of documents that were added during the batch process.

Plugin Configuration - Updating Content

Once we’ve submitted a document to the search index it can easily be updated when the original post has been changed or modified. Just push an update with a single document object including the ID and other updated fields.
				
					curl -X POST "https://<api-endpoint>.searchstax.com/<_id_>/<_indexName_>/select?q=*:*&wt=json" \
-H "Authorization: Token <_readToken_> \
,Content-type: application/json"
-d "[
  {
    "id": "post_123",
    "title": "Post Title",
    ...
  }
]"
				
			

Removing Content

As sites are updated and pages change we’ll likely need to remove content from the Solr index. This can be done on a per-document basis or we can empty the entire index if needed to start over.

Deleting an Individual Document

We can remove a document by submitting a delete request to the UPDATE API endpoint and include the document ID in the request body.

				
					curl -X POST "https://<api-endpoint>.searchstax.com/<_id_>/<_indexName_>/update" \
-H "Authorization: Token <_readToken_> \
,Content-type: application/json"
-d "{"delete":"post_id_123"}"
				
			

Deleting Everything in the Search Index

It may be necessary to delete everything in the search index and start over if we’ve made major changes to the site, don’t want to delete individual documents, or have made changes to the schema.

				
					curl -X POST "https://<api-endpoint>.searchstax.com/<_id_>/<_indexName_>/update" \
-H "Authorization: Token <_readToken_> \
,Content-type: application/json"
-d "{"delete": {"query": "*:*"}}"
				
			

This request deletes anything that matches the wildcard query (every indexed document) and is not reversible.

Querying and Getting Search Results

After content has been successfully indexed we can start querying and building search results. Typically searches require a search term and the field(s) that should be queried for the given term. Some advanced search features include getting search facets, adjusting search relevancy, and sorting results by a given field or sorting method.

Basic Search Request

Search requests can be made with a basic GET request to the SELECT endpoint using the read-only token. The following request will return any documents that contain the keyword search_term.
				
					curl "https://<api-endpoint>.searchstax.com/<_id_>/<_indexName_>/select?q=body:*search_term*&wt=json" \
-H "Authorization: Token <_readToken_>"
				
			
This will return any matching documents from the index. Search responses are paginated and will only return a maximum number of rows (10 by default and up to 100 if we include the `rows` parameter). The response will also include the total number of matching documents so we can make additional requests for the additional rows to get all search results.

Advanced Search with Facets

We can refine search results by including facets, sorting, and other advanced search parameters (line breaks have been added to the URL for readability).
				
					curl "https://<api-endpoint>.searchstax.com/<_id_>/<_indexName_>/select
?q=body:*search_term*
&rows=50
&facet=true
&facet.field=tags
&f.tags.facet.sort=index
&facet.mincount=1
&wt=json
" \
-H "Authorization: Token <_readToken_>"
				
			
If you do plan on using facets or sorting you’ll need to make sure the search fields are properly defined in your search schema. The example provided above will need the tags fields defined in schema in order to return the appropriate facets. We’re also sorting the tags facets and only returning tag facets if they’re assigned to documents in the search results.

Next Steps

These examples should be a good starting point to work with the SearchStax APIs and create your own implementation in your content management system or website. SearchStax accounts can be scaled with more indexes, larger request sizes, and document limits so that you can continue providing an excellent site search experience even as your site grows.
Can I use SearchStax with WordPress?

Yes it’s possible to use SearchStax Managed Search and SearchStax Site Search with WordPress when using our REST APIs.

By Kevin Montgomery

Product Marketing Engineer

These examples are a good starting point to work with the SearchStax APIs and create your own implementation in your CMS.

Get the Latest Content First