Data Ingestion for SearchStax Studio

February 22, 2023

Tom Humbarger

|

7 min. read

If you’re considering SearchStax Studio or have recently purchased a site search solution, you’re probably thinking through data ingestion. Data ingestion is often a major challenge in getting site search up and running, and developing a sound approach to data import is critical as it directly impacts the quality and relevance of search results.

In the context of site search, data ingestion is the process of importing and loading data from one or more data sources and making it available in a structured format that can be indexed and searched by a search engine. The data may include website content, product information, user behavior data, documents and more.  Data ingestion also involves repeatedly pulling in data on a real-time or regular batched basis. 

The goal of data ingestion for site search is to ensure that the search engine can quickly and accurately retrieve relevant information for a user’s search query and improve the overall user experience.

Getting Data into SearchStax Studio

With SearchStax Studio, data ingestion means getting data into a Solr-based search index so it can be accessed via a search request on a website or in a custom application. This post looks at the various ways to load data into Studio, identifies the sources and types of data we support and provides recommendations for best practices.

There are three main ways to load data into SearchStax Studio:

  • CMS Connectors for Sitecore and Drupal
  • Ingest APIs
  • SearchStax Crawler

Let’s take a look at each one of these options in more detail. 

CMS Connectors for Sitecore and Drupal

If you use Sitecore or Drupal for your content management system (CMS), SearchStax has integration modules that automate the data indexing process and accelerate the implementation process.

The SearchStax Studio Connector for Sitecore is available for Sitecore versions starting with version 9 through version 10.3. The Connector integrates with the Sitecore Indexing Manager and automatically indexes all Sitecore content items out-of-the-box. Additional information can be found in the Sitecore Connector product documentation.

The SearchStax Studio Connector for Drupal automatically tracks all search results known to the Drupal Search API. Once the Drupal Connector is installed and configured, it automatically indexes any new or updated content in the Drupal environment. The module adds search functionality while requiring virtually no changes to the Drupal website.The Drupal integration was developed by Thomas Seidl (drunken monkey), the creator and maintainer of the Drupal Search API, and follows all Drupal open source code guidelines. Additional information can be found in the Drupal Connector  product documentation or from the Drupal Connector module page at Drupal.org.

SearchStax Ingest APIs

The SearchStax Data Ingest API is a service that allows you to index and search structured data in your SearchStax search service. The API enables you to send data to your search service in real-time, making it immediately searchable by users. Customers can also use the SearchStax Ingest API to load documents into their Studio application. On the Settings page, the Ingest endpoint is the /update endpoint and uses the “Read-Write” Search API credentials.

The Ingest APIs simplify the data ingestion process by enabling a customer or an implementation partner to create a small piece of code to get data from any source and push it into SearchStax Studio. You can index individual JSON documents, multiple JSON documents or a JSON file with an array of JSON objects. You can also index XML documents by sending one or multiple tags. Additional information on using the Ingest APIs can be found in the Studio product documentation.

SearchStax Crawler

SearchStax also has a web crawler that can crawl the data on any website. The crawler lets the website know that it is crawling your website, and then bombards the site with a lot of queries to gather the metadata needed for the Solr index.  To control the SearchStax crawler,  a number of variables are passed so it knows where to start, the types of pages to crawl and what pages to exclude, if any.

A brilliant feature of the SearchStax Crawler is that it can crawl single page applications. For example, if you build a website using Salesforce (like hub.nashville.gov which our Crawler crawls for the City of Nashville), then all of the content you see on the page is loaded dynamically when you click on that page. Our Crawler is smart enough to detect the dynamic pages and crawl the page.

The SearchStax Crawler is available for a separate one-time setup charge and an on-going monthly charge. Limitations of the SearchStax Crawler are that it is limited to 100,000 pages and the data cannot be uploaded to the Solr index until the entire crawl has been completed. Contact SearchStax to learn more about the SearchStax Crawler and pricing.

Sources and Types of Data for SearchStax Studio

The primary use cases for SearchStax Studio involve adding search capabilities to popular content management systems such as Sitecore, Drupal, Acquia and Adobe AEM. Additional sources could include WordPress, Azure Cosmos DB, MySQL, custom apps, websites written in PHP and RSS feeds. If you are working with Sitecore or Drupal, most customers will use the SearchStax Connectors for these CMSes. For other content sources, you will have to select another option to load your data into Solr. As far as content, the following types of data can be brought into Solr and managed by SearchStax Studio: HTML web pages, PDFs, Word documents, Excel spreadsheets, Powerpoint files, text files, rich text format (RTF) and Visio drawing files (VSD).
SearchStax Studio enables marketers and developers to deliver powerful site search at scale. Schedule a product demo with our search experts to get an evaluation of search on your current website, see how search can be improved and discover how analytics and easy-to-use tools can drive the best practices to quickly optimize the search experience.

Data Ingest for Site Search FAQs

What is Site Search?

Site search refers to the feature on a website that allows users to search for specific content or information within that website. It typically involves a search box and search results page, which may display relevant pages, documents, products, or other content based on the user’s search query. Site search can improve user experience by helping visitors find what they’re looking for quickly and efficiently.

SearchStax Studio is our site search solution that makes powerful search easy with best-in-class experience, actionable search insights, self-service marketing tools and quick implementation to accelerate digital transformation projects.

What is Data Ingest for Site Search?

In the context of site search, data ingestion is the process of importing and loading data from one or more data sources and making it available in a structured format that can be indexed and searched by a search engine. The data may include website content, product information, user behavior data, documents and more.

What is the SearchStax Ingest API?

The SearchStax Data Ingest API is a service that allows you to index and search structured data in your SearchStax search service. The API enables you to send data to your search service in real-time, making it immediately searchable by users.

Using the SearchStax Data Ingest API, you can create, update, and delete documents in your search index. You can also configure custom mappings to define how your data should be indexed and searched. The API supports a variety of data formats, including JSON, XML, and CSV, and you can choose to send data to your search service in batches or individually.

By using the SearchStax Data Ingest API, you can ensure that your search service is always up-to-date with the latest data from your application. This can help to improve the relevance of search results and provide a better search experience for your users.

What is the SearchStax Drupal Connector?

SearchStax Drupal Connector is a module for the Drupal content management system that allows you to integrate your website’s search functionality with the SearchStax search engine. The SearchStax Drupal module allows you to easily configure and customize the search experience for your website users. You can use the module to create custom search forms, configure search settings, and manage search results. The module provides several advanced features such as faceted search, autocomplete, and spelling suggestions.

What is the SearchStax Sitecore Connector?

The SearchStax Sitecore Connector is a Sitecore module that Sitecore developers can install to leverage all the search capabilities offered by SearchStax Studio for customer-facing search pages. The Connector contains a Sitecore index connector which can index your Sitecore items using the out-of-the-box Indexing Manager provided by Sitecore. The SearchStax Sitecore Conector is easy to install and integrate into a Sitecore solution, and provides a user-friendly interface for configuring search options and managing search indexes. It also supports multilingual content for websites that serve a global audience.

What is the SearchStax Crawler?

The SearchStax Crawler is a web crawling tool designed to help developers and website owners index and search the content of their websites or web applications. The crawler scans through the pages of a website, extracts the content and metadata, and makes it searchable using the Solr search engine. The crawler is designed to be flexible and customizable, allowing users to configure the crawl settings to meet their specific needs. It supports a variety of file types, including HTML, PDF, and Microsoft Office documents, and can be used to index content from websites, intranets, and e-commerce sites.

The SearchStax Crawler is available for a separate one-time setup charge and an on-going monthly charge. Limitations of the SearchStax Crawler are that it is limited to 100,000 pages and the data cannot be uploaded to the Solr index until the entire crawl has been completed. Contact SearchStax to learn more about the SearchStax Crawler and pricing.

By Tom Humbarger

Senior Product Marketing Manager

“Data ingestion is often a major challenge in getting site search up and running, and developing a sound approach to data import is critical...”

Get the Latest Content First