PODCASTS

Data Ingestion: Unlocking the Power of CMS Connectors

In this episode, Pete Navarra, VP of DXP Solutions at SearchStax, and Karan Jeet Singh, Sr. Solutions Architect, sit down to talk about how Sitecore and Drupal CMS connectors make data ingestion with SearchStax Studio simple.

Welcome back to another episode of the SearchStax podcast. In today’s installment, we delve into the world of CMS connectors, with a special focus on Drupal’s and Sitecore’s integration with SearchStax Studio. Join us as we uncover the significance of these connectors, explore the customization options they offer, and share real-world insights on how they enhance the digital experience. Whether you’re new to data ingest or looking to optimize your CMS integration, this episode has you covered.

Jae  

We are back today to record another episode of the SearchStax podcast. This episode is continuing on the data ingest theme. We’re talking specifically about CMS connectors, and specifically Drupal and Sitecore. So, Pete, and Karan, thanks again for joining us – really excited to have you guys. Let’s just dive right in, let’s first talk about these connectors. What are they? How can I get access to them? What exactly do they do? Why do we need them? Can we just maybe start off at a high level with that – Karan, we’ll start off with you.

Karan  

So connectors are, as the word suggests, they are there to help you connect to us. So SearchStax Studio provides an index and a search service, but you need to send your data to it. Because an index can only be created on data. And that’s where the connectors come in. So if you’re using a CMS like Sitecore, or Drupal, then you can utilize our connectors to connect to Studio and use those to send data to Studio. And then also those connectors allow you to  get data back from Studio, and then use it however you want to use them. And you can utilize the APIs directly and get the data from there. Or you can build your own layer using the connectors and use the data coming in from Studio to surface up on your website. So that’s where connectors come in to help communicate with SearchStax Studio.

Jae  

Awesome. That’s very helpful. I think a lot of people who are new to this may be doing data ingest for the first time. And so that’s super helpful context. And maybe the other thing I’m curious about here is – can you comment on – do they just work out of the box? How often is customization needed? When would such customization options actually come into play? Pete, let’s start with you.

Pete  

Yeah, in terms of what works out of the box, obviously, you can download our connectors right off of our website, depending on the platform that you’re using. For instance, Sitecore, we also have some Docker images where you can get access to our modules, specifically for your Docker images as well. But in terms of the actual setup, I mean, out of the box, there’s always going to be some configuration. Most of the time, you’re going to have maybe some different fields that you want to have indexed. Especially on the front end side, if you’re using our accelerator straight as it is, then generally speaking, our modules can work with the accelerator almost out of the box. But there’s always going to be some customizations that you’re going to want to do simply because not everybody’s search requirements are going to be exactly the same. So I would say there’s gonna be some customization directly out of the box.

Jae  

Are there more common ones that are like you mentioned – and there’s always going to be something – but are there kind of the most common things that people run into in the most common use cases that we see? Karan, do you have any input?

Karan  

Yeah. Like, for example, in Sitecore, you might be using a computed field to show something on your website. So you will have to configure that in the connector. And in Drupal, you might have a bunch of different fields from where you are pulling in the data, and then building a webpage. So you will have to configure your connector to index all those fields. So that kind of tiny customization or configuration is required whenever you are installing and using the connector.

Jae  

Are there any examples of in sort of a vertical sense, are there any sort of common insights that you can provide for people who are listening? Just related to, again, maybe in health care or in higher ed – do you see any kind of common configuration types?

Karan  

So what I’ve seen in higher education, is people use computed fields to create facets. So you can create a facet out of what kind of course it is. And then also a facet out of what’s the duration of the course, what’s the location of the course. And you might be pulling in data from different sources to build that facet out. So you can configure all of that so that all those fields do get indexed whenever you are running the connector. And the same goes for Drupal. The same case is in Drupal as well, where you need to select the fields properly to make sure that all the data gets indexed. And you select all the fields that you need to create all those facets, and this stands true for healthcare or higher ed or any industry, where if you want to build industry specific facets, then you need to configure those in the connector. You need to configure the fields that should be picked up. And you need to configure how the fields should be indexed when creating those facets. 

For example, in Drupal, I remember we ran into this issue with one of our customers, where they were pulling in data for what kind of course it is. And they were categorizing things as undergraduate, graduate, post graduate, you know, these were the three categorizations that they were pulling in. And they had selected the correct field, and the data was being pulled in. But on the website, whenever they clicked on undergraduate or graduate, it was working fine. But as soon as they clicked on post graduate,everything went away. When we looked into it, we realized that they had selected the incorrect field type. So you might run into those kinds of issues where you need to not only select the proper fields, but also select the proper field types to make sure that they work properly.

Jae  

Awesome. Let’s switch gears a little bit, then, into the actual operational side of things. So how long does it take, typically, to actually set things up? Set up the connector? Maybe can you walk us through what that looks like in some level of detail, so we can get a sense of the key steps that are needed. Let’s start with Sitecore.

Pete  

In terms of setting up the connector for Sitecore, and we’re gonna take it from an easy perspective. I’m not gonna go into all the Docker setup, but just from an easy perspective. Step 1: You download the connector right off our website. It is set up as a Sitecore package. Any Sitecore developer out there is going to understand what a Sitecore package is. It’s going to be in the format of a zip file, and you go through Sitecore’s GUI processes for installing the package onto the content management server. This is going to install our module into Sitecore. And what that really means is it’s creating some configuration files, it’s importing some DLL files onto the server. And then, when the configuration is done, it’s actually going to provide you an action button, where you can go to our SearchStax configuration page right inside the sidebar, and actually start to configure your Studio index in application to connect it to Sitecore. We do happen to utilize Sitecore’s native indexing engine – if anybody is familiar with how Sitecore works with Solr, there’s a series of indexes that Sitecore uses, like remaster index, the web index, report index, the list can go on. Our particular Sitecore module, or connector, if you will, actually embeds itself as an index that Sitecore is aware of. So you can do all the same functions, you can populate the schema, you can index just like you would any other index. And so what’s special about this is that while it uses a lot of the default properties that Sitecore has for creating an index, it also provides you an area where you customize and configure computed fields and other aspects that might not necessarily be currently in your Solr configurations. And so once you go through the process of  setting up the connection between the Studio app and Sitecore – and we have a graphical user interface for that – then the index will come alive. From there, as we outline on one of our documentation pages on SearchStax.com, you go through and  populate your schema, that’s going to set up all of the fields that Sitecore is aware of, and tell Solr, “Hey, these are all of the fields that we have in our system.” This allows the user to go into their Studio dashboard, go into the settings for the app, click the reload schema. And that’s going to provide all the fields that are available inside of the Studio dashboard that Sitecore is aware of. But then that gives you the opportunity to set your result configurations to select different fields, or set up your relevance modeling. So any other thing that might need references to fields automatically through the dashboard, they automatically show up once you hit that. Past that, it’s really just a matter of indexing the content, so that first results start to populate into the index. You can do that through Sitecore’s indexing manager – you go through, you select the Studio index. And before you know it, you actually have the content in your Studio index. That’s the backside – that is really all about ingesting content  from Sitecore into the Studio app. 

The next piece of that really has to do with the front end aspect of getting the search results. So if you’re using our accelerator as an example, we have packaged  into our Studio connector – we have an MVC component or rendering. Any Sitecore developer who understands the backend development of Sitecore understands what an MVC rendering is, it’s basically going to be a display rendering that you can add into your template display properties and bubble up a view that’s going to have our accelerator on it. As long as that component is on the page, and has one of the other styling components that we also package, those two options will automatically provide you with a search result experience on any page that you provide, as long as like the width of the container that are the placeholders that you’re putting it in is accurate. So again, pretty easy setup. If you need a more custom approach, where maybe the accelerator is not the right view for you, then at that point, you’re kind of building your own search interface. And you can use our APIs or the Studio API to actually manage all that.

Jae  

Terrific. Thanks for walking us through that. hat was really helpful. Karan, do you have any examples of customers who have provided feedback on the process itself? Like how easy it is, or just general real world feedback or input.

Karan  

Yeah, I have worked with a lot of Sitecore customers. And the process for them has been pretty straightforward. Because any Sitecore developer who takes a look at the module, they instinctively understand what needs to be done. Most of the Sitecore users are aware of all of these things. So when they look at the package, they understand what needs to be done. And once they look at our documentation, then it’s pretty much clear to them. And after that, they can just run with it. They do sometimes come back to us asking us, “okay, well, I’m used to defining my fields here. How do I define my fields now?” So we just point them to our documentation again, or we just tell them, “Hey, instead of doing it in this file, go and make the changes in this file.” And that’s about it. 

After that, they just run with it and do some crazy things. Like we have one healthcare customer who has built an entirely custom search page that just looks fantastic. I don’t know if I can say the name over here or not, but they have built out a page where a single page is built using four or five different data points. And then they build one search page using that. And then we have a couple of higher ed customers who we helped out in their first implementation, but then they added like four or five different search experiences in their website without our input at all. They just run with it. So it’s pretty straightforward.

Jae  

That’s awesome. And Pete, you mentioned earlier, we get a lot of people who are doing Sitecore upgrades because of the need to move to Solr. And so I was wondering – that’s on the Cloud side – but I was wondering, is there anything related to that, that we should mention here? Just because you mentioned earlier that a lot of people who come from this, they’re gonna know Solr, and I didn’t want to make any assumptions.

Pete  

I mean, when you’re using our Sitecore connector, you don’t necessarily need to know Solr in order to use the connector. Sitecore, for the most part, pretty much takes care of all that. That’s why it’s kind of easy to use, because you don’t have to really know too much about the schema, or the default schema – Sitecore sets that all up. But from more of a customization perspective, you’re not really using Solr to customize things, you’re actually using Sitecore configuration, which makes it a little bit easier for software developers. They’re going to be used to modifying Sitecore config files and using Sitecore patch files to make modifications to the Sitecore configuration, in order to make changes to the schema that eventually gets populated in this folder using the property. For the most part, you don’t need to know a lot of Solr to use the connector, at least for Sitecore. I think Drupal is pretty much the same way. Where that differs –  where you might need some Solr – is if you need to customize the query or maybe you need more advanced query parameters that’s actually going back to Solr specifically. At that point, that’s where having some insight into Solr might be a little bit more appropriate.

Jae  

Terrific. Do either of you have any examples where we provided some consultative help on the Solr side of things from a Studio standpoint?

Karan  

Yeah, so the higher ed customer that we work with that I was mentioning, they had a very unique use case where they wanted to search for something like “Bachelor of Nursing.” And whenever someone searched for that, they wanted the exact match to appear on the top. And then after that, other matches should appear. But that wasn’t happening, because there was some other course which was called Bachelor of Nursing Science or something; it had one extra word in it. And that always appeared first, with the “Bachelor of Nursing” result coming next. So whenever there is a use case like this, to us, it sounds very straightforward – just do an exact match, and if someone searches for “Bachelor of Nursing,” then that should be first – you should just do an exact match. But to do that in Sitecore is hard. It’s hard to do both exact match and inexact match in the same query. Because if you do exact match, then nothing else will appear; you will just get one search result. That’s it. So they had this use case which they wanted to solve. And for that, we offered some consultative services where we suggested that they can use a particular type of field in Solr, and that will allow them to do that. As a matter of fact, for them, we had to create that field. There was nothing out of the box in Solr that helped them achieve exactly this, so we created a field for them. And then they re-indexed their data in that field, and that allowed them to achieve this, where they were able to do both inexact and exact match in the same query without doing a lot of changes in their Sitecore.

Jae  

Awesome, thanks for that context, Karan, that’s really helpful. Anything specific to Drupal that we should also know? I know we kind of touched on it as you were talking through it. But since we were  starting with Sitecore, is there anything specific to Drupal that we should cover as well here for those people listening?

Karan  

Well, the process of integrating with Drupal is pretty similar to Sitecore. But they are different platforms, so they both come with different requirements. The thing with Drupal is that you can easily dissect the whole experience in two different buckets. One is data ingestion, and then the other one is data consumption. So if you want to just solve the data ingestion problem in Drupal, then for that you have some prerequisites, like you should be using the Search API in Drupal, of course – you’re implementing search, so you need the search API. Then you need to use the Search API Solr because you are using search with Solr. And then you also need a SearchStax module that you install in your Drupal that allows you to utilize your search API with the shared index that we have created. So we have three offerings, SearchStax Cloud, Dedicated SearchStax Cloud shared, and SearchStax Studio. So by installing this module, you can utilize all these three offerings. Otherwise, you can just utilize SearchStax Cloud. So once you have installed this module, then you can index your data, you can just select the index that has been created for you, you can select the SearchStax Studio index, and just get your data into Drupal. And then while you’re just making sure that all the proper fields are selected, you also need to make sure that all the fields have proper field types. And you also select the parsers over there to make sure that if there is any STML content in one of your fields, you don’t store the HTML content – you want to store the parsed content. So you can select that in Drupal. And then once you’re happy with all the fields that you have selected, and the way those fields should be indexed, then you just hit the index button, and it will upload all of that data. And after that, it’s up to you on how you want to move next. If you want to utilize Drupal Search API, then you can you can do that. Once you install the search API SearchStax module, that will give you an option to say “redirect all search API calls through Studio.” So you just check that checkbox, and then you’ll start using Studio behind the scenes. And in that same module, you also have an option to configure your own analytics. So you can just enter the analytics key and the analytics integration will be taken care of, if you’re using the Search API. But then a lot of Drupal customers also go the headless route, where they want to build their own web page, they want to utilize the APIs directly. And you can do that. If you want to use twig templates in Drupal, then you can just download the code that we have on our website. You can download an example page from Studio, and then just bring it over in your twig template and you’ll have a search page up and running. That’s what a lot of our Drupal customers do. Or if you want to utilize APIs directly, then we have a whole suite of APIs that you can utilize and build your own search experience. So you do have a lot of options when it comes to using Studio. But when it comes to sending data to Studio, that’s pretty straightforward. You just need to configure a module. And that’s it.

Jae  

That’s awesome. Thanks, Karan. Pete, anything to add there, or did Karan pretty much cover that?

Pete 

Yeah, no, I think with regards to Drupal, I think Karan pretty much covered that. 

Jae  

Okay. And as far as use case, then, is there anything that we could – similarly to what we did with Sitecore, is there any kind of general feedback that we hear from Drupal customers about the process? Or just from a use case standpoint, are there specific things that people do that, again, it would help listeners to know?

Karan  

Oh, one thing that I have to make sure that I retrain our customers on, when it comes to using Drupal modules, is relevancy modeling. In SearchStax Studio, you configure all of your relevancy in SearchStax Studio; you don’t rely on Drupal anymore. But a lot of Drupal users are used to making those kinds of changes inside their search API modules. So that’s one thing that’s different. That’s one thing that I have to make sure that customers understand – that you don’t do relevance modeling in Drupal anymore, you have to go to Studio. But other than that, it’s a very straightforward thing. A lot of our customers just – once they start using it, it just makes sense. It makes their lives easier, as well, because the kind of controls, the kind of knobs that we have available, are magnitudes better than what they are usually used to. So they’re just generally happy to just rely on that part and move on.

Jae  

Awesome. Thanks, Karan. And anything specific we need to be aware of related to Acquia, that just kind of goes above and beyond a vanilla Drupal install? That would be a good thing to cover.

Pete  

So, when it comes to Acquia, for the most part, our Drupal connector works just fine with Acquia. In fact, there’s no difference in the install configurations, or process, when you’re installing software, which is great. It makes it a pretty easy transition, especially for folks that are maybe migrating a Drupal site into Acquia. All those configurations are going to be just fine.

Karan  

Yep. One thing that I noticed is different for Acquia customers is that usually, they don’t have just one site. They usually have multiple websites. And some of them are on Acquia, Drupal, some of them are in some – I don’t know, Salesforce website, and then they might have a repository of documents and the Acquia Widen product. So the data that they have is pretty spread out. That’s one extra layer of challenge that we’ve had, that we see with Acquia customers, where they not only have to rely on the SearchStax Studio connector that we have. But then after that, we also worked with them to configure our web crawlers so that we can pull in data from all these different sources, and then create a search experience – a unified search experience – where they can utilize one search page and search through all these different websites and parts that they have.

Jae  

Yeah, that’s really important. Thanks for calling that out. We do have another episode we’ll be recording on web crawlers, specifically, so looking forward to that session as well. That’s great to hear, though, because that makes sense. It’s typically larger companies who are using Acquia. Are there any general recommendations? Or to sum it up –  it sounds like the process is very straightforward in regards to installation, there’s a lot of flexibility and configuration options that people get through our interface directly. Is there anything else that people need to be aware of, or iron out beforehand that they can just kind of anchor to? Obviously, we have an extensive set of documentation that covers this, but I’d just love to hear your take on that, summarization of that.

Pete  

Yeah, I think from a starting out perspective, if you think about prerequisites – of what you might need before you can use one of these simulators – think, hey, you want to go ahead and make sure your Studio app is created, maybe not configured but at least created, so that we have access to that when we’re setting up the connectors. But really, having a general sense for your fields and the type of content that you have – sometimes just doing a field inventory of, I know that I’m gonna want my title field to show up in my search results, do I have an appropriate title field that’s actually discreet that we can index on? And also other considerations are things like, what about multilanguage? Right? Is there another field that might need to have a different multilanguage version for that field? So you want to think about the different field types. The other field types that are generally common, do you have a URL field? Do you have a description field? You know, what about faceting? Do you have the appropriate facet fields, or different taxonomy? You want to make sure you have some of that stuff either known or at least set up. And then from the accelerator side, doesn’t really matter since we’re not necessarily crawling the site. But if this was something where you’re using the connector in conjunction with maybe a web crawler, you also want to make sure that your web pages have the appropriate meta tags that are set up and visible. So for me, at least – Karan, I don’t know if you have anything to add there. 

Karan  

No, that’s about it, you should have a SearchStax app, and then you should be able to connect to that app.That’s one thing that we usually run into, where their own networking firewall rules just don’t allow them to connect to us. So we end up spending like a week or so to just iron that out – so just make sure that you have an app and that you can connect to it. And then just go to our documentation page, and it lists out all the prerequisites that you need to install, and then all the steps that you need to follow. And then as Pete mentioned, you just need to be aware of what are the fields that you will be selecting, that you will be indexing for your data ingestion.

Pete  

One other consideration I just thought of as Karan was talking was also about the type of login. So you have a login into the Studio dashboard – you have the ability to have a two factor authentication for that login. Generally speaking, our older versions of the – at least for the Sitecore connector, I don’t know about Drupal, specifically – but I know for the Sitecore one, the older versions didn’t support the two factor authentication in order to get access to the app for configuration purposes. I believe version 2.0, which I don’t know when this is gonna get released, but – or sorry, I don’t know when this podcast is gonna get released. But that’s for version 2.0. We do have plans to have the multifactor authentication available within the configuration for the Sitecore connector. So if you run into issues where maybe you can’t get access to the connector, maybe try disabling two factor authentication, if you have that enabled, in order to move forward with that process. But otherwise, that’s the only other thing I can think of.

Jae 

Great, those are all really important. I appreciate you guys covering that. I guess I was gonna ask what else can people read or watch to get smarter on how to use our connectors? We have a series of other data ingest podcast episodes, but other than documentation, we have some blogs, I believe. 

Well, thanks so much, guys. This was really, really helpful. Next time we’ll do part three. So thanks again. And thanks, everyone, for listening. Stay tuned for the next one.

SearchStax helps companies create exceptional search experiences managing Solr infrastructure on the backend via SearchStax Cloud and site search on the front end with SearchStax Studio. Interested in learning more?