PODCASTS

Federated Search

Pete Navarra, VP of Onboarding (and Sitecore MVP) discusses federated search and the benefits it can bring in refining your site search experience.

SearchStax helps companies create exceptional search experiences managing Solr infrastructure on the backend via SearchStax Managed Search and site search on the front end with SearchStax Site Search. Interested in learning more?

 Jae 

Okay, so let’s start, what is the general industry definition of federated search? And how can that help? 

Pete 

It’s an interesting question, whether or not there is an industry defined definition of federated search.  I think that there’s a lot of different definitions that people use  interchangeably, but I think largely our industry looks at federated search as this concept where I have all this data coming in from different sources – I might have six different systems, I might have two different kinds of management systems, I might have a database, I might have a file system. When we think about federated search, enterprises and large companies, it’s really this idea that there are all these different kinds of discrete areas that contain content and data.

So if I was to say, “Hey, I’m looking for something on our website, but I also need to check our Google Drive” as an example, that would require me to basically do two different searches. I have to do a search on our website, and I have to do a search on Google Drive. And in that sense, I have now performed a query. So if I was to create some  automated search system or service that performed those two queries, and that gave me a single result set, that’s federated search. And so you think about it from  a number of different systems. And, quite frankly, and largely, that’s where Coveo got their start. In fact, I was around when Coveo started, and they had an appliance type of federated search which would go out and crawl local files, hard drives, SharePoint, any connector you could think of. They would be able to crawl and then pull in all these different types of content into an index and/or multiple indexes. And so they would have each index for each federated search. When they would do queries, they would run queries against all of their indexes and  create a unified search. And that was actually Coveo’s start. They targeted the enterprise. We actually had –  where I was previously working – a Coveo Enterprise Application Server in our office, and they had a little Windows desktop widget, where I could search for something that I was looking for, and it would scan all of our file drives. It would scan my email, it would scan the website, and I could find content of what I was looking for based on that.

And so they’ve evolved into obviously now this big search company, but when I think of federated search, that’s what I think of. And it’s really this idea that there’s all these different kinds of bespoke systems that have data, and we want to be able to collect that information.

 Jae 

And in the case that you just described, was that a workplace search, like an internal employee search? Ah, okay. So what’s interesting is that even before you talk about Coveo, you touched on the methods that are used there. Because one was that there’s multiple searches that are going on across different datasets – or data stores, I should say – but then there’s also this concept that you’re unifying the datastore. So the question I have is, why did they even call it federated from the beginning, versus something like unified, right? Because ultimately, if you think about federated, there are two approaches, like you just mentioned. You could either do it in terms of the data, or you can, just from an inner interface standpoint, it’s like the search itself, right?  So do you have any context on that? 

 Pete 

I can make some assumptions, right? Again, this was 15 years ago, maybe. I mean, it’s been a while now. And so this is also around the time where things like Active Directory, Federated Services started around the word “federation” –  hey, I’m taking something at an enterprise level, and I’m able to  basically share out this function or data that I’m using into other parts. And so we’re going to start to call that like a federation. But I think when we start thinking about the term federated search, I think that really came from, whether it was a marketing lens, or it was just easy for people to understand that if we use the term federation, we’re really talking about multiple systems interacting with each other. And I largely think that’s where the term “federated search” comes from.

And why, especially in our industry, it is, for lack of a better word, the household name that we use for when we talk about consolidated or unified search in general, especially when you start adding crawlers, but there is a distinct, distinct difference. And I think, when we think about unified search versus federated search – and I mean, Sameer  laughs every time about this – but there is a difference. And the biggest difference under a federated search is that you’re not really unifying a single index, you’re crafting a query mechanism that is clearing multiple data points or multiple sources, right, and then cataloging that into a single response. That’s true federated. Whereas in a unified search result, we’re still looking at multiple systems, but we’re doing it more from an ingestion perspective. So we may have crawlers or we go out and say, “Hey, I’m going to crawl this system, I’m going to crawl that system.” And we call them crawlers, because that’s the household name. I mean, at the end of the day, we’re just looping through some  API or scanning,  scraping a web page or following links. There’s a number of different things that we can do that we would call a crawler. And we’re taking all that in creating a unified object of data that is then stored into a single index. So now we’re doing this translation process where the data structures are completely different from the things that we’re reading. But now we’re basically creating an interface – we’re saying, “Okay, we’re going to take this structure, we’re going to mold it into the structure that we want, here’s all of our fields. And now we’re going to store it into a single index.”  That is a unified search, where our search queries only have to hit one index, and the content all resides in one catalog.

 Jae 

Right, I’ve heard the term query federation. So what you’re describing there, this next question was about,  why do we see so many different perspectives out there on federated search? And I think you already answered it, basically, in that people call it different things. They have their own approach. Is this just marketing differences? Are there technical differences, too? And what you’re describing is that in terms of the actual definition – of the accepted definition – your example of authentication was a perfect example, because people needed to do that within a very specific technical framework because of compliance reasons and other requirements. And so, it sounds like federated search grew up with having some of those similar types of, again, the underlying mechanism had dictated whether or not it was actually federated. We’ve now reached the point where some of these use cases, it’s really more about just  unifying the experience, right? It has a lot less to do with the architectural requirements here.

 Pete 

Yeah, and largely there’s two primary user bases that this industry typically targets. We target our technical developers, but we also target our non-technical – or maybe technically not as savvy digital marketers – out there, but we target both. And I think, from a perspective of not having to retrain, I think it’s easier to say “federated search”, because there’s already this like, baked in idea that – okay, I understand federated search, I’m just looking at multiple sites. And so then,  at the end of the day, the effect is the same.

 Jae 

Then what is the real problem that customers are trying to solve? You touched on it, but I’d like to really hone in on that and answer it in a morecomplete way. Because what I’m trying to get to is – and I think we already already said this, too – the question is, is there one size of federated search that  fits everything? Or are there very specific, sub-requirements that need  different search functionality? And platform interoperability – again, speaking about all these different types of data stores –  what’s the real problem first, that they’re really trying to solve? Because my assumption is that the pure form of federated search still has its need out there, right? But in general, when people come and say they need federated, they may actually need something different, like unified.

 Pete

So, to answer your first question initially, that you asked there – when it comes to federated search, when it comes to a federation, when it comes to an enterprise, the sky’s the limit, almost, on what you could potentially need  from business requirements. I mean, I need to search a ledger for an AS 400 site, and at the same time, I need to be able to pull Wiki pages off Wikipedia. There could be any number of use cases there. 

The challenge that most of the marketers are really trying to solve is two-fold. And I think they’re trying to solve it in one action, but they’re trying to solve two problems. And the first one is, I want to make content easy to find and accessible for my users. Oh, and by the way, that content lives in 13 systems. So it’s hard for a marketer to think “How am I going to get all of these pieces of content from different systems into a place where a user can easily use their system and find the conference they’re using they’re looking for?” That’s, to me, the number one challenge I think people are facing when they start thinking about federated search or unified search. 

And I do think that there’s a limit to what unified search can do. It’s largely around the number of systems and just how different your data is. If, as an example, let’s say we have a website that’s a blog post aggregator or a stock exchange. So we’re going to go out, we’re going to have a bunch of crawlers, we can grab the title, we can grab the author, we can grab the content title, and even if those systems represent the data a little bit differently – a blog post is a blog post. We know it’s going to have some headers, we know it’s going to have a heading, we know it’s going to have a byline, it’s going to have content, might even have a picture. So we can craft crawlers, they’re going to go through and set that up. Oh, and by the way, we also want to return stock quotes. And, you know, it’s different data. Google solves this really easily. Google does it, but they scan everything. Google’s the ultimate example of a unified index. And most companies, most people, most marketers that are looking for search stuff for their own website? I don’t think they’re looking to create Google. Now, design is a completely different perspective. A lot of people just want the design to mimic the simple ease of use of Google and just give me the search bar and go, but that’s a whole other topic.

 Jae 

So what I’m hearing is that, like you said, there are a million use cases. But at the core of the problem is really heterogeneous datasets, data stores everywhere. And I really want to just simplify that search experience. But then you touched on something that I would like to drill into there – is that there is still a need for federated search in some cases, right? So is that industry specific? Or, are there specific industries that need federated more than others? And then what about at the use case level? Do we see, for example, certain custom maps that might be federated, or maybe like, from a compliance standpoint, or privacy security standpoint – or is it just more like a fundamental site search requirement?

 Pete 

I think you’ve  asked a couple of interesting questions there. And I do think that – I’m actually going to take the last thing you mentioned, because compliance is the big piece, when we started looking at things like GDPR. Where the data resides has actually been very important. And if we start unifying information and putting data into some other location, that is still held to the GDPR standard. 

And so if we’re putting personal information – which honestly, I don’t think you ever should – into a unified search structure, we need to understand that that is now under the compliance rules of GDPR, maybe potentially the California Consumer Protection Act – there are all kinds of regulations now. And as time moves on, those regulations are getting even more stringent. So in those cases, the risk that the companies might take on to have that unified index might be greater than the need for, or an ease of use, for that company to be able to provide a unified index. And in that regard, you would want to have some  federation where “my query is reaching into this other system that is protected by GDPR, in order to return the results, and no, I’m not storing that information anywhere else.” 

So I think compliance does play a role there. Industry-wise, maybe some industries need it more than others. But I don’t think it’s necessarily industry-specific. I would say, more to the point, I would say it’s use case-specific. More importantly, I would say what is the data? What is the content and where does that content live? And that’s going to drive a lot of your business decisions around your use cases, depending on what kind of data you’re working with.

 Jae 

And is it fair to say then that, generally speaking, internal use –  like the workplace search type of use case –  is probably going to need a little bit more around that compliance aspect, because you’re dealing with customer data, PII potentially,  other things like that. Versus if you’re just like looking at a corporate website or like a standard informational digital experience that isn’t rooted in e-commerce. In general, it’s not going to be a big problem, right? I mean, is that fair to say?

 Pete 

Not just compliance, Jae. Governance, too. I may not want to show everybody the search results, I may have privileged people who can only see certain search results in that workplace environment. 

So the governance standards also play a role there. Because if we’re now talking about having to gate site search results based on that, that adds complexity to the use case and complexity into the solution. But, largely, especially when we’re talking about the type of customers that are using our products – where SearchStax  really fits in is data that’s  already publicly available. We’re just crawling public websites – you know, I’m not going to name names, but we have a customer currently right now for whom we just finished implementation. They’re storing every single one of their contracts into a digital asset management tool. And we’re scanning all those contracts and making it available for anybody to search against. 

So again, it’s a great use for unified search because we can distill the different parts of the search query result that we’re looking for from different parts of the contract and still be able to merge that with website content.

 Jae 

What I’m hearing is that again, especially the markets that we service, the federated requirement is not really an issue most of the time, like you said.  It’s publicly available information. 

Pete

And honestly, if it is a requirement, maybe your tool’s not the best.

Jae

Exactly. Let’s switch context for a second here into skill sets and talk about some of the organizational skill sets that these teams are going to need in order to satisfy this particular requirement. 

Is it going to be different from what people are doing today? In other words, do you need a new set of skills in order to solve this unified search and/or federated? Or is it going to be more like what you were mentioning before – as long as you sort of understand the methodology, then you pretty much have what you need, right?  I’m asking this question  to  tease out whether or not this is something that people should view as a pretty big barrier / challenge, or as something that’s pretty straightforward.

 Pete 

I would say that when it comes down to site search results – and I’m speaking more from my own experience here,  I spent 15 years in the consulting world where we’re having to create solutions, like full solutions, for potential clients for their entire website. So in those cases, the majority of site search becomes almost like –  and I hate to say it – but almost like the redheaded stepchild. Because what happens is that the focus is on the homepage, the focus is on the landing pages, the focus is on the article pages, and then three months into design planning, the developer goes, “what about the site search results page?”

And then a designer quickly stubs out a search results page that looks like Google custom search results. 

At the end of the day, I think it’s time – potentially people just don’t think about site search results as a first class citizen to their website, they think of them as like a second or third class citizen. So because of that, when it comes down to time and budget, you’re suddenly out of both. 

Or when it is thought of, there’s not really a strategy around how you do site search. A lot of developers look at it and approach the search results problem as, “I have to reinvent the wheel.”  I mean, Sitecore did this, right?  They created a custom search results API that was a wrapper around Solr, basically. They didn’t include any starter kits or GUIs or anything like that. It’s just a .NET API. And developers use that tocreate results pages. And so they were doing this handcrafted, because that’s largely what you do to Sitecore, you handcraft your views, you handcraft your UI, to use the APIs Sitecore has provided.

So to a Sitecore developer in the Sitecore world, that is the standard. Like you don’t even bat your eye when somebody says, “I’m just going to recreate a search UI.” You just go off and create a search UI – it must have some results, it’s going to be a repeating element, there’s probably going to be some  filtering on the left side or maybe on the top or maybe on the right. I don’t know, it depends on what my designer wants to do. But I’m going to go ahead and hand it off. It is furthest from their imagination that, “Oh, there’s a tool out there that I can just drop.” And maybe “drop” is too watered down – it does require some implementation – but we can drop our accelerator into most applications and have a search experience up in a short time. And then the developer doesn’t have to think about the design, the developer doesn’t have to think about reinventing the wheel every time, and then we get rid of the search API, which, frankly, is not my favorite.

 Jae 

I see your point, though, that in other words, it’s like it took you so far, but really, the bulk of the heavy lift was still very much there. And that’s really arcane work.

Pete 

Oh, yeah. And the reason why is that it’s an education gap. Most developers aren’t aware that there’s another way to do it, because we’re so trained to just do it ourselves. Do it DIY, basically. We either borrow from the project we did last time, or we reinvent it. 

When you start thinking about, like, consulting companies, that’s the name of the game. Whatever wheel you invented in the last project, reuse that wheel, because they don’t ever differ from it. So they just keep reusing that wheel, even though there’s probably a better wheel out there.

 Jae 

Yep. Yeah, of course. Repurpose what you have, right? 

 Pete 

And so educating those people, especially those in the leadership of the devs, and leadership on the tech side, to say, hey, there is a different wheel, we can show you a better wheel, is key.

 Jae 

What advice would you give to somebody who’s just  starting down this path for the first time in terms of having to deal with the problem statement from before, if you have lots of different data everywhere? What pitfalls can they avoid when they’re just thinking about building either federated or unified search, however you want to look at it? When they’re trying to do that search capability to solve that particular problem that we mentioned before in their environment, what advice would you give?

 Pete 

The biggest piece of advice I would give is don’t boil the ocean in the first month. Figure out the minimum viable product, do one or two external websites – maybe even start by just getting your content management system, whether you’re using Sitecore, Drupal, Adobe, whatever it is, maybe just start by getting that into a better search capability, and then add in other sites.

If you try to do all encompassing – which I mean, you can, with proper planning and stuff like that – it’s really just going to lengthen your adoption time, which is going to cause complexity to the overall build. And you might start seeing yourself walk away from the accelerator, which then starts to invite custom development efforts. Which is doable, and we have an API system that provides that. And I can tell you one other thing I forgot to mention about the developer, because it just popped into my head. In those scenarios – where we’re either reusing the wheel or reinventing the wheel, especially around the Custom Search API from Sitecore – we might still be building custom UIs with SearchStax Site Search, but there’s not a single application out there that a developer has done against Sitecore that provides the analytics, the stockboard usage, the ability for boosting…in fact, that’s a very common ticket for developers, is for marketers to send a ticket to development to say, “Please boost a field.” And that’s a code effort. You attach it to the backlog, apply it to a sprint, you’re out two weeks for that development sprint to mature, then you’re out for another two weeks for the release cycle. And so maybe a month later you get it working. That’s a common practice, and that’s an accepted common practice.

Jae 

Right. Like you said earlier, we’ve been trained to accept these timelines. So can you give a couple of examples – and maybe I’ll even  cover the previous question to which was about skillsets again – what I’m trying to get is a sense of a couple of real world examples. You mentioned a customer that we’ve been working with more recently. Can you just paint the picture from a real world standpoint where you comment on both the  skillsets needed as well as this advice of making sure you don’t boil the ocean, as you’re going through it? And maybe, without calling out the customer name, if you can just give an example of where that worked well, because they took a more  stepwise approach to it. You gave a very conceptual version of that, but I’d love to hear a more concrete example.

Pete 

From a concrete perspective, there’s two things we’re looking at here. We’re looking at the developer person and we’re looking at the marketer persona, and the marketer persona can be technical or non technical, it doesn’t matter. 

So from a marketer’s perspective, the skillset that’s really needed is they just need to understand their business use cases and they need to understand why site search is important. They need to understand why relevance is important, and really have a clear understanding that this is what their users are searching for, and why they’re searching for it. That’s the responsibility of the marketer. 

On the developer side, the competency is going to lie in a couple of different spots. One, know your CMS. Obviously, you’re going to be integrated into a CMS, you’re going to have to know your CMS. But more importantly, API development is becoming more and more of a mainstream thing. We think about “headless” – and I’m using “headless” just in the broad sense, right –   it’s all API-driven. And our SearchStax Site Search product is definitely, at the end of the day, an API-first application. So being able to integrate front end technology – whether it’s JavaScript, or React, or View or any of these others – is an important concept. We even provide some helper modules. Sitecore, as an example, has XSA, which is Sitecore as an experience accelerator. We have a drop-in module for XSA that exposes our accelerator and gets a Sitecore experience going even faster. But the dev is still going to need to know Sitecore.

So understand that you need to know your technologies – that while we have an accelerator, you’re still going to have to implement it, there’s still going to be an implementation timeframe on that. And use cases and customizations are only going to complicate that, and I think that’s okay. Because if that’s what you’re signing up for, if you’re like, “Hey, we want the promise of SearchStax Site Search because of what the backend part of it provides, which is the relevance, the boosting, the promotions, all the features in SearchStax Site Search that we actually love – because those are the pieces that nobody really does” –  you can be okay with custom data, you can be okay with some implementation. 

So I think that’s the expectation that just needs to be said – “No, you’re going to have to have somebody that knows your frontend, knows the content management systems, and has patience,” at the end of the day. I mean, that’s with any development effort. A good example of where we had a customer  – I would say, not boil the ocean, exactly.  But this is a good example where it was like “Take some baby steps, get to know the product a little.”  We did about a month or two of proof-of-concept with them where they’re going to have to federate. They have about 50 websites, and they want to be able to show all of the content in a site search experience from all of these different bespoke websites into a single search experience. And we  looked at it from an aspect of “Alright, well, why don’t we just take two of those sites? Let’s just take two of those sites, and see how quickly you’re able to implement those two and get a search result experience that you desire. And that really helped them focus in on  the API, focus in on how to actually build the results and kept the project away from distractions such as from, “What are these other 49 different websites that we’re going to have to integrate with?”  

That has opened it up, because now that we did two sites, they’re like, “Oh, now we see how we can do the other 50.” And that fear of the size of the Goliath, basically, is gone. And they now understand that this is really simple to do, we just  have to replicate it, so now I think they’re on a pathway to big success.

 Jae 

Oh, that’s awesome. So again, that sounds like you  provided a lot of sort of initial guidance on the planning itself. In other words, like you said, not boiling the ocean, making sure that there’s a  logical path to progress  in place, and how to get to that point,

What was some of that specific guidance that you gave? And this question is really the last one, but it’s more for people who are looking at SearchStax – is there any guidance that you can give them that helps them determine quickly whether or not we will be a good solution for them to explore when it comes to federated search? Maybe if you could expound a little bit on what you just mentioned, with that current customer? What were some of the discussions that were had early on to determine whether there was a good fit? And then maybe you can just also comment generically about that for everybody?

 Pete 

Well, I think what was really interesting, especially for this particular customer, is that they approached us initially with a “boil the ocean” mentality. They had probably 35 different requirements. Everything from AI and natural language processing, to 37 different languages, to all kinds of different requirements. And to be quite honest, for some of that stuff, our product is not there yet. For example, we don’t have any natural language processing in our product, we don’t have a lot of AI built into our product. So, in those cases, we have to be just very transparent and honest and say, “these are the features of our site search.” And so for somebody else wanting to  say,” Hey, these are the features that match that matter to me,” what are the features that are really requirements, and what are your features that are nice to have? If your nice-to-have is natural language processing, well, we’re going to continue to evolve our product. I don’t see a world where our SearchStax Site Search product doesn’t come into natural language processing down the line. There’s going to be product evolution. But if that is your core requirement, and you must have this on day one, we’re not going to be able to help you.

It’s really understanding what are your requirements, and what really are your day one requirements? If your day one requirements are, “I just need three websites” – or  maybe the number of websites doesn’t even count, maybe you just need content for multiple websites to show up under five different languages.

 Jae 

Was there anything specific to the federated aspect, the federated/unified aspect of that conversation that was had with the customer in terms of being a good fit? Because you mentioned some of these other features that, obviously, as you work through it, identified as more nice to have – but on the federated side, were there any  distinctions between must have, ways of doing federated versus the path that we take, which is more unified?

 Pete 

No, in fact, I mean once we describe the commonplace term of federation is really unified indexing, they were fine. Because at the end of the day, and I think, largely, this is the important takeaway nugget, right? At the end of the day, it doesn’t matter if it’s federated or unified. At the end of the day, I’m able to execute a search query, and I get results that are in full all encompassing of multiple sites. If that happens, what happens under the covers, generally the marketer is not going to care about.

 Jae 

Right, unless in those very few cases that we mentioned earlier with one of the compliance in the governance, those kinds of things, but for the bulk of these public facing websites, it’s not going to be an issue. That’s great. I really appreciate this discussion. I think, like you said, that’s a key takeaway, understand your requirements well. We covered everything else like skillsets, even,  what you need to be focused on and aware of as both a digital marketer, and as a developer. And we also talked about the whole concept of what federated really means, and how people  like – I like the term you used, this is just  a regular sort of use for federated. And then, we also identify that it’s probably going to be more use case specific than industry. So it sounds like what we’re really saying is, at the end of the day, take a logical stepwise approach, be realistic about it, we can help you, we have both the product as well as the internal consultative knowledge to help people  get started. This was super useful and helpful, hopefully, for a lot of people out there. Any other  closing comments?

 Pete 

Don’t talk to strangers. [laughter] No, I’m kidding. No, I mean, this has been a good discussion. And it’s been great to  pick apart and dive into some of these areas. We don’t talk enough about the differences between unified and federated, because I do think people largely just understand that they’re going to be able to execute a query and get results. But there are particulars under the covers, and I think just making sure that you really understand your use cases –  what are your must -ave requirements – and then we can work through that and we can do a  proof of concept, we can do a trial, we can show you how SearchStax Site Search really can work with those use cases. And then we’ll also be up-front and say, hey, this might not be a match.

 Jae

Yeah, exactly. This is a great deep dive into some of these topics. Thanks so much again, Pete. 

FAQs for Federated Search

What is Federated Search?

Federated Search involves querying multiple data sources independently and then merging the results into a single result set. In this approach, each data source is searched separately, and the results are then combined and presented to the user. 

Federated search can be useful when searching a diverse set of data sources, each with its own unique search capabilities and user interfaces. On the other hand, Federated Search can result in slower search performance and can be difficult to implement if there are significant differences in how the data is stored or represented in the various data sources.

What is Unified Search?

Unified Search is a bit different from Federated Search in that it involves issuing a single search query against a consolidated or unified data source. 

The benefits of the Unified Search approach is generally faster search, better relevance and a better search experience. Since the Unified Search is only querying a single index, the performance is better than searching the multiple systems that you have to do with Federated Search. With Unified Search, the search relevance can be applied across the entire unified data set. This means that the results will always be ranked by relevance and are not dependent on the relevance of the data source. Finally, Unified Search produces a better search experience since search results will be blended from all sources and facets, filtering, sorting and other key search UX features are built on top of the unified index.

What is SearchStax’s Approach to Consolidated Search?

SearchStax typically employs a variation on the Unified Search approach and we have seen great success with customers who initially came to us with the federated search requirement. In this approach, we use what is known as a Merged Solr Index – data from each source is normalized to a common data model and then stored in a single index. This supports the customer’s actual use case better in most of the situations we’ve seen, since we can search from a combined single index, rather than needing to dynamically search each source.