Jul. 18, 2018

Karan Jeet Singh

|

3 min. read

Importing Data from PostgreSQL into Solr is part 2 of our 3-part series about importing data from our Solutions Engineer, Karan Singh. Part 1 covered Importing Data into Apache Solr.  

What You Need To Import Data From PostgreSQL

Importing data from PostgreSQL into Solr is quite straightforward and we can use Data Import Handler to do it for us.

PostgreSQL supports JDBC, so configuring Data Import Handler only requires following:

    1. Define a new request handler in the solrconfig.xml file.
    2. Define data source
    3. Define schema
    4. Place JDBC drivers in the Solr.

Set Up Data Import Handler for PostgreSQL

Here are the steps to set up the Data Import Handler for PostgreSQL:

  1. Update solrconfig.xml to create the “dataimport” request handler, and enable Solr to read custom JARs posted in contrib/dataimporthandler/lib folder.

            Add following tags under the <config> tag so that Solr picks up the Custom JARs

            <lib dir=”${solr.install.dir:../../../..}/contrib/dataimporthandler/lib” regex=”.*\.jar” />
            <lib dir=”${solr.install.dir:../../../..}/dist/” regex=”solr-dataimporthandler-.*\.jar” />

            Define /dataimport request handler

            <!– A request handler for data import handler –>
            <requestHandler name=”/dataimport”
            class=”org.apache.solr.handler.dataimport.DataImportHandler”>
            <lst name=”defaults”>
            <str name=”config”>data-config.xml</str>
            </lst>
            </requestHandler>

2. Create data-config.xml file in the same folder as solrconfig.xml, this file will contain the connection string, the query to get data, and the details about incoming fields.

            <dataConfig>
           <dataSource type=”JdbcDataSource” name=”imdb-title-rating” driver=”org.postgresql.Driver” url=”jdbc:postgresql://127.0.0.1:5432/imdb” user=”postgrestest” password=”test” />
           <document name=”title_rating”>
           <entity name=”rating” query=”SELECT * FROM title_ratings;”>
           <field column=”tconst” name=”tconst” />
           <field column=”averagerating” name=”averageRating” />
           <field column=”numvotes” name=”numVotes” />
           </entity>
           </document>
           </dataConfig> 

3. Edit managed-schema file, or if it doesn’t exist, then create schema.xml file. The incoming fields defined in Step 2 need to be made recognizable by Solr. Every incoming field needs to be map it to the Solr recognizable datatype, so that it can parse the data.

When editing managed-schema file, then add the fields defined above –

<field name=”tconst” type=”string” indexed=”true” stored=”true” />
<field name=”averageRating” type=”tint” indexed=”true” stored=”true” />
<field name=”numVotes” type=”int” indexed=”true” stored=”true” />
<field name=”id” type=”string” indexed=”true” stored=”true” multiValued=”false” />

4.   Download and copy jdbc driver jar for postgres to “contrib/dataimporthandler/lib” folder

          https://jdbc.postgresql.org/download.htm

          https://jdbc.postgresql.org/download/postgresql-42.2.2.jar

Simplify Your Other Solr-Related Tasks

SearchStax engineers have helped our clients import data from different sources into Apache Solr. If you’re interested in learning more about our hosted Solr solution and support, please reach out.

Using DIH requires following important configuration items:

Simplify Your Other Solr-Related Tasks

SearchStax engineers have helped our clients import data from different sources into Apache Solr. If you’re interested in learning more about our hosted Solr solution and support, please reach out.

By Karan Jeet Singh

Solutions Engineer

“…search should not only be for those organizations with massive search budgets.”

Get the Latest Content First