How to implement complex full-text search with Hibernate Search

How to implement complex full-text search with Hibernate Search

This is the second part of the Full-Text Search with Hibernate Search series. In the first part, I showed you how to add Hibernate Search to your project and to perform a very basic full-text query which returned all entities which contained a set of words. This query already returned a much better result than the typical SQL or JPQL query with a WHERE messageLIKE :searchTerm clause. But Hibernate Search can do a lot more.

But you can do a lot more than that with Hibernate Search. It provides you an easy way to use Lucene’s analyzers to process the indexed Strings and also find texts that use different word forms or even synonyms of your search terms.

Let’s have a quick look at the general structure of an analyzer before I show you how to create one with Hibernate Search. It consists of 3 phases, and each of them can perform multiple steps. The CharFilter adds, removes or replaces certain characters. That is often used to normalize special characters like ñ or ß. The Tokenizer splits the text into multiple words. The Filter adds, removes or replaces specific tokens.

The separation in 3 phases and multiple steps allows you to create very complex analyzers based on a set of small, reusable components. I will use it in this post to extend the example from the previous post so that I get the same results when I search for “validate Hibernate”, “Hibernate validation” and “HIBERNATE VALIDATION”.

That requires the search to handle words in upper and lower case in the same way and to recognize that “validate” and “validation” are two different forms of the same word. The first part is simple and you could achieve that in a simple SQL query. But the second one is something you can’t do easily in SQL. It is a common full-text search requirement which you can achieve with a technique called stemming. It reduces the words in the index and in the search query to its basic form.

OK, let’s define an analyzer that ignores the case upper and lower case and that uses stemming.

As you can see in the following code snippet, you can do that with an @AnalyzerDef annotation, and it’s not too complicated.

The analyzer definition is global and you can reference it by its name. So, better make sure to use an expressive name that you can easily remember. I choose the name textanalyzer in this example because I define a generic analyzer for text messages. It’s a good fit for most simple text attributes.

This example doesn’t require any character normalization or any other form of character filtering. The analyzer, therefore, doesn’t need any CharFilter.

Share it:
Share it:

[Social9_Share class=”s9-widget-wrapper”]

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

You Might Be Interested In

Why Healthcare is Behind the Data Curve

6 Sep, 2017

The healthcare sector has been trailing industries such as banking and retail when it comes to adoption of data analytics, …

Read more

Cognition and the future of marketing

5 Oct, 2016

Many of us remember that night in 2011 when Jeopardy! all-stars Brad Rutter and Ken Jennings met their fate against …

Read more

Big data for small biz is leveling the playing field

6 Mar, 2016

The cycles that accompany advances in computing are fairly predictable. Technology starts off in a lab setting understood by only …

Read more

Do You Want to Share Your Story?

Bring your insights on Data, Visualization, Innovation or Business Agility to our community. Let them learn from your experience.

Get the 3 STEPS

To Drive Analytics Adoption
And manage change

3-steps-to-drive-analytics-adoption

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.