Open Source Document Management System | OpenKM - Difference performing a full text search in Lucene 2.4 and Lucene 3.1

Reply

Difference performing a full text search in Lucene 2.4 and Lucene 3.1

#54718 by alan_vallejo
Thu May 23, 2024 7:18 am

I know that this is more a question for Lucene developers but I just post this just in case you can guide/help me. I also posted this message in StackOverflow (https://stackoverflow.com/questions/785 ... lucene-3-1).

Our actual java application works with Lucene 2.4 and we are triyng to migrate the part of indexing and storing files to OpenKM comunity version that works with Lucene 3.1 by default.

I'm facing same issues while performing the same search in two different versions of Lucene (2.4 and 3.1) using the same index. I think that the problem is with the evolution of the Standard Analyzer class which I'm using in both versions.

Text to search: "Company, S.A."

LUCENE 2.4:

ParseQuery result: text:"company sa" N results.

LUCENE 3.1

ParseQuery result: text:"company s.a" 0 results (expected same results that in version 2.4)

The funny thing is that when I'm searching using Lucene 2.4 it returns the results that I'm expecting while when using Lucene 3.1 version it doesn't.

I've searched how the phrase search works in Lucene and I've learned that when Lucene builds the index of the document it keeps the information of the words that belong to the document and the position of them in it. So, I could understand that the analyzer has changes in version 3.1 and the way that the terms are extracted is diferent but when it extracts the terms it should work the same way!

Another thing that I don't understand is that when I perform a similar search (deleting the dots) both versions return the same results.

Text to search: "Company, SA"

LUCENE 2.4:

ParseQuery result: text:"company sa" N results.

LUCENE 3.1

ParseQuery result: text:"company sa" same N results that version 2.4

So when Lucene indexes the term "s.a" (in version 3.1) what the hell is doing with it and why is not positioning it after "company" term?

So, here are the questions for OpenKM developers. If I change in the configuration file configuring an anterior Lucene version what I'm changing really? The way that the analyzer works? The way that the index is build? The speed of the search?

Username

alan_vallejo

Rank

Fresh Boarder

Posts

11

Joined

Fri Dec 29, 2023 1:11 pm

Re: Difference performing a full text search in Lucene 2.4 and Lucene 3.1

#54721 by pavila
Fri May 24, 2024 8:31 am

Sorry, only questions about OpenKM.

Username

pavila

Rank

Moderator

Posts

3142

Joined

Tue Dec 11, 2007 6:02 pm

Location

Alicante, Spain

Contact

Re: Difference performing a full text search in Lucene 2.4 and Lucene 3.1

#54723 by alan_vallejo
Fri May 24, 2024 10:48 am

Ok, thanks! no worries.

I've searched the code, and I've realised that when you change the lucene version everything works with that version (QueryParser, IndexWritter...)

Lucene 3.0 works in the same way as Lucene 2.4 does. So I would downgrade to that version till I can make a new version of OpenKM with an earlier Hibernate, Hibernate Search and Lucene version.

Thanks for your time and for all the work you've done!

Username

alan_vallejo

Rank

Fresh Boarder

Posts

11

Joined

Fri Dec 29, 2023 1:11 pm

Reply

Page 1 of 1
3 posts