• Difference performing a full text search in Lucene 2.4 and Lucene 3.1

  • He we will discuss about how to make customization and improvement to the OpenKM source code.
He we will discuss about how to make customization and improvement to the OpenKM source code.
Forum rules: Please, before asking something see the documentation wiki or use the search feature of the forum. And remember we don't have a crystal ball or mental readers, so if you post about an issue tell us which OpenKM are you using and also the browser and operating system version. For more info read How to Report Bugs Effectively.
 #54718  by alan_vallejo
 
I know that this is more a question for Lucene developers but I just post this just in case you can guide/help me. I also posted this message in StackOverflow (https://stackoverflow.com/questions/785 ... lucene-3-1).

Our actual java application works with Lucene 2.4 and we are triyng to migrate the part of indexing and storing files to OpenKM comunity version that works with Lucene 3.1 by default.

I'm facing same issues while performing the same search in two different versions of Lucene (2.4 and 3.1) using the same index. I think that the problem is with the evolution of the Standard Analyzer class which I'm using in both versions.

Text to search: "Company, S.A."

LUCENE 2.4:

ParseQuery result: text:"company sa" N results.

LUCENE 3.1

ParseQuery result: text:"company s.a" 0 results (expected same results that in version 2.4)

The funny thing is that when I'm searching using Lucene 2.4 it returns the results that I'm expecting while when using Lucene 3.1 version it doesn't.

I've searched how the phrase search works in Lucene and I've learned that when Lucene builds the index of the document it keeps the information of the words that belong to the document and the position of them in it. So, I could understand that the analyzer has changes in version 3.1 and the way that the terms are extracted is diferent but when it extracts the terms it should work the same way!

Another thing that I don't understand is that when I perform a similar search (deleting the dots) both versions return the same results.

Text to search: "Company, SA"

LUCENE 2.4:

ParseQuery result: text:"company sa" N results.

LUCENE 3.1

ParseQuery result: text:"company sa" same N results that version 2.4

So when Lucene indexes the term "s.a" (in version 3.1) what the hell is doing with it and why is not positioning it after "company" term?

So, here are the questions for OpenKM developers. If I change in the configuration file configuring an anterior Lucene version what I'm changing really? The way that the analyzer works? The way that the index is build? The speed of the search?
 #54723  by alan_vallejo
 
Ok, thanks! no worries.

I've searched the code, and I've realised that when you change the lucene version everything works with that version (QueryParser, IndexWritter...)

Lucene 3.0 works in the same way as Lucene 2.4 does. So I would downgrade to that version till I can make a new version of OpenKM with an earlier Hibernate, Hibernate Search and Lucene version.

Thanks for your time and for all the work you've done!

About Us

OpenKM is part of the management software. A management software is a program that facilitates the accomplishment of administrative tasks. OpenKM is a document management system that allows you to manage business content and workflow in a more efficient way. Document managers guarantee data protection by establishing information security for business content.