Page 1 of 2

Chinese search problem

PostPosted:Thu Mar 14, 2013 7:59 am
by taihung
We have OpenKM Community version 6.2.2 build 7815 installed on Windows Server 2003.
I found a search problem.
When I search in advanced mode, there is no search result after typing 圖書館 in content field.
If i type 圖 書 館 seperated by a space, there are many search results showing up.
But when I search 圖書館 via a single search box on the top right conner, it can function correctly.

I give it a try on openkm online demo using user accout today, it still have the same problem.
Would you please fix this bug?
Thanks.

Re: Chinese search problem

PostPosted:Sat Mar 16, 2013 8:49 am
by jllort
It's not a bug, basically you have not configured the lucene search engine for your locale. You should use a chinese analyzer take a look here where's explained what you need to change http://wiki.openkm.com/index.php/Indexing_configuration ( You should google to search which is the name of chinese analyzer )

Re: Chinese search problem

PostPosted:Mon Mar 18, 2013 1:14 am
by taihung
Thanks jllort,

According to http://lucene.apache.org/core/old_versi ... lyzer.html,
I've tried adding a line in OpenKM.cfg
hibernate.search.analyzer=org.apache.lucene.analysis.cn.ChineseAnalyzer
Then go to Administration > Utilities > Rebuild indexes.
But the problem still exists.

I've also changed hibernate.search.analyzer to org.apache.lucene.analysis.cjk.CJKAnalyzer,
it's still the same.

Should I copy files into openkm directory?

Thanks.

Re: Chinese search problem

PostPosted:Wed Mar 20, 2013 9:47 pm
by jllort
With rebuild lucene indexes should be enought.

Execute it from database query to be sure content has been indexed ( column NDC_TEXT ):
SELECT * FROM OKM_NODE_DOCUMENT;

Re: Chinese search problem

PostPosted:Thu Mar 21, 2013 9:54 am
by pavila
Please, attach a sample file to test in my local installation so I can find the problem and fix it.

Re: Chinese search problem

PostPosted:Fri Mar 22, 2013 8:14 am
by taihung
Thanks pavila,

I've updated to the last nightbuild(7936),
the problem still exists.

I cannot attach file successfully, so I put it on dropbox.
You can download at
https://www.dropbox.com/s/0ruug58klqjggf8/library.doc

Please download it as doc file.

Thanks again.

Re: Chinese search problem

PostPosted:Mon Apr 01, 2013 4:23 pm
by pavila
The file is not currently accesible from Dropbox. Please, attach to the forum as .doc or .zip

Re: Chinese search problem

PostPosted:Sun Apr 07, 2013 1:58 am
by taihung
Iv'e tried to upload .doc file, it return error message: The extension doc is not allowed.
Then uploaded .zip file, still got error: "Sorry, the board attachment quota has been reached."

The file on dropbox can be download now. Thanks.

Re: Chinese search problem

PostPosted:Mon Apr 08, 2013 8:18 pm
by pavila
Sorry, I have increased the board attachment quota and also added the .doc extension to be able to attach.

Re: Chinese search problem

PostPosted:Tue Apr 09, 2013 1:07 am
by taihung
I already uploaded file to forum.

Re: Chinese search problem

PostPosted:Tue Apr 09, 2013 5:58 pm
by pavila
According to Lucene, the "圖書館" string is splitted in several words. I have no idea of Chinese, so I need some clarifications: the string "圖書館" is a whole word or every ideogram ("圖" , "書" and "館") is a word?

For example, using the org.apache.lucene.analysis.cjk.CJKAnalyzer analyzer the extracted Lucene terms are "圖", "圖書", "書", "書館", "館".

Re: Chinese search problem

PostPosted:Tue Apr 09, 2013 10:52 pm
by taihung
Thanks for your reply.

"圖書館" means "library".
"圖書" and "書" mean "books".
"館" means "building" or "institution"

So it looks like not openkm's problem,right?
but why it returns no result in advanced mode?

Re: Chinese search problem

PostPosted:Wed Apr 10, 2013 9:16 am
by pavila
Because when use simple search really is searching "圖" and "書" and "館", but when using advanced search is looking for "圖書館" as a whole word.

Re: Chinese search problem

PostPosted:Mon Feb 24, 2014 8:11 am
by Miesto
Which folder I can add "hibernate.search.analyzer=org.apache.lucene.analysis.cn.ChineseAnalyzer"?
At "Administration>Config" or "OpenKM\tomcat\OpenKM.cfg" ?

Re: Chinese search problem

PostPosted:Sun Mar 23, 2014 5:47 am
by pavila
"hibernate.search.analyzer=org.apache.lucene.analysis.cn.ChineseAnalyzer" should be added to "$TOMCAT_HOME/OpenKM.cfg". After that, go to Administration > Utilities > Rebuild indexes because if the analyzer is changed, the repository have to be re-indexed.