• Chinese search problem

  • We tried to make OpenKM as intuitive as possible, but an advice is always welcome.
We tried to make OpenKM as intuitive as possible, but an advice is always welcome.
Forum rules: Please, before asking something see the documentation wiki or use the search feature of the forum. And remember we don't have a crystal ball or mental readers, so if you post about an issue tell us which OpenKM are you using and also the browser and operating system version. For more info read How to Report Bugs Effectively.
 #21710  by taihung
 
We have OpenKM Community version 6.2.2 build 7815 installed on Windows Server 2003.
I found a search problem.
When I search in advanced mode, there is no search result after typing 圖書館 in content field.
If i type 圖 書 館 seperated by a space, there are many search results showing up.
But when I search 圖書館 via a single search box on the top right conner, it can function correctly.

I give it a try on openkm online demo using user accout today, it still have the same problem.
Would you please fix this bug?
Thanks.
 #21736  by jllort
 
It's not a bug, basically you have not configured the lucene search engine for your locale. You should use a chinese analyzer take a look here where's explained what you need to change http://wiki.openkm.com/index.php/Indexing_configuration ( You should google to search which is the name of chinese analyzer )
 #21748  by taihung
 
Thanks jllort,

According to http://lucene.apache.org/core/old_versi ... lyzer.html,
I've tried adding a line in OpenKM.cfg
hibernate.search.analyzer=org.apache.lucene.analysis.cn.ChineseAnalyzer
Then go to Administration > Utilities > Rebuild indexes.
But the problem still exists.

I've also changed hibernate.search.analyzer to org.apache.lucene.analysis.cjk.CJKAnalyzer,
it's still the same.

Should I copy files into openkm directory?

Thanks.
 #21773  by jllort
 
With rebuild lucene indexes should be enought.

Execute it from database query to be sure content has been indexed ( column NDC_TEXT ):
SELECT * FROM OKM_NODE_DOCUMENT;
 #21778  by pavila
 
Please, attach a sample file to test in my local installation so I can find the problem and fix it.
 #21921  by pavila
 
The file is not currently accesible from Dropbox. Please, attach to the forum as .doc or .zip
 #22313  by taihung
 
Iv'e tried to upload .doc file, it return error message: The extension doc is not allowed.
Then uploaded .zip file, still got error: "Sorry, the board attachment quota has been reached."

The file on dropbox can be download now. Thanks.
 #22349  by pavila
 
Sorry, I have increased the board attachment quota and also added the .doc extension to be able to attach.
 #22380  by pavila
 
According to Lucene, the "圖書館" string is splitted in several words. I have no idea of Chinese, so I need some clarifications: the string "圖書館" is a whole word or every ideogram ("圖" , "書" and "館") is a word?

For example, using the org.apache.lucene.analysis.cjk.CJKAnalyzer analyzer the extracted Lucene terms are "圖", "圖書", "書", "書館", "館".
 #22381  by taihung
 
Thanks for your reply.

"圖書館" means "library".
"圖書" and "書" mean "books".
"館" means "building" or "institution"

So it looks like not openkm's problem,right?
but why it returns no result in advanced mode?
 #22407  by pavila
 
Because when use simple search really is searching "圖" and "書" and "館", but when using advanced search is looking for "圖書館" as a whole word.
 #27916  by Miesto
 
Which folder I can add "hibernate.search.analyzer=org.apache.lucene.analysis.cn.ChineseAnalyzer"?
At "Administration>Config" or "OpenKM\tomcat\OpenKM.cfg" ?
 #28169  by pavila
 
"hibernate.search.analyzer=org.apache.lucene.analysis.cn.ChineseAnalyzer" should be added to "$TOMCAT_HOME/OpenKM.cfg". After that, go to Administration > Utilities > Rebuild indexes because if the analyzer is changed, the repository have to be re-indexed.

About Us

OpenKM is part of the management software. A management software is a program that facilitates the accomplishment of administrative tasks. OpenKM is a document management system that allows you to manage business content and workflow in a more efficient way. Document managers guarantee data protection by establishing information security for business content.