Page 1 of 1

Content Search not working for 6.3.0 for special characters

PostPosted:Fri Jan 30, 2015 5:10 am
by vaibhavk
Hi OpenKM Support Team,

We have OpenKM Community Edition 6.3.0 installed on our machine. Browsers used: Mozilla Firefox, IE

In the content search, we used below search criteria.

Searching a text 422.50 returned a set of documents having text 422.50(a)(2), however when I use the text 422.50(a)(2) or 422.50(a) it does not return any results.

Is there any configuration that we need to do so that content with special characters like (,;"''{[ are returned in the results or special characters are not supported in Search.

Note: The text extraction job has completed for all the documents that have been uploaded. We are searching for PDF and word documents mostly.

Let us know if we only support 'Search for any word' or 'Search for exact word'

Also as per community site, latest build for Community available is 6.3.1, But we could only find download link for OpenKM version 6.3.0. Please let us know if 6.3.1 is available for download too.

Regards,
Vaibhav

Re: Content Search not working for 6.3.0 for special characters

PostPosted:Sun Feb 01, 2015 1:17 pm
by jllort
Basically lucene search engine probably with default tokenizer is storing into the indexes as separated words. Normally default tokenizer goes right for almost people, but sometimes is interesting build your own or use other than default. Basically this classes take the text, and based on tokenizer split text in word, for example "some-text" will be separated in two words "some" and "text" because character "-" is considered as separator character.

I think you should use org.apache.lucene.analysis.WhitespaceAnalyzer what only considers white space as separator. And reindex whole repository to take it effect ( Go to administration -> Utilites -> rebuild indexes -> choose lucene indexes. ( for it before you must change the default analyzer, and restarted openkm )

Consider take a look here http://wiki.openkm.com/index.php/Indexing_configuration

Re: Content Search not working for special characters

PostPosted:Fri Jul 17, 2015 12:06 pm
by vaibhavk
Hi,

We are using 6.2.5 OpenKM

i am uploading text document(.txt) with !#$%&'()+,-.0123456789 as a title

But when i am searching for the same it is giving error Please find the attached screen shot

So can you pleasse tell us what can be the problem
is there any limitation by OpenKM on speacial characters ' , ()


Please reply

Thanks

Re: Content Search not working for 6.3.0 for special characters

PostPosted:Sun Jul 19, 2015 10:17 am
by jllort
Take in mind some characters are reservated and passed to lucene search engine. About special characters also take in mind, when text goes into lucene it passes across analyzer what really use only a couple of token for the indexing and other are discarted. The default analyzer can be changed for other or write your own. On almost cases default analyzer is right, but not always. Al depends on what you expect get from the search engine.

Re: Content Search not working for 6.3.0 for special characters

PostPosted:Fri Jul 31, 2015 12:32 pm
by pavila
Please, try to reproduce the issue with a recent night build from http://integration.openkm.com/6.3/