• Content Search not working for 6.3.0 for special characters

  • OpenKM has many interesting features, but requires some configuration process to show its full potential.
OpenKM has many interesting features, but requires some configuration process to show its full potential.
Forum rules: Please, before asking something see the documentation wiki or use the search feature of the forum. And remember we don't have a crystal ball or mental readers, so if you post about an issue tell us which OpenKM are you using and also the browser and operating system version. For more info read How to Report Bugs Effectively.
 #31094  by vaibhavk
 
Hi OpenKM Support Team,

We have OpenKM Community Edition 6.3.0 installed on our machine. Browsers used: Mozilla Firefox, IE

In the content search, we used below search criteria.

Searching a text 422.50 returned a set of documents having text 422.50(a)(2), however when I use the text 422.50(a)(2) or 422.50(a) it does not return any results.

Is there any configuration that we need to do so that content with special characters like (,;"''{[ are returned in the results or special characters are not supported in Search.

Note: The text extraction job has completed for all the documents that have been uploaded. We are searching for PDF and word documents mostly.

Let us know if we only support 'Search for any word' or 'Search for exact word'

Also as per community site, latest build for Community available is 6.3.1, But we could only find download link for OpenKM version 6.3.0. Please let us know if 6.3.1 is available for download too.

Regards,
Vaibhav
 #31129  by jllort
 
Basically lucene search engine probably with default tokenizer is storing into the indexes as separated words. Normally default tokenizer goes right for almost people, but sometimes is interesting build your own or use other than default. Basically this classes take the text, and based on tokenizer split text in word, for example "some-text" will be separated in two words "some" and "text" because character "-" is considered as separator character.

I think you should use org.apache.lucene.analysis.WhitespaceAnalyzer what only considers white space as separator. And reindex whole repository to take it effect ( Go to administration -> Utilites -> rebuild indexes -> choose lucene indexes. ( for it before you must change the default analyzer, and restarted openkm )

Consider take a look here http://wiki.openkm.com/index.php/Indexing_configuration
 #40116  by vaibhavk
 
Hi,

We are using 6.2.5 OpenKM

i am uploading text document(.txt) with !#$%&'()+,-.0123456789 as a title

But when i am searching for the same it is giving error Please find the attached screen shot

So can you pleasse tell us what can be the problem
is there any limitation by OpenKM on speacial characters ' , ()


Please reply

Thanks
Attachments
OpenKM search.png
OpenKM search.png (25.97 KiB) Viewed 3352 times
 #40123  by jllort
 
Take in mind some characters are reservated and passed to lucene search engine. About special characters also take in mind, when text goes into lucene it passes across analyzer what really use only a couple of token for the indexing and other are discarted. The default analyzer can be changed for other or write your own. On almost cases default analyzer is right, but not always. Al depends on what you expect get from the search engine.

About Us

OpenKM is part of the management software. A management software is a program that facilitates the accomplishment of administrative tasks. OpenKM is a document management system that allows you to manage business content and workflow in a more efficient way. Document managers guarantee data protection by establishing information security for business content.