• Full-text search has very poor performance. How to improve?

  • OpenKM has many interesting features, but requires some configuration process to show its full potential.
OpenKM has many interesting features, but requires some configuration process to show its full potential.
Forum rules: Please, before asking something see the documentation wiki or use the search feature of the forum. And remember we don't have a crystal ball or mental readers, so if you post about an issue tell us which OpenKM are you using and also the browser and operating system version. For more info read How to Report Bugs Effectively.
 #53785  by snowman
 
Hello,

my repository contains only PDFs OCR'ed by Abbyy. In Acrobat Reader I can find all kinds of keywords. However, the search of OpenKM does not recognize many of them.
When I import my repository into nextcloud all keywords are found using Elasticsearch as search backend so it is possible to have good performance on my repository.

How can I improve the search?

P.S.: Language is German.
 #53790  by jllort
 
Should set the focus in a specific document -> then from Administration > Tools > Check text extraction evaluate the plugin and the text extracted.
* Check what plugin has been used to extract contents
* Check if all the text have been extracted
 #53842  by snowman
 
I hope I did the right thing. I selected a document, copied the uuid, went to Administration > Utilities > Check text extraction > entered the uuid > pushed check.

Result is a measured time: Time: 00:00:00.000
and a table with two columns:

application/pdf | com.openkm.extractor.AbbyTextExtractor

white empty field below.

I guess no text is extracted.
 #53866  by jllort
 
AbbyTextExtractor should be removed and have enabled only the TeserractTextExtractor. Ensure TesseractTextExtractor is enabled and AbbyTextExtractor disabled. Then update the table OKM_NODE_DOCUMENT and set value 'F' to the column OKM_NODE_DOCUMENT -> that will set all the documents in the extraction queue again.

Anyway, I suggest you check extraction in a document from administration again, until the extraction working from there will not work from the background.

About Us

OpenKM is part of the management software. A management software is a program that facilitates the accomplishment of administrative tasks. OpenKM is a document management system that allows you to manage business content and workflow in a more efficient way. Document managers guarantee data protection by establishing information security for business content.