Page 1 of 1

Full-text search has very poor performance. How to improve?

PostPosted:Fri Aug 12, 2022 1:58 pm
by snowman
Hello,

my repository contains only PDFs OCR'ed by Abbyy. In Acrobat Reader I can find all kinds of keywords. However, the search of OpenKM does not recognize many of them.
When I import my repository into nextcloud all keywords are found using Elasticsearch as search backend so it is possible to have good performance on my repository.

How can I improve the search?

P.S.: Language is German.

Re: Full-text search has very poor performance. How to improve?

PostPosted:Tue Aug 16, 2022 7:14 am
by jllort
Should set the focus in a specific document -> then from Administration > Tools > Check text extraction evaluate the plugin and the text extracted.
* Check what plugin has been used to extract contents
* Check if all the text have been extracted

Re: Full-text search has very poor performance. How to improve?

PostPosted:Sat Sep 17, 2022 1:27 pm
by snowman
I hope I did the right thing. I selected a document, copied the uuid, went to Administration > Utilities > Check text extraction > entered the uuid > pushed check.

Result is a measured time: Time: 00:00:00.000
and a table with two columns:

application/pdf | com.openkm.extractor.AbbyTextExtractor

white empty field below.

I guess no text is extracted.

Re: Full-text search has very poor performance. How to improve?

PostPosted:Mon Oct 03, 2022 7:26 am
by jllort
AbbyTextExtractor should be removed and have enabled only the TeserractTextExtractor. Ensure TesseractTextExtractor is enabled and AbbyTextExtractor disabled. Then update the table OKM_NODE_DOCUMENT and set value 'F' to the column OKM_NODE_DOCUMENT -> that will set all the documents in the extraction queue again.

Anyway, I suggest you check extraction in a document from administration again, until the extraction working from there will not work from the background.