Page 1 of 1
Full-text search has very poor performance. How to improve?
PostPosted:Fri Aug 12, 2022 1:58 pm
by snowman
Hello,
my repository contains only PDFs OCR'ed by Abbyy. In Acrobat Reader I can find all kinds of keywords. However, the search of OpenKM does not recognize many of them.
When I import my repository into nextcloud all keywords are found using Elasticsearch as search backend so it is possible to have good performance on my repository.
How can I improve the search?
P.S.: Language is German.
Re: Full-text search has very poor performance. How to improve?
PostPosted:Tue Aug 16, 2022 7:14 am
by jllort
Should set the focus in a specific document -> then from Administration > Tools > Check text extraction evaluate the plugin and the text extracted.
* Check what plugin has been used to extract contents
* Check if all the text have been extracted
Re: Full-text search has very poor performance. How to improve?
PostPosted:Sat Sep 17, 2022 1:27 pm
by snowman
I hope I did the right thing. I selected a document, copied the uuid, went to Administration > Utilities > Check text extraction > entered the uuid > pushed check.
Result is a measured time: Time: 00:00:00.000
and a table with two columns:
application/pdf | com.openkm.extractor.AbbyTextExtractor
white empty field below.
I guess no text is extracted.
Re: Full-text search has very poor performance. How to improve?
PostPosted:Mon Oct 03, 2022 7:26 am
by jllort
AbbyTextExtractor should be removed and have enabled only the TeserractTextExtractor. Ensure TesseractTextExtractor is enabled and AbbyTextExtractor disabled. Then update the table OKM_NODE_DOCUMENT and set value 'F' to the column OKM_NODE_DOCUMENT -> that will set all the documents in the extraction queue again.
Anyway, I suggest you check extraction in a document from administration again, until the extraction working from there will not work from the background.