Page 1 of 1

6.3.3 how to automatically ocr uploaded images

PostPosted:Wed Apr 19, 2017 10:17 am
by MBeyer
Hi all,
in a fresh installation of 6.3.3 I want to automate different processes like perfoming an ocr on new scans.
In Administration:Automation I created a new rule
- OCR new File
In the rule definition I want to add actions and validations, but there are no options to choose any from the dropdown menus nether can I create a new action via the plus button.

Any advice?

cheers
Mirko

Re: 6.3.3 how to automatically ocr uploaded images

PostPosted:Fri Apr 21, 2017 6:38 pm
by jllort
All the documents go into text extraction queue, and are automatically processed for extracting the text contents. Only be sure you have installed the ocr engine and configured the system.ocr parameter.
Take a look :
https://docs.openkm.com/kcenter/view/ok ... ngine.html
https://docs.openkm.com/kcenter/view/ok ... eters.html

Re: 6.3.3 how to automatically ocr uploaded images

PostPosted:Sat Apr 22, 2017 2:30 pm
by MBeyer
Hi jllort,
thanks for the advice.
Ubuntu reports tesseract to be installed correctly
Code: Select all
tesseract-ocr is already the newest version (3.04.01-4).
tesseract-ocr-eng is already the newest version (3.04.00-1).
All config settings in OpenKM are set exactly as writen in the documentation post.

Still if I upload a png nothing seems to happen, clearly readable words in the document cannot be found via search?!

Were can I check the ocr result after upload a picture?

Re: 6.3.3 how to automatically ocr uploaded images

PostPosted:Sat Apr 22, 2017 8:20 pm
by jllort
1- Take a look here https://docs.openkm.com/kcenter/view/ok ... ctionqueue ( if document is still in queue ).
2- Read information about main database tables at https://docs.openkm.com/kcenter/view/ok ... ption.html . You should be interested in OKM_DOCUMENT_TABLE and specially in two columns NDC_TEXT_EXTRACTED ='T' indicate the process has finished and NDC_TEXT has the string text extracted.
3- Finally from List indexes feature you can see the terms for what is indexed into the lucene search engine https://docs.openkm.com/kcenter/view/ok ... dexes.html

Re: 6.3.3 how to automatically ocr uploaded images

PostPosted:Sun Apr 23, 2017 9:38 pm
by MBeyer
Hi,
thanks for the hints.

Concerning the 1. link, the pending task queue is empty.

Can you be more specific on the database topic, please. I read the docs but did not get the point were to find the database nodes or how to use them to get the ocr job runing.

The List Indexes seems to be empty, so it looks like there was no ocr job runing ...

Re: 6.3.3 how to automatically ocr uploaded images

PostPosted:Tue Apr 25, 2017 5:51 pm
by jllort
Execute the next query to force all documents going to the text extraction queue again
Code: Select all
UPDATE OKM_NODE_DOCUMENT SET NDC_TEXT_EXTRACTED ='F'