• 6.3.3 how to automatically ocr uploaded images

  • We tried to make OpenKM as intuitive as possible, but an advice is always welcome.
We tried to make OpenKM as intuitive as possible, but an advice is always welcome.
Forum rules: Please, before asking something see the documentation wiki or use the search feature of the forum. And remember we don't have a crystal ball or mental readers, so if you post about an issue tell us which OpenKM are you using and also the browser and operating system version. For more info read How to Report Bugs Effectively.
 #43637  by MBeyer
 
Hi all,
in a fresh installation of 6.3.3 I want to automate different processes like perfoming an ocr on new scans.
In Administration:Automation I created a new rule
- OCR new File
In the rule definition I want to add actions and validations, but there are no options to choose any from the dropdown menus nether can I create a new action via the plus button.

Any advice?

cheers
Mirko
 #43650  by jllort
 
All the documents go into text extraction queue, and are automatically processed for extracting the text contents. Only be sure you have installed the ocr engine and configured the system.ocr parameter.
Take a look :
https://docs.openkm.com/kcenter/view/ok ... ngine.html
https://docs.openkm.com/kcenter/view/ok ... eters.html
 #43662  by MBeyer
 
Hi jllort,
thanks for the advice.
Ubuntu reports tesseract to be installed correctly
Code: Select all
tesseract-ocr is already the newest version (3.04.01-4).
tesseract-ocr-eng is already the newest version (3.04.00-1).
All config settings in OpenKM are set exactly as writen in the documentation post.

Still if I upload a png nothing seems to happen, clearly readable words in the document cannot be found via search?!

Were can I check the ocr result after upload a picture?
 #43666  by jllort
 
1- Take a look here https://docs.openkm.com/kcenter/view/ok ... ctionqueue ( if document is still in queue ).
2- Read information about main database tables at https://docs.openkm.com/kcenter/view/ok ... ption.html . You should be interested in OKM_DOCUMENT_TABLE and specially in two columns NDC_TEXT_EXTRACTED ='T' indicate the process has finished and NDC_TEXT has the string text extracted.
3- Finally from List indexes feature you can see the terms for what is indexed into the lucene search engine https://docs.openkm.com/kcenter/view/ok ... dexes.html
 #43676  by MBeyer
 
Hi,
thanks for the hints.

Concerning the 1. link, the pending task queue is empty.

Can you be more specific on the database topic, please. I read the docs but did not get the point were to find the database nodes or how to use them to get the ocr job runing.

The List Indexes seems to be empty, so it looks like there was no ocr job runing ...

About Us

OpenKM is part of the management software. A management software is a program that facilitates the accomplishment of administrative tasks. OpenKM is a document management system that allows you to manage business content and workflow in a more efficient way. Document managers guarantee data protection by establishing information security for business content.