• OCR Configuration trouble on Trial Pro 6.4.15

  • OpenKM has many interesting features, but requires some configuration process to show its full potential.
OpenKM has many interesting features, but requires some configuration process to show its full potential.
Forum rules: Please, before asking something see the documentation wiki or use the search feature of the forum. And remember we don't have a crystal ball or mental readers, so if you post about an issue tell us which OpenKM are you using and also the browser and operating system version. For more info read How to Report Bugs Effectively.
 #30881  by b33gopher
 
Hi,

I am currently evaluating this software. I have found conflicting documentation on the proper way to setup the OCR functionality and configure a OCR Template. I am running Ubuntu 14.04 LTS 64bit. I wanted to see if someone could point me in the right direction. Here are the settings I have:

OpenKM version 6.4.15 (installed path) /opt/openkm-6.4.15
OCR Software installed Tesseract (executable path) /usr/bin/

I have the following configuration settings:

Under Administration:Configuration
registered.text.extractors: Removed com.openkm.extractor.OCRTextExtractor entry and inserted com.openkm.extractor.Tesseract3TextExtractor
system.ocr: I have tried multiple entries here they are:
/usr/bin/tesseract
/usr/bin/tesseract ${fileIn} ${fileOut}
/usr/bin/tesseract ${fileIn} ${fileOut} -l eng

Note: Each change I have restarted the OpenKM service

Here are the links to the documentation I have used:
http://wiki.openkm.com/index.php/Third- ... ation:_OCR
http://wiki.openkm.com/index.php/Applic ... abling_OCR
http://wiki.openkm.com/index.php/Third- ... _Tesseract
https://www.youtube.com/watch?v=pmaPi-0O7Gs (OpenKM - zonal ocr ( english ) demo)


I go to create an OCR Template by performing the following:

Uploaded JPG file as I could not get a Tiff or PDF to work (error messages after attempting to upload Tiff or PDF).
Applied existing properties value of: okp:consulting.name (just to test)
Enable "active" checkbox

Define OCR Template Definition:

Name: Client Name Template
Type: String
Property: okp:consulting.name
Pattern: Left Blank
Rotation: 0 (default value)
OCR: Left Blank for now (tried the parameters mentioned above)
Use to Recognise: Enabled Check box
Zone: Identified field I wanted to capture from document


Now when I go to check the document I get an error message:

Class: java.lang.runtimeexception
Message: IO exception executing command:-crop 580x120+15550+4035 /opt/openkm-6.4.15/tomcat/temp/okmXXXX.jpg /opt/openkm-6.4.15/temp/okmXXXX.jpg
Date: XXXXX

Questions:
What are the proper configuration settings to use for the OCR Tesseract software?
Where do you put the configuration settings in the OpenKM.cfg file, in Administration;Configuration section on the web interface or both? I have tried multiple times using a combination of each
In regards to the error message mentioned above when checking my OCR Template, how do I go about resolving that?

I'm sure I am missing something obvious but I'm really confused as to what documentation is accurate. Any help would be greatly appreciated.
 #30899  by jllort
 
Did you installed imagemagick ?
My suggestion for this advanced testing is, contact with our sales & marketing team and they can provide you for some weeks one of our online demo ( there's all well installed and tested and you must not break your head for it ). Also if you provide us some samples of the document you want to extract data, fastly we can tell you if there's some problem on them. Contact url is http://www.openkm.com/en/contact.html

Take in mind, to understand in deep how OCR zone goes - to our partners with development skill - we need 1-2 hours ( to take real advantage of zone ocr must understand how going plugins and how extend it to make your own. In some cases is needed for better recognize. Obviously you will not find information about it on videos etc... because is quite difficult explain in 10-15 minutes, and a 2 hours video will be a waste of time. At the present we solve it with direct meeting when the partner or the customer really wants to take advantage of all the feature possibilities ).
 #30909  by b33gopher
 
I did not install imagemagick. I'm not familiar with the software but can look into it. I did contact Biel but he referred me to the forum for help. I'll use the link you provided and go from there for further assistance. Thanks for your help.

About Us

OpenKM is part of the management software. A management software is a program that facilitates the accomplishment of administrative tasks. OpenKM is a document management system that allows you to manage business content and workflow in a more efficient way. Document managers guarantee data protection by establishing information security for business content.