Page 1 of 1

OCR Configuration trouble on Trial Pro 6.4.15

PostPosted:Thu Jan 08, 2015 9:57 pm
by b33gopher
Hi,

I am currently evaluating this software. I have found conflicting documentation on the proper way to setup the OCR functionality and configure a OCR Template. I am running Ubuntu 14.04 LTS 64bit. I wanted to see if someone could point me in the right direction. Here are the settings I have:

OpenKM version 6.4.15 (installed path) /opt/openkm-6.4.15
OCR Software installed Tesseract (executable path) /usr/bin/

I have the following configuration settings:

Under Administration:Configuration
registered.text.extractors: Removed com.openkm.extractor.OCRTextExtractor entry and inserted com.openkm.extractor.Tesseract3TextExtractor
system.ocr: I have tried multiple entries here they are:
/usr/bin/tesseract
/usr/bin/tesseract ${fileIn} ${fileOut}
/usr/bin/tesseract ${fileIn} ${fileOut} -l eng

Note: Each change I have restarted the OpenKM service

Here are the links to the documentation I have used:
http://wiki.openkm.com/index.php/Third- ... ation:_OCR
http://wiki.openkm.com/index.php/Applic ... abling_OCR
http://wiki.openkm.com/index.php/Third- ... _Tesseract
https://www.youtube.com/watch?v=pmaPi-0O7Gs (OpenKM - zonal ocr ( english ) demo)


I go to create an OCR Template by performing the following:

Uploaded JPG file as I could not get a Tiff or PDF to work (error messages after attempting to upload Tiff or PDF).
Applied existing properties value of: okp:consulting.name (just to test)
Enable "active" checkbox

Define OCR Template Definition:

Name: Client Name Template
Type: String
Property: okp:consulting.name
Pattern: Left Blank
Rotation: 0 (default value)
OCR: Left Blank for now (tried the parameters mentioned above)
Use to Recognise: Enabled Check box
Zone: Identified field I wanted to capture from document


Now when I go to check the document I get an error message:

Class: java.lang.runtimeexception
Message: IO exception executing command:-crop 580x120+15550+4035 /opt/openkm-6.4.15/tomcat/temp/okmXXXX.jpg /opt/openkm-6.4.15/temp/okmXXXX.jpg
Date: XXXXX

Questions:
What are the proper configuration settings to use for the OCR Tesseract software?
Where do you put the configuration settings in the OpenKM.cfg file, in Administration;Configuration section on the web interface or both? I have tried multiple times using a combination of each
In regards to the error message mentioned above when checking my OCR Template, how do I go about resolving that?

I'm sure I am missing something obvious but I'm really confused as to what documentation is accurate. Any help would be greatly appreciated.

Re: OCR Configuration trouble on Trial Pro 6.4.15

PostPosted:Sun Jan 11, 2015 12:02 pm
by jllort
Did you installed imagemagick ?
My suggestion for this advanced testing is, contact with our sales & marketing team and they can provide you for some weeks one of our online demo ( there's all well installed and tested and you must not break your head for it ). Also if you provide us some samples of the document you want to extract data, fastly we can tell you if there's some problem on them. Contact url is http://www.openkm.com/en/contact.html

Take in mind, to understand in deep how OCR zone goes - to our partners with development skill - we need 1-2 hours ( to take real advantage of zone ocr must understand how going plugins and how extend it to make your own. In some cases is needed for better recognize. Obviously you will not find information about it on videos etc... because is quite difficult explain in 10-15 minutes, and a 2 hours video will be a waste of time. At the present we solve it with direct meeting when the partner or the customer really wants to take advantage of all the feature possibilities ).

Re: OCR Configuration trouble on Trial Pro 6.4.15

PostPosted:Mon Jan 12, 2015 3:40 pm
by b33gopher
I did not install imagemagick. I'm not familiar with the software but can look into it. I did contact Biel but he referred me to the forum for help. I'll use the link you provided and go from there for further assistance. Thanks for your help.