Open Source Document Management System | OpenKM - OCR Configuration trouble on Trial Pro 6.4.15

OCR Configuration trouble on Trial Pro 6.4.15

Forum rules: Please, before asking something see the documentation wiki or use the search feature of the forum. And remember we don't have a crystal ball or mental readers, so if you post about an issue tell us which OpenKM are you using and also the browser and operating system version. For more info read How to Report Bugs Effectively.

3 posts

3 posts

OCR Configuration trouble on Trial Pro 6.4.15

#30881 by b33gopher
Thu Jan 08, 2015 9:57 pm

Hi,

I am currently evaluating this software. I have found conflicting documentation on the proper way to setup the OCR functionality and configure a OCR Template. I am running Ubuntu 14.04 LTS 64bit. I wanted to see if someone could point me in the right direction. Here are the settings I have:

OpenKM version 6.4.15 (installed path) /opt/openkm-6.4.15
OCR Software installed Tesseract (executable path) /usr/bin/

I have the following configuration settings:

Under Administration:Configuration
registered.text.extractors: Removed com.openkm.extractor.OCRTextExtractor entry and inserted com.openkm.extractor.Tesseract3TextExtractor
system.ocr: I have tried multiple entries here they are:
/usr/bin/tesseract
/usr/bin/tesseract ${fileIn} ${fileOut}
/usr/bin/tesseract ${fileIn} ${fileOut} -l eng

Note: Each change I have restarted the OpenKM service

Here are the links to the documentation I have used:
http://wiki.openkm.com/index.php/Third- ... ation:_OCR
http://wiki.openkm.com/index.php/Applic ... abling_OCR
http://wiki.openkm.com/index.php/Third- ... _Tesseract
https://www.youtube.com/watch?v=pmaPi-0O7Gs (OpenKM - zonal ocr ( english ) demo)

I go to create an OCR Template by performing the following:

Uploaded JPG file as I could not get a Tiff or PDF to work (error messages after attempting to upload Tiff or PDF).
Applied existing properties value of: okp:consulting.name (just to test)
Enable "active" checkbox

Define OCR Template Definition:

Name: Client Name Template
Type: String
Property: okp:consulting.name
Pattern: Left Blank
Rotation: 0 (default value)
OCR: Left Blank for now (tried the parameters mentioned above)
Use to Recognise: Enabled Check box
Zone: Identified field I wanted to capture from document

Now when I go to check the document I get an error message:

Class: java.lang.runtimeexception
Message: IO exception executing command:-crop 580x120+15550+4035 /opt/openkm-6.4.15/tomcat/temp/okmXXXX.jpg /opt/openkm-6.4.15/temp/okmXXXX.jpg
Date: XXXXX

Questions:
What are the proper configuration settings to use for the OCR Tesseract software?
Where do you put the configuration settings in the OpenKM.cfg file, in Administration;Configuration section on the web interface or both? I have tried multiple times using a combination of each
In regards to the error message mentioned above when checking my OCR Template, how do I go about resolving that?

I'm sure I am missing something obvious but I'm really confused as to what documentation is accurate. Any help would be greatly appreciated.

Username

b33gopher

Rank

Fresh Boarder

Posts

Joined

Wed Jan 07, 2015 5:46 pm

Re: OCR Configuration trouble on Trial Pro 6.4.15

#30899 by jllort
Sun Jan 11, 2015 12:02 pm

Did you installed imagemagick ?
My suggestion for this advanced testing is, contact with our sales & marketing team and they can provide you for some weeks one of our online demo ( there's all well installed and tested and you must not break your head for it ). Also if you provide us some samples of the document you want to extract data, fastly we can tell you if there's some problem on them. Contact url is http://www.openkm.com/en/contact.html

Take in mind, to understand in deep how OCR zone goes - to our partners with development skill - we need 1-2 hours ( to take real advantage of zone ocr must understand how going plugins and how extend it to make your own. In some cases is needed for better recognize. Obviously you will not find information about it on videos etc... because is quite difficult explain in 10-15 minutes, and a 2 hours video will be a waste of time. At the present we solve it with direct meeting when the partner or the customer really wants to take advantage of all the feature possibilities ).

Username

jllort

Rank

Moderator

Posts

12185

Joined

Fri Dec 21, 2007 11:23 am

Location

Sineu - ( Illes Balears ) - Spain

Contact

Re: OCR Configuration trouble on Trial Pro 6.4.15

#30909 by b33gopher
Mon Jan 12, 2015 3:40 pm

I did not install imagemagick. I'm not familiar with the software but can look into it. I did contact Biel but he referred me to the forum for help. I'll use the link you provided and go from there for further assistance. Thanks for your help.

Username

b33gopher

Rank

Fresh Boarder

Posts

Joined

Wed Jan 07, 2015 5:46 pm

Page 1 of 1
3 posts

Return to “Configuration”

Display:

Sort by:

Jump to: