Hi,
I am currently evaluating this software. I have found conflicting documentation on the proper way to setup the OCR functionality and configure a OCR Template. I am running Ubuntu 14.04 LTS 64bit. I wanted to see if someone could point me in the right direction. Here are the settings I have:
OpenKM version 6.4.15 (installed path) /opt/openkm-6.4.15
OCR Software installed Tesseract (executable path) /usr/bin/
I have the following configuration settings:
Under Administration:Configuration
registered.text.extractors: Removed com.openkm.extractor.OCRTextExtractor entry and inserted com.openkm.extractor.Tesseract3TextExtractor
system.ocr: I have tried multiple entries here they are:
/usr/bin/tesseract
/usr/bin/tesseract ${fileIn} ${fileOut}
/usr/bin/tesseract ${fileIn} ${fileOut} -l eng
Note: Each change I have restarted the OpenKM service
Here are the links to the documentation I have used:
http://wiki.openkm.com/index.php/Third- ... ation:_OCR
http://wiki.openkm.com/index.php/Applic ... abling_OCR
http://wiki.openkm.com/index.php/Third- ... _Tesseract
https://www.youtube.com/watch?v=pmaPi-0O7Gs (OpenKM - zonal ocr ( english ) demo)
I go to create an OCR Template by performing the following:
Uploaded JPG file as I could not get a Tiff or PDF to work (error messages after attempting to upload Tiff or PDF).
Applied existing properties value of: okp:consulting.name (just to test)
Enable "active" checkbox
Define OCR Template Definition:
Name: Client Name Template
Type: String
Property: okp:consulting.name
Pattern: Left Blank
Rotation: 0 (default value)
OCR: Left Blank for now (tried the parameters mentioned above)
Use to Recognise: Enabled Check box
Zone: Identified field I wanted to capture from document
Now when I go to check the document I get an error message:
Class: java.lang.runtimeexception
Message: IO exception executing command:-crop 580x120+15550+4035 /opt/openkm-6.4.15/tomcat/temp/okmXXXX.jpg /opt/openkm-6.4.15/temp/okmXXXX.jpg
Date: XXXXX
Questions:
What are the proper configuration settings to use for the OCR Tesseract software?
Where do you put the configuration settings in the OpenKM.cfg file, in Administration;Configuration section on the web interface or both? I have tried multiple times using a combination of each
In regards to the error message mentioned above when checking my OCR Template, how do I go about resolving that?
I'm sure I am missing something obvious but I'm really confused as to what documentation is accurate. Any help would be greatly appreciated.
I am currently evaluating this software. I have found conflicting documentation on the proper way to setup the OCR functionality and configure a OCR Template. I am running Ubuntu 14.04 LTS 64bit. I wanted to see if someone could point me in the right direction. Here are the settings I have:
OpenKM version 6.4.15 (installed path) /opt/openkm-6.4.15
OCR Software installed Tesseract (executable path) /usr/bin/
I have the following configuration settings:
Under Administration:Configuration
registered.text.extractors: Removed com.openkm.extractor.OCRTextExtractor entry and inserted com.openkm.extractor.Tesseract3TextExtractor
system.ocr: I have tried multiple entries here they are:
/usr/bin/tesseract
/usr/bin/tesseract ${fileIn} ${fileOut}
/usr/bin/tesseract ${fileIn} ${fileOut} -l eng
Note: Each change I have restarted the OpenKM service
Here are the links to the documentation I have used:
http://wiki.openkm.com/index.php/Third- ... ation:_OCR
http://wiki.openkm.com/index.php/Applic ... abling_OCR
http://wiki.openkm.com/index.php/Third- ... _Tesseract
https://www.youtube.com/watch?v=pmaPi-0O7Gs (OpenKM - zonal ocr ( english ) demo)
I go to create an OCR Template by performing the following:
Uploaded JPG file as I could not get a Tiff or PDF to work (error messages after attempting to upload Tiff or PDF).
Applied existing properties value of: okp:consulting.name (just to test)
Enable "active" checkbox
Define OCR Template Definition:
Name: Client Name Template
Type: String
Property: okp:consulting.name
Pattern: Left Blank
Rotation: 0 (default value)
OCR: Left Blank for now (tried the parameters mentioned above)
Use to Recognise: Enabled Check box
Zone: Identified field I wanted to capture from document
Now when I go to check the document I get an error message:
Class: java.lang.runtimeexception
Message: IO exception executing command:-crop 580x120+15550+4035 /opt/openkm-6.4.15/tomcat/temp/okmXXXX.jpg /opt/openkm-6.4.15/temp/okmXXXX.jpg
Date: XXXXX
Questions:
What are the proper configuration settings to use for the OCR Tesseract software?
Where do you put the configuration settings in the OpenKM.cfg file, in Administration;Configuration section on the web interface or both? I have tried multiple times using a combination of each
In regards to the error message mentioned above when checking my OCR Template, how do I go about resolving that?
I'm sure I am missing something obvious but I'm really confused as to what documentation is accurate. Any help would be greatly appreciated.
