Open Source Document Management System | OpenKM

Because information matters

OCR not working
OpenKM has many interesting features, but requires some configuration process to show its full potential.

Board index
OpenKM English Users Configuration « You are here

OCR not working

Forum rules: Please, before asking something see the documentation wiki or use the search feature of the forum. And remember we don't have a crystal ball or mental readers, so if you post about an issue tell us which OpenKM are you using and also the browser and operating system version. For more info read How to Report Bugs Effectively.

2 posts

2 posts

OCR not working

#18779 by sorenbronsted
Wed Oct 17, 2012 3:27 pm

I am trying to use tesseract to extract text from jpg file. I have tried it by hand and that works fine. I have configured

Code: Select all

system.ocr /usr/bin/tesseract ${fileIn} ${fileOut}

i get the following error in catalina.log:

Code: Select all

[Text Extractor Worker] WARN  com.openkm.dao.NodeDocumentDAO - There was a problem extracting text from '/okm:root/sb/001.jpg': Too few text extracted

Any thought on want is the problem?

Username

sorenbronsted

Rank

Fresh Boarder

Posts

Joined

Wed Oct 17, 2012 1:57 pm

Re: OCR not working

#18790 by jllort
Thu Oct 18, 2012 8:21 am

Could be a image resolution problem ( few for this ocr engine and extract few characters )

Sometimes is better cuneiform, OCR installation is not trivial should be done some test with several documents to determine which is the best in your environement. It's important to know if all imagemagick libraries are correctly installed. Test can be directly from terminal. After tests can determine which ocr use.

Username

jllort

Rank

Moderator

Posts

12182

Joined

Fri Dec 21, 2007 11:23 am

Location

Sineu - ( Illes Balears ) - Spain

Contact

Page 1 of 1
2 posts

Return to “Configuration”

Display:

Sort by:

Jump to:

- All times are UTC -