Open Source Document Management System | OpenKM

PostPosted:**Wed Oct 17, 2012 3:27 pm**

I am trying to use tesseract to extract text from jpg file. I have tried it by hand and that works fine. I have configured

Code: Select all

system.ocr /usr/bin/tesseract ${fileIn} ${fileOut}

i get the following error in catalina.log:

Code: Select all

[Text Extractor Worker] WARN  com.openkm.dao.NodeDocumentDAO - There was a problem extracting text from '/okm:root/sb/001.jpg': Too few text extracted

Any thought on want is the problem?

PostPosted:**Thu Oct 18, 2012 8:21 am**

Could be a image resolution problem ( few for this ocr engine and extract few characters )

Sometimes is better cuneiform, OCR installation is not trivial should be done some test with several documents to determine which is the best in your environement. It's important to know if all imagemagick libraries are correctly installed. Test can be directly from terminal. After tests can determine which ocr use.

Open Source Document Management System | OpenKM

OCR not working

OCR not working

Re: OCR not working