Page 1 of 1

OCR not working

PostPosted:Wed Oct 17, 2012 3:27 pm
by sorenbronsted
I am trying to use tesseract to extract text from jpg file. I have tried it by hand and that works fine. I have configured
Code: Select all
system.ocr /usr/bin/tesseract ${fileIn} ${fileOut}
i get the following error in catalina.log:
Code: Select all
[Text Extractor Worker] WARN  com.openkm.dao.NodeDocumentDAO - There was a problem extracting text from '/okm:root/sb/001.jpg': Too few text extracted
Any thought on want is the problem?

Re: OCR not working

PostPosted:Thu Oct 18, 2012 8:21 am
by jllort
Could be a image resolution problem ( few for this ocr engine and extract few characters )

Sometimes is better cuneiform, OCR installation is not trivial should be done some test with several documents to determine which is the best in your environement. It's important to know if all imagemagick libraries are correctly installed. Test can be directly from terminal. After tests can determine which ocr use.