• OCR not working

  • OpenKM has many interesting features, but requires some configuration process to show its full potential.
OpenKM has many interesting features, but requires some configuration process to show its full potential.
Forum rules: Please, before asking something see the documentation wiki or use the search feature of the forum. And remember we don't have a crystal ball or mental readers, so if you post about an issue tell us which OpenKM are you using and also the browser and operating system version. For more info read How to Report Bugs Effectively.
 #18779  by sorenbronsted
 
I am trying to use tesseract to extract text from jpg file. I have tried it by hand and that works fine. I have configured
Code: Select all
system.ocr /usr/bin/tesseract ${fileIn} ${fileOut}
i get the following error in catalina.log:
Code: Select all
[Text Extractor Worker] WARN  com.openkm.dao.NodeDocumentDAO - There was a problem extracting text from '/okm:root/sb/001.jpg': Too few text extracted
Any thought on want is the problem?
 #18790  by jllort
 
Could be a image resolution problem ( few for this ocr engine and extract few characters )

Sometimes is better cuneiform, OCR installation is not trivial should be done some test with several documents to determine which is the best in your environement. It's important to know if all imagemagick libraries are correctly installed. Test can be directly from terminal. After tests can determine which ocr use.

About Us

OpenKM is part of the management software. A management software is a program that facilitates the accomplishment of administrative tasks. OpenKM is a document management system that allows you to manage business content and workflow in a more efficient way. Document managers guarantee data protection by establishing information security for business content.