• How to OCR?

  • OpenKM has many interesting features, but requires some configuration process to show its full potential.
OpenKM has many interesting features, but requires some configuration process to show its full potential.
Forum rules: Please, before asking something see the documentation wiki or use the search feature of the forum. And remember we don't have a crystal ball or mental readers, so if you post about an issue tell us which OpenKM are you using and also the browser and operating system version. For more info read How to Report Bugs Effectively.
 #29652  by Fohnbit
 
Hello!

I setup OCR according this: http://wiki.openkm.com/index.php/Applic ... abling_OCR
I use Cuneiform
system.ocr: /usr/bin/cuneiform ${fileIn} -o ${fileOut}
system.ocr.rotate:
system.pdf.force.ocr: Active

But when he made the ocr?

Must I set a automatic rule? But in the Dropdown Lists I have no Option for OCR.
I also miss the OCR Button in the Admin menu

Thank you!
 #29653  by Fohnbit
 
I made some tests now.
With tesseract I get much better results at Linux commandline.
But all PDF has first convert in 300 dpi png files, because he cant proceed with 1.4 PDF´s

How can I reach this?

Thank you!
 #29666  by jllort
 
First, tesseract as you observate is better than cuneiform for almost cases. For converting pdf to png you got imagemagic ( convert ) what you should also get configured.

For this kind of features I suggest study Automation and how to build your own task:
http://wiki.openkm.com/index.php/Automation
http://wiki.openkm.com/index.php/Extend_automation
http://wiki.openkm.com/index.php/Developer_Guide

Also you should be interested in conversion methods
http://doxygen.openkm.com/openkm/d0/ddd ... erter.html
And probably you're interested in class com.openkm.util.ImageUtils ( doxygen is for 6.2.x and still not include this class )

About Us

OpenKM is part of the management software. A management software is a program that facilitates the accomplishment of administrative tasks. OpenKM is a document management system that allows you to manage business content and workflow in a more efficient way. Document managers guarantee data protection by establishing information security for business content.