Page 1 of 1

How to OCR?

PostPosted:Sun Aug 24, 2014 2:54 pm
by Fohnbit
Hello!

I setup OCR according this: http://wiki.openkm.com/index.php/Applic ... abling_OCR
I use Cuneiform
system.ocr: /usr/bin/cuneiform ${fileIn} -o ${fileOut}
system.ocr.rotate:
system.pdf.force.ocr: Active

But when he made the ocr?

Must I set a automatic rule? But in the Dropdown Lists I have no Option for OCR.
I also miss the OCR Button in the Admin menu

Thank you!

Re: How to OCR?

PostPosted:Sun Aug 24, 2014 3:35 pm
by Fohnbit
I made some tests now.
With tesseract I get much better results at Linux commandline.
But all PDF has first convert in 300 dpi png files, because he cant proceed with 1.4 PDF´s

How can I reach this?

Thank you!

Re: How to OCR?

PostPosted:Mon Aug 25, 2014 4:51 pm
by jllort
First, tesseract as you observate is better than cuneiform for almost cases. For converting pdf to png you got imagemagic ( convert ) what you should also get configured.

For this kind of features I suggest study Automation and how to build your own task:
http://wiki.openkm.com/index.php/Automation
http://wiki.openkm.com/index.php/Extend_automation
http://wiki.openkm.com/index.php/Developer_Guide

Also you should be interested in conversion methods
http://doxygen.openkm.com/openkm/d0/ddd ... erter.html
And probably you're interested in class com.openkm.util.ImageUtils ( doxygen is for 6.2.x and still not include this class )