• No OCRing with Tesseract or Cuneiform

  • OpenKM has many interesting features, but requires some configuration process to show its full potential.
OpenKM has many interesting features, but requires some configuration process to show its full potential.
Forum rules: Please, before asking something see the documentation wiki or use the search feature of the forum. And remember we don't have a crystal ball or mental readers, so if you post about an issue tell us which OpenKM are you using and also the browser and operating system version. For more info read How to Report Bugs Effectively.
 #14107  by andydld
 
Hi,

after some weeks/months i started again to get OpenKM work with OCR.
My first try on debian with tesseract won't work.
See this topic:

http://forum.openkm.com/viewtopic.php?f=4&t=5594

I decided to switch to ubuntu 10.04 amd64 server.
I activated the partner-repo to be able to install OpenOffice.org, SWFTools, ImageMagick, Tesseract and Cuneiform of it.
I downloaded and instaled OpenKM 5.1.9.
So far, so good.

But it still seems to me, that OCR does not work.

Here's my config with cuneiform:

system.imagemagick.convert = /usr/bin/convert
system.ocr = /usr/bin/cuneiform -l ger ${fileIn} -o ${fileOut}
system.openoffice.path = /usr/lib/openoffice
system.swftools.pdf2swf = /usr/bin/pdf2swf -T 9 ${fileIn} ${fileOut}

The text-filters for cuneiform are configured on repository.xml and workspace.xml.

I tested with the tif-images that comes with the windows-version of tesseract.

There's no error at the time, i upload the images to OpenKM within the server.log.

Any ideas whats wrong?

Are there any other test-images available?

Best regards,

Andy
 #14141  by jllort
 
You've configured in administration tab or into OpenKM.cfg ( that's deprecated and now is used administration tab for configuration ). Only I want to be sure about it.

Try debugging CuneiformText estractor class take a look here how doing it http://wiki.openkm.com/index.php/Debugging_OpenKM

About Us

OpenKM is part of the management software. A management software is a program that facilitates the accomplishment of administrative tasks. OpenKM is a document management system that allows you to manage business content and workflow in a more efficient way. Document managers guarantee data protection by establishing information security for business content.