Page 1 of 1

No OCRing with Tesseract or Cuneiform

PostPosted:Thu Feb 23, 2012 2:02 pm
by andydld
Hi,

after some weeks/months i started again to get OpenKM work with OCR.
My first try on debian with tesseract won't work.
See this topic:

http://forum.openkm.com/viewtopic.php?f=4&t=5594

I decided to switch to ubuntu 10.04 amd64 server.
I activated the partner-repo to be able to install OpenOffice.org, SWFTools, ImageMagick, Tesseract and Cuneiform of it.
I downloaded and instaled OpenKM 5.1.9.
So far, so good.

But it still seems to me, that OCR does not work.

Here's my config with cuneiform:

system.imagemagick.convert = /usr/bin/convert
system.ocr = /usr/bin/cuneiform -l ger ${fileIn} -o ${fileOut}
system.openoffice.path = /usr/lib/openoffice
system.swftools.pdf2swf = /usr/bin/pdf2swf -T 9 ${fileIn} ${fileOut}

The text-filters for cuneiform are configured on repository.xml and workspace.xml.

I tested with the tif-images that comes with the windows-version of tesseract.

There's no error at the time, i upload the images to OpenKM within the server.log.

Any ideas whats wrong?

Are there any other test-images available?

Best regards,

Andy

Re: No OCRing with Tesseract or Cuneiform

PostPosted:Sun Feb 26, 2012 8:14 am
by jllort
You've configured in administration tab or into OpenKM.cfg ( that's deprecated and now is used administration tab for configuration ). Only I want to be sure about it.

Try debugging CuneiformText estractor class take a look here how doing it http://wiki.openkm.com/index.php/Debugging_OpenKM