Page 1 of 1

jpeg's and OCR

PostPosted:Fri Jun 29, 2012 7:42 am
by domi
Hi @all,

finally I got a working OCR-integration. I use tesseract at the moment, because for me it has a better support for german language.
I installed OpenKM 5.1.10 on ubuntu x64.

But with both extractor's I can't get jpeg's to be ocr'd. All other supported formats get ocr'd, but I tried everything, I can't get it working with the jpg-format. :cry:

Executed in a shell it works with both tesseract and cuneiform.

Any possibility to debug the OCR-mechanism?

Thx!

Domi

Re: jpeg's and OCR

PostPosted:Sun Jul 01, 2012 8:06 am
by pavila
There was a typo in the JPEG MimeType: was "image/jpg" and should be "image/jpeg". Please, try with the last night build from http://integration.openkm.com/5.1/.

Re: jpeg's and OCR

PostPosted:Sun Jul 01, 2012 4:09 pm
by domi
Hi and thanks for response, but sorry, still not working :( Neither with jpg nor jpeg ...

Re: jpeg's and OCR

PostPosted:Mon Jul 02, 2012 6:43 pm
by pavila
Have you installed the last night build?

Re: jpeg's and OCR

PostPosted:Fri Jul 06, 2012 9:46 am
by shaardu
can you please post the "exact steps" that you took to get your ocr work?? cos I am trying from past one month, it doesnt work....please let us know the exact steps1

thanks

Re: jpeg's and OCR

PostPosted:Fri Jul 06, 2012 10:05 am
by domi
Hi, finally I got it working with jpegs too.

There are just a few steps:
  • apt-get install ImageMagick
  • apt-get install cuneiform
  • apt-get install libreoffice
In Admin-Panel set:
  • system.imagemagick.convert = /usr/bin/convert
  • system.ocr = /usr/bin/cuneiform ${fileIn} -o ${fileOut} -l ger
  • system.openoffice.path = /usr/lib/libreoffice
That's all I made for finally got it working.

Don't know which OS you have installed, but this is how it works (for me) with Ubuntu 12.04

Good luck!