Page 1 of 2

Anyone with working OCR?

PostPosted:Wed Jun 13, 2012 8:08 am
by shaardu
Hi,

From past 3 days, I am stuck with OCR. I am unable to work, I tried all variations with tesseract, OpenOCR and the error messages seem to be same as what many people here got!

Can anyone with working OCR, please explain what you did to get that work? Please post your configuration along with the versions as it would be very helpful.

Can openkm admins give me just 5 min access of admin demo? Cos it works perfectly with demo.openkm.com. I jus want to see the configuration. Its ok if you post the screenshot also.

Please help.

Regards.

Re: Anyone with working OCR?

PostPosted:Wed Jun 13, 2012 6:44 pm
by jllort
Which OpenKM version do you have installed ?
Which OS ?
Which is your administration configuration parameter for system.ocr
I suggest install cuneiform by default comes prepared for it, otherside you should made some changes in xml configuration files.
http://cognitiveforms.ru/products/cuneiform/
http://wiki.openkm.com/index.php/OCR

Re: Anyone with working OCR?

PostPosted:Thu Jun 14, 2012 4:54 am
by shaardu
jllort wrote:Which OpenKM version do you have installed ?
Which OS ?
Which is your administration configuration parameter for system.ocr
I suggest install cuneiform by default comes prepared for it, otherside you should made some changes in xml configuration files.
http://cognitiveforms.ru/products/cuneiform/
http://wiki.openkm.com/index.php/OCR
OpenKM 5.1.9
Windows 7 Ultimate
system.ocr=C:/Users/Sharadh/Desktop/Cognitive/CuneiForm/Face.exe ${fileIn} ${fileOut}

I did this after installing Cuneiform it self. Openkm opens the cuneiform outside the openkm and i don know how to read the OCR.

It would be very nice if you could just tel us whats in openkm Demo cos that works perfectly!

Also I tried withvarious configurations but still no luck.

Re: Anyone with working OCR?

PostPosted:Fri Jun 15, 2012 7:59 pm
by jllort
In demo we are using linux configuration, that's more easy environment to configure open source ocr engines, because they're build normally in linux OS.

Well try in your terminal ( to see if happens some error ) Face.exe some_file.tif out.txt

Do you have imagemagic installed OCR engines do not working in all images types and sometimes is needed make some conversion, for it you need have convert.exe installed too.

Re: Anyone with working OCR?

PostPosted:Sat Jun 16, 2012 7:21 am
by shaardu
Well I tried your command, but nothing happens! But I installed cuneiform properly. What is the way to make it work with OpenKM? Where are the output files saved that are searchable by OPenkm from Cuneiform?

Re: Anyone with working OCR?

PostPosted:Sat Jun 16, 2012 9:37 am
by jllort
When you execute it Face.exe some_file.tif out.txt it's generated out.txt with ocr text ?

Re: Anyone with working OCR?

PostPosted:Sun Jun 17, 2012 1:09 pm
by shaardu
It didnt generate any text!! I just installed Cuneiform normally but when the app runs, I can see the text on the screen but I don know where it is saving!!

Re: Anyone with working OCR?

PostPosted:Wed Jun 20, 2012 10:21 am
by jllort
the cuneiform command is cuneiform fileinput fileouput nothing else. Where OpenKM stores is not a problem, really only I want to be sure from your terminal your OCR is executing with same image file there's into OpenKM without problems. If in terminal is executed correctly then we can investigate what happens in OpenKM scenario but first ensure is working on terminal, is that.

Re: Anyone with working OCR?

PostPosted:Thu Jul 26, 2012 10:49 am
by shaardu
Hey i tried with tesseract, its workin in the sense it is creating the .txt.txt files at tmp with extracted pdf...but how do i search with openkm? Please tel me what exactly should we do? Also i added to workspace.xml, repository.xml and other places. Should i completely remove cunieform class? Should i add any other class?

Re: Anyone with working OCR?

PostPosted:Thu Jul 26, 2012 3:31 pm
by jllort
You should replace cuneiform classes to tesseract ones. And take a look on Administration configuration that there's present too ! Then stop and restart jboss and should going correctly. Which is your OpenKM version and your OCR parameters ( take a look on some upgrade that parameters have been changed ).

http://wiki.openkm.com/index.php/OCR

Re: Anyone with working OCR?

PostPosted:Thu Jul 26, 2012 4:06 pm
by shaardu
Thanks a lot, now all tiffs, png etc are working...can you please tel me how to make it read scanned pdfs? Its not working

Re: Anyone with working OCR?

PostPosted:Fri Jul 27, 2012 9:28 pm
by jllort
have you installed imagemagick ? ( convert is needed to converting jpg format to tif and then the ocr can working )

Re: Anyone with working OCR?

PostPosted:Sat Jul 28, 2012 4:12 am
by shaardu
yeah imagemagick is installed, i mean it starts as a service...cos tiff preview works perfectly, so imagemagick is installed correctly right?

Also manual pdfs which contains jpg screenshots are getting recognized in temp but not searchable!! Also major problem is that if an completely scanned pdf is uploaded, its not at all recognizing!!

Re: Anyone with working OCR?

PostPosted:Sat Jul 28, 2012 10:56 am
by okmuser
Hi Shaardu / jllort,

I had/have the same issue with scanned PDF's which are not OCR'd at all.
When I checked the temp txt.txt files I saw lots of garbage (machine codes) rather than text.

I used Tesseract but didn't try with cuneiform (as I had bad experience in the past with cuneiform crashing very frequently).

I also checked by changing the PDF's to TIFF but Tesseract is still not the best performing but enough to index text for a OpenSource solution.

Currently I am using Abby fine reader to convert the PDF's to searchable PDF's and export to OpenKM.

Jllort,
I have one question for you, when we export PDF's to OpenKM, does OpenKM convert PDF's to TIFF and OCRing or directly OCRing the PDFs?

Cheers,
OKMUser

Re: Anyone with working OCR?

PostPosted:Sat Jul 28, 2012 4:24 pm
by shaardu
thanks for telling that

btw openkm is converting to image before scanning cos no ocr can convert pdf directly to searchable..