Page 1 of 1

Scan images with OCR doesn't work

PostPosted:Mon Sep 23, 2013 4:22 pm
by gvdm
Hi to all,

I installed Tesseract on my OpenKM's machine, tried it (it works greatly) and set the OCR's settings to associate Tesseract to OpenKM.
Now, when I upload a PDF file the OpenKM system converts it with the OCR module and writes the OCR text into the OKM_NODE_DOCUMENT table.

The problem is that when I upload an image (no matter the format) the conversion doesn't work. In particular the "NDC_TEXT" field of the table is empty.
Tesseract (by command line) converts the same images very well, so it seems a OpenKM problem.

What can be the issue?

I attach to this post one of the images.

Thanks
Giulio

Re: Scan images with OCR doesn't work

PostPosted:Tue Sep 24, 2013 6:05 pm
by jllort
Document after be added goes into indexing queue, has been finished from queue ?
OpenKM by default comes with configuration parameters for cuneiform, have you changed for tesseract like is explained here http://wiki.openkm.com/index.php/Third- ... ation:_OCR ( I'm not totally sure if this change need restarting application, I refer about change cuneiform class by tesseract one ).

Re: Scan images with OCR doesn't work

PostPosted:Wed Sep 25, 2013 8:17 am
by gvdm
jllort wrote:Document after be added goes into indexing queue, has been finished from queue ?
Yes, I wait for the NDC_TEXT_EXTRACTED flag to get the value "T".
jllort wrote: [...] have you changed for tesseract like is explained here http://wiki.openkm.com/index.php/Third- ... ation:_OCR .
No thanks, I didn't have changed it yet.
Now I fixed it and re-tried the conversion of the same image and got this tuple in the OKM_NODE_DOCUMENT table
tuple.png
tuple.png (15.13 KiB) Viewed 5266 times
why do I see the Jpeg-image's properties instead of the extracted text?

Re: Scan images with OCR doesn't work

PostPosted:Wed Sep 25, 2013 8:29 am
by gvdm
Here there are my OCR's settings:
(115.49 KiB) Downloaded 433 times

Re: Scan images with OCR doesn't work

PostPosted:Sat Sep 28, 2013 2:53 pm
by jllort
Seems problems are only with pdf files. Can you test in our online demo ?
Which OpenKM version are you using ?