Scan images with OCR doesn't work

Nous essayons de faire OpenKM aussi intuitif que possible, mais tout avis est bienvenu.

Moderator: dedisoft

Forum rules
Avant de poser une question, merci de regarder la documentation du wiki ou d'utiliser la fonction recherche du forum. Et rappelez vous que nous n'avons ni boule de cristal ni possibilité de lire dans les pensées, aussi pensez à spécifier quelle version d'OpenKM vous utilisez ainsi que la version du navigateur web et du système d'exploitation. Pour de plus amples informations lisez Comment reporter un bug efficacement (anglais).
Post Reply
gvdm
Fresh Boarder
Fresh Boarder
Posts: 13
Joined: Thu Aug 08, 2013 9:42 am

Scan images with OCR doesn't work

Post by gvdm » Mon Sep 23, 2013 4:22 pm

Hi to all,

I installed Tesseract on my OpenKM's machine, tried it (it works greatly) and set the OCR's settings to associate Tesseract to OpenKM.
Now, when I upload a PDF file the OpenKM system converts it with the OCR module and writes the OCR text into the OKM_NODE_DOCUMENT table.

The problem is that when I upload an image (no matter the format) the conversion doesn't work. In particular the "NDC_TEXT" field of the table is empty.
Tesseract (by command line) converts the same images very well, so it seems a OpenKM problem.

What can be the issue?

I attach to this post one of the images.

Thanks
Giulio
Attachments
img.jpg

jllort
Moderator
Moderator
Posts: 9380
Joined: Fri Dec 21, 2007 11:23 am
Location: Sineu - ( Illes Balears ) - Spain
Contact:

Re: Scan images with OCR doesn't work

Post by jllort » Tue Sep 24, 2013 6:05 pm

Document after be added goes into indexing queue, has been finished from queue ?
OpenKM by default comes with configuration parameters for cuneiform, have you changed for tesseract like is explained here http://wiki.openkm.com/index.php/Third- ... ation:_OCR ( I'm not totally sure if this change need restarting application, I refer about change cuneiform class by tesseract one ).

gvdm
Fresh Boarder
Fresh Boarder
Posts: 13
Joined: Thu Aug 08, 2013 9:42 am

Re: Scan images with OCR doesn't work

Post by gvdm » Wed Sep 25, 2013 8:17 am

jllort wrote:Document after be added goes into indexing queue, has been finished from queue ?
Yes, I wait for the NDC_TEXT_EXTRACTED flag to get the value "T".
jllort wrote: [...] have you changed for tesseract like is explained here http://wiki.openkm.com/index.php/Third- ... ation:_OCR .
No thanks, I didn't have changed it yet.
Now I fixed it and re-tried the conversion of the same image and got this tuple in the OKM_NODE_DOCUMENT table
tuple.png
why do I see the Jpeg-image's properties instead of the extracted text?

gvdm
Fresh Boarder
Fresh Boarder
Posts: 13
Joined: Thu Aug 08, 2013 9:42 am

Re: Scan images with OCR doesn't work

Post by gvdm » Wed Sep 25, 2013 8:29 am

Here there are my OCR's settings:
ocr settings.pdf
(115.49 KiB) Downloaded 131 times

jllort
Moderator
Moderator
Posts: 9380
Joined: Fri Dec 21, 2007 11:23 am
Location: Sineu - ( Illes Balears ) - Spain
Contact:

Re: Scan images with OCR doesn't work

Post by jllort » Sat Sep 28, 2013 2:53 pm

Seems problems are only with pdf files. Can you test in our online demo ?
Which OpenKM version are you using ?

Post Reply

Who is online

Users browsing this forum: No registered users and 1 guest