Open Source Document Management System | OpenKM - Scan images with OCR doesn't work

Scan images with OCR doesn't work

Moderator: dedisoft

Forum rules: Avant de poser une question, merci de regarder la documentation du wiki ou d'utiliser la fonction recherche du forum. Et rappelez vous que nous n'avons ni boule de cristal ni possibilité de lire dans les pensées, aussi pensez à spécifier quelle version d'OpenKM vous utilisez ainsi que la version du navigateur web et du système d'exploitation. Pour de plus amples informations lisez Comment reporter un bug efficacement (anglais).

5 posts

5 posts

Scan images with OCR doesn't work

#25573 by gvdm
Mon Sep 23, 2013 4:22 pm

Hi to all,

I installed Tesseract on my OpenKM's machine, tried it (it works greatly) and set the OCR's settings to associate Tesseract to OpenKM.
Now, when I upload a PDF file the OpenKM system converts it with the OCR module and writes the OCR text into the OKM_NODE_DOCUMENT table.

The problem is that when I upload an image (no matter the format) the conversion doesn't work. In particular the "NDC_TEXT" field of the table is empty.
Tesseract (by command line) converts the same images very well, so it seems a OpenKM problem.

What can be the issue?

I attach to this post one of the images.

Thanks
Giulio

Attachments

img.jpg (63.44 KiB) Viewed 5246 times

Username

gvdm

Rank

Fresh Boarder

Posts

Joined

Thu Aug 08, 2013 9:42 am

Re: Scan images with OCR doesn't work

#25583 by jllort
Tue Sep 24, 2013 6:05 pm

Document after be added goes into indexing queue, has been finished from queue ?
OpenKM by default comes with configuration parameters for cuneiform, have you changed for tesseract like is explained here http://wiki.openkm.com/index.php/Third- ... ation:_OCR ( I'm not totally sure if this change need restarting application, I refer about change cuneiform class by tesseract one ).

Username

jllort

Rank

Moderator

Posts

12048

Joined

Fri Dec 21, 2007 11:23 am

Location

Sineu - ( Illes Balears ) - Spain

Contact

Re: Scan images with OCR doesn't work

#25588 by gvdm
Wed Sep 25, 2013 8:17 am

jllort wrote:Document after be added goes into indexing queue, has been finished from queue ?

Yes, I wait for the NDC_TEXT_EXTRACTED flag to get the value "T".

jllort wrote: [...] have you changed for tesseract like is explained here http://wiki.openkm.com/index.php/Third- ... ation:_OCR .

No thanks, I didn't have changed it yet.
Now I fixed it and re-tried the conversion of the same image and got this tuple in the OKM_NODE_DOCUMENT table

tuple.png (15.13 KiB) Viewed 5237 times

why do I see the Jpeg-image's properties instead of the extracted text?

Username

gvdm

Rank

Fresh Boarder

Posts

Joined

Thu Aug 08, 2013 9:42 am

Re: Scan images with OCR doesn't work

#25590 by gvdm
Wed Sep 25, 2013 8:29 am

Here there are my OCR's settings:

ocr settings.pdf

(115.49 KiB) Downloaded 428 times

Username

gvdm

Rank

Fresh Boarder

Posts

Joined

Thu Aug 08, 2013 9:42 am

Re: Scan images with OCR doesn't work

#25655 by jllort
Sat Sep 28, 2013 2:53 pm

Seems problems are only with pdf files. Can you test in our online demo ?
Which OpenKM version are you using ?

Username

jllort

Rank

Moderator

Posts

12048

Joined

Fri Dec 21, 2007 11:23 am

Location

Sineu - ( Illes Balears ) - Spain

Contact

Page 1 of 1
5 posts

Return to “Usage”

Display:

Sort by:

Jump to: