• Scan images with OCR doesn't work

  • Nous essayons de faire OpenKM aussi intuitif que possible, mais tout avis est bienvenu.
Nous essayons de faire OpenKM aussi intuitif que possible, mais tout avis est bienvenu.

Moderator: dedisoft

Forum rules: Avant de poser une question, merci de regarder la documentation du wiki ou d'utiliser la fonction recherche du forum. Et rappelez vous que nous n'avons ni boule de cristal ni possibilité de lire dans les pensées, aussi pensez à spécifier quelle version d'OpenKM vous utilisez ainsi que la version du navigateur web et du système d'exploitation. Pour de plus amples informations lisez Comment reporter un bug efficacement (anglais).
 #25573  by gvdm
 
Hi to all,

I installed Tesseract on my OpenKM's machine, tried it (it works greatly) and set the OCR's settings to associate Tesseract to OpenKM.
Now, when I upload a PDF file the OpenKM system converts it with the OCR module and writes the OCR text into the OKM_NODE_DOCUMENT table.

The problem is that when I upload an image (no matter the format) the conversion doesn't work. In particular the "NDC_TEXT" field of the table is empty.
Tesseract (by command line) converts the same images very well, so it seems a OpenKM problem.

What can be the issue?

I attach to this post one of the images.

Thanks
Giulio
Attachments
img.jpg
img.jpg (63.44 KiB) Viewed 5193 times
 #25583  by jllort
 
Document after be added goes into indexing queue, has been finished from queue ?
OpenKM by default comes with configuration parameters for cuneiform, have you changed for tesseract like is explained here http://wiki.openkm.com/index.php/Third- ... ation:_OCR ( I'm not totally sure if this change need restarting application, I refer about change cuneiform class by tesseract one ).
 #25588  by gvdm
 
jllort wrote:Document after be added goes into indexing queue, has been finished from queue ?
Yes, I wait for the NDC_TEXT_EXTRACTED flag to get the value "T".
jllort wrote: [...] have you changed for tesseract like is explained here http://wiki.openkm.com/index.php/Third- ... ation:_OCR .
No thanks, I didn't have changed it yet.
Now I fixed it and re-tried the conversion of the same image and got this tuple in the OKM_NODE_DOCUMENT table
tuple.png
tuple.png (15.13 KiB) Viewed 5184 times
why do I see the Jpeg-image's properties instead of the extracted text?
 #25655  by jllort
 
Seems problems are only with pdf files. Can you test in our online demo ?
Which OpenKM version are you using ?

About Us

OpenKM is part of the management software. A management software is a program that facilitates the accomplishment of administrative tasks. OpenKM is a document management system that allows you to manage business content and workflow in a more efficient way. Document managers guarantee data protection by establishing information security for business content.