Open Source Document Management System | OpenKM

PostPosted:**Thu Aug 26, 2021 2:50 pm**

Hi all

I'm running openKM on a docker environment on a Intel NUC.

Now I installed OCR according to this guide:
https://docs.openkm.com/kcenter/view/ok ... ngine.html

Code: Select all

system.ocr	String 	/usr/bin/tesseract ${fileIn} ${fileOut}

If I test OCR in debian console it works.
But in openKM nothing happens.

And if I set system.pdf.force.ocr to "true" then also the regular text extraction works anymore.
So I set it back to "false". Now the text extraction works again, but no OCR.

Anyone an idea what I'm doing wrong? Haven't found any solution on google.

Thank you very much
Marco

PS: "Check text extraction" show exactly the same. And if I test an image it shows me the metadata but no recognized text inside the pic.

PostPosted:**Thu Aug 26, 2021 3:17 pm**

I think I found a solution. I had the wrong docker container.

https://hub.docker.com/r/openkm/openkm-ce

This helped me.

But I have an additional question:
https://s29843.pcdn.co/blog/wp-content/ ... 24x768.png

From this picture it only can extract "eee FROM AN IMAGE".
This is quite bad. Is there any chance to improve the result?

PostPosted:**Fri Aug 27, 2021 6:50 pm**

What tesseract-ocr engine do you have configured ... version 4.x?

PostPosted:**Mon Aug 30, 2021 2:03 pm**

I am having a somewhat similar problem using the official Docker image, which already comes with Tesseract 4.00 installed. I have found that for some bizarre reason, OpenKM seems to randomly choose any of the Abby, Cuneiform, Tesseract3, and Barcode TextExtractors, no matter the configuration.

Every time I run

Code: Select all

docker run --rm -p 8080:8080 openkm/openkm-ce

then go to localhost:8080 and navigate to Administration > Utility > Test text extraction, OpenKM uses a completely different TextExtractor every time I start a new container, but almost never the one I want it to use.

What exactly am I missing here? I've also created this issue on GitHub, complete with a demo repository

Help would be greatly appreciated!

Open Source Document Management System | OpenKM

No extraction after installing OCR

No extraction after installing OCR

Re: No extraction after installing OCR

Re: No extraction after installing OCR

Re: No extraction after installing OCR