• No extraction after installing OCR

  • OpenKM has many interesting features, but requires some configuration process to show its full potential.
OpenKM has many interesting features, but requires some configuration process to show its full potential.
Forum rules: Please, before asking something see the documentation wiki or use the search feature of the forum. And remember we don't have a crystal ball or mental readers, so if you post about an issue tell us which OpenKM are you using and also the browser and operating system version. For more info read How to Report Bugs Effectively.
 #52769  by Marco_I
 
Hi all

I'm running openKM on a docker environment on a Intel NUC.

Now I installed OCR according to this guide:
https://docs.openkm.com/kcenter/view/ok ... ngine.html
Code: Select all
system.ocr	String 	/usr/bin/tesseract ${fileIn} ${fileOut} 
If I test OCR in debian console it works.
But in openKM nothing happens.

And if I set system.pdf.force.ocr to "true" then also the regular text extraction works anymore.
So I set it back to "false". Now the text extraction works again, but no OCR.

Anyone an idea what I'm doing wrong? Haven't found any solution on google.

Thank you very much
Marco

PS: "Check text extraction" show exactly the same. And if I test an image it shows me the metadata but no recognized text inside the pic.
 #52787  by kvist
 
I am having a somewhat similar problem using the official Docker image, which already comes with Tesseract 4.00 installed. I have found that for some bizarre reason, OpenKM seems to randomly choose any of the Abby, Cuneiform, Tesseract3, and Barcode TextExtractors, no matter the configuration.

Every time I run
Code: Select all
docker run --rm -p 8080:8080 openkm/openkm-ce
then go to localhost:8080 and navigate to Administration > Utility > Test text extraction, OpenKM uses a completely different TextExtractor every time I start a new container, but almost never the one I want it to use.

What exactly am I missing here? I've also created this issue on GitHub, complete with a demo repository

Help would be greatly appreciated!

About Us

OpenKM is part of the management software. A management software is a program that facilitates the accomplishment of administrative tasks. OpenKM is a document management system that allows you to manage business content and workflow in a more efficient way. Document managers guarantee data protection by establishing information security for business content.