• OCR HowTo Help

  • OpenKM has many interesting features, but requires some configuration process to show its full potential.
OpenKM has many interesting features, but requires some configuration process to show its full potential.
Forum rules: Please, before asking something see the documentation wiki or use the search feature of the forum. And remember we don't have a crystal ball or mental readers, so if you post about an issue tell us which OpenKM are you using and also the browser and operating system version. For more info read How to Report Bugs Effectively.
 #16924  by sdhengsoft
 
Hi All,

I have enabled OCR in me test database with:
system.ocr=/usr/local/bin/tesseract ${fileIn} ${fileOut}
but, now what? How does the OCR get initiated? Do I have to scan a document? Can I just upload a .tiff or .png file? How do I know OCR ever gets run. My log file is not showing anything. I've search all the maunal and I can only find information on how to enable OCR, but nothing on how to use it. Help appreciated. Thanks.

---
Using OpenKM 5.1.9
Last edited by sdhengsoft on Wed Jun 20, 2012 7:41 am, edited 1 time in total.
 #16925  by sdhengsoft
 
Okay, I may have some more info on this. I just noticed that tesseract is running and producing a large bunch of these files at /tmp:
okm8312619013357497564.txt.txt
okm8324881110573322876.txt.txt
okm8339233672072019959.txt.txt
okm8536214784698222815.txt.txt
okm8558812468149759325.txt.txt
okm8748329116801568633.txt.txt
okm8797588013538507270.txt.txt
okm8815076161610502866.txt.txt
okm888376810507889038.txt.txt
okm8891306622847062047.txt.txt
okm8956175428643242839.txt.txt
okm9043047215220718787.txt.txt
okm9092926384979839669.txt.txt
okm9165349238391572802.txt.txt
okm918819898082217424.txt.txt
okm9210689225196411602.txt.txt
I have upgraded to OpenKM 5.1.10 and my ocr setting is:
system.ocr=/usr/local/bin/tesseract ${fileIn} ${fileOut}
The OpenKM wiki seems to suggest this is correct. Why so many *.txt.txt files not being cleaned up? Is the above ocr setting not correct?
 #16968  by jllort
 
it's correct but have you changed the textextractor classes that comes by default ( if you have not change here's the problem because openkm comes by default prepared for cuneiform and must do some change to use tesseract ) confirm it.

As indicates in first table ( engines ) http://wiki.openkm.com/index.php/OCR you have some one defined in Administration/configuration parameters and then in files repository.xml and into /repository/workspace/default/workspace.xml ( change this latest with jboss stopped )

About Us

OpenKM is part of the management software. A management software is a program that facilitates the accomplishment of administrative tasks. OpenKM is a document management system that allows you to manage business content and workflow in a more efficient way. Document managers guarantee data protection by establishing information security for business content.