Open Source Document Management System | OpenKM

OCR HowTo Help

Forum rules: Please, before asking something see the documentation wiki or use the search feature of the forum. And remember we don't have a crystal ball or mental readers, so if you post about an issue tell us which OpenKM are you using and also the browser and operating system version. For more info read How to Report Bugs Effectively.

3 posts

3 posts

OCR HowTo Help

#16924 by sdhengsoft
Wed Jun 20, 2012 2:32 am

Hi All,

I have enabled OCR in me test database with:

system.ocr=/usr/local/bin/tesseract ${fileIn} ${fileOut}

but, now what? How does the OCR get initiated? Do I have to scan a document? Can I just upload a .tiff or .png file? How do I know OCR ever gets run. My log file is not showing anything. I've search all the maunal and I can only find information on how to enable OCR, but nothing on how to use it. Help appreciated. Thanks.

---
Using OpenKM 5.1.9

Last edited by sdhengsoft on Wed Jun 20, 2012 7:41 am, edited 1 time in total.

Username

sdhengsoft

Rank

Fresh Boarder

Posts

Joined

Mon Jun 18, 2012 4:06 am

Re: OCR HowTo Help

#16925 by sdhengsoft
Wed Jun 20, 2012 7:37 am

Okay, I may have some more info on this. I just noticed that tesseract is running and producing a large bunch of these files at /tmp:

okm8312619013357497564.txt.txt
okm8324881110573322876.txt.txt
okm8339233672072019959.txt.txt
okm8536214784698222815.txt.txt
okm8558812468149759325.txt.txt
okm8748329116801568633.txt.txt
okm8797588013538507270.txt.txt
okm8815076161610502866.txt.txt
okm888376810507889038.txt.txt
okm8891306622847062047.txt.txt
okm8956175428643242839.txt.txt
okm9043047215220718787.txt.txt
okm9092926384979839669.txt.txt
okm9165349238391572802.txt.txt
okm918819898082217424.txt.txt
okm9210689225196411602.txt.txt

I have upgraded to OpenKM 5.1.10 and my ocr setting is:

system.ocr=/usr/local/bin/tesseract ${fileIn} ${fileOut}

The OpenKM wiki seems to suggest this is correct. Why so many *.txt.txt files not being cleaned up? Is the above ocr setting not correct?

Username

sdhengsoft

Rank

Fresh Boarder

Posts

Joined

Mon Jun 18, 2012 4:06 am

Re: OCR HowTo Help

#16968 by jllort
Fri Jun 22, 2012 11:36 am

it's correct but have you changed the textextractor classes that comes by default ( if you have not change here's the problem because openkm comes by default prepared for cuneiform and must do some change to use tesseract ) confirm it.

As indicates in first table ( engines ) http://wiki.openkm.com/index.php/OCR you have some one defined in Administration/configuration parameters and then in files repository.xml and into /repository/workspace/default/workspace.xml ( change this latest with jboss stopped )

Username

jllort

Rank

Moderator

Posts

12187

Joined

Fri Dec 21, 2007 11:23 am

Location

Sineu - ( Illes Balears ) - Spain

Contact

Page 1 of 1
3 posts

Return to “Configuration”

Display:

Sort by:

Jump to: