Page 1 of 1

OCR HowTo Help

PostPosted:Wed Jun 20, 2012 2:32 am
by sdhengsoft
Hi All,

I have enabled OCR in me test database with:
system.ocr=/usr/local/bin/tesseract ${fileIn} ${fileOut}
but, now what? How does the OCR get initiated? Do I have to scan a document? Can I just upload a .tiff or .png file? How do I know OCR ever gets run. My log file is not showing anything. I've search all the maunal and I can only find information on how to enable OCR, but nothing on how to use it. Help appreciated. Thanks.

---
Using OpenKM 5.1.9

Re: OCR HowTo Help

PostPosted:Wed Jun 20, 2012 7:37 am
by sdhengsoft
Okay, I may have some more info on this. I just noticed that tesseract is running and producing a large bunch of these files at /tmp:
okm8312619013357497564.txt.txt
okm8324881110573322876.txt.txt
okm8339233672072019959.txt.txt
okm8536214784698222815.txt.txt
okm8558812468149759325.txt.txt
okm8748329116801568633.txt.txt
okm8797588013538507270.txt.txt
okm8815076161610502866.txt.txt
okm888376810507889038.txt.txt
okm8891306622847062047.txt.txt
okm8956175428643242839.txt.txt
okm9043047215220718787.txt.txt
okm9092926384979839669.txt.txt
okm9165349238391572802.txt.txt
okm918819898082217424.txt.txt
okm9210689225196411602.txt.txt
I have upgraded to OpenKM 5.1.10 and my ocr setting is:
system.ocr=/usr/local/bin/tesseract ${fileIn} ${fileOut}
The OpenKM wiki seems to suggest this is correct. Why so many *.txt.txt files not being cleaned up? Is the above ocr setting not correct?

Re: OCR HowTo Help

PostPosted:Fri Jun 22, 2012 11:36 am
by jllort
it's correct but have you changed the textextractor classes that comes by default ( if you have not change here's the problem because openkm comes by default prepared for cuneiform and must do some change to use tesseract ) confirm it.

As indicates in first table ( engines ) http://wiki.openkm.com/index.php/OCR you have some one defined in Administration/configuration parameters and then in files repository.xml and into /repository/workspace/default/workspace.xml ( change this latest with jboss stopped )