Page 1 of 1

Activating OCR

PostPosted:Wed Feb 23, 2011 6:00 am
by joako
I've installed OpenKM & tesseract & configured OpenKM.conf:

system.ocr=/opt/local/bin/tesseract

I've also uploaded some .tiff files, however, I don't see any way to activate the OCR system. How would I go about having OCR run against these uploaded files?

Re: Activating OCR

PostPosted:Wed Feb 23, 2011 8:16 pm
by jllort
Have you restarted jboss before doing OpenKM.cfg changes
You can try executing tesseract img.tif img2.tif ( be caure if image is .tiff could not be parsed by ocr ... I don't know if it could be the problem, it'll be a bug on this case ), first try on your terminal.
There's nothing else

Re: Activating OCR

PostPosted:Thu Feb 24, 2011 2:36 am
by joako
Actually the files are named .TIF. I've tried the command and it is working, but I cant see how to determine if the OCR is working in OpenKM...

Re: Activating OCR

PostPosted:Thu Feb 24, 2011 8:52 am
by jllort
When you pass the ocr from terminal you see the extracted text. Put in OpenKM and make you query by content that contains some of that words.

Re: Activating OCR

PostPosted:Tue Mar 01, 2011 12:15 am
by joako
I have done that but no matter what keywords I use I don't ever get any results from .tif files. Is there some special way they need to be imported? Where in the process does the OCR run and how can I verify if the OCR is running, if any error is generated, etc. Also, there's no way to view the properties/details for a file and see if the OCR has been run and view the OCR results? These would be very useful troubleshooting tools.

Re: Activating OCR

PostPosted:Tue Mar 01, 2011 8:28 pm
by pavila
Put in $JBOSS_HOME/server/default/conf/jboss-log4j.xml this entry:
Code: Select all
<category name="com.openkm.extractor">
    <priority value="DEBUG"/>
</category>
Also if you post a sample TIFF here will help us.

Re: Activating OCR

PostPosted:Wed Mar 02, 2011 6:57 am
by joako
I only see these messages when I upload a .tiff file:
Code: Select all
01:51:06,388 INFO  [BundleCache] num=1486 mem=8135k max=8192k avg=5605 hits=25358 miss=4642
01:51:06,419 INFO  [LRUNodeIdCache] num=3153/10240 hits=8404 miss=31596
Is during the upload process when OCR is run?

Re: Activating OCR

PostPosted:Fri Mar 04, 2011 9:00 pm
by joako
Since I am sure many people are not using MacOS X Server I decided to run on Linux. On the MacOS X Server I installed VirtualBox, then openSUSE 11.3 Linux so on and so forth. I now have OpenKM running but again there is no search result for TIFF format images.

So I think there is a bug in OpenKM 5.0.2 that TIFF images OCR or the search index for them do not work.

Re: Activating OCR

PostPosted:Sat Mar 05, 2011 2:21 pm
by jllort
you can try with nightly build that solves some problem integration.openkm.com but I think your problem it's on other place.

Re: Activating OCR

PostPosted:Mon Mar 07, 2011 5:35 am
by joako
Is there some platform I can setup that we know everything will work? Information to debug search index... I already asked this and got no answer...

Re: Activating OCR

PostPosted:Mon Mar 07, 2011 4:41 pm
by jllort
We suggest ubuntu server to Installing OpenKM, but can be used any linux or windows, there's no major problem that you get the necessary skill to configure on your server. WE develop on linux environment and normally use to our customers ubuntu servers .... for it reason we suggest it ... but there's no hidden questions relation to one OS or others.