• Activating OCR

  • We tried to make OpenKM as intuitive as possible, but an advice is always welcome.
We tried to make OpenKM as intuitive as possible, but an advice is always welcome.
Forum rules: Please, before asking something see the documentation wiki or use the search feature of the forum. And remember we don't have a crystal ball or mental readers, so if you post about an issue tell us which OpenKM are you using and also the browser and operating system version. For more info read How to Report Bugs Effectively.
 #9061  by joako
 
I've installed OpenKM & tesseract & configured OpenKM.conf:

system.ocr=/opt/local/bin/tesseract

I've also uploaded some .tiff files, however, I don't see any way to activate the OCR system. How would I go about having OCR run against these uploaded files?
 #9088  by jllort
 
Have you restarted jboss before doing OpenKM.cfg changes
You can try executing tesseract img.tif img2.tif ( be caure if image is .tiff could not be parsed by ocr ... I don't know if it could be the problem, it'll be a bug on this case ), first try on your terminal.
There's nothing else
 #9099  by joako
 
Actually the files are named .TIF. I've tried the command and it is working, but I cant see how to determine if the OCR is working in OpenKM...
 #9110  by jllort
 
When you pass the ocr from terminal you see the extracted text. Put in OpenKM and make you query by content that contains some of that words.
 #9220  by joako
 
I have done that but no matter what keywords I use I don't ever get any results from .tif files. Is there some special way they need to be imported? Where in the process does the OCR run and how can I verify if the OCR is running, if any error is generated, etc. Also, there's no way to view the properties/details for a file and see if the OCR has been run and view the OCR results? These would be very useful troubleshooting tools.
 #9235  by pavila
 
Put in $JBOSS_HOME/server/default/conf/jboss-log4j.xml this entry:
Code: Select all
<category name="com.openkm.extractor">
    <priority value="DEBUG"/>
</category>
Also if you post a sample TIFF here will help us.
 #9242  by joako
 
I only see these messages when I upload a .tiff file:
Code: Select all
01:51:06,388 INFO  [BundleCache] num=1486 mem=8135k max=8192k avg=5605 hits=25358 miss=4642
01:51:06,419 INFO  [LRUNodeIdCache] num=3153/10240 hits=8404 miss=31596
Is during the upload process when OCR is run?
 #9300  by joako
 
Since I am sure many people are not using MacOS X Server I decided to run on Linux. On the MacOS X Server I installed VirtualBox, then openSUSE 11.3 Linux so on and so forth. I now have OpenKM running but again there is no search result for TIFF format images.

So I think there is a bug in OpenKM 5.0.2 that TIFF images OCR or the search index for them do not work.
 #9308  by jllort
 
you can try with nightly build that solves some problem integration.openkm.com but I think your problem it's on other place.
 #9335  by joako
 
Is there some platform I can setup that we know everything will work? Information to debug search index... I already asked this and got no answer...
 #9355  by jllort
 
We suggest ubuntu server to Installing OpenKM, but can be used any linux or windows, there's no major problem that you get the necessary skill to configure on your server. WE develop on linux environment and normally use to our customers ubuntu servers .... for it reason we suggest it ... but there's no hidden questions relation to one OS or others.

About Us

OpenKM is part of the management software. A management software is a program that facilitates the accomplishment of administrative tasks. OpenKM is a document management system that allows you to manage business content and workflow in a more efficient way. Document managers guarantee data protection by establishing information security for business content.