• how to use the tesseract

  • We tried to make OpenKM as intuitive as possible, but an advice is always welcome.
We tried to make OpenKM as intuitive as possible, but an advice is always welcome.
Forum rules: Please, before asking something see the documentation wiki or use the search feature of the forum. And remember we don't have a crystal ball or mental readers, so if you post about an issue tell us which OpenKM are you using and also the browser and operating system version. For more info read How to Report Bugs Effectively.
 #1815  by stanley
 
I wander how to use the tesseract.
where can I find how to use the tesseract with steps?
 #1826  by pavila
 
You have to install tesseract and imagemagick packages. If you use a Debian-based distribution it is as simple as:

$ aptitude install tesseract imagemagick

After that, you have to edit OpenKM.cfg and set the \"Config.SYSTEM_OCR\" parameter to the tesseract binary file. Restart JBoss and OpenKM will make an OCR of uploaded TIFF files.

See OpenKM sources (/src/es/git/openkm/extractor/TiffExtractor.java) for more info.
 #1872  by stanley
 
I am a javer,and I don\'t know how to do with the tesseract sources and the imagemagick ,can you give me the steps?
thank you for your patients.
 #1877  by pavila
 
If you use Debian / Ubuntu, you can install these programs easilly:

$ aptitude install tesseract imagemagick
 #1883  by stanley
 
I use windows system,and I get the tesseract binary ,and I download the imagemagick.
how to make them work together,.
 #1915  by pavila
 
I have never tested tesseract OCR integration in Windows, and I\'m not sure if it works.
 #3080  by djdifulvio
 
Question, I am new to this project and so far its been great. However, I followed your directions and I still am showing the \"WARN [TiffTextExtractor] Undefined OCR application\" when I try to upload TIFFs...

I am using Ubuntu with a 2.6.28-15 kernel. I have installed everything and corrected the path for the OCR software which is \"system.ocr=/usr/bin/tesseract\"

Do I have to do anything with the TiffTextExtractor.Java? I did not make this install, just downloaded the pre-made version.

Thanks,
djd

About Us

OpenKM is part of the management software. A management software is a program that facilitates the accomplishment of administrative tasks. OpenKM is a document management system that allows you to manage business content and workflow in a more efficient way. Document managers guarantee data protection by establishing information security for business content.