• Tesseract OCR version update support for more image types?

  • OpenKM has many interesting features, but requires some configuration process to show its full potential.
OpenKM has many interesting features, but requires some configuration process to show its full potential.
Forum rules: Please, before asking something see the documentation wiki or use the search feature of the forum. And remember we don't have a crystal ball or mental readers, so if you post about an issue tell us which OpenKM are you using and also the browser and operating system version. For more info read How to Report Bugs Effectively.
 #7221  by bontscho
 
hi guys,

i did a clean install of openkm and i'm very happy so far.

my question is:

since i have tesseract version 1.02 running on my server and tesseract 2.04 and tesseract 3 is available, does openkm support the newer versions of tesseract so an upgrade would give me the addtional support for more image-formats like jpg/png like stated on the official tesseract page?

it also says that version 3 is not compatible with the files from 2.04, so i would really appreciate a clear answer on that topic :-)

many thanks for any useful information here.

kind regards,
bontscho
 #7300  by bontscho
 
nevermind, i successfully upgraded to tesseract 2.04 and now multipage tiffs and compressed tiffs are extracted correctly.

aswell tesseract 2.04 enables localizing, that means now my german documents are recognized correctly and available through lucene

maybe in the future openkm will take advantage of tesseract 3 and it would be more flexible in ocr recognition (tesseract 3 supports more formats as mentioned)

kind regards,
bontscho
 #7334  by pavila
 
I was reading about tesseract 3.0 and have some interesting improvements. If the command line parameter are not changed, the new tesseract 3.0 can run ok with OpenKM. If you want to pass other paramters to teesseract, you can configure the "system.ocr" to a script which wraps the original binary call. The same trick is used with pdf2swf utility. Search the wiki for more info.

About Us

OpenKM is part of the management software. A management software is a program that facilitates the accomplishment of administrative tasks. OpenKM is a document management system that allows you to manage business content and workflow in a more efficient way. Document managers guarantee data protection by establishing information security for business content.