• Searching PDF OCR

  • OpenKM has many interesting features, but requires some configuration process to show its full potential.
OpenKM has many interesting features, but requires some configuration process to show its full potential.
Forum rules: Please, before asking something see the documentation wiki or use the search feature of the forum. And remember we don't have a crystal ball or mental readers, so if you post about an issue tell us which OpenKM are you using and also the browser and operating system version. For more info read How to Report Bugs Effectively.
 #11172  by pavila
 
Starting with OpenKM 5.1.3 you can see what text was extracted from a document. To see, go to Administration and go to Repository View. Also you can check what document had problem when extracting text running this Hibernate query:
Code: Select all
from Activity where action='MISC_TEXT_EXTRACTION_FAILURE'
from Administration / Database Query.
 #11173  by joako
 
That's good to know.

Anyways what I've noticed is the open source OCR isn't so great. Maybe with some pre- and post-processing (spell check) it could be better. But I don't have the time to dedicate to OCR development.

I tested OmniPage 17. It hangs and requires manual intervention.
I then tested ABBYY FineReader Corporate 10. It works well. If you look around you can find the box SKU for about 1/2 the price offered on ABBYY website, and from a reputable vendor I mean. not a pirated software site.

Still need to work on converting my NFS shares to SMB because ABBYY runs on Windows...
 #11267  by pavila
 
You can expose OpenKM document repository by WebDAV and mount this a a shared resource in Windows. Look at documentation wiki for more info.

In recent OpenKM released you can also configure a dictionary to offer better OCR results. Of course, a commercial OCR engine may offer better results. Abby is a good option, anyway if you want a good integration should contact to our sales team at http://www.openkm.com/Contact/.

About Us

OpenKM is part of the management software. A management software is a program that facilitates the accomplishment of administrative tasks. OpenKM is a document management system that allows you to manage business content and workflow in a more efficient way. Document managers guarantee data protection by establishing information security for business content.