Open Source Document Management System | OpenKM

PostPosted:**Mon May 30, 2011 7:01 pm**

Starting with OpenKM 5.1.3 you can see what text was extracted from a document. To see, go to Administration and go to Repository View. Also you can check what document had problem when extracting text running this Hibernate query:

Code: Select all

from Activity where action='MISC_TEXT_EXTRACTION_FAILURE'

from Administration / Database Query.

PostPosted:**Mon May 30, 2011 8:45 pm**

That's good to know.

Anyways what I've noticed is the open source OCR isn't so great. Maybe with some pre- and post-processing (spell check) it could be better. But I don't have the time to dedicate to OCR development.

I tested OmniPage 17. It hangs and requires manual intervention.
I then tested ABBYY FineReader Corporate 10. It works well. If you look around you can find the box SKU for about 1/2 the price offered on ABBYY website, and from a reputable vendor I mean. not a pirated software site.

Still need to work on converting my NFS shares to SMB because ABBYY runs on Windows...

PostPosted:**Fri Jun 10, 2011 2:16 pm**

You can expose OpenKM document repository by WebDAV and mount this a a shared resource in Windows. Look at documentation wiki for more info.

In recent OpenKM released you can also configure a dictionary to offer better OCR results. Of course, a commercial OCR engine may offer better results. Abby is a good option, anyway if you want a good integration should contact to our sales team at http://www.openkm.com/Contact/.

Open Source Document Management System | OpenKM

Searching PDF OCR

Re: Searching PDF OCR

Re: Searching PDF OCR

Re: Searching PDF OCR