Page 1 of 1

OpenKM for monolithic document store?

PostPosted:Thu Mar 05, 2015 2:16 pm
by dsmith
tldr first: I work at a government office and we want to do the probate court upstairs a favor and scan in and index a large collection of paper wills (10,000+). This is a simple use case and I'm wondering if OpenKM community is overkill.

=====

Our requirements are:

Associate scanned copy of the will with plain text indexed data (person's name, date of birth, etc)
Search indexed data by any field
Open source and installable without a third party (we have an existing proprietary software we can use, but the reason I'm looking around is to avoid the taxpayer expense of licensing a separate database instance)

Nice to have, but not necessary:

Ability to scan directly from the application (rather than scan to a folder and import)
OCR
Verification queue (index data is entered, then put in a queue to proofread by a second person)

That's more to illustrate how simple our use case is than to ask about features (it looks like OpenKM can do all but the verification queue).
We have a spare server well within requirements, so hardware is not an issue.
This is mainly for archival purposes, and the documents, once recorded, will rarely be accessed or modified.
Mostly looking for feedback if the set up and overhead of running and maintaining OpenKM is worth it for a system storing a large number of a single document type.

Thanks for any advice the community can provide.

Re: OpenKM for monolithic document store?

PostPosted:Fri Mar 06, 2015 5:57 pm
by jllort
If papers have same format, are clear, and not hand written the best option should be process images ( OCR zone ) to extract fields and store as metadata fields. That will help you on retrieving information on an efficient way neither full search ( full search is tokenized and you can get problem for example with date '01-04-88' is tokenized by lucene as tree words ( you can also change the tokenizer to whitespace sepator only what solve some of the problems you can have on it ).

I suggest take a look at this video, https://www.youtube.com/watch?v=TkYS33I88oU the bad news is that is not available on community version. But if you want can do it, community comes with enought things for doing something similar, for example a previous process of images ( document -> document + cvs data ) -> and then import into openkm ( here can get some idea http://wiki.openkm.com/index.php/CSV_importer )