• OpenKM for monolithic document store?

  • Problems with installing OpenKM? No problemo, the solution is closer than you think.
Problems with installing OpenKM? No problemo, the solution is closer than you think.
Forum rules: Please, before asking something see the documentation wiki or use the search feature of the forum. And remember we don't have a crystal ball or mental readers, so if you post about an issue tell us which OpenKM are you using and also the browser and operating system version. For more info read How to Report Bugs Effectively.
 #31504  by dsmith
 
tldr first: I work at a government office and we want to do the probate court upstairs a favor and scan in and index a large collection of paper wills (10,000+). This is a simple use case and I'm wondering if OpenKM community is overkill.

=====

Our requirements are:

Associate scanned copy of the will with plain text indexed data (person's name, date of birth, etc)
Search indexed data by any field
Open source and installable without a third party (we have an existing proprietary software we can use, but the reason I'm looking around is to avoid the taxpayer expense of licensing a separate database instance)

Nice to have, but not necessary:

Ability to scan directly from the application (rather than scan to a folder and import)
OCR
Verification queue (index data is entered, then put in a queue to proofread by a second person)

That's more to illustrate how simple our use case is than to ask about features (it looks like OpenKM can do all but the verification queue).
We have a spare server well within requirements, so hardware is not an issue.
This is mainly for archival purposes, and the documents, once recorded, will rarely be accessed or modified.
Mostly looking for feedback if the set up and overhead of running and maintaining OpenKM is worth it for a system storing a large number of a single document type.

Thanks for any advice the community can provide.
 #31520  by jllort
 
If papers have same format, are clear, and not hand written the best option should be process images ( OCR zone ) to extract fields and store as metadata fields. That will help you on retrieving information on an efficient way neither full search ( full search is tokenized and you can get problem for example with date '01-04-88' is tokenized by lucene as tree words ( you can also change the tokenizer to whitespace sepator only what solve some of the problems you can have on it ).

I suggest take a look at this video, https://www.youtube.com/watch?v=TkYS33I88oU the bad news is that is not available on community version. But if you want can do it, community comes with enought things for doing something similar, for example a previous process of images ( document -> document + cvs data ) -> and then import into openkm ( here can get some idea http://wiki.openkm.com/index.php/CSV_importer )

About Us

OpenKM is part of the management software. A management software is a program that facilitates the accomplishment of administrative tasks. OpenKM is a document management system that allows you to manage business content and workflow in a more efficient way. Document managers guarantee data protection by establishing information security for business content.