Page 1 of 1

How does one create an Automated Document Import

PostPosted:Wed Apr 09, 2014 7:32 pm
by rmueller
I'm completely new to OpenKM, we are in the process of evaluating it for purchase and as of now I have a question. We have version 6.2.5 Community loaded for our testing.

Assuming I create property groups and associated meta data; how can one upload a massive (4Million) number of documents into OpenKM automatically and have the property group items loaded specific for each document.

For each document, I am able to generate a record in a file (document index or property groups), the record looks like the sample below. The filename is called out in the last
item in the record. The records can be CSV, XML or ??? any other struture that lends itself to loading the files automatically.

The record layout below is as follows:

The first two fields identify the taxonomy of the document; Cases/Non-Barcoded Billing Slips

The rest of the fields (all but the last field) are the document property group, consisting of

Group Number
Doctor Number
Case Number
Instance Number
Name
Soc Sec Number
Date of Service
Update Code

"Cases","Non-Barcoded Billing Slips","GROUP NUMBER","201","DOCTOR NUMBER","1","CASE NUM","24654","INSTANCE NUMBER","0","PATIENT NAME","Jim, Jones","SOC SEC NUMBER","000000000","DATE OF SERVICE","8/21/2013","UPDATE CODE","","\\prdora64\scannedimages\Cases\000C1A9B.TIF"

The last field is the actual filename on disk to be uploaded.

ron

Re: How does one create an Automated Document Import

PostPosted:Fri Apr 11, 2014 10:16 am
by jllort
The easies is use CSV file ( we got some example also with XML but is more tedious parser it ), interesting information can be found here http://wiki.openkm.com/index.php/Utilities

In this example http://wiki.openkm.com/index.php/CSV_importer you set metadata to and existing file.
In this other example http://wiki.openkm.com/index.php/Crontab_xml_importer files and metadata are imported at same time and catalog based on metadata
I think all examples into Crontab are interesting to you.

If you got 4 milion doc is very important the way in you will decide to catalog ( folders created into taxonomy , security etc... ). I do not know if you are looking for container and users will not aaccesing from UI or not. Take in mind a lot of documents ( objects ) on same node will need a lot of rendering time ( we do not suggest more than 1000 objects per node if it's possible in order to get better UI perfomance -> the way you consider catalog data is important ).

Also you should have been interested in http://wiki.openkm.com/index.php/SDK ( specially java sdk ) what allows to remotelly accessing all openkm features transparently( unfortunatelly at this moment is not supported by community version what still not have rest support, you should try now with professional trial ). We do not suggets webservices - these are already supported by community - http://wiki.openkm.com/index.php/Workflow_Guide