Open Source Document Management System | OpenKM - Open KM in Amazon Clustering

Reply

Open KM in Amazon Clustering - Huge Volume

#28382 by Modria
Tue Apr 22, 2014 7:31 am

Hi Moderators team,

A Warm greetings from Modria. We are building a legal application to resolve multi level disputes and we are leader in Dispute Resolution industry. Our business requires to store a huge number if legal documents which will be generated by our application and searched/listed/archived by various users. We have selected Open KM for document management and we are planning to purchase a professional licence for a Multi Availability Open km Environment. We are hosting all our services into Amazon cloud.

Currently our engineering team has installed Open km 6. 2.26 Professional Trial in our cloud. We are going to import atleast 100s of thousands of documents as a migration plan. (nearing 2-3 Terabytes of documents).

Our Major requirements are through Java API (upload,download, search) , we would be using Openkm UI for administrative tasks only. That said we are focusing in performance of APIs which will be inturn integrated into our Product.

We are also partnering with one of the NAS providers in cloud for distributed high availability Storage. We will be using this storage as the repository for Open KM. We also need Open KM as a clustered environment (To handle the loads of document requests).

Our plan forward is

1. MySQl in Amazon RDS
2. 2 EC2 Instances with Open KM professional installed (used for clustering- But doing the same job)
3. Use NAS NFS share drive for Repository (datastore) - Single point
3. Use local ROOT storage for caching (High IPOS) - Each individual server will have its own cache
4. Use local ROOT Storage for Index location (need your guidance here as to which is best solution) - Each individual server will have its own Index
5. Backup all Data into Amazon S3

We also want to make sure that the huge load of documents is currently supported in full Licensed version of Open KM? When we tried to import 30000+ documents (from a single folder) as a pilot run , the import got successful (1 hr 40 mins), but while viewing the documents in OpneKM UI it was taking indefinite time for load (Updating Document List) . though the repository view got rendered in 5 mins .

We would like to know what is the best practice here? . As we will be importing atleast 1 Million documents siing 3 TB into Open KM

The open questions for Open KM team is

1) Can Open KM support pure stateless clustering as mentioned above
2) Can Open KM have direct connectors to Amazon S3 for backup.
3) What is the optimal document count under a folder for best performance(If limit is exceed what is the way to view the document list for administrative tasks ?)

Please also let us know your support email ID for direct future communications

Thanks in Advance.

Ameen Raffic
Engineering , Modria
araffic@modria.com
http://www.modria.com