• Open KM in Amazon Clustering - Huge Volume

  • We tried to make OpenKM as intuitive as possible, but an advice is always welcome.
We tried to make OpenKM as intuitive as possible, but an advice is always welcome.
Forum rules: Please, before asking something see the documentation wiki or use the search feature of the forum. And remember we don't have a crystal ball or mental readers, so if you post about an issue tell us which OpenKM are you using and also the browser and operating system version. For more info read How to Report Bugs Effectively.
 #28382  by Modria
 
Hi Moderators team,

A Warm greetings from Modria. We are building a legal application to resolve multi level disputes and we are leader in Dispute Resolution industry. Our business requires to store a huge number if legal documents which will be generated by our application and searched/listed/archived by various users. We have selected Open KM for document management and we are planning to purchase a professional licence for a Multi Availability Open km Environment. We are hosting all our services into Amazon cloud.

Currently our engineering team has installed Open km 6. 2.26 Professional Trial in our cloud. We are going to import atleast 100s of thousands of documents as a migration plan. (nearing 2-3 Terabytes of documents).

Our Major requirements are through Java API (upload,download, search) , we would be using Openkm UI for administrative tasks only. That said we are focusing in performance of APIs which will be inturn integrated into our Product.

We are also partnering with one of the NAS providers in cloud for distributed high availability Storage. We will be using this storage as the repository for Open KM. We also need Open KM as a clustered environment (To handle the loads of document requests).

Our plan forward is

1. MySQl in Amazon RDS
2. 2 EC2 Instances with Open KM professional installed (used for clustering- But doing the same job)
3. Use NAS NFS share drive for Repository (datastore) - Single point
3. Use local ROOT storage for caching (High IPOS) - Each individual server will have its own cache
4. Use local ROOT Storage for Index location (need your guidance here as to which is best solution) - Each individual server will have its own Index
5. Backup all Data into Amazon S3

We also want to make sure that the huge load of documents is currently supported in full Licensed version of Open KM? When we tried to import 30000+ documents (from a single folder) as a pilot run , the import got successful (1 hr 40 mins), but while viewing the documents in OpneKM UI it was taking indefinite time for load (Updating Document List) . though the repository view got rendered in 5 mins .

We would like to know what is the best practice here? . As we will be importing atleast 1 Million documents siing 3 TB into Open KM

The open questions for Open KM team is

1) Can Open KM support pure stateless clustering as mentioned above
2) Can Open KM have direct connectors to Amazon S3 for backup.
3) What is the optimal document count under a folder for best performance(If limit is exceed what is the way to view the document list for administrative tasks ?)

Please also let us know your support email ID for direct future communications

Thanks in Advance.

Ameen Raffic
Engineering , Modria
araffic@modria.com
http://www.modria.com
Attachments
Our server statistics after 30000 upload
Our server statistics after 30000 upload
opekm stat istics.jpg (114.04 KiB) Viewed 2116 times

About Us

OpenKM is part of the management software. A management software is a program that facilitates the accomplishment of administrative tasks. OpenKM is a document management system that allows you to manage business content and workflow in a more efficient way. Document managers guarantee data protection by establishing information security for business content.