Open Source Document Management System | OpenKM

PostPosted:**Tue Aug 14, 2012 11:50 am**

In the comparison of versions you indicate that there are limitations in users and size of the repository between versions. Does the community version use a different code or employ inherent limitations as to the users and size vis-a-vis the other paid versions?

I am thinking of employing it for a growing 100gb sized repository on a file system with 20-30 users on a debian server. Please advise.

PostPosted:**Wed Aug 15, 2012 2:54 pm**

You're talking about the table at http://www.openkm.com/en/overview/compa ... sions.html
With values:
Size of repository [Small, Small and medium, All]
Users [Few, Scalable on demand, Without limit]

Obviously has not exactly the same source code, there're some optimizations that comunity version has not. Community version can store 100GB and more, really the problem is not repository size the problems really normally are the number of total files, normally 100GB are 100K files. With this number of files at least you should change default DBSM to Mysql or PostgreSQL

PostPosted:**Fri Aug 17, 2012 8:33 am**

actually 100gb of our repository amounts to 300k files and more. We plan to increase this even more but obviously cannot trust mysql or postgres to handle a 100gb+ size database and would be more comfortable using a file system based repository. Will that be a problem or we come to work one day to find that everything has crashed the night before and cannot access our mission critical files? You know in terms of business users, functionality and features come second or lower, to robustness. A very simple system which can handle terrabyte sized repositories is far more useful than a feature-full system with limited size capabilities.
Just for discussion we rate our priorities as follows:
1. robustness for high repositories (1TB+)
2. Backup (preferably incremental such as rsync)
3. user control
4. encryption
.
.
.
20.. all else and really of very low priority if at all necessary

PostPosted:**Sat Aug 18, 2012 3:03 pm**

excuse my ignorance but is there any way to split the repository into DBMS information such as metadata whilst storing the image data in the file system to promote robustness (which can then be easily backed up individually) ? if yes then it changes everything for us and we can proceed with implementation since we can easily use postgres + the files system to achieve our heavy implementation. In this scenario even if the DBMS crashes theoretically the file system part of the repository shall remain intact albeit without its metadata.

PostPosted:**Sun Aug 19, 2012 10:01 am**

OpenKM DBMS works as you said, metadata etc... are on one DBMS and binary information in other, normal configuration directly at hard disk structure ( can be stored on DMBS but is not the configuration that comes by default ). We use rsync for incremental backup ( consider you need metadata too, because on hard disk binaries are stored in non human file name and folders structure, you will not recognise by name etc... as files you see in OpenKM, uuid etc... is used to store binary files ok ?

About using community version for greater repositories like 1 Tera etc.. I do not suggest this scenario, althought I know there are users who are using greater repositories with community version ( 200Gb and something more, I have no notice for 1 tera or upper ), professional version is optimized for 1 milion documents and more, but not community. About question, can I use community for it ?, the answer is yes, but you should be working on optimization if you get performance problems, study database queries time, etc... Community version is prepared for general purpose users, effors are orientated in making extensible to almost users, to solve general documentation problems, focused on little and small companies with general purpose repositories ( store company documentation etc... ) we considering 100k documents the maximum scenario for the 90% of candidate users.

Basically for understanting us, we dedicate time to solve general problems to arrive at major number of users. We must decide where we dedicate effors, new general purpose feature or exclusive one for one company or with less candicates. In this case we select general purpose. The idea is that what is not general purpose, as you can have source code, you can collaborate in making openkm better contributing with your specifics needs, optimizations. Some people does it, other dedice contract us for doing it, this case is professional version, where we make optimizations or special parametrizations for our customers. Basically this is the idea. Sincerally consider 1 milion docs is not into the general problem, really upper 100K you need working on optimizations.

PostPosted:**Mon Aug 20, 2012 8:23 am**

I fully understand what you are saying and concur as to the need for optimisation in certain particular cases. As I noted in my last email though size is important and a welcome step when you release your next update to see some optimisation there perhaps by bundling postgres within the basic installation something which I believe other DMSs do successfully. Having said that however, you need to feed yourselves and your families too so your position to charge for your time dissipated in optimisation is not only understandable but also your prerogative (right).
Thanks for your input once again.

PostPosted:**Mon Aug 20, 2012 4:09 pm**

Comunity version is prepared for sizes between 25k-100k that should get good performance ( in some cases need some optimization, depending number of users, if you upload a lot of pdf which need ocr etc... ). From 25k-50k is good idea change hypersonic DBMS to Mysql, postgreSQL etc.. When you get a lot of documents sure is needed doing some special tunning on the installation, the decrease performance can be caused by several causes, sometimes a non correct use of the application ( bad taxonomy is the easiest way to get bad performance and there's no need to get a lot of documents, simply doing a bad organization could cause low rendering time from the browser, among other causes ). Althought we releases all optimizations always are needed at least minor tunning.

PostPosted:**Thu Aug 23, 2012 3:05 pm**

thanks for your analysis. What strikes me is that you are using a known and tested technology along with a repository architecture commonly used to all the major dms/cms systems (you know the ones I refer to). Yet they claim huge database workloads without any apparent detriment to performance ( easily 1TB of data and a million docs) whilst you are being modest and careful in your assessment. Obviously I trust your opinion and modesty more on this yet it is a pity that such a well designed dms be left at the mercy of the big boys manipulation.

PostPosted:**Fri Aug 24, 2012 4:41 pm**

When you talk about repository architecture I do not know if you're talking about about jackrabbit or other DMS. All obviously we use common DBMS and application servers or servlet container ( tomcat ) like other DMS, that not specific dms architecture I will talk about normal web application architecture. Althought all apparently use the same internally have important differences and each team decides to solve similar problems but in differents ways ( sometimes with better results others worse ). I do not like to talk about others, really are other good applications, sincerally I think there's space for all, and depending users need, ones are sometimes better to than others. I think nobody will be able to implementing the "definitive dms application".

Us finally we left jackrabbit in version 6.0 we use our own repository structure for getting 100% control and not losing performance in some places ( althought it, our recognizement to jackrabbit project, during first years has been good travel companion and we can continue supporting api in 6.0 but we switch to ours).

Finally our project has been been reverse engineered by other, at least. And some source code parts, ideas etc... has been copyed exactly ( copy & paste, I made some forense analysis ). Obviously all projects gets one to others inspirations and ideas, that's normall, but arrived some level is not so normal. That caused some prevention.

PostPosted:**Fri Aug 31, 2012 3:46 pm**

if I wish to try this on a 100-150gb repository size and provided I change the database to postgres and am careful on the taxonomy side as well, (many folders of small number of files each) would you think that we would still need to optimise java etc considering now that you no longer have any jackrabbit limitations?
on the hardware side both ram and processing will be plentiful.

PostPosted:**Sun Sep 02, 2012 9:06 am**

JVM always should be have minimal configuration memory ( on run.sh at JAVA_OPTS parameters are normal configuration , you can increase at your needs, if you got some stats application to trace harware consuming, you will know if you should increase or decrease JVM memory parameters ). Depending concurrent users and operations will be done in system ( antivirus, previewing and textextractors are top 100% cpu consumers ) you will need more or less cpu's and memory. Your users feeling and your stats will indicate to you the direction.

Open Source Document Management System | OpenKM

users and size

users and size

Re: users and size

Re: users and size

Re: users and size

Re: users and size

Re: users and size

Re: users and size

Re: users and size

Re: users and size

Re: users and size

Re: users and size