Page 1 of 2

Best way to organize folders and files

PostPosted:Thu Dec 17, 2015 11:33 am
by openkm_user
Hi,

We have been using OpenKM for some time now, but still a newbie when best practices are taken into account. We are going to have about 15 million documents. Gone through other threads regarding this topic and have a basic idea of hierarchy that will help us.

We have a list of Main folders that will have Subfolders which holds the files. Subfolder might sometimes also have another Folder inside. Main folder name will be a name (meaning text value) and Subfolder will be no. Inside each Main there could be thousands of Subfolders.

Hierarchy will be like,
/okm:root/Main1/1022593032/Files & a Folder
/okm:root/Main1/1022593033/Files & a Folder
..
..
..
/okm:root/Main1/1022601308/Files & a Folder
/okm:root/Main2/1022593074/Files & a Folder
/okm:root/Main2/1022593075/Files & a Folder
..
..
..
/okm:root/Main2/1025601308/Files & a Folder

Is this the right way to organize folders and files or is there any other better? Because I read in a thread that having 1000s of child nodes will have performance issues. In our case Main folder is going to have 1000s of folders.

How is Categories, Metadata, Thesaurus, Templates, etc., going to affect? Do we have to work on any of these other than uploading folders and files in Taxonomy?

Attaching couple of screenshots for better understanding.
Option1.JPG
Option1.JPG (33.18 KiB) Viewed 7529 times
We will have 1000s of accounts under each portfolio and each account will have 1 to 20 documents. Is it better to have taxonomy structure like this (above mentioned) or,
option2.JPG
option2.JPG (29.76 KiB) Viewed 7529 times
Here we have all files in same folder prefixing name of folders for each file.

For us it is more important that the performance is good while we do search or other operations using REST API.

Thanks in advance!

Re: Best way to organize folders and files

PostPosted:Fri Dec 18, 2015 2:02 pm
by openkm_user
Appreciate anybody's help :).

Re: Best way to organize folders and files

PostPosted:Sat Dec 19, 2015 6:49 pm
by jllort
Good practice is get 100 nodes per parent ( at least 500-1000 ) but take in mind icreasin the number of nodes per parent the performance will decrease ( basically the problem is on UI, large tables needs more time to render all information.

With a quick view my suggestion should be organizae the documents based on name, for example:
1022601308 -> /okm:root/1/0/2/2/6 ( The deep depends on number of files in your repository, probably with 4 or less levels could be enought for you ).

In this kind of scenarios a good practice is copy files on server and use a crontab task to import and catalog into openkm. The catalog process will create folders when needed and upload the document at final destination.

Take a look here:
http://wiki.openkm.com/index.php/Crontab

You should also be interested on:
createMissingFolders method -> http://docs.openkm.com/apidoc/com/okm/6 ... .String%29
createSimple method -> http://docs.openkm.com/apidoc/com/okm/6 ... tStream%29

From crontab session you must use always the systemToken.

Re: Best way to organize folders and files

PostPosted:Mon Dec 21, 2015 5:46 pm
by openkm_user
Hi,

Thanks for the reply. You say the performance of the UI will be a problem if we have lots of nodes per parent. We will be doing API calls, we will not be using OpenKM directly so UI being slow isn't a big concern for us. We might have 10,000 nodes (folder) under each parent (folder), will the performance of the system be acceptable in this case?

1022601308 is the name of the folder (which contains 1 to 20 documents) under parent (portfolio) folder.

Thanks again!

Re: Best way to organize folders and files

PostPosted:Tue Dec 22, 2015 12:30 pm
by jllort
Better if you have in consideration default UI performance. If some day your app presents some malfunction etc... you will need to take a look across default UI, and then you'll get the problem. Retrieving a list of 10K nodes is always bad idea, take in mind from API ( rest ) you expend some time marshalling an umarshalling the data. For few objects is small time, but for a lot of files will not be good idea. Consider always retrieve information what you really need, not all and then filtering. Another good scenario is retrieving from search engine ( based on metadata etc... ) that is usually a good scenario and can do pagination etc...

I do not like 10k nodes per parent. I suggest you create some segmentation based on folder name to get 100-200 per node. If you are retrieving from API, specially from search engine, you will get best performance on API calls and networking moving less information. Retrieve what you need ( I ignore if you really need to retrieve on a single call 10k nodes and why -> usually on this scenarios really you are building a report or performing integration task ). With more detailed information, expected logic to be used, information flow etc... I can try to give to you some clues for getting best performance.

Re: Best way to organize folders and files

PostPosted:Tue Dec 22, 2015 2:10 pm
by openkm_user
Hi,

Thanks for your input. We might have 10K nodes per parent (we will consider segmenting it in different folders like you suggested though), but API call is going to search for only 1 node (or at the very max may be a few more nodes) in a parent at a time.

The usual use-case is going to be like,
/okm:root/Portfolio1/1022593032/Documents

API call will search for 1022593032 (or few more of these accounts) to retrieve accounts and documents tree. Please let me know if you need more information to understand our use of system.

Re: Best way to organize folders and files

PostPosted:Wed Dec 23, 2015 6:08 pm
by jllort
If I understood we are talking about something like "user accounts". Each user account will have some folders ( not many ) and few documents. Each user account is identified by 1022593032. You will have about 10K user accounts, more or less. Is that ?

Re: Best way to organize folders and files

PostPosted:Wed Dec 23, 2015 7:29 pm
by openkm_user
Yes, you are right. Portfolios are going to have thousands of consumer accounts, different companies are going to own those accounts (will be handled with permissions). For example, PortfolioA will have Account1 (like 1022593032), Account2, Account3,..., Accountn. CompanyA owns Account1 and Account2, and CompanyB owns Account3 (consider CompanyA and CompanyB are the users logging into OpenKM through API).

So API call will search for Account1 and/or Account2 if logged in through CompanyA or Account3 if logged in through CompanyB.

Am I making sense?

Re: Best way to organize folders and files

PostPosted:Fri Dec 25, 2015 11:24 am
by jllort
Give us some numbers; how many companies ? How many portfolio per company ? How many accounts in a portfolio ? How many files into an account ? Some estimation is wellcome.

Re: Best way to organize folders and files

PostPosted:Fri Dec 25, 2015 3:04 pm
by openkm_user
There will be 100 companies approximately, 100 portfolios (approx) containing 1000s of accounts and each account contains around 1 to 50 documents.

Companies will own accounts, for example PortfolioA will have 1000 accounts, Company1 own 300 accounts, Company2 own 400 accounts and Company3 own remaining 300. Company1 might also own accounts from another portfolio.

Re: Best way to organize folders and files

PostPosted:Sun Dec 27, 2015 10:12 am
by jllort
My suggestion is:

company name / portfolio name / folder 0 to folder 9/account name/ files
account name will be on folder 0 if account starts with this digits

About metadata, for fast accessing, I suggest you create a property group with three metadata fields ( company name -> select, portfolio name -> select or input I have not enought information to decide between both options and account name -> input field ). Set metadata in account name folder. Then with search API you'll be able to quickly retrieve the folder UUID by account name and retrieve documents into with document API.

A better option will be set metadata in all documents and based on automation "set property" do the automatic catalog ( create missing folders, move data there, and set the security ). Take in mind probably you would like to set the security based on company or account.

Re: Best way to organize folders and files

PostPosted:Mon Dec 28, 2015 7:44 am
by openkm_user
Hi,

Thanks for the suggestion. Can you please explain a little more about setting metadata and automation for our purpose?

Thanks!

Re: Best way to organize folders and files

PostPosted:Tue Dec 29, 2015 4:01 pm
by jllort
Based on PropertyGroup event you can retrieve the metadata values and then create a new destination folder and move.

You should take a look at
http://docs.openkm.com/apidoc/com/okm/6 ... Group.html ( getProperties method )
http://docs.openkm.com/apidoc/com/okm/6 ... older.html ( createMissingFolders method )
http://docs.openkm.com/apidoc/com/okm/6 ... ument.html ( move method )

Re: Best way to organize folders and files

PostPosted:Wed Dec 30, 2015 7:06 am
by openkm_user
Thank you, we will look into all these and get back in case of any doubts, and I am sure we will have a lot.

Re: Best way to organize folders and files

PostPosted:Thu Dec 31, 2015 5:41 pm
by jllort
I'm not totally sure - about community development environment - but it should coming with automation and crontab sample ( http://sourceforge.net/projects/openkmportabledev/ ). You should start getting control on Automation task ( set property group ).