• Best way to organize folders and files

  • We tried to make OpenKM as intuitive as possible, but an advice is always welcome.
We tried to make OpenKM as intuitive as possible, but an advice is always welcome.
Forum rules: Please, before asking something see the documentation wiki or use the search feature of the forum. And remember we don't have a crystal ball or mental readers, so if you post about an issue tell us which OpenKM are you using and also the browser and operating system version. For more info read How to Report Bugs Effectively.
 #41076  by openkm_user
 
Hi,

We have been using OpenKM for some time now, but still a newbie when best practices are taken into account. We are going to have about 15 million documents. Gone through other threads regarding this topic and have a basic idea of hierarchy that will help us.

We have a list of Main folders that will have Subfolders which holds the files. Subfolder might sometimes also have another Folder inside. Main folder name will be a name (meaning text value) and Subfolder will be no. Inside each Main there could be thousands of Subfolders.

Hierarchy will be like,
/okm:root/Main1/1022593032/Files & a Folder
/okm:root/Main1/1022593033/Files & a Folder
..
..
..
/okm:root/Main1/1022601308/Files & a Folder
/okm:root/Main2/1022593074/Files & a Folder
/okm:root/Main2/1022593075/Files & a Folder
..
..
..
/okm:root/Main2/1025601308/Files & a Folder

Is this the right way to organize folders and files or is there any other better? Because I read in a thread that having 1000s of child nodes will have performance issues. In our case Main folder is going to have 1000s of folders.

How is Categories, Metadata, Thesaurus, Templates, etc., going to affect? Do we have to work on any of these other than uploading folders and files in Taxonomy?

Attaching couple of screenshots for better understanding.
Option1.JPG
Option1.JPG (33.18 KiB) Viewed 6008 times
We will have 1000s of accounts under each portfolio and each account will have 1 to 20 documents. Is it better to have taxonomy structure like this (above mentioned) or,
option2.JPG
option2.JPG (29.76 KiB) Viewed 6008 times
Here we have all files in same folder prefixing name of folders for each file.

For us it is more important that the performance is good while we do search or other operations using REST API.

Thanks in advance!
 #41086  by openkm_user
 
Appreciate anybody's help :).
 #41093  by jllort
 
Good practice is get 100 nodes per parent ( at least 500-1000 ) but take in mind icreasin the number of nodes per parent the performance will decrease ( basically the problem is on UI, large tables needs more time to render all information.

With a quick view my suggestion should be organizae the documents based on name, for example:
1022601308 -> /okm:root/1/0/2/2/6 ( The deep depends on number of files in your repository, probably with 4 or less levels could be enought for you ).

In this kind of scenarios a good practice is copy files on server and use a crontab task to import and catalog into openkm. The catalog process will create folders when needed and upload the document at final destination.

Take a look here:
http://wiki.openkm.com/index.php/Crontab

You should also be interested on:
createMissingFolders method -> http://docs.openkm.com/apidoc/com/okm/6 ... .String%29
createSimple method -> http://docs.openkm.com/apidoc/com/okm/6 ... tStream%29

From crontab session you must use always the systemToken.
 #41098  by openkm_user
 
Hi,

Thanks for the reply. You say the performance of the UI will be a problem if we have lots of nodes per parent. We will be doing API calls, we will not be using OpenKM directly so UI being slow isn't a big concern for us. We might have 10,000 nodes (folder) under each parent (folder), will the performance of the system be acceptable in this case?

1022601308 is the name of the folder (which contains 1 to 20 documents) under parent (portfolio) folder.

Thanks again!
 #41104  by jllort
 
Better if you have in consideration default UI performance. If some day your app presents some malfunction etc... you will need to take a look across default UI, and then you'll get the problem. Retrieving a list of 10K nodes is always bad idea, take in mind from API ( rest ) you expend some time marshalling an umarshalling the data. For few objects is small time, but for a lot of files will not be good idea. Consider always retrieve information what you really need, not all and then filtering. Another good scenario is retrieving from search engine ( based on metadata etc... ) that is usually a good scenario and can do pagination etc...

I do not like 10k nodes per parent. I suggest you create some segmentation based on folder name to get 100-200 per node. If you are retrieving from API, specially from search engine, you will get best performance on API calls and networking moving less information. Retrieve what you need ( I ignore if you really need to retrieve on a single call 10k nodes and why -> usually on this scenarios really you are building a report or performing integration task ). With more detailed information, expected logic to be used, information flow etc... I can try to give to you some clues for getting best performance.
 #41105  by openkm_user
 
Hi,

Thanks for your input. We might have 10K nodes per parent (we will consider segmenting it in different folders like you suggested though), but API call is going to search for only 1 node (or at the very max may be a few more nodes) in a parent at a time.

The usual use-case is going to be like,
/okm:root/Portfolio1/1022593032/Documents

API call will search for 1022593032 (or few more of these accounts) to retrieve accounts and documents tree. Please let me know if you need more information to understand our use of system.
 #41110  by jllort
 
If I understood we are talking about something like "user accounts". Each user account will have some folders ( not many ) and few documents. Each user account is identified by 1022593032. You will have about 10K user accounts, more or less. Is that ?
 #41111  by openkm_user
 
Yes, you are right. Portfolios are going to have thousands of consumer accounts, different companies are going to own those accounts (will be handled with permissions). For example, PortfolioA will have Account1 (like 1022593032), Account2, Account3,..., Accountn. CompanyA owns Account1 and Account2, and CompanyB owns Account3 (consider CompanyA and CompanyB are the users logging into OpenKM through API).

So API call will search for Account1 and/or Account2 if logged in through CompanyA or Account3 if logged in through CompanyB.

Am I making sense?
 #41117  by jllort
 
Give us some numbers; how many companies ? How many portfolio per company ? How many accounts in a portfolio ? How many files into an account ? Some estimation is wellcome.
 #41120  by openkm_user
 
There will be 100 companies approximately, 100 portfolios (approx) containing 1000s of accounts and each account contains around 1 to 50 documents.

Companies will own accounts, for example PortfolioA will have 1000 accounts, Company1 own 300 accounts, Company2 own 400 accounts and Company3 own remaining 300. Company1 might also own accounts from another portfolio.
 #41123  by jllort
 
My suggestion is:

company name / portfolio name / folder 0 to folder 9/account name/ files
account name will be on folder 0 if account starts with this digits

About metadata, for fast accessing, I suggest you create a property group with three metadata fields ( company name -> select, portfolio name -> select or input I have not enought information to decide between both options and account name -> input field ). Set metadata in account name folder. Then with search API you'll be able to quickly retrieve the folder UUID by account name and retrieve documents into with document API.

A better option will be set metadata in all documents and based on automation "set property" do the automatic catalog ( create missing folders, move data there, and set the security ). Take in mind probably you would like to set the security based on company or account.
 #41126  by openkm_user
 
Hi,

Thanks for the suggestion. Can you please explain a little more about setting metadata and automation for our purpose?

Thanks!
 #41138  by jllort
 
Based on PropertyGroup event you can retrieve the metadata values and then create a new destination folder and move.

You should take a look at
http://docs.openkm.com/apidoc/com/okm/6 ... Group.html ( getProperties method )
http://docs.openkm.com/apidoc/com/okm/6 ... older.html ( createMissingFolders method )
http://docs.openkm.com/apidoc/com/okm/6 ... ument.html ( move method )
 #41143  by openkm_user
 
Thank you, we will look into all these and get back in case of any doubts, and I am sure we will have a lot.

About Us

OpenKM is part of the management software. A management software is a program that facilitates the accomplishment of administrative tasks. OpenKM is a document management system that allows you to manage business content and workflow in a more efficient way. Document managers guarantee data protection by establishing information security for business content.