• Keyword slow down import?

  • We tried to make OpenKM as intuitive as possible, but an advice is always welcome.
We tried to make OpenKM as intuitive as possible, but an advice is always welcome.
Forum rules: Please, before asking something see the documentation wiki or use the search feature of the forum. And remember we don't have a crystal ball or mental readers, so if you post about an issue tell us which OpenKM are you using and also the browser and operating system version. For more info read How to Report Bugs Effectively.
 #43777  by openkm_user
 
Hi,

We have about 6 million documents currently and there are queries that are taking a lot of time to execute slowing down the import process. One of them involves Keywords, does Keyword slow down importing? For each document depending on the document name a keyword is set and permission is granted depending on certain condition (document name again).

I am not sure where I read but in one of the threads it was mentioned that Metadata is a better option than Keyword (or is using Keyword a very bad option?).

Thanks!
 #43779  by openkm_user
 
Code: Select all
select COUNT(DISTINCT(NKW_KEYWORD)) from OKM_NODE_KEYWORD
This returns 24, documents will have one of these 24 keywords only added to it. So we aren't giving unique keyword for each of the documents.
 #43793  by jllort
 
Very bad idea using keywords in this scenarios. Keyword might going right for small companies, 5-10 users what they will not create a lot of them or with a controlled dictionary ( like thesaurus ) anyway I do not suggest it. It's always better for several reason - what I will explain now for not extending so much the answer - use metadata, with metadata you can do the same as using keywords or categories and something more. But for performance reason we encourage do not use it, take in mind at dashboard we are drawing a tag cloud, with a tag cloud of 25K keywords can take several minutes the browser stalled working.

Do you have 6 milions docs into OpenKM community version ? is that your scenario ? What OpenKM version are you using.
 #43799  by openkm_user
 
Yes, we have 6 million documents in OpenKM Community 6.3.3. We periodically purge dashboard table since it is not required for our purpose, we use OpenKM like a container and access everything through REST API.

If Keyword is absolutely a bad idea, I read somewhere you mentioned that Keyword can be converted to Metadata, can you please let us know how to do?
 #43804  by jllort
 
The best option should be a crontab task or scripting task what iterate across all the repository, getting the keywords and converting to metadata, and removing keyword. Consider these URL as a starting point:
https://docs.openkm.com/kcenter/view/ok ... rsal-.html
https://docs.openkm.com/kcenter/view/ok ... etChildren ( list of documents from some folder ) -> Document will return keywords associated
https://docs.openkm.com/kcenter/view/ok ... oveKeyword ( remove the keyword )
https://docs.openkm.com/kcenter/view/ok ... tiesSimple ( set metadata )
https://docs.openkm.com/kcenter/view/ok ... field.html ( single input field is enought for your case ).

6 milions docs is something very big, in this kind of scenarios must consider a lot of things. You must tunning database sure from up 1 milion docs. How do you catalog files -> set the folder ?

Good idea is to separate application server from database server. Investigate configuration performance for writing, and cache.

Are you using security in your repository or only a single user administrator ( the best scenario is removing any kind of security in the repository and use only administrator conection, but obviously that is better if you do it from the begining ).
 #43836  by openkm_user
 
Thanks, I will take a look at all these.

About Us

OpenKM is part of the management software. A management software is a program that facilitates the accomplishment of administrative tasks. OpenKM is a document management system that allows you to manage business content and workflow in a more efficient way. Document managers guarantee data protection by establishing information security for business content.