• OpenKM taking up constant CPU

  • We tried to make OpenKM as intuitive as possible, but an advice is always welcome.
We tried to make OpenKM as intuitive as possible, but an advice is always welcome.
Forum rules: Please, before asking something see the documentation wiki or use the search feature of the forum. And remember we don't have a crystal ball or mental readers, so if you post about an issue tell us which OpenKM are you using and also the browser and operating system version. For more info read How to Report Bugs Effectively.
 #43190  by openkm_user
 
Hi,

Even though there is no activity being performed in OpenKM, I still see OpenKM java process is still constantly taking 15% CPU, please explain why?

Server is Xeon with 16GB of RAM.

However 2 days ago, we have imported about 100,000 documents into the system, is there some more import process that is using up the CPU?

Thanks!
 #43209  by jllort
 
After any document is created goes into "text extraction pending queue", periodically set by crontab task named "text extractor worker" it's started a thread what extract text from documents for indexing them by content. You can configure how aggressive is this process, by defaul if you have not changed the parameters associated with it, the application will use a single thread ( single core )
 #43498  by openkm_user
 
Thank you for your prompt response. I am sorry for the late reply. We have imported about 500,000 documents into our repository. What you say makes sense.

However, we really don’t need to extract text from imported content and so we have stopped the ‘text extractor’ in the crontab. Please can you let us know how to NOT put the imported content in the text extraction queue? We just want to import it into the db?

We notice that openkm tomcat is constantly using about 7% CPU. After some investigation we noticed that com.openkm.core.updateinfo is the one using this CPU. What exactly does this method do? It appears we can set updateinfo to false in the config file and it will not run BUT what is the purpose of this updateinfo? Is this the same text extractor that you have mentioned or is this something different.

Thanks again :).
 #43499  by jllort
 
The UpdateInfo class connect to OpenKM server and send notifications from our side to the application. Notifications like "it has been released a new version you can upgrade your installation".

Take a look at this OpenKM section https://docs.openkm.com/kcenter/view/ok ... 0Nodetypes you should be interested in OKM_NODE_DOCUMENT and specially in two fields:
NDC_TEXT_EXTRACTED ( when the value is T means had been processed by the queue, when the value is F it means is in the queue, changing the value you can add or remove files from the queue ).
NDC_TEXT what contains the text extracted by the text extractor process. You can update this field manually or with the automatic process, but take in mind if you change from the database point of view is not visible from the search engine ( you have not changed from the api, and from the lucene search engine point of view nothing has changed ). If you modify any field from database view and you want to propagate changes to search engine, must reindex the whole repository https://docs.openkm.com/kcenter/view/ok ... dexes.html ( lucene indexes option. In your case with 500K files, can take some hours to be completed )
 #43506  by openkm_user
 
Thanks for the detailed explanation, can you please let me know how to turn off com.openkm.core.updateinfo? Is there any configuration parameter available in Administration?
 #43547  by openkm_user
 
Thanks, first of all we tried setting update.info= false in openkm.cfg file but it did not work. The updateinfo continued to run and consume CPU. Then we modified the config file to manually set it to false and this worked. However the updateinfo() calls com.openkm.core.RepositoryInfo().run();. Internally this run() method calls runAs() which calls methods such as okmStats.getFoldersByContext(). What is the purpose of these okmStats methods? Is it used anywhere?
 #43553  by jllort
 
I think might be a crontab task, there's a crontab task name "Repository info" executed daily is used to calculate stats. Try stopping it.

About Us

OpenKM is part of the management software. A management software is a program that facilitates the accomplishment of administrative tasks. OpenKM is a document management system that allows you to manage business content and workflow in a more efficient way. Document managers guarantee data protection by establishing information security for business content.