• Turn off indexing temporarily

  • We tried to make OpenKM as intuitive as possible, but an advice is always welcome.
We tried to make OpenKM as intuitive as possible, but an advice is always welcome.
Forum rules: Please, before asking something see the documentation wiki or use the search feature of the forum. And remember we don't have a crystal ball or mental readers, so if you post about an issue tell us which OpenKM are you using and also the browser and operating system version. For more info read How to Report Bugs Effectively.
 #43783  by openkm_user
 
Hello,

We are doing large number of files via import. Each batch could be 10000 documents totaling 600Mb in size.

We are facing slowness periodically. On further investigation via resource monitor in windows we see that there is heavy I/O activity on the repository/index folder constantly. I believe that you might be constantly indexing the file during the imports. Is there a way to turn off the indexing during the import and then we can rebuild the lucene index later? Any thoughts are appreciated.

index.jpg
index.jpg (100.06 KiB) Viewed 3781 times

Thanks!
 #43794  by jllort
 
I'm not sure if community version have this feature, but the solution is not disabling lucene index, the solution has two sides:
1- do lucene index async
2- stop text extractor from the crontab task ( it will consume a lot of resources )

How many core do you have in your computer ?
What database are you using ?
What OS ?
Virtualized or dedicated server ?
 #43795  by jllort
 
I have reviewed the documentation of professinal version what comes with a parameter hibernate.search.worker.execution ( sync, async values ) what does lucene search engine does not stall the document uploading, but unfortunately this parameter and configuration still not exits into community version. Should appearing here https://docs.openkm.com/kcenter/view/ok ... eters.html and it does not exists.
 #43798  by openkm_user
 
jllort wrote: Fri May 12, 2017 6:57 pm I'm not sure if community version have this feature, but the solution is not disabling lucene index, the solution has two sides:
1- do lucene index async
2- stop text extractor from the crontab task ( it will consume a lot of resources )

How many core do you have in your computer ?
What database are you using ?
What OS ?
Virtualized or dedicated server ?
Text extractor crontab task already disabled.

How many core do you have in your computer? 16
What database are you using? MySQL 5.7.14
What OS? Windows Server 2012
Virtualized or dedicated server? Virtualized
 #43803  by jllort
 
What is the periodicity you are uploading this 10K files ( each day ? ). What total amount repository size do you thing you will have at the end. How do you store into openkm ( what folder structure do you use for it, what is the logic ), how many files do you have for each folder, the maximum you have ?

My first approach will be separate database server from openkm application server. And do not set more than eight cores for database. Also should check my.cnf to optimize Mysql for writing. Also I suggest linux rather windows if you want performance. With more data I can try providing you more clues, anyway the main problem is that lucene search engine will periodically stall the process ( I mean a peridioc delay in miliseconds ). If you use and application like jprofiler in combination with source code you'll watch what I'm talking about.
 #43811  by openkm_user
 
What is the periodicity you are uploading this 10K files ( each day ? ). What total amount repository size do you thing you will have at the end.

We are uploading more than 10K files daily. We are uploading 50K docs a day if possible. Ultimately we will be uploading a total of 1Tb data consissting of about 3 million documents. We will not use the openkm frontend so frontend slowness does not matter to us. If we get a good performance via REST that is enough.

How do you store into openkm ( what folder structure do you use for it, what is the logic ), how many files do you have for each folder, the maximum you have ?

We are storing the docs in folders such as Folder1, Folder2 etc. These folders reside under Root-->Portfolios-->PortfolioName-->PortfolioName_1-->Folder1. Each PortfolioName_1, PortfolioName_2, etc has at most 200 subfolders.

My first approach will be separate database server from openkm application server.

I am not sure what you mean by separate db server from openkm application server. Are you telling us to run mysql on a one physical server and tomcat on another physical server? We are already using mysql for the database instead of hypersonic that comes by default with openkm.

And do not set more than eight cores for database.

I can do this but why do you suggest this?

Also I suggest linux rather windows if you want performance.

We are forced by client to use windows.

We are not doing ANY text extraction at all since we dont need it. The only search ability we need is to locate files based on the file name.
 #43821  by jllort
 
First of all understanding this is not the standard installation and you need a lot of knowledge to success on it. It's not easy explaining all the problems you will find out during the process. But well I will try to give you some clues.

You need performance -> how ?
- Move Mysql in another server, totally separated, because you need to apply tunning configuration parameters to the database. Why do not use more than 8 cores ( we'll because I had migrated 2,5 milions of documents and with 8 cores I get best perfomance rather 10 and 12 ... the reason is quite complex to explaining, but has relation with disk I/O perfomance, writer buffers, OS etc... ). In your case I'm not sure, that's why in this scenarios is good idea having professional database and supported by specialized guys.
- More records means more problems from database -> obviously you can not decrease the number of documents, but should think in have as less as folders as possible, but at the same time folders with 1000-2000 documents, not more ( you must find the correct number ).
- You must disable all text extractors and also stop the crontab task "text extract"
- I do not understanding this folder structure "Root-->Portfolios-->PortfolioName-->PortfolioName_1-->Folder1" why ? depending the kind of problem might be other ways to storing data, and retrieving.

With small description of the data will store OpenKM, and how the third system will demand it ( for example -> invoices what will use invoice number to get back, or document uuid what is stored in the crm is used to retrieve the document. As you see here have two ways of interact between thirdparty application and OpenKM, this is one of the most important things to take in consideration, a bad decision here will get you better or worse performance ).

In this scenario only I suggest go for payment support ( OpenKM or another application ). Althought with payment subscription this kind of projects have a lot of problems and sometimes might fail. I ignore what is your experience on this scenario, but if this is your first aproach to this sizes I suggest for your safe expend money on supported version, will be more cheap than the hours and frustration migh be in the corner.

We can try to guide you from this forum, but this projects needs a lot of time, thinking and meetings, before taken a decision. In case repositories with more 5 milions I suggest go for payment supported database ( percona or similar guys can help on it, there's a cost, but also a reason for it ).

About Us

OpenKM is part of the management software. A management software is a program that facilitates the accomplishment of administrative tasks. OpenKM is a document management system that allows you to manage business content and workflow in a more efficient way. Document managers guarantee data protection by establishing information security for business content.