Open Source Document Management System | OpenKM - Turn off indexing temporarily

Reply

Turn off indexing temporarily

#43783 by openkm_user
Wed May 10, 2017 2:41 pm

Hello,

We are doing large number of files via import. Each batch could be 10000 documents totaling 600Mb in size.

We are facing slowness periodically. On further investigation via resource monitor in windows we see that there is heavy I/O activity on the repository/index folder constantly. I believe that you might be constantly indexing the file during the imports. Is there a way to turn off the indexing during the import and then we can rebuild the lucene index later? Any thoughts are appreciated.

index.jpg (100.06 KiB) Viewed 4206 times

Thanks!

Username

openkm_user

Rank

Expert Boarder

Posts

142

Joined

Thu Dec 17, 2015 7:38 am

Re: Turn off indexing temporarily

#43794 by jllort
Fri May 12, 2017 6:57 pm

I'm not sure if community version have this feature, but the solution is not disabling lucene index, the solution has two sides:
1- do lucene index async
2- stop text extractor from the crontab task ( it will consume a lot of resources )

How many core do you have in your computer ?
What database are you using ?
What OS ?
Virtualized or dedicated server ?

Username

jllort

Rank

Moderator

Posts

12155

Joined

Fri Dec 21, 2007 11:23 am

Location

Sineu - ( Illes Balears ) - Spain

Contact

Re: Turn off indexing temporarily

#43795 by jllort
Fri May 12, 2017 7:00 pm

I have reviewed the documentation of professinal version what comes with a parameter hibernate.search.worker.execution ( sync, async values ) what does lucene search engine does not stall the document uploading, but unfortunately this parameter and configuration still not exits into community version. Should appearing here https://docs.openkm.com/kcenter/view/ok ... eters.html and it does not exists.

Username

jllort

Rank

Moderator

Posts

12155

Joined

Fri Dec 21, 2007 11:23 am

Location

Sineu - ( Illes Balears ) - Spain

Contact

Re: Turn off indexing temporarily

#43798 by openkm_user
Fri May 12, 2017 7:20 pm

jllort wrote: ↑Fri May 12, 2017 6:57 pm I'm not sure if community version have this feature, but the solution is not disabling lucene index, the solution has two sides:
1- do lucene index async
2- stop text extractor from the crontab task ( it will consume a lot of resources )

How many core do you have in your computer ?
What database are you using ?
What OS ?
Virtualized or dedicated server ?

Text extractor crontab task already disabled.

How many core do you have in your computer? 16
What database are you using? MySQL 5.7.14
What OS? Windows Server 2012
Virtualized or dedicated server? Virtualized

Username

openkm_user

Rank

Expert Boarder

Posts

142

Joined

Thu Dec 17, 2015 7:38 am

Re: Turn off indexing temporarily

#43803 by jllort
Sat May 13, 2017 11:30 am

What is the periodicity you are uploading this 10K files ( each day ? ). What total amount repository size do you thing you will have at the end. How do you store into openkm ( what folder structure do you use for it, what is the logic ), how many files do you have for each folder, the maximum you have ?

My first approach will be separate database server from openkm application server. And do not set more than eight cores for database. Also should check my.cnf to optimize Mysql for writing. Also I suggest linux rather windows if you want performance. With more data I can try providing you more clues, anyway the main problem is that lucene search engine will periodically stall the process ( I mean a peridioc delay in miliseconds ). If you use and application like jprofiler in combination with source code you'll watch what I'm talking about.

Username

jllort

Rank

Moderator

Posts

12155

Joined

Fri Dec 21, 2007 11:23 am

Location

Sineu - ( Illes Balears ) - Spain

Contact

Re: Turn off indexing temporarily

#43811 by openkm_user
Mon May 15, 2017 9:12 am

What is the periodicity you are uploading this 10K files ( each day ? ). What total amount repository size do you thing you will have at the end.

We are uploading more than 10K files daily. We are uploading 50K docs a day if possible. Ultimately we will be uploading a total of 1Tb data consissting of about 3 million documents. We will not use the openkm frontend so frontend slowness does not matter to us. If we get a good performance via REST that is enough.

How do you store into openkm ( what folder structure do you use for it, what is the logic ), how many files do you have for each folder, the maximum you have ?

We are storing the docs in folders such as Folder1, Folder2 etc. These folders reside under Root-->Portfolios-->PortfolioName-->PortfolioName_1-->Folder1. Each PortfolioName_1, PortfolioName_2, etc has at most 200 subfolders.

My first approach will be separate database server from openkm application server.

I am not sure what you mean by separate db server from openkm application server. Are you telling us to run mysql on a one physical server and tomcat on another physical server? We are already using mysql for the database instead of hypersonic that comes by default with openkm.

And do not set more than eight cores for database.

I can do this but why do you suggest this?

Also I suggest linux rather windows if you want performance.

We are forced by client to use windows.

We are not doing ANY text extraction at all since we dont need it. The only search ability we need is to locate files based on the file name.

Username

openkm_user

Rank

Expert Boarder

Posts

142

Joined

Thu Dec 17, 2015 7:38 am

Re: Turn off indexing temporarily

#43821 by jllort
Mon May 15, 2017 8:14 pm

First of all understanding this is not the standard installation and you need a lot of knowledge to success on it. It's not easy explaining all the problems you will find out during the process. But well I will try to give you some clues.

You need performance -> how ?
- Move Mysql in another server, totally separated, because you need to apply tunning configuration parameters to the database. Why do not use more than 8 cores ( we'll because I had migrated 2,5 milions of documents and with 8 cores I get best perfomance rather 10 and 12 ... the reason is quite complex to explaining, but has relation with disk I/O perfomance, writer buffers, OS etc... ). In your case I'm not sure, that's why in this scenarios is good idea having professional database and supported by specialized guys.
- More records means more problems from database -> obviously you can not decrease the number of documents, but should think in have as less as folders as possible, but at the same time folders with 1000-2000 documents, not more ( you must find the correct number ).
- You must disable all text extractors and also stop the crontab task "text extract"
- I do not understanding this folder structure "Root-->Portfolios-->PortfolioName-->PortfolioName_1-->Folder1" why ? depending the kind of problem might be other ways to storing data, and retrieving.

With small description of the data will store OpenKM, and how the third system will demand it ( for example -> invoices what will use invoice number to get back, or document uuid what is stored in the crm is used to retrieve the document. As you see here have two ways of interact between thirdparty application and OpenKM, this is one of the most important things to take in consideration, a bad decision here will get you better or worse performance ).

In this scenario only I suggest go for payment support ( OpenKM or another application ). Althought with payment subscription this kind of projects have a lot of problems and sometimes might fail. I ignore what is your experience on this scenario, but if this is your first aproach to this sizes I suggest for your safe expend money on supported version, will be more cheap than the hours and frustration migh be in the corner.

We can try to guide you from this forum, but this projects needs a lot of time, thinking and meetings, before taken a decision. In case repositories with more 5 milions I suggest go for payment supported database ( percona or similar guys can help on it, there's a cost, but also a reason for it ).

Username

jllort

Rank

Moderator

Posts

12155

Joined

Fri Dec 21, 2007 11:23 am

Location

Sineu - ( Illes Balears ) - Spain

Contact

Reply

Page 1 of 1
7 posts