Page 1 of 1

CPU 100% load

PostPosted:Mon Dec 07, 2015 9:00 pm
by stefbort
Hi all.
I installed OpenKM 6.3.1 and after the uploading around 200GB (=18.000 files) I have a core of CPU to 100% load everytime. A screenshot to cpu/process status:
img1.png
img1.png (101.27 KiB) Viewed 5402 times
I tried to wait and after 5 day nothing cahnge: way? Whath is wrong?

A little deeping about my installation:
- i tried 3 times the installation and the problem is constant
- when i install and upload the first time all is OK. Only after the server reboot I see a Java thread load to 100% 1 core
- the files that i will upload are a big "mix file": text, Word, Open Office, PDF, Jpg, php, c, java, ecc... Total is, around, 3TB and a few milon files
- to upload the files I have prepare some (big) ZIP file and I upload it as "Import Documents from ZIP"
- the system is CentOS 7.1, Java "1.8.0_65" OpenJDK 64-Bit, OpenKM 6.3.1 build 8235 bundle, MariaDB backend and with the extra software Tesseract, pdf2swf, ImageMagick, Libreoffice and wkhtmltopdf
- another strange thing is no statics (I try to run by hand the process, but... nothing). Screenshot beelow:
img2.png
img2.png (117.2 KiB) Viewed 5402 times
- and a second strange behavior: when upload a zip file bigger than around 600MB the browser not show the finish "Process file..." (but, from the tomcat's log, it is finished and I can see all file and I can research each new files)
- in the log of Tomcat I can't see specifc error. Only I see some extract errors like the follow examples
Code: Select all
...
2015-12-07 20:50:00,539 [Thread-1193] WARN  com.openkm.dao.NodeDocumentDAO- There was a problem extracting text from '/okm:root/Progetti/Sito fondazionedonorione.org/sito_ORG/core_clone_20110403_150000/images/stories/editoriali/jsn_addiocrocifisso/007_addio_crocifisso.jpg': /opt/openkm/temp/okm1160755861317877225.txt (No such file or directory)
..
2015-12-06 10:20:06,758 [Thread-2010] WARN  org.apache.jackrabbit.extractor.MsWordTextExtractor- Failed to extract Word text content
org.apache.poi.hwpf.OldWordFileFormatException: The document is too old - Word 95 or older. Try HWPFOldDocument instead?
...
Thank you in advanced.

Re: CPU 100% load

PostPosted:Tue Dec 08, 2015 11:54 am
by jllort
When you upload documents they go into pending indexing queue. Each 5 minutes ( if you have not changed anything ) the documents are being processed from the queue to extract contents ( text ) for indexing purpose. Could be several reason why a process goes to 100% and not stops ( usually a complex file, could be xls or similar is locking the queue for some reason ), should investigate if the file in queue is never finishing ( Administration -> Stats -> Pending stat queue ).

I suggest increase tomcat memory at least to 4GB.
Before the huge uploads is also good idea check each mime type to be sure you have everything configured correctly, when you have error in text extractors could be by several reason ( older openoffice installed, not compatible word file, etc... ). Do a couple of test before starting is a good practice if you are not sure all is well configured.
About importing big files from desktop, our suggestion is get all files from server and import from there with administration import tool. About process not finishing with 600mb files or upper, should see if it happens some error in catalina.log ( you are working always on intranet or is another scenario ? )
We suggest Oracle JDK rather Open JDK, basically because is the version what we use on development and production ( that not means with Open JDK will not running, but better if you use the same than use ).

Re: CPU 100% load

PostPosted:Tue Dec 08, 2015 4:45 pm
by stefbort
Hi jllort.
I change java from OpenJDK to Oracle JDK.
About the pending queue I have around 164892 (!) extractions in queue! I attach 2 screenshot. Probably this is my problem!
img3.png
img3.png (245.29 KiB) Viewed 5394 times
img4.png
img4.png (102.83 KiB) Viewed 5394 times
How can I accelerate the extract process?

About my network configuration is LAN (PC - switch - SERVER). I am using, as client, Ubuntu with Firefox and Chromium but this my PC is computer to test and developing. I have one other OpenKM installation with around 10.000 files for 1GB total space in a different site. The client PCs are with Windows, Internet Explorer and Firefox with 10 users and I haven't meet any problem.
My be is my linux client with a problem.

Thank you again.

Re: CPU 100% load

PostPosted:Thu Dec 10, 2015 12:11 pm
by jllort
You have a lot of files pending to be processed by queue ( it's done by text extractor task in crontab view ). You should watch if the docu ments in queue decrease in time or not ?

Re: CPU 100% load

PostPosted:Sun Dec 13, 2015 5:15 pm
by stefbort
Yes: the documents decrease in time.

Re: CPU 100% load

PostPosted:Tue Dec 15, 2015 8:11 pm
by jllort
When OpenKM process PDF files ( take as example ) it needs a lot of cpu to process ( usually 100% ). Try go to administration > crontab > and disable a task named "text extractor worker". Wait for 10 minutes more or less and look then for CPU usage.

Re: CPU 100% load

PostPosted:Wed Dec 16, 2015 4:20 pm
by stefbort
Thank you a lot.
I understand.

Re: CPU 100% load

PostPosted:Wed Dec 30, 2015 10:44 am
by ericjohnson
First time used. Quite confusing :)