• Email - Cron task 'Text Extractor Worker' executed - Error

  • We tried to make OpenKM as intuitive as possible, but an advice is always welcome.
We tried to make OpenKM as intuitive as possible, but an advice is always welcome.
Forum rules: Please, before asking something see the documentation wiki or use the search feature of the forum. And remember we don't have a crystal ball or mental readers, so if you post about an issue tell us which OpenKM are you using and also the browser and operating system version. For more info read How to Report Bugs Effectively.
 #30882  by kvijay
 
I received the following message to administrator email id.

Subject : Cron task 'Text Extractor Worker' executed - Error
Code: Select all
Sourced file: inline evaluation of: ``new com.openkm.extractor.TextExtractorWorker().run();'' : Method Invocation run : at Line: 1 : in file: inline evaluation of: ``new com.openkm.extractor.TextExtractorWorker().run();'' : .run ( ) Target exception: java.lang.OutOfMemoryError: Java heap space
What does it mean and what action to be taken.

With regards,

K Vijay
 #30902  by jllort
 
It means few heap memory to execute the process. Go to $TOMCAT_HOME/bin/setenv.sh ( or setenv.bat ) and increase the maxPermSize value ( for example 512MB ). Then restart the openkm. Thas has been caused because openkm execute a thread for indexing a document, but the process needed more memory than max allowed, that caused a heap error ( could be a huge pdf or similar )
 #39693  by gwaitsi
 
Hi Jllort

I have just started receiving this message yesterday, even though i hadn't uploaded any new documents.
i increased the memory per your suggestion, but am still getting an email every 5min since i increased the memory.
yesterday, i had only receive one email when the memory was set to 512
 #39700  by jllort
 
Try with 1024m and after set the parameter will be modified you must restart the openkm. If the problem persists, then should take a look at your actual queue, should be one document that's is causing the problem ( need a lot of heap to be processed ), if increasing memory not solves the problem, tell us and I will try to explain how to detect the file and what do with it.
 #39708  by gwaitsi
 
I don't know what was wrong, but there were 900 pending tasks and nothing being executed.
Now there are 16000 but they are being executed and the number is decreasing.
So i guess it will take some hours before the everything has been processed.
 #39710  by gwaitsi
 
hi Jllort,

java become stuck with 100% CPU, so i up increased the MaxPerm to 1024m from 768m and restart the server.
The TextExtractionQueue was stuck around 12,500 and not processing.
The only way i could get it to process, was to rebuild the index but then it reset the queue to 16,000

now it is stuck at 10,600 and i see it seems stuck on a specific PDF file.
i can see from the processes in the webview that it is the cuneiform that is stuck.

when i local on the process monitor, i see cuneiform is continuing to process views

Is there another way to restart the processing if it is stuck like that again? or to clear the webform
 #39719  by jllort
 
My suggestion, depending your OS version consider switch to tesseract.
Get the document UUID and then update database to indicate has been processed
Code: Select all
select * from OKM_NODE_BASE WHERE NBS_UUID = 'DOC UUID' 
The field you must update is named NBS_TEXT_EXTRACTED with value to 'T'
 #39726  by gwaitsi
 
i've changed to tesseract as you have suggested, and i note that the output files are now named "okm1234.txt.txt"
Under cuneiform they were named "okm1234.txt" and i think this is a problem, because the tmp files are not being deleted as they were under cuneiform.

looking at some of the text file under both tesseract and cuneiform, i see no text of any value.

Is it possible to see what text has been associated to a document?
 #39727  by gwaitsi
 
I added com.openkm.extractor.Tesseract3TextExtractor and removed the cuneiform one, and removed the "-l eng" and the files now have no extension. is that correct?

under tesseract it seems to process much slower than cuneiform but i don't see with either, any quality in the character recognition.
 #45048  by ulinuha
 
jllort wrote: Sun Jan 11, 2015 12:16 pm It means few heap memory to execute the process. Go to $TOMCAT_HOME/bin/setenv.sh ( or setenv.bat ) and increase the maxPermSize value ( for example 512MB ). Then restart the openkm. Thas has been caused because openkm execute a thread for indexing a document, but the process needed more memory than max allowed, that caused a heap error ( could be a huge pdf or similar )
I've tried this and didn't work.
I solved by this.

Try to update the service paramater
1. First stop the OpenKM service
2. Go to $TOMCAT_HOME/bin/openkmw.exe
Click "Java" tab, in Java Options: insert:
-Xmx2048m (or any number refer to your preference/ hardware)
-Xms1024m (or any number refer to your preference/ hardware)
Capture.PNG
Capture.PNG (10.59 KiB) Viewed 5406 times
3. Start OpenKM service
 #45066  by jllort
 
Xms and Xmx must not be added here. Really the two fields "initial memory" etc.. .are for it.

What OpenKM version are you using ? How much memory do you have ? Are you in a 64 bits scenario ?

With JDK 1.8 the heap is dynamically assigned by JVM without any configuration need, if you are in JDK 1.7 take a look on how to register, take some time reading this section https://docs.openkm.com/kcenter/view/ok ... asaservice
 #45086  by ulinuha
 
I'm using openkm 6.3.4. Physical memory is 7gb in windows 10 64bit. Jdk 1.8. I don't understand why setting xms and xmx in setenv.bat didn't make any difference in memory problem (still openkm is quite slow, previewing doc keep failing, text extractor queue never finished, etc), while setting them in openkmw.exe worked. I'm not advanced in Java. Please tell me what's the impact of setting xms and xmx in openkmw may result? Thanks.
 #45092  by jllort
 
When tomcat is started as service, the service.bat changes does not take any kind of effect on JVM parameters ( service.bat file is not used when starting service ). About the list of issues - preview , indexing - do you have I suggest you add a post for each one ( better if the discussion on each post is focused on a single issue ). About perfomance ... Windows is not the best scenario, with same hardware you will get always best results with linux rather windows, basically because Linux I/O is quite better than windows. You will observe it specially when openkm is starting on windows what for uncompressing war file can takes 2 or 3 times more than linux, but when tomcat has been started the perfomance difference are not as visible as in the starting process.

If you have in the server other applications running, specially antivirus or similar hardware I suggest disable them, at least for openkm folders. And obviously share a server with other applications is not the best scenario for any application because all of them are on competition for resources and ones might be affected by others ... etc..

About Us

OpenKM is part of the management software. A management software is a program that facilitates the accomplishment of administrative tasks. OpenKM is a document management system that allows you to manage business content and workflow in a more efficient way. Document managers guarantee data protection by establishing information security for business content.