Open Source Document Management System | OpenKM - Email - Cron task 'Text Extractor Worker' executed

Reply

Email - Cron task 'Text Extractor Worker' executed - Error

#30882 by kvijay
Fri Jan 09, 2015 1:42 am

I received the following message to administrator email id.

Subject : Cron task 'Text Extractor Worker' executed - Error

Code: Select all

Sourced file: inline evaluation of: ``new com.openkm.extractor.TextExtractorWorker().run();'' : Method Invocation run : at Line: 1 : in file: inline evaluation of: ``new com.openkm.extractor.TextExtractorWorker().run();'' : .run ( ) Target exception: java.lang.OutOfMemoryError: Java heap space

What does it mean and what action to be taken.

With regards,

K Vijay

Username

kvijay

Rank

Fresh Boarder

Posts

9

Joined

Sat Jul 19, 2014 1:54 am

Re: Email - Cron task 'Text Extractor Worker' executed - Err

#30902 by jllort
Sun Jan 11, 2015 12:16 pm

It means few heap memory to execute the process. Go to $TOMCAT_HOME/bin/setenv.sh ( or setenv.bat ) and increase the maxPermSize value ( for example 512MB ). Then restart the openkm. Thas has been caused because openkm execute a thread for indexing a document, but the process needed more memory than max allowed, that caused a heap error ( could be a huge pdf or similar )

Username

jllort

Rank

Moderator

Posts

12128

Joined

Fri Dec 21, 2007 11:23 am

Location

Sineu - ( Illes Balears ) - Spain

Contact

Re: Email - Cron task 'Text Extractor Worker' executed - Error

#39693 by gwaitsi
Sat May 30, 2015 6:02 am

Hi Jllort

I have just started receiving this message yesterday, even though i hadn't uploaded any new documents.
i increased the memory per your suggestion, but am still getting an email every 5min since i increased the memory.
yesterday, i had only receive one email when the memory was set to 512

Username

gwaitsi

Rank

Senior Boarder

Posts

54

Joined

Wed Sep 03, 2014 1:00 pm

Re: Email - Cron task 'Text Extractor Worker' executed - Error

#39700 by jllort
Sat May 30, 2015 9:33 am

Try with 1024m and after set the parameter will be modified you must restart the openkm. If the problem persists, then should take a look at your actual queue, should be one document that's is causing the problem ( need a lot of heap to be processed ), if increasing memory not solves the problem, tell us and I will try to explain how to detect the file and what do with it.

Username

jllort

Rank

Moderator

Posts

12128

Joined

Fri Dec 21, 2007 11:23 am

Location

Sineu - ( Illes Balears ) - Spain

Contact

Re: Email - Cron task 'Text Extractor Worker' executed - Error

#39705 by gwaitsi
Sun May 31, 2015 1:26 pm

where do we find the queue pls?

Username

gwaitsi

Rank

Senior Boarder

Posts

54

Joined

Wed Sep 03, 2014 1:00 pm

Re: Email - Cron task 'Text Extractor Worker' executed - Error

#39707 by jllort
Sun May 31, 2015 4:31 pm

Go to Administration -> Statistics -> At top right you have the option "Text extraction queue"

Username

jllort

Rank

Moderator

Posts

12128

Joined

Fri Dec 21, 2007 11:23 am

Location

Sineu - ( Illes Balears ) - Spain

Contact

Re: Email - Cron task 'Text Extractor Worker' executed - Error

#39708 by gwaitsi
Sun May 31, 2015 8:23 pm

I don't know what was wrong, but there were 900 pending tasks and nothing being executed.
Now there are 16000 but they are being executed and the number is decreasing.
So i guess it will take some hours before the everything has been processed.

Username

gwaitsi

Rank

Senior Boarder

Posts

54

Joined

Wed Sep 03, 2014 1:00 pm

Re: Email - Cron task 'Text Extractor Worker' executed - Error

#39710 by gwaitsi
Mon Jun 01, 2015 9:05 am

hi Jllort,

java become stuck with 100% CPU, so i up increased the MaxPerm to 1024m from 768m and restart the server.
The TextExtractionQueue was stuck around 12,500 and not processing.
The only way i could get it to process, was to rebuild the index but then it reset the queue to 16,000

now it is stuck at 10,600 and i see it seems stuck on a specific PDF file.
i can see from the processes in the webview that it is the cuneiform that is stuck.

when i local on the process monitor, i see cuneiform is continuing to process views

Is there another way to restart the processing if it is stuck like that again? or to clear the webform

Username

gwaitsi

Rank

Senior Boarder

Posts

54

Joined

Wed Sep 03, 2014 1:00 pm

Re: Email - Cron task 'Text Extractor Worker' executed - Error

#39719 by jllort
Tue Jun 02, 2015 4:55 pm

My suggestion, depending your OS version consider switch to tesseract.
Get the document UUID and then update database to indicate has been processed

Code: Select all

select * from OKM_NODE_BASE WHERE NBS_UUID = 'DOC UUID'

The field you must update is named NBS_TEXT_EXTRACTED with value to 'T'

Username

jllort

Rank

Moderator

Posts

12128

Joined

Fri Dec 21, 2007 11:23 am

Location

Sineu - ( Illes Balears ) - Spain

Contact

Re: Email - Cron task 'Text Extractor Worker' executed - Error

#39726 by gwaitsi
Tue Jun 02, 2015 11:02 pm

i've changed to tesseract as you have suggested, and i note that the output files are now named "okm1234.txt.txt"
Under cuneiform they were named "okm1234.txt" and i think this is a problem, because the tmp files are not being deleted as they were under cuneiform.

looking at some of the text file under both tesseract and cuneiform, i see no text of any value.

Is it possible to see what text has been associated to a document?

Username

gwaitsi

Rank

Senior Boarder

Posts

54

Joined

Wed Sep 03, 2014 1:00 pm

Re: Email - Cron task 'Text Extractor Worker' executed - Error

#39727 by gwaitsi
Wed Jun 03, 2015 4:13 am

I added com.openkm.extractor.Tesseract3TextExtractor and removed the cuneiform one, and removed the "-l eng" and the files now have no extension. is that correct?

under tesseract it seems to process much slower than cuneiform but i don't see with either, any quality in the character recognition.

Username

gwaitsi

Rank

Senior Boarder

Posts

54

Joined

Wed Sep 03, 2014 1:00 pm

Re: Email - Cron task 'Text Extractor Worker' executed - Err

#45048 by ulinuha
Mon Dec 11, 2017 9:32 am

jllort wrote: ↑Sun Jan 11, 2015 12:16 pm It means few heap memory to execute the process. Go to $TOMCAT_HOME/bin/setenv.sh ( or setenv.bat ) and increase the maxPermSize value ( for example 512MB ). Then restart the openkm. Thas has been caused because openkm execute a thread for indexing a document, but the process needed more memory than max allowed, that caused a heap error ( could be a huge pdf or similar )

I've tried this and didn't work.
I solved by this.

Try to update the service paramater
1. First stop the OpenKM service
2. Go to $TOMCAT_HOME/bin/openkmw.exe
Click "Java" tab, in Java Options: insert:
-Xmx2048m (or any number refer to your preference/ hardware)
-Xms1024m (or any number refer to your preference/ hardware)

Capture.PNG (10.59 KiB) Viewed 6328 times

3. Start OpenKM service

Username

ulinuha

Rank

Fresh Boarder

Posts

7

Joined

Mon Nov 13, 2017 9:08 am

Re: Email - Cron task 'Text Extractor Worker' executed - Error

#45066 by jllort
Wed Dec 13, 2017 10:43 pm

Xms and Xmx must not be added here. Really the two fields "initial memory" etc.. .are for it.

What OpenKM version are you using ? How much memory do you have ? Are you in a 64 bits scenario ?

With JDK 1.8 the heap is dynamically assigned by JVM without any configuration need, if you are in JDK 1.7 take a look on how to register, take some time reading this section https://docs.openkm.com/kcenter/view/ok ... asaservice

Username

jllort

Rank

Moderator

Posts

12128

Joined

Fri Dec 21, 2007 11:23 am

Location

Sineu - ( Illes Balears ) - Spain

Contact

Re: Email - Cron task 'Text Extractor Worker' executed - Error

#45086 by ulinuha
Sat Dec 16, 2017 12:21 am

I'm using openkm 6.3.4. Physical memory is 7gb in windows 10 64bit. Jdk 1.8. I don't understand why setting xms and xmx in setenv.bat didn't make any difference in memory problem (still openkm is quite slow, previewing doc keep failing, text extractor queue never finished, etc), while setting them in openkmw.exe worked. I'm not advanced in Java. Please tell me what's the impact of setting xms and xmx in openkmw may result? Thanks.

Username

ulinuha

Rank

Fresh Boarder

Posts

7

Joined

Mon Nov 13, 2017 9:08 am

Re: Email - Cron task 'Text Extractor Worker' executed - Error

#45092 by jllort
Sat Dec 16, 2017 5:02 pm

When tomcat is started as service, the service.bat changes does not take any kind of effect on JVM parameters ( service.bat file is not used when starting service ). About the list of issues - preview , indexing - do you have I suggest you add a post for each one ( better if the discussion on each post is focused on a single issue ). About perfomance ... Windows is not the best scenario, with same hardware you will get always best results with linux rather windows, basically because Linux I/O is quite better than windows. You will observe it specially when openkm is starting on windows what for uncompressing war file can takes 2 or 3 times more than linux, but when tomcat has been started the perfomance difference are not as visible as in the starting process.

If you have in the server other applications running, specially antivirus or similar hardware I suggest disable them, at least for openkm folders. And obviously share a server with other applications is not the best scenario for any application because all of them are on competition for resources and ones might be affected by others ... etc..

Username

jllort

Rank

Moderator

Posts

12128

Joined

Fri Dec 21, 2007 11:23 am

Location

Sineu - ( Illes Balears ) - Spain

Contact

Reply

Page 1 of 1
15 posts