Page 1 of 1

Re-indexing not starting

PostPosted:Tue Sep 17, 2019 4:33 pm
by openkm_user
Hello,

We have used re-indexing feature for a while now. Recently we upgraded to OpenKM version 6.3.8 and tried to do lucene re-indexing, but every time it starts and doesn't go through,
Untitled.png
Untitled.png (40.97 KiB) Viewed 2180 times
Only the above is logged in re-index log file and then no progress.

Re: Re-indexing not starting

PostPosted:Thu Sep 19, 2019 10:25 am
by openkm_user
Do we have to enable tracing in logs? Are we missing something? Please help!

Re: Re-indexing not starting

PostPosted:Sat Sep 21, 2019 4:03 pm
by jllort
If you want to take a look in the source code ( the method luceneIndexesFlushToIndexes is executed from class RebuildIndexesServlet ):
https://github.com/openkm/document-mana ... .java#L175

In this branch, I have added extra detail in the process of rebuild indexes ( compile and update with the code in this branch )
https://github.com/openkm/document-mana ... /issue/200

Also in the setenv.sh or setenv.bat check if you have enabled the headdump enabled ( in case critical error the JVM creates a dump )
Code: Select all
JAVA_OPTS="$JAVA_OPTS -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=$CATALINA_HOME"
You can check here for JVM parameters:
https://www.oracle.com/technetwork/java ... 40102.html

The glowroot tools might help you a lot cotrolling the JVM heap in real time:
https://glowroot.org/
Require some extra parameter:
Code: Select all
JAVA_OPTS="$JAVA_OPTS -javaagent:$CATALINA_HOME/bin/glowroot.jar"
if you are in Linux can you share your current setenv.sh ( if you are in Windows Service -> make some snapshot of the openkmw.exe tool to watch your current memory configuration).

Are you using JDK 1.8 ?

Re: Re-indexing not starting

PostPosted:Thu Sep 26, 2019 2:34 pm
by openkm_user
Deleting the folders inside index did start re-indexing, but it stopped after a while throwing the following exception,
Code: Select all
Exception in thread "Hibernate Search: indexwriter-6" Exception in thread "Hibernate Search: indexwriter-1" java.lang.IllegalArgumentException
	at java.nio.Buffer.position(Unknown Source)
	at java.nio.HeapByteBuffer.put(Unknown Source)
	at org.apache.tomcat.util.net.SocketWrapperBase.transfer(SocketWrapperBase.java:1044)
	at org.apache.tomcat.util.net.SocketWrapperBase.writeBlocking(SocketWrapperBase.java:448)
	at org.apache.tomcat.util.net.SocketWrapperBase.write(SocketWrapperBase.java:388)
	at org.apache.coyote.http11.Http11OutputBuffer$SocketOutputBuffer.doWrite(Http11OutputBuffer.java:644)
	at org.apache.coyote.http11.filters.ChunkedOutputFilter.doWrite(ChunkedOutputFilter.java:123)
	at org.apache.coyote.http11.Http11OutputBuffer.doWrite(Http11OutputBuffer.java:235)
	at org.apache.coyote.Response.doWrite(Response.java:541)
	at org.apache.catalina.connector.OutputBuffer.realWriteBytes(OutputBuffer.java:351)
	at org.apache.catalina.connector.OutputBuffer.flushByteBuffer(OutputBuffer.java:815)
	at org.apache.catalina.connector.OutputBuffer.realWriteChars(OutputBuffer.java:456)
	at org.apache.catalina.connector.OutputBuffer.flushCharBuffer(OutputBuffer.java:820)
	at org.apache.catalina.connector.OutputBuffer.doFlush(OutputBuffer.java:307)
	at org.apache.catalina.connector.OutputBuffer.flush(OutputBuffer.java:284)
	at org.apache.catalina.connector.CoyoteWriter.flush(CoyoteWriter.java:94)
	at org.springframework.security.web.context.OnCommittedResponseWrapper$SaveContextPrintWriter.flush(OnCommittedResponseWrapper.java:231)
	at com.openkm.servlet.admin.RebuildIndexesServlet$ProgressMonitor.documentsAdded(RebuildIndexesServlet.java:463)
	at org.hibernate.search.backend.impl.lucene.works.AddWorkDelegate.logWorkDone(AddWorkDelegate.java:117)
	at org.hibernate.search.backend.impl.batchlucene.DirectoryProviderWorkspace.doWorkInSync(DirectoryProviderWorkspace.java:97)
	at org.hibernate.search.backend.impl.batchlucene.DirectoryProviderWorkspace$AsyncIndexRunnable.run(DirectoryProviderWorkspace.java:144)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
	at java.lang.Thread.run(Unknown Source)
java.nio.BufferOverflowException
	at java.nio.CharBuffer.put(Unknown Source)
	at org.apache.catalina.connector.OutputBuffer.transfer(OutputBuffer.java:860)
	at org.apache.catalina.connector.OutputBuffer.write(OutputBuffer.java:521)
	at org.apache.catalina.connector.CoyoteWriter.write(CoyoteWriter.java:170)
	at org.apache.catalina.connector.CoyoteWriter.write(CoyoteWriter.java:180)
	at org.apache.catalina.connector.CoyoteWriter.print(CoyoteWriter.java:238)
	at org.springframework.security.web.context.OnCommittedResponseWrapper$SaveContextPrintWriter.print(OnCommittedResponseWrapper.java:317)
	at com.openkm.servlet.admin.RebuildIndexesServlet$ProgressMonitor.documentsAdded(RebuildIndexesServlet.java:457)
	at org.hibernate.search.backend.impl.lucene.works.AddWorkDelegate.logWorkDone(AddWorkDelegate.java:117)
	at org.hibernate.search.backend.impl.batchlucene.DirectoryProviderWorkspace.doWorkInSync(DirectoryProviderWorkspace.java:97)
	at org.hibernate.search.backend.impl.batchlucene.DirectoryProviderWorkspace$AsyncIndexRunnable.run(DirectoryProviderWorkspace.java:144)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
	at java.lang.Thread.run(Unknown Source)
What could be going wrong?

Re: Re-indexing not starting

PostPosted:Fri Sep 27, 2019 12:01 pm
by jllort
Did you download and compiled the branch I indicated?
The error java.nio.BufferOverflowException seems something not much good.

Try the next steps:
1- Execute the next query ( will clean previous text extracted from the database )
Code: Select all
UPDATE OKM_NODE_DOCUMENT SET NDC_TEXT=''
2- Try now to rebuild the indexes
3- Set all the documents in the queue to extract content again ( I suspect you have a big document what is killing the rebuild index process ). After the rebuild indexes be completed, execute:
Code: Select all
UPDATE OKM_NODE_DOCUMENT SET NDC_TEXT_EXTRACTED='F'
The queue of the documents for being indexed by content will be full, and they will start the analysis process from the beginning. Until the process be finished ( some hours ... depending on the number and type of your documents ) you will not be able to find documents by content.