Page 1 of 1

Error doing text.extraction

PostPosted:Tue Nov 23, 2021 11:41 pm
by Greind
I got this error when running text extraction manually or via crontab.
Code: Select all
2021-11-24 07:36:01,750 [Thread-17] [] INFO  c.o.extractor.TextExtractorWorker - processSerial.Working on {docUuid=1726f02e-4750-4c1f-ac26-52f6d00cc0e3, docPath=/okm:trash/system/00036243 - 20210330 (10).pdf, docVerUuid=a520d0d8-3b31-4402-a710-f51029e07c4d, date=Sat Jun 19 04:32:20 MYT 2021}
2021-11-24 07:36:01,751 [Thread-17] [] WARN  c.o.extractor.TextExtractorWorker - /home/ksklsu/tomcat-7.0.61/repository/datastore/a5/20/d0/d8/a520d0d8-3b31-4402-a710-f51029e07c4d (No such file or directory)
java.io.FileNotFoundException: /home/ksklsu/tomcat-7.0.61/repository/datastore/a5/20/d0/d8/a520d0d8-3b31-4402-a710-f51029e07c4d (No such file or directory)
        at java.io.FileInputStream.open0(Native Method) ~[na:1.8.0_131]
        at java.io.FileInputStream.open(FileInputStream.java:195) ~[na:1.8.0_131]
        at java.io.FileInputStream.<init>(FileInputStream.java:138) ~[na:1.8.0_131]
        at com.openkm.module.db.stuff.FsDataStore.read(FsDataStore.java:65) ~[FsDataStore.class:na]
        at com.openkm.dao.NodeDocumentDAO.textExtractorHelper(NodeDocumentDAO.java:1499) ~[NodeDocumentDAO.class:na]
        at com.openkm.extractor.TextExtractorWorker.processSerial(TextExtractorWorker.java:161) [TextExtractorWorker.class:na]
        at com.openkm.extractor.TextExtractorWorker.processQueue(TextExtractorWorker.java:146) [TextExtractorWorker.class:na]
        at com.openkm.extractor.TextExtractorWorker.run(TextExtractorWorker.java:97) [TextExtractorWorker.class:na]
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[na:1.8.0_131]
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[na:1.8.0_131]
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.8.0_131]
        at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_131]
        at bsh.Reflect.invokeMethod(Reflect.java:166) [beanshell2-2.1.8.jar:2.1.8 2014-02-20 03:56:17]
        at bsh.Reflect.invokeObjectMethod(Reflect.java:99) [beanshell2-2.1.8.jar:2.1.8 2014-02-20 03:56:17]
        at bsh.BSHPrimarySuffix.doName(BSHPrimarySuffix.java:176) [beanshell2-2.1.8.jar:2.1.8 2014-02-20 03:56:17]
        at bsh.BSHPrimarySuffix.doSuffix(BSHPrimarySuffix.java:120) [beanshell2-2.1.8.jar:2.1.8 2014-02-20 03:56:17]
        at bsh.BSHPrimaryExpression.eval(BSHPrimaryExpression.java:80) [beanshell2-2.1.8.jar:2.1.8 2014-02-20 03:56:17]
        at bsh.BSHPrimaryExpression.eval(BSHPrimaryExpression.java:47) [beanshell2-2.1.8.jar:2.1.8 2014-02-20 03:56:17]
        at bsh.Interpreter.eval(Interpreter.java:664) [beanshell2-2.1.8.jar:2.1.8 2014-02-20 03:56:17]
        at bsh.Interpreter.eval(Interpreter.java:758) [beanshell2-2.1.8.jar:2.1.8 2014-02-20 03:56:17]
        at bsh.Interpreter.eval(Interpreter.java:747) [beanshell2-2.1.8.jar:2.1.8 2014-02-20 03:56:17]
        at com.openkm.util.ExecutionUtils.runScript(ExecutionUtils.java:106) [ExecutionUtils.class:na]
        at com.openkm.core.Cron$RunnerBsh.run(Cron.java:99) [Cron$RunnerBsh.class:na]
        at java.lang.Thread.run(Thread.java:748) [na:1.8.0_131]

Need your help. Thank you.

Re: Error doing text.extraction

PostPosted:Sat Nov 27, 2021 12:28 pm
by jllort
What is your OpenKM version?
The error happens because trying to process a node that seems does not exist in the file system. In an old openkm version was a problem when purge trashes with documents with several versions, maybe affected by it, that's why asking about what is your current OpenKM version?
I suggest upgrading to the latest if you are not there and then purging the trashes ( Administration > Tools > Purge trashes ). That should remove all data from there and the error will disappear ( and also clean all trashes ).

Re: Error doing text.extraction

PostPosted:Wed Dec 01, 2021 12:47 am
by Greind
Hi Jilort,

Appreciate your response. We're currently using Version 6.3.7-DEV (build: 4a2b821). Apologize for not informing it in the earlier post.

From openkm.log file. I can see there's a lot of missing files error that I need to manually create empty files using the command:

echo $null >> "path"

The missing files error also occurred on the files that are in the taxonomy. Not only in the trash.

We currently having more than 200K documents. No clue which files have this error. Appreciate your help.

Re: Error doing text.extraction

PostPosted:Sat Dec 04, 2021 9:26 am
by jllort
Two possibilities:
1- you have the files in the server but wrong privileges, and the user who's executing openkm service do not have grants to access these files ( that's why raise error file does not exist, although exist -> because not enough grant do it invisible for the user ) -> this will be the best scenario
2- bad scenario -> files do not exist -> I suggest going to administration > tools > repository checker and executing repository checker with fast option -> that will provide a full list of missing files

In case of missing files must create empty files in the file system to overpass this error.

Re: Error doing text.extraction

PostPosted:Mon Dec 06, 2021 2:39 am
by Greind
Unfortunately it's the 2nd scenario.

May I know, the repository checker - fast will check all individual document or only document that has been successfully indexed only?

Fortunately, empty file created for all missing file, and the text extraction now running smoothly as of now. THANK YOU! :D

Can this repository checker being used also for checking missing files within the trash folder? (okm:trash/)

Re: Error doing text.extraction

PostPosted:Sat Dec 11, 2021 8:28 am
by jllort
* I suggest upgrading to latest 6.2.11 version and then try to clean trash ( at least upgrading to latest versin will be sure the bug in the trash is solved and you'll not get the current behaviour ) https://docs.openkm.com/kcenter/view/ok ... guide.html

* About repository checker -> all nodes in the database are checked ( indexed or not indexed )