• Error doing text.extraction

  • We tried to make OpenKM as intuitive as possible, but an advice is always welcome.
We tried to make OpenKM as intuitive as possible, but an advice is always welcome.
Forum rules: Please, before asking something see the documentation wiki or use the search feature of the forum. And remember we don't have a crystal ball or mental readers, so if you post about an issue tell us which OpenKM are you using and also the browser and operating system version. For more info read How to Report Bugs Effectively.
 #53049  by Greind
I got this error when running text extraction manually or via crontab.
Code: Select all
2021-11-24 07:36:01,750 [Thread-17] [] INFO  c.o.extractor.TextExtractorWorker - processSerial.Working on {docUuid=1726f02e-4750-4c1f-ac26-52f6d00cc0e3, docPath=/okm:trash/system/00036243 - 20210330 (10).pdf, docVerUuid=a520d0d8-3b31-4402-a710-f51029e07c4d, date=Sat Jun 19 04:32:20 MYT 2021}
2021-11-24 07:36:01,751 [Thread-17] [] WARN  c.o.extractor.TextExtractorWorker - /home/ksklsu/tomcat-7.0.61/repository/datastore/a5/20/d0/d8/a520d0d8-3b31-4402-a710-f51029e07c4d (No such file or directory)
java.io.FileNotFoundException: /home/ksklsu/tomcat-7.0.61/repository/datastore/a5/20/d0/d8/a520d0d8-3b31-4402-a710-f51029e07c4d (No such file or directory)
        at java.io.FileInputStream.open0(Native Method) ~[na:1.8.0_131]
        at java.io.FileInputStream.open(FileInputStream.java:195) ~[na:1.8.0_131]
        at java.io.FileInputStream.<init>(FileInputStream.java:138) ~[na:1.8.0_131]
        at com.openkm.module.db.stuff.FsDataStore.read(FsDataStore.java:65) ~[FsDataStore.class:na]
        at com.openkm.dao.NodeDocumentDAO.textExtractorHelper(NodeDocumentDAO.java:1499) ~[NodeDocumentDAO.class:na]
        at com.openkm.extractor.TextExtractorWorker.processSerial(TextExtractorWorker.java:161) [TextExtractorWorker.class:na]
        at com.openkm.extractor.TextExtractorWorker.processQueue(TextExtractorWorker.java:146) [TextExtractorWorker.class:na]
        at com.openkm.extractor.TextExtractorWorker.run(TextExtractorWorker.java:97) [TextExtractorWorker.class:na]
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[na:1.8.0_131]
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[na:1.8.0_131]
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.8.0_131]
        at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_131]
        at bsh.Reflect.invokeMethod(Reflect.java:166) [beanshell2-2.1.8.jar:2.1.8 2014-02-20 03:56:17]
        at bsh.Reflect.invokeObjectMethod(Reflect.java:99) [beanshell2-2.1.8.jar:2.1.8 2014-02-20 03:56:17]
        at bsh.BSHPrimarySuffix.doName(BSHPrimarySuffix.java:176) [beanshell2-2.1.8.jar:2.1.8 2014-02-20 03:56:17]
        at bsh.BSHPrimarySuffix.doSuffix(BSHPrimarySuffix.java:120) [beanshell2-2.1.8.jar:2.1.8 2014-02-20 03:56:17]
        at bsh.BSHPrimaryExpression.eval(BSHPrimaryExpression.java:80) [beanshell2-2.1.8.jar:2.1.8 2014-02-20 03:56:17]
        at bsh.BSHPrimaryExpression.eval(BSHPrimaryExpression.java:47) [beanshell2-2.1.8.jar:2.1.8 2014-02-20 03:56:17]
        at bsh.Interpreter.eval(Interpreter.java:664) [beanshell2-2.1.8.jar:2.1.8 2014-02-20 03:56:17]
        at bsh.Interpreter.eval(Interpreter.java:758) [beanshell2-2.1.8.jar:2.1.8 2014-02-20 03:56:17]
        at bsh.Interpreter.eval(Interpreter.java:747) [beanshell2-2.1.8.jar:2.1.8 2014-02-20 03:56:17]
        at com.openkm.util.ExecutionUtils.runScript(ExecutionUtils.java:106) [ExecutionUtils.class:na]
        at com.openkm.core.Cron$RunnerBsh.run(Cron.java:99) [Cron$RunnerBsh.class:na]
        at java.lang.Thread.run(Thread.java:748) [na:1.8.0_131]

Need your help. Thank you.
 #53059  by jllort
What is your OpenKM version?
The error happens because trying to process a node that seems does not exist in the file system. In an old openkm version was a problem when purge trashes with documents with several versions, maybe affected by it, that's why asking about what is your current OpenKM version?
I suggest upgrading to the latest if you are not there and then purging the trashes ( Administration > Tools > Purge trashes ). That should remove all data from there and the error will disappear ( and also clean all trashes ).
 #53063  by Greind
Hi Jilort,

Appreciate your response. We're currently using Version 6.3.7-DEV (build: 4a2b821). Apologize for not informing it in the earlier post.

From openkm.log file. I can see there's a lot of missing files error that I need to manually create empty files using the command:

echo $null >> "path"

The missing files error also occurred on the files that are in the taxonomy. Not only in the trash.

We currently having more than 200K documents. No clue which files have this error. Appreciate your help.
 #53083  by jllort
Two possibilities:
1- you have the files in the server but wrong privileges, and the user who's executing openkm service do not have grants to access these files ( that's why raise error file does not exist, although exist -> because not enough grant do it invisible for the user ) -> this will be the best scenario
2- bad scenario -> files do not exist -> I suggest going to administration > tools > repository checker and executing repository checker with fast option -> that will provide a full list of missing files

In case of missing files must create empty files in the file system to overpass this error.
 #53087  by Greind
Unfortunately it's the 2nd scenario.

May I know, the repository checker - fast will check all individual document or only document that has been successfully indexed only?

Fortunately, empty file created for all missing file, and the text extraction now running smoothly as of now. THANK YOU! :D

Can this repository checker being used also for checking missing files within the trash folder? (okm:trash/)
 #53101  by jllort
* I suggest upgrading to latest 6.2.11 version and then try to clean trash ( at least upgrading to latest versin will be sure the bug in the trash is solved and you'll not get the current behaviour ) https://docs.openkm.com/kcenter/view/ok ... guide.html

* About repository checker -> all nodes in the database are checked ( indexed or not indexed )

About Us

OpenKM is part of the management software. A management software is a program that facilitates the accomplishment of administrative tasks. OpenKM is a document management system that allows you to manage business content and workflow in a more efficient way. Document managers guarantee data protection by establishing information security for business content.