• Openkm ver 6.3.12 cannot extract text content from uploaded file

  • We tried to make OpenKM as intuitive as possible, but an advice is always welcome.
We tried to make OpenKM as intuitive as possible, but an advice is always welcome.
Forum rules: Please, before asking something see the documentation wiki or use the search feature of the forum. And remember we don't have a crystal ball or mental readers, so if you post about an issue tell us which OpenKM are you using and also the browser and operating system version. For more info read How to Report Bugs Effectively.
 #54485  by ikun1289
 
I got a problem in previous thread which is about i cannot search by file content in ver 6.3.12. After check again in openkm database i found out that a lot of file doesn't have their text content extract and saved to OKM_NODE_DOCUMENT, that why i cannot search file content
I tried the same files and upload those to version 6.3.11 and after a while i can search those file content and their content is saved to OKM_NODE_DOCUMENT.
Attachments
query result in version 6.3.12
query result in version 6.3.12
screenshot_1698821441.png (21.6 KiB) Viewed 8991 times
 #54486  by ikun1289
 
When i view log and found that this error when run crontab TextExtractorWorker
Code: Select all
2023-11-01 14:10:00,031 [Thread-33390] [] INFO c.o.extractor.TextExtractorWorker - processSerial.Working on {docUuid=226d8b88-fa96-4204-8868-c13508df47c8, docPath=/okm:root/qas/company/52/docs/Test/NGSC_DanhsachRuiro_V1.0 (2).xlsx, docVerUuid=0f1b6d69-f6ca-427b-b4ab-5eb4dcd24b67, date=Wed Nov 01 14:06:28 ICT 2023}
447	2023-11-01 14:10:00,034 [Thread-33390] [] INFO com.openkm.util.DocConverter - Cmd: /usr/bin/soffice --headless -env:UserInstallation=file:///opt/tomcat/temp/okm5474015497586253312 --convert-to txt --outdir /opt/tomcat/temp/okm5474015497586253312 /opt/tomcat/temp/okm1708528915643306341.doc
448	2023-11-01 14:10:03,399 [Thread-33390] [] WARN com.openkm.extractor.OOTextExtractor - Failed to extract text
449	com.openkm.core.ConversionException: IO exception executing command: /usr/bin/soffice --headless -env:UserInstallation=file:///opt/tomcat/temp/okm5474015497586253312 --convert-to txt --outdir /opt/tomcat/temp/okm5474015497586253312 /opt/tomcat/temp/okm1708528915643306341.doc
450	at com.openkm.util.DocConverter.convert(DocConverter.java:843) ~[classes/:6.3.12]
451	at com.openkm.extractor.OOTextExtractor.extractText(OOTextExtractor.java:93) ~[classes/:6.3.12]
452	at com.openkm.extractor.RegisteredExtractors.getText(RegisteredExtractors.java:139) [classes/:6.3.12]
453	at com.openkm.extractor.RegisteredExtractors.getText(RegisteredExtractors.java:97) [classes/:6.3.12]
454	at com.openkm.dao.NodeDocumentDAO.textExtractorHelper(NodeDocumentDAO.java:1443) [classes/:6.3.12]
455	at com.openkm.extractor.TextExtractorWorker.processSerial(TextExtractorWorker.java:161) [classes/:6.3.12]
456	at com.openkm.extractor.TextExtractorWorker.processQueue(TextExtractorWorker.java:146) [classes/:6.3.12]
457	at com.openkm.extractor.TextExtractorWorker.run(TextExtractorWorker.java:97) [classes/:6.3.12]
458	at sun.reflect.GeneratedMethodAccessor159.invoke(Unknown Source) ~[na:na]
459	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.8.0_342]
460	at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_342]
461	at bsh.Reflect.invokeMethod(Reflect.java:166) [beanshell2-2.1.8.jar:2.1.8 2014-02-20 03:56:17]
462	at bsh.Reflect.invokeObjectMethod(Reflect.java:99) [beanshell2-2.1.8.jar:2.1.8 2014-02-20 03:56:17]
463	at bsh.BSHPrimarySuffix.doName(BSHPrimarySuffix.java:176) [beanshell2-2.1.8.jar:2.1.8 2014-02-20 03:56:17]
464	at bsh.BSHPrimarySuffix.doSuffix(BSHPrimarySuffix.java:120) [beanshell2-2.1.8.jar:2.1.8 2014-02-20 03:56:17]
465	at bsh.BSHPrimaryExpression.eval(BSHPrimaryExpression.java:80) [beanshell2-2.1.8.jar:2.1.8 2014-02-20 03:56:17]
466	at bsh.BSHPrimaryExpression.eval(BSHPrimaryExpression.java:47) [beanshell2-2.1.8.jar:2.1.8 2014-02-20 03:56:17]
467	at bsh.Interpreter.eval(Interpreter.java:664) [beanshell2-2.1.8.jar:2.1.8 2014-02-20 03:56:17]
468	at bsh.Interpreter.eval(Interpreter.java:758) [beanshell2-2.1.8.jar:2.1.8 2014-02-20 03:56:17]
469	at bsh.Interpreter.eval(Interpreter.java:747) [beanshell2-2.1.8.jar:2.1.8 2014-02-20 03:56:17]
470	at com.openkm.util.ExecutionUtils.runScript(ExecutionUtils.java:106) [classes/:6.3.12]
471	at com.openkm.core.Cron$RunnerBsh.run(Cron.java:99) [classes/:6.3.12]
472	at java.lang.Thread.run(Thread.java:750) [na:1.8.0_342]
473	Caused by: java.io.FileNotFoundException: /opt/tomcat/temp/okm5474015497586253312/okm1708528915643306341.pdf (No such file or directory)
474	at java.io.FileInputStream.open0(Native Method) ~[na:1.8.0_342]
475	at java.io.FileInputStream.open(FileInputStream.java:195) ~[na:1.8.0_342]
476	at java.io.FileInputStream.(FileInputStream.java:138) ~[na:1.8.0_342]
477	at com.google.common.io.Files$FileByteSource.openStream(Files.java:120) ~[guava-20.0.jar:na]
478	at com.google.common.io.Files$FileByteSource.openStream(Files.java:110) ~[guava-20.0.jar:na]
479	at com.google.common.io.ByteSource.copyTo(ByteSource.java:267) ~[guava-20.0.jar:na]
480	at com.google.common.io.Files.copy(Files.java:304) ~[guava-20.0.jar:na]
481	at com.google.common.io.Files.move(Files.java:480) ~[guava-20.0.jar:na]
482	at com.openkm.util.DocConverter.convert(DocConverter.java:837) ~[classes/:6.3.12]
483	... 22 common frames omitted
484	2023-11-01 14:10:03,399 [Thread-33390] [] WARN com.openkm.dao.NodeDocumentDAO - There was a problem extracting text from '/okm:root/qas/company/52/docs/Test/NGSC_DanhsachRuiro_V1.0 (2).xlsx': Too few text extracted
file NGSC_DanhsachRuiro_V1.0 (2).xlsx have a lot of sheet and text.
Attachments
screenshot_1698823080.png
screenshot_1698823080.png (174.97 KiB) Viewed 8986 times
 #54603  by jllort
 
I think the problem is something with the LibreOffice , try to execute the command that raises the error in the log directly from the terminal:

/usr/bin/soffice --headless -env:UserInstallation=file:///opt/tomcat/temp/okm5474015497586253312 --convert-to txt --outdir /opt/tomcat/temp/okm5474015497586253312 /opt/tomcat/temp/okm1708528915643306341.doc


change the path by yours, and then check if there's some error in the terminal or not -> usually problems comes because X server -> sometimes adding "export DISPLAY=:1" at the end of the setenv.sh solves it, others it is required to purge and reinstall the LibreOffice application -> executing the application from the terminal will give you some clue about what happens

About Us

OpenKM is part of the management software. A management software is a program that facilitates the accomplishment of administrative tasks. OpenKM is a document management system that allows you to manage business content and workflow in a more efficient way. Document managers guarantee data protection by establishing information security for business content.