Page 1 of 1

Openkm ver 6.3.12 cannot extract text content from uploaded file

PostPosted:Wed Nov 01, 2023 6:52 am
by ikun1289
I got a problem in previous thread which is about i cannot search by file content in ver 6.3.12. After check again in openkm database i found out that a lot of file doesn't have their text content extract and saved to OKM_NODE_DOCUMENT, that why i cannot search file content
I tried the same files and upload those to version 6.3.11 and after a while i can search those file content and their content is saved to OKM_NODE_DOCUMENT.

Re: Openkm ver 6.3.12 cannot extract text content from uploaded file

PostPosted:Wed Nov 01, 2023 7:21 am
by ikun1289
When i view log and found that this error when run crontab TextExtractorWorker
Code: Select all
2023-11-01 14:10:00,031 [Thread-33390] [] INFO c.o.extractor.TextExtractorWorker - processSerial.Working on {docUuid=226d8b88-fa96-4204-8868-c13508df47c8, docPath=/okm:root/qas/company/52/docs/Test/NGSC_DanhsachRuiro_V1.0 (2).xlsx, docVerUuid=0f1b6d69-f6ca-427b-b4ab-5eb4dcd24b67, date=Wed Nov 01 14:06:28 ICT 2023}
447	2023-11-01 14:10:00,034 [Thread-33390] [] INFO com.openkm.util.DocConverter - Cmd: /usr/bin/soffice --headless -env:UserInstallation=file:///opt/tomcat/temp/okm5474015497586253312 --convert-to txt --outdir /opt/tomcat/temp/okm5474015497586253312 /opt/tomcat/temp/okm1708528915643306341.doc
448	2023-11-01 14:10:03,399 [Thread-33390] [] WARN com.openkm.extractor.OOTextExtractor - Failed to extract text
449	com.openkm.core.ConversionException: IO exception executing command: /usr/bin/soffice --headless -env:UserInstallation=file:///opt/tomcat/temp/okm5474015497586253312 --convert-to txt --outdir /opt/tomcat/temp/okm5474015497586253312 /opt/tomcat/temp/okm1708528915643306341.doc
450	at com.openkm.util.DocConverter.convert(DocConverter.java:843) ~[classes/:6.3.12]
451	at com.openkm.extractor.OOTextExtractor.extractText(OOTextExtractor.java:93) ~[classes/:6.3.12]
452	at com.openkm.extractor.RegisteredExtractors.getText(RegisteredExtractors.java:139) [classes/:6.3.12]
453	at com.openkm.extractor.RegisteredExtractors.getText(RegisteredExtractors.java:97) [classes/:6.3.12]
454	at com.openkm.dao.NodeDocumentDAO.textExtractorHelper(NodeDocumentDAO.java:1443) [classes/:6.3.12]
455	at com.openkm.extractor.TextExtractorWorker.processSerial(TextExtractorWorker.java:161) [classes/:6.3.12]
456	at com.openkm.extractor.TextExtractorWorker.processQueue(TextExtractorWorker.java:146) [classes/:6.3.12]
457	at com.openkm.extractor.TextExtractorWorker.run(TextExtractorWorker.java:97) [classes/:6.3.12]
458	at sun.reflect.GeneratedMethodAccessor159.invoke(Unknown Source) ~[na:na]
459	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.8.0_342]
460	at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_342]
461	at bsh.Reflect.invokeMethod(Reflect.java:166) [beanshell2-2.1.8.jar:2.1.8 2014-02-20 03:56:17]
462	at bsh.Reflect.invokeObjectMethod(Reflect.java:99) [beanshell2-2.1.8.jar:2.1.8 2014-02-20 03:56:17]
463	at bsh.BSHPrimarySuffix.doName(BSHPrimarySuffix.java:176) [beanshell2-2.1.8.jar:2.1.8 2014-02-20 03:56:17]
464	at bsh.BSHPrimarySuffix.doSuffix(BSHPrimarySuffix.java:120) [beanshell2-2.1.8.jar:2.1.8 2014-02-20 03:56:17]
465	at bsh.BSHPrimaryExpression.eval(BSHPrimaryExpression.java:80) [beanshell2-2.1.8.jar:2.1.8 2014-02-20 03:56:17]
466	at bsh.BSHPrimaryExpression.eval(BSHPrimaryExpression.java:47) [beanshell2-2.1.8.jar:2.1.8 2014-02-20 03:56:17]
467	at bsh.Interpreter.eval(Interpreter.java:664) [beanshell2-2.1.8.jar:2.1.8 2014-02-20 03:56:17]
468	at bsh.Interpreter.eval(Interpreter.java:758) [beanshell2-2.1.8.jar:2.1.8 2014-02-20 03:56:17]
469	at bsh.Interpreter.eval(Interpreter.java:747) [beanshell2-2.1.8.jar:2.1.8 2014-02-20 03:56:17]
470	at com.openkm.util.ExecutionUtils.runScript(ExecutionUtils.java:106) [classes/:6.3.12]
471	at com.openkm.core.Cron$RunnerBsh.run(Cron.java:99) [classes/:6.3.12]
472	at java.lang.Thread.run(Thread.java:750) [na:1.8.0_342]
473	Caused by: java.io.FileNotFoundException: /opt/tomcat/temp/okm5474015497586253312/okm1708528915643306341.pdf (No such file or directory)
474	at java.io.FileInputStream.open0(Native Method) ~[na:1.8.0_342]
475	at java.io.FileInputStream.open(FileInputStream.java:195) ~[na:1.8.0_342]
476	at java.io.FileInputStream.(FileInputStream.java:138) ~[na:1.8.0_342]
477	at com.google.common.io.Files$FileByteSource.openStream(Files.java:120) ~[guava-20.0.jar:na]
478	at com.google.common.io.Files$FileByteSource.openStream(Files.java:110) ~[guava-20.0.jar:na]
479	at com.google.common.io.ByteSource.copyTo(ByteSource.java:267) ~[guava-20.0.jar:na]
480	at com.google.common.io.Files.copy(Files.java:304) ~[guava-20.0.jar:na]
481	at com.google.common.io.Files.move(Files.java:480) ~[guava-20.0.jar:na]
482	at com.openkm.util.DocConverter.convert(DocConverter.java:837) ~[classes/:6.3.12]
483	... 22 common frames omitted
484	2023-11-01 14:10:03,399 [Thread-33390] [] WARN com.openkm.dao.NodeDocumentDAO - There was a problem extracting text from '/okm:root/qas/company/52/docs/Test/NGSC_DanhsachRuiro_V1.0 (2).xlsx': Too few text extracted
file NGSC_DanhsachRuiro_V1.0 (2).xlsx have a lot of sheet and text.

Re: Openkm ver 6.3.12 cannot extract text content from uploaded file

PostPosted:Sat Nov 18, 2023 8:15 am
by jllort
First check if you have libreoffice installed and the file exists in this path /usr/bin/soffice

Re: Openkm ver 6.3.12 cannot extract text content from uploaded file

PostPosted:Tue Jan 30, 2024 8:04 am
by jllort
I think the problem is something with the LibreOffice , try to execute the command that raises the error in the log directly from the terminal:

/usr/bin/soffice --headless -env:UserInstallation=file:///opt/tomcat/temp/okm5474015497586253312 --convert-to txt --outdir /opt/tomcat/temp/okm5474015497586253312 /opt/tomcat/temp/okm1708528915643306341.doc


change the path by yours, and then check if there's some error in the terminal or not -> usually problems comes because X server -> sometimes adding "export DISPLAY=:1" at the end of the setenv.sh solves it, others it is required to purge and reinstall the LibreOffice application -> executing the application from the terminal will give you some clue about what happens