pdftextextraction error
PostPosted:Wed Oct 15, 2014 7:25 am
Getting the below error, everytime i upload a new document. actually, have been getting it since i installed, but can successfully view the pdf in the viewer, so not sure of the impact of this error.
Code: Select all
2014-10-15 09:20:00,815 [Thread-18] WARN org.apache.pdfbox.pdfparser.XrefTrailerResolver- Did not found XRef object at specified startxref position 583141
2014-10-15 09:20:02,401 [Thread-18] INFO com.openkm.extractor.TextExtractorWorker- processSerial.Working on {docUuid=5d0b7703-a6a7-4b0a-aa16-8ff5069cc050, docPath=/okm:root/document.pdf, docVerUuid=d81b28f1-c6d3-4c95-b049-407400a02fd2, date=Wed Oct 15 09:16:26 CEST 2014}
2014-10-15 09:20:02,417 [Thread-18] WARN com.openkm.extractor.PdfTextExtractor- Failed to extract PDF text content
java.io.IOException: Error: Expected an integer type, actual='>>'
at org.apache.pdfbox.pdfparser.BaseParser.readInt(BaseParser.java:1384)
at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:517)
at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:184)
at com.openkm.extractor.PdfTextExtractor.extractText(PdfTextExtractor.java:64)
at com.openkm.extractor.RegisteredExtractors.getText(RegisteredExtractors.java:214)
at com.openkm.extractor.RegisteredExtractors.getText(RegisteredExtractors.java:173)
at com.openkm.dao.NodeDocumentDAO.textExtractorHelper(NodeDocumentDAO.java:1343)
at com.openkm.extractor.TextExtractorWorker.processSerial(TextExtractorWorker.java:164)
at com.openkm.extractor.TextExtractorWorker.processQueue(TextExtractorWorker.java:149)
at com.openkm.extractor.TextExtractorWorker.run(TextExtractorWorker.java:100)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at bsh.Reflect.invokeMethod(Reflect.java:134)
at bsh.Reflect.invokeObjectMethod(Reflect.java:80)
at bsh.BSHPrimarySuffix.doName(BSHPrimarySuffix.java:176)
at bsh.BSHPrimarySuffix.doSuffix(BSHPrimarySuffix.java:120)
at bsh.BSHPrimaryExpression.eval(BSHPrimaryExpression.java:80)
at bsh.BSHPrimaryExpression.eval(BSHPrimaryExpression.java:47)
at bsh.Interpreter.eval(Interpreter.java:645)
at bsh.Interpreter.eval(Interpreter.java:739)
at bsh.Interpreter.eval(Interpreter.java:728)
at com.openkm.util.ExecutionUtils.runScript(ExecutionUtils.java:112)
at com.openkm.core.Cron$RunnerBsh.run(Cron.java:103)
at java.lang.Thread.run(Thread.java:745)
2014-10-15 09:20:02,426 [Thread-18] WARN com.openkm.dao.NodeDocumentDAO- There was a problem extracting text from '/okm:root/document.pdf': Too few text extracted