Page 1 of 1

pdftextextraction error

PostPosted:Wed Oct 15, 2014 7:25 am
by gwaitsi
Getting the below error, everytime i upload a new document. actually, have been getting it since i installed, but can successfully view the pdf in the viewer, so not sure of the impact of this error.
Code: Select all
2014-10-15 09:20:00,815 [Thread-18] WARN  org.apache.pdfbox.pdfparser.XrefTrailerResolver- Did not found XRef object at specified startxref position 583141
2014-10-15 09:20:02,401 [Thread-18] INFO  com.openkm.extractor.TextExtractorWorker- processSerial.Working on {docUuid=5d0b7703-a6a7-4b0a-aa16-8ff5069cc050, docPath=/okm:root/document.pdf, docVerUuid=d81b28f1-c6d3-4c95-b049-407400a02fd2, date=Wed Oct 15 09:16:26 CEST 2014}
2014-10-15 09:20:02,417 [Thread-18] WARN  com.openkm.extractor.PdfTextExtractor- Failed to extract PDF text content
java.io.IOException: Error: Expected an integer type, actual='>>'
	at org.apache.pdfbox.pdfparser.BaseParser.readInt(BaseParser.java:1384)
	at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:517)
	at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:184)
	at com.openkm.extractor.PdfTextExtractor.extractText(PdfTextExtractor.java:64)
	at com.openkm.extractor.RegisteredExtractors.getText(RegisteredExtractors.java:214)
	at com.openkm.extractor.RegisteredExtractors.getText(RegisteredExtractors.java:173)
	at com.openkm.dao.NodeDocumentDAO.textExtractorHelper(NodeDocumentDAO.java:1343)
	at com.openkm.extractor.TextExtractorWorker.processSerial(TextExtractorWorker.java:164)
	at com.openkm.extractor.TextExtractorWorker.processQueue(TextExtractorWorker.java:149)
	at com.openkm.extractor.TextExtractorWorker.run(TextExtractorWorker.java:100)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at bsh.Reflect.invokeMethod(Reflect.java:134)
	at bsh.Reflect.invokeObjectMethod(Reflect.java:80)
	at bsh.BSHPrimarySuffix.doName(BSHPrimarySuffix.java:176)
	at bsh.BSHPrimarySuffix.doSuffix(BSHPrimarySuffix.java:120)
	at bsh.BSHPrimaryExpression.eval(BSHPrimaryExpression.java:80)
	at bsh.BSHPrimaryExpression.eval(BSHPrimaryExpression.java:47)
	at bsh.Interpreter.eval(Interpreter.java:645)
	at bsh.Interpreter.eval(Interpreter.java:739)
	at bsh.Interpreter.eval(Interpreter.java:728)
	at com.openkm.util.ExecutionUtils.runScript(ExecutionUtils.java:112)
	at com.openkm.core.Cron$RunnerBsh.run(Cron.java:103)
	at java.lang.Thread.run(Thread.java:745)
2014-10-15 09:20:02,426 [Thread-18] WARN  com.openkm.dao.NodeDocumentDAO- There was a problem extracting text from '/okm:root/document.pdf': Too few text extracted

Re: pdftextextraction error

PostPosted:Thu Oct 16, 2014 8:15 am
by jllort
How did you create this PDF document ?
Could you give us some of these documents ( post here, for example )