• pdftextextraction error

  • Problems with installing OpenKM? No problemo, the solution is closer than you think.
Problems with installing OpenKM? No problemo, the solution is closer than you think.
Forum rules: Please, before asking something see the documentation wiki or use the search feature of the forum. And remember we don't have a crystal ball or mental readers, so if you post about an issue tell us which OpenKM are you using and also the browser and operating system version. For more info read How to Report Bugs Effectively.
 #30226  by gwaitsi
 
Getting the below error, everytime i upload a new document. actually, have been getting it since i installed, but can successfully view the pdf in the viewer, so not sure of the impact of this error.
Code: Select all
2014-10-15 09:20:00,815 [Thread-18] WARN  org.apache.pdfbox.pdfparser.XrefTrailerResolver- Did not found XRef object at specified startxref position 583141
2014-10-15 09:20:02,401 [Thread-18] INFO  com.openkm.extractor.TextExtractorWorker- processSerial.Working on {docUuid=5d0b7703-a6a7-4b0a-aa16-8ff5069cc050, docPath=/okm:root/document.pdf, docVerUuid=d81b28f1-c6d3-4c95-b049-407400a02fd2, date=Wed Oct 15 09:16:26 CEST 2014}
2014-10-15 09:20:02,417 [Thread-18] WARN  com.openkm.extractor.PdfTextExtractor- Failed to extract PDF text content
java.io.IOException: Error: Expected an integer type, actual='>>'
	at org.apache.pdfbox.pdfparser.BaseParser.readInt(BaseParser.java:1384)
	at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:517)
	at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:184)
	at com.openkm.extractor.PdfTextExtractor.extractText(PdfTextExtractor.java:64)
	at com.openkm.extractor.RegisteredExtractors.getText(RegisteredExtractors.java:214)
	at com.openkm.extractor.RegisteredExtractors.getText(RegisteredExtractors.java:173)
	at com.openkm.dao.NodeDocumentDAO.textExtractorHelper(NodeDocumentDAO.java:1343)
	at com.openkm.extractor.TextExtractorWorker.processSerial(TextExtractorWorker.java:164)
	at com.openkm.extractor.TextExtractorWorker.processQueue(TextExtractorWorker.java:149)
	at com.openkm.extractor.TextExtractorWorker.run(TextExtractorWorker.java:100)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at bsh.Reflect.invokeMethod(Reflect.java:134)
	at bsh.Reflect.invokeObjectMethod(Reflect.java:80)
	at bsh.BSHPrimarySuffix.doName(BSHPrimarySuffix.java:176)
	at bsh.BSHPrimarySuffix.doSuffix(BSHPrimarySuffix.java:120)
	at bsh.BSHPrimaryExpression.eval(BSHPrimaryExpression.java:80)
	at bsh.BSHPrimaryExpression.eval(BSHPrimaryExpression.java:47)
	at bsh.Interpreter.eval(Interpreter.java:645)
	at bsh.Interpreter.eval(Interpreter.java:739)
	at bsh.Interpreter.eval(Interpreter.java:728)
	at com.openkm.util.ExecutionUtils.runScript(ExecutionUtils.java:112)
	at com.openkm.core.Cron$RunnerBsh.run(Cron.java:103)
	at java.lang.Thread.run(Thread.java:745)
2014-10-15 09:20:02,426 [Thread-18] WARN  com.openkm.dao.NodeDocumentDAO- There was a problem extracting text from '/okm:root/document.pdf': Too few text extracted
 #30243  by jllort
 
How did you create this PDF document ?
Could you give us some of these documents ( post here, for example )

About Us

OpenKM is part of the management software. A management software is a program that facilitates the accomplishment of administrative tasks. OpenKM is a document management system that allows you to manage business content and workflow in a more efficient way. Document managers guarantee data protection by establishing information security for business content.