It seems as if the pdf-export from abbyy does something special to these pdf-files ...
Converting input file with ghostscript to all different pdf-Levels and openkm is able to index documents.
What is used to index pdf-documents in openkm?
error-log when uploading now working pdf:
Code: Select all[PdfTextExtractor] Failed to extract PDF text content
java.lang.NullPointerException
at org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:100)
at org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:75)
at org.apache.pdfbox.pdmodel.PDResources.getFonts(PDResources.java:115)
at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:243)
at org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:225)
at org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:442)
at org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:366)
at org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:322)
at com.openkm.extractor.PdfTextExtractor.extractText(PdfTextExtractor.java:70)
at org.apache.jackrabbit.extractor.CompositeTextExtractor.extractText(CompositeTextExtractor.java:90)
at org.apache.jackrabbit.core.query.lucene.JackrabbitTextExtractor.extractText(JackrabbitTextExtractor.java:195)
at org.apache.jackrabbit.core.query.lucene.TextExtractorJob$1.call(TextExtractorJob.java:93)
at EDU.oswego.cs.dl.util.concurrent.FutureResult$1.run(Unknown Source)
at org.apache.jackrabbit.core.query.lucene.TextExtractorJob.run(TextExtractorJob.java:172)
at EDU.oswego.cs.dl.util.concurrent.PooledExecutor$Worker.run(Unknown Source)