Hi !
I'm playing with okm 6.2.3 on Debian Linux. Everything works fine except that certain PDF files are not indexed.
Those PDFs do for sure contain a text layer. The only difference is that they are created with Abbyy Finereader 11.
The logfile has the following entries:
Has anyone had similar problems and probably can suggest a solution ?
Thanks
Alex
I'm playing with okm 6.2.3 on Debian Linux. Everything works fine except that certain PDF files are not indexed.
Those PDFs do for sure contain a text layer. The only difference is that they are created with Abbyy Finereader 11.
The logfile has the following entries:
Code: Select all
The files index all right on Adobe, just don't work in OKM.2013-05-21 16:03:33,857 [http-bio-0.0.0.0-8080-exec-18] INFO com.openkm.servlet.frontend.FileUploadServlet - Filename: 'fff.pdf'
2013-05-21 16:03:33,857 [http-bio-0.0.0.0-8080-exec-18] INFO com.openkm.servlet.frontend.FileUploadServlet - Upload file 'fff.pdf' into '/okm:root (137.7 KB)'
2013-05-21 16:03:33,857 [http-bio-0.0.0.0-8080-exec-18] INFO com.openkm.servlet.frontend.FileUploadServlet - Wizard: {path=, showWizardCategories=false, showWizardKeywords=false, groupsList=[], workflowList=[], hasAutomation=false, error=, digitalSignature=false}
2013-05-21 16:03:33,896 [http-bio-0.0.0.0-8080-exec-18] INFO com.openkm.servlet.frontend.FileUploadServlet - Wizard: {path=%2Fokm%3Aroot%2Ffff.pdf, showWizardCategories=false, showWizardKeywords=false, groupsList=[], workflowList=[], hasAutomation=false, error=, digitalSignature=false}
2013-05-21 16:03:33,897 [http-bio-0.0.0.0-8080-exec-18] INFO com.openkm.servlet.frontend.FileUploadServlet - Action: 0, JSON Response: {"hasAutomation":false,"path":"%2Fokm%3Aroot%2Ffff.pdf","groupsList":[],"workflowList":[],"showWizardCategories":false,"showWizardKeywords":false,"digitalSignature":false,"error":""}
2013-05-21 16:05:00,014 [Thread-4072] INFO com.openkm.extractor.TextExtractorWorker - processSerial.Working on {docUuid=08375552-ff32-4a0c-8f88-234fa0c1986a, docPath=/okm:root/fff.pdf, docVerUuid=883f7a97-0a19-449b-a23a-9cc81dbd5b54, date=Tue May 21 16:03:33 CEST 2013}
2013-05-21 16:05:00,021 [Thread-4072] WARN com.openkm.extractor.PdfTextExtractor - PDF does not contains text layer
2013-05-21 16:05:00,733 [Thread-4072] INFO com.openkm.extractor.Tesseract3TextExtractor - TEXT:
2013-05-21 16:05:00,734 [Thread-4072] WARN com.openkm.dao.NodeDocumentDAO - There was a problem extracting text from '/okm:root/fff.pdf': Too few text extractedHas anyone had similar problems and probably can suggest a solution ?
Thanks
Alex
