tesseract-ocr not adding *.txt to output causing errors
PostPosted:Fri Jun 05, 2015 11:11 am
i switched from cuneiform to tesseract per jllort's recommendation and have so far cleared the java heap errors i was getting after tweaking the memory values. Also got the office dictionary installed.
i first noticed however, that tesseract is not adding the default *.txt to the files in the temp directory, whereas cuneiform was.
this seems to be producing an error
i first noticed however, that tesseract is not adding the default *.txt to the files in the temp directory, whereas cuneiform was.
this seems to be producing an error
Code: Select all
2015-06-05 12:35:01,702 [Thread-14570] WARN com.openkm.util.ExecutionUtils- CommandLine: [/usr/bin/tesseract, -l, eng+deu+ces, /usr/local/openkm/temp/okm6898227341957316764-001.pbm, /usr/local/openkm/temp/okm1885491060987496539]
2015-06-05 12:35:01,702 [Thread-14570] WARN com.openkm.util.ExecutionUtils- STDERR: Error in findFileFormatStream: truncated file
Error in pixReadStream: Unknown format: no pix returned
Error in pixRead: pix not read
2015-06-05 12:35:01,702 [Thread-14570] WARN com.openkm.extractor.Tesseract3TextExtractor- IO exception executing command: /usr/bin/tesseract -l eng+deu+ces /usr/local/openkm/temp/okm6898227341957316764-001.pbm /usr/local/openkm/temp/okm1885491060987496539
java.io.FileNotFoundException: /usr/local/openkm/temp/okm1885491060987496539.txt (No such file or directory)
at java.io.FileInputStream.open(Native Method)
at java.io.FileInputStream.<init>(FileInputStream.java:146)
at java.io.FileInputStream.<init>(FileInputStream.java:101)
at com.openkm.extractor.Tesseract3TextExtractor.doOcr(Tesseract3TextExtractor.java:152)
at com.openkm.extractor.Tesseract3TextExtractor.doOcr(Tesseract3TextExtractor.java:127)
at com.openkm.extractor.PdfTextExtractor.doOcr(PdfTextExtractor.java:175)
at com.openkm.extractor.PdfTextExtractor.extractText(PdfTextExtractor.java:96)
at com.openkm.extractor.RegisteredExtractors.getText(RegisteredExtractors.java:214)
at com.openkm.extractor.RegisteredExtractors.getText(RegisteredExtractors.java:173)
at com.openkm.dao.NodeDocumentDAO.textExtractorHelper(NodeDocumentDAO.java:1344)
at com.openkm.extractor.TextExtractorWorker.processSerial(TextExtractorWorker.java:164)
at com.openkm.extractor.TextExtractorWorker.processQueue(TextExtractorWorker.java:149)
at com.openkm.extractor.TextExtractorWorker.run(TextExtractorWorker.java:100)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at bsh.Reflect.invokeMethod(Reflect.java:166)
at bsh.Reflect.invokeObjectMethod(Reflect.java:99)
at bsh.BSHPrimarySuffix.doName(BSHPrimarySuffix.java:176)
at bsh.BSHPrimarySuffix.doSuffix(BSHPrimarySuffix.java:120)
at bsh.BSHPrimaryExpression.eval(BSHPrimaryExpression.java:80)
at bsh.BSHPrimaryExpression.eval(BSHPrimaryExpression.java:47)
at bsh.Interpreter.eval(Interpreter.java:664)
at bsh.Interpreter.eval(Interpreter.java:758)
at bsh.Interpreter.eval(Interpreter.java:747)
at com.openkm.util.ExecutionUtils.runScript(ExecutionUtils.java:112)
at com.openkm.core.Cron$RunnerBsh.run(Cron.java:103)
at java.lang.Thread.run(Thread.java:745)