Page 1 of 1

tesseract processing error in log

PostPosted:Thu Jan 21, 2021 9:27 pm
by yedkm
Hi,

Using 6.3.9 CE version

I am seeing multiple tesseract errors in log:
Code: Select all
2021-01-21 13:23:34,850 [Thread-16] [] WARN  com.openkm.util.ExecutionUtils - Abnormal program termination: 1
2021-01-21 13:23:34,850 [Thread-16] [] WARN  com.openkm.util.ExecutionUtils - CommandLine: [C:\Program, Files\Tesseract-OCR\tesseract.exe, C:\tomcat-8.5.34\temp\okm2629366693785316603-0009.pbm, C:\tomcat-8.5.34\temp\okm7056446360890699995]
2021-01-21 13:23:34,850 [Thread-16] [] WARN  com.openkm.util.ExecutionUtils - STDERR: Tesseract Open Source OCR Engine v5.0.0-alpha.20200328 with Leptonica
Error in findFileFormatStream: truncated file
Error in fopenReadStream: file not found
Error in pixRead: image file not found: P4
Image file P4 cannot be read!
Error during processing.
When I look at the .pbm files under temp, they just have 1KB of size and if I open them with notepad the content is:
P4
1 1

Any suggestions / ideas?

Thanks,
Felipe.

Re: tesseract processing error in log

PostPosted:Fri Jan 22, 2021 7:11 pm
by jllort
I suggest using stable version either alpha ( tesseract version 4.X maybe is a bug ).

Also I suggest check tesseract directly from terminal, usually in this manner maybe you'll get more information about what's happening.

Re: tesseract processing error in log

PostPosted:Fri Jan 22, 2021 9:20 pm
by yedkm
Thanks,

So you don't think is the process that generates that .pbm file?
is that file supposed to be a text file or a graphics file...I thought pbm stands for portable bit map?

Felipe.

Re: tesseract processing error in log

PostPosted:Sat Jan 30, 2021 7:27 am
by jllort
First step should be discover what document is raising this issue. Then using Administration > Tools > Check text extraction can execute the process manually to debug what happens. At the end the objective is trying to reproduce from the command line.

If it is a PDF file, maybe the images into have some issue. Or a local library with a bug what raises this exception with these images.