• tesseract processing error in log

  • We tried to make OpenKM as intuitive as possible, but an advice is always welcome.
We tried to make OpenKM as intuitive as possible, but an advice is always welcome.
Forum rules: Please, before asking something see the documentation wiki or use the search feature of the forum. And remember we don't have a crystal ball or mental readers, so if you post about an issue tell us which OpenKM are you using and also the browser and operating system version. For more info read How to Report Bugs Effectively.
 #52073  by yedkm
 
Hi,

Using 6.3.9 CE version

I am seeing multiple tesseract errors in log:
Code: Select all
2021-01-21 13:23:34,850 [Thread-16] [] WARN  com.openkm.util.ExecutionUtils - Abnormal program termination: 1
2021-01-21 13:23:34,850 [Thread-16] [] WARN  com.openkm.util.ExecutionUtils - CommandLine: [C:\Program, Files\Tesseract-OCR\tesseract.exe, C:\tomcat-8.5.34\temp\okm2629366693785316603-0009.pbm, C:\tomcat-8.5.34\temp\okm7056446360890699995]
2021-01-21 13:23:34,850 [Thread-16] [] WARN  com.openkm.util.ExecutionUtils - STDERR: Tesseract Open Source OCR Engine v5.0.0-alpha.20200328 with Leptonica
Error in findFileFormatStream: truncated file
Error in fopenReadStream: file not found
Error in pixRead: image file not found: P4
Image file P4 cannot be read!
Error during processing.
When I look at the .pbm files under temp, they just have 1KB of size and if I open them with notepad the content is:
P4
1 1

Any suggestions / ideas?

Thanks,
Felipe.
 #52083  by jllort
 
I suggest using stable version either alpha ( tesseract version 4.X maybe is a bug ).

Also I suggest check tesseract directly from terminal, usually in this manner maybe you'll get more information about what's happening.
 #52088  by yedkm
 
Thanks,

So you don't think is the process that generates that .pbm file?
is that file supposed to be a text file or a graphics file...I thought pbm stands for portable bit map?

Felipe.
 #52112  by jllort
 
First step should be discover what document is raising this issue. Then using Administration > Tools > Check text extraction can execute the process manually to debug what happens. At the end the objective is trying to reproduce from the command line.

If it is a PDF file, maybe the images into have some issue. Or a local library with a bug what raises this exception with these images.

About Us

OpenKM is part of the management software. A management software is a program that facilitates the accomplishment of administrative tasks. OpenKM is a document management system that allows you to manage business content and workflow in a more efficient way. Document managers guarantee data protection by establishing information security for business content.