Hello,
After a mozilla firefox update, I loss the preview functionality.
I was running OpenKM 6.3.0.
I have now installed 6.3.11 and testing it before going live.
Text recognition is not working
System :
- Ubuntu 22.04.1 LTS (GNU/Linux 5.15.0-48-generic x86_64)
- OpenKM 6.3.12 (build: a3587ce) With Community Extension
- Tesseract 4.1.1 leptonica-1.82.0
An uploaded document goes to the queue and then to extraction in progress.
When the doc is a pdf I have this error :
In catalina.out :
openkm@okm-vm:~/Polet/Manuels$ tesseract '2018 Manual FM voiture.pdf' test -l fra
Tesseract Open Source OCR Engine v4.1.1 with Leptonica
Error in pixReadStream: Pdf reading is not supported
Error in pixRead: pix not read
Error during processing.
When the doc is a jpg it also fail, catalina.out :
openkm@okm-vm:~$ tesseract bpostTest.jpg bpost.txt -l fra
Tesseract Open Source OCR Engine v4.1.1 with Leptonica
Warning: Invalid resolution 0 dpi. Using 70 instead.
Estimating resolution as 625
System settings :
system.ocr String /usr/bin/tesseract ${fileIn} ${fileOut}
system.ocr.rotate String
system.pdf.force.ocr Boolean Inactive
Plug-in settings :
com.openkm.extractor.TextExtractor :
Thank you,
Harold
After a mozilla firefox update, I loss the preview functionality.
I was running OpenKM 6.3.0.
I have now installed 6.3.11 and testing it before going live.
Text recognition is not working
System :
- Ubuntu 22.04.1 LTS (GNU/Linux 5.15.0-48-generic x86_64)
- OpenKM 6.3.12 (build: a3587ce) With Community Extension
- Tesseract 4.1.1 leptonica-1.82.0
An uploaded document goes to the queue and then to extraction in progress.
When the doc is a pdf I have this error :
In catalina.out :
Code: Select all
Trying to convert from the terminal also fail2022-09-22 19:10:57,139 [Thread-6295] WARN com.openkm.dao.NodeDocumentDAO - There was a problem extracting text from '/okm:root/Licences/2012_01_16 VMware Fusion 4.pdf': Too few text extracted
2022-09-22 19:10:57,143 [Thread-6295] INFO c.o.extractor.TextExtractorWorker - processSerial.Working on {docUuid=a31e356d-c9a5-4eab-9f42-afcaa25bec32, docPath=/okm:root/Licences/2014_06_16 AssistiveWare activation.pdf, docVerUuid=ea6a1cb2-8ac3-4b6f-beaa-6ced3b6a6b31, date=Thu Sep 22 15:04:44 CEST 2022}
2022-09-22 19:10:57,278 [Thread-6295] WARN com.openkm.util.ExecutionUtils - Abnormal program termination: 1
2022-09-22 19:10:57,278 [Thread-6295] WARN com.openkm.util.ExecutionUtils - CommandLine: [/usr/bin/tesseract, /home/openkm/tomcat/temp/okm5678788871933443952.pdf, /home/openkm/tomcat/temp/okm7604132903759119539.txt]
2022-09-22 19:10:57,278 [Thread-6295] WARN com.openkm.util.ExecutionUtils - STDERR: Tesseract Open Source OCR Engine v4.1.1 with Leptonica
Error in pixReadStream: Pdf reading is not supported
Error in pixRead: pix not read
Error during processing.
openkm@okm-vm:~/Polet/Manuels$ tesseract '2018 Manual FM voiture.pdf' test -l fra
Tesseract Open Source OCR Engine v4.1.1 with Leptonica
Error in pixReadStream: Pdf reading is not supported
Error in pixRead: pix not read
Error during processing.
When the doc is a jpg it also fail, catalina.out :
Code: Select all
but from the terminal I do have a successfull text recognition output2022-09-23 09:55:00,055 [Thread-6964] INFO c.o.extractor.TextExtractorWorker - processSerial.Working on {docUuid=7977b55b-a787-474d-a314-d40415d72776, docPath=/okm:root/Test/test Bpost TVA import.jpg, docVerUuid=481b4a0b-2efa-4615-b6ec-835e453e2601, date=Fri Sep 23 09:50:54 CEST 2022}
2022-09-23 09:55:15,447 [Thread-6964] WARN com.openkm.dao.NodeDocumentDAO - There was a problem extracting text from '/okm:root/Test/test Bpost TVA import.jpg': Too few text extracted
openkm@okm-vm:~$ tesseract bpostTest.jpg bpost.txt -l fra
Tesseract Open Source OCR Engine v4.1.1 with Leptonica
Warning: Invalid resolution 0 dpi. Using 70 instead.
Estimating resolution as 625
System settings :
system.ocr String /usr/bin/tesseract ${fileIn} ${fileOut}
system.ocr.rotate String
system.pdf.force.ocr Boolean Inactive
Plug-in settings :
com.openkm.extractor.TextExtractor :
Code: Select all
Any advice please ?AbbyTextExtractor com.openkm.extractor.AbbyTextExtractor Active
AudioTextExtractor com.openkm.extractor.AudioTextExtractor Active
BarcodeTextExtractor com.openkm.extractor.BarcodeTextExtractor Active
CuneiformTextExtractor com.openkm.extractor.CuneiformTextExtractor Active
ExifTextExtractor com.openkm.extractor.ExifTextExtractor Active
HTMLTextExtractor com.openkm.extractor.HTMLTextExtractor Active
MsExcelTextExtractor com.openkm.extractor.MsExcelTextExtractor Active
MsOffice2007TextExtractor com.openkm.extractor.MsOffice2007TextExtractor Active
MsOutlookTextExtractor com.openkm.extractor.MsOutlookTextExtractor Active
MsPowerPointTextExtractor com.openkm.extractor.MsPowerPointTextExtractor Active
MsWordTextExtractor com.openkm.extractor.MsWordTextExtractor Active
NativeMsExcelTextExtractor com.openkm.extractor.NativeMsExcelTextExtractor Active
OOTextExtractor com.openkm.extractor.OOTextExtractor Active
OpenOfficeTextExtractor com.openkm.extractor.OpenOfficeTextExtractor Active
PdfTextExtractor com.openkm.extractor.PdfTextExtractor Active
PlainTextExtractor com.openkm.extractor.PlainTextExtractor Active
RTFTextExtractor com.openkm.extractor.RTFTextExtractor Active
SourceCodeTextExtractor com.openkm.extractor.SourceCodeTextExtractor Active
Tesseract2TextExtractor com.openkm.extractor.Tesseract2TextExtractor Active
Tesseract3TextExtractor com.openkm.extractor.Tesseract3TextExtractor Active
XMLTextExtractor com.openkm.extractor.XMLTextExtractor Active
Thank you,
Harold