Open Source Document Management System | OpenKM

PostPosted:**Mon Nov 07, 2016 4:09 pm**

Hi,

I dedicated several days to configure OpenKM. I would like to use the program to manage my documents at home. The OCR feature is critical as I would like the contents of all uploaded documents to be taken into account while searching. This is all.

I've installed OpenKM Community 6.3.2 under Debian Stretch 4.7.8-1 (2016-10-19) x86_64 GNU/Linux
I've installed tesseract 3.04.01
I've installed all required Java staff.

Below is the configuration that I performed in the administration tab in OpenKM.

Code: Select all

registered.text.extractors= com.openkm.extractor.Tesseract3TextExtractor -l eng
system.ocr=/usr/bin/tesseract
system.ocr.rotate= 90;180;270; 
system.pdf.force.ocr=TRUE

The OCR feature does not seem to be working. When I try the Tessaract over the command line I'm able to get results.

In the log file I see the following message:

Code: Select all

WARN  com.openkm.extractor.RegisteredExtractors- Text extraction failure: Full text indexing of 'image/png' is not supported

PostPosted:**Tue Nov 08, 2016 12:55 pm**

This is wrong:

Code: Select all

registered.text.extractors= com.openkm.extractor.Tesseract3TextExtractor -l eng

Should be

Code: Select all

registered.text.extractors= com.openkm.extractor.Tesseract3TextExtractor -l eng

About the

Code: Select all

system.ocr=/usr/bin/tesseract

Should be ( as is explained here http://wiki.openkm.com/index.php/Third- ... ation:_OCR )

Code: Select all

system.ocr=/usr/bin/tesseract ${fileIn} ${fileOut} -l eng

Really if you only install eng support language for tesseract is not necessary specify the -l

Open Source Document Management System | OpenKM

OCR feature not working in community

OCR feature not working in community

Re: OCR feature not working in community