Open Source Document Management System | OpenKM - OCR feature not working in community

Because information matters

OCR feature not working in community
OpenKM has many interesting features, but requires some configuration process to show its full potential.

Board index
OpenKM English Users Configuration « You are here

OCR feature not working in community

Forum rules: Please, before asking something see the documentation wiki or use the search feature of the forum. And remember we don't have a crystal ball or mental readers, so if you post about an issue tell us which OpenKM are you using and also the browser and operating system version. For more info read How to Report Bugs Effectively.

2 posts

2 posts

OCR feature not working in community

#42541 by Tazbir
Mon Nov 07, 2016 4:09 pm

Hi,

I dedicated several days to configure OpenKM. I would like to use the program to manage my documents at home. The OCR feature is critical as I would like the contents of all uploaded documents to be taken into account while searching. This is all.

I've installed OpenKM Community 6.3.2 under Debian Stretch 4.7.8-1 (2016-10-19) x86_64 GNU/Linux
I've installed tesseract 3.04.01
I've installed all required Java staff.

Below is the configuration that I performed in the administration tab in OpenKM.

Code: Select all

registered.text.extractors= com.openkm.extractor.Tesseract3TextExtractor -l eng
system.ocr=/usr/bin/tesseract
system.ocr.rotate= 90;180;270; 
system.pdf.force.ocr=TRUE

The OCR feature does not seem to be working. When I try the Tessaract over the command line I'm able to get results.

In the log file I see the following message:

Code: Select all

WARN  com.openkm.extractor.RegisteredExtractors- Text extraction failure: Full text indexing of 'image/png' is not supported

Username

Tazbir

Rank

Fresh Boarder

Posts

Joined

Mon Nov 07, 2016 3:34 pm

Re: OCR feature not working in community

#42549 by jllort
Tue Nov 08, 2016 12:55 pm

This is wrong:

Code: Select all

registered.text.extractors= com.openkm.extractor.Tesseract3TextExtractor -l eng

Should be

Code: Select all

registered.text.extractors= com.openkm.extractor.Tesseract3TextExtractor -l eng

About the

Code: Select all

system.ocr=/usr/bin/tesseract

Should be ( as is explained here http://wiki.openkm.com/index.php/Third- ... ation:_OCR )

Code: Select all

system.ocr=/usr/bin/tesseract ${fileIn} ${fileOut} -l eng

Really if you only install eng support language for tesseract is not necessary specify the -l

Username

jllort

Rank

Moderator

Posts

12185

Joined

Fri Dec 21, 2007 11:23 am

Location

Sineu - ( Illes Balears ) - Spain

Contact

Page 1 of 1
2 posts

Return to “Configuration”

Display:

Sort by:

Jump to:

- All times are UTC -