Open Source Document Management System | OpenKM - where is OCR menu in opensource version

where is OCR menu in opensource version

Forum rules: Please, before asking something see the documentation wiki or use the search feature of the forum. And remember we don't have a crystal ball or mental readers, so if you post about an issue tell us which OpenKM are you using and also the browser and operating system version. For more info read How to Report Bugs Effectively.

6 posts

6 posts

where is OCR menu in opensource version

#26087 by vincentk222
Tue Oct 29, 2013 11:59 am

where is OCR menu in open source version?

here are my setting:

system.ocr String /usr/bin/tesseract ${fileIn} ${fileOut}
system.ocr.rotate String 90;180;270;
system.pdf.force.ocr Boolean Active

Username

vincentk222

Rank

Junior Boarder

Posts

Joined

Fri Sep 20, 2013 12:27 pm

Re: where is OCR menu in opensource version

#26133 by jllort
Wed Oct 30, 2013 12:07 pm

What do you mean about OCR menu ? Are you talking about Zone OCR ?

Username

jllort

Rank

Moderator

Posts

12185

Joined

Fri Dec 21, 2007 11:23 am

Location

Sineu - ( Illes Balears ) - Spain

Contact

Re: where is OCR menu in opensource version

#26141 by vincentk222
Thu Oct 31, 2013 2:40 pm

In general feature, I can see that OCR is marked as green.
I also configured :
system.ocr String /usr/bin/tesseract ${fileIn} ${fileOut}
system.ocr.rotate String 90;180;270;
system.pdf.force.ocr Boolean Active

But how does it work? If I have a pdf image can I make a pdf searchable?
What does this OCR function?

Username

vincentk222

Rank

Junior Boarder

Posts

Joined

Fri Sep 20, 2013 12:27 pm

Re: where is OCR menu in opensource version

#26151 by jllort
Fri Nov 01, 2013 10:51 am

Take in mind you got document content index queue ( administration -> stats -> queue ). If document is not processed you're not able to search into.

I suggest take a look at administration -> database query
use jdbc and make a query to get OKM_NODE_DOCUMENT ( there's a column to indicate if text has been extracted = T and you can see there the extracted text in other column )

Final considerations, depending the resolucion of images in pdf etc... some OCR engine will be better than other. Last year tests seams tesserract gives better results than cuneiform from latest released versions.

Username

jllort

Rank

Moderator

Posts

12185

Joined

Fri Dec 21, 2007 11:23 am

Location

Sineu - ( Illes Balears ) - Spain

Contact

Re: where is OCR menu in opensource version

#26191 by vincentk222
Mon Nov 11, 2013 1:09 pm

extraction was done
My mistake was, I believe the OCR add a layer text to the PDF, but this is not the case.

If the document is a PDF (scanned image) there is nothing in the text extracted, I think there is no ocr done
If the file is a TIF, OCR is processed but the result is only minus : --------------------------------- ------------------------ ----------------------

Username

vincentk222

Rank

Junior Boarder

Posts

Joined

Fri Sep 20, 2013 12:27 pm

Re: where is OCR menu in opensource version

#26230 by jllort
Wed Nov 13, 2013 10:17 am

Open source ocr engines can not work with low resolution images. I suggest extract image into pdf and execute ocr application from terminal to see results. For example with Abby ocr capture will get good results with 100ppp images. Take in mind with open source solution not always will get same performance than comercial otherside nobody will buy it.

Username

jllort

Rank

Moderator

Posts

12185

Joined

Fri Dec 21, 2007 11:23 am

Location

Sineu - ( Illes Balears ) - Spain

Contact

Page 1 of 1
6 posts

Return to “Configuration”

Display:

Sort by:

Jump to: