Open Source Document Management System | OpenKM

PostPosted:**Tue Oct 29, 2013 11:59 am**

where is OCR menu in open source version?

here are my setting:

system.ocr String /usr/bin/tesseract ${fileIn} ${fileOut}
system.ocr.rotate String 90;180;270;
system.pdf.force.ocr Boolean Active

PostPosted:**Wed Oct 30, 2013 12:07 pm**

What do you mean about OCR menu ? Are you talking about Zone OCR ?

PostPosted:**Thu Oct 31, 2013 2:40 pm**

In general feature, I can see that OCR is marked as green.
I also configured :
system.ocr String /usr/bin/tesseract ${fileIn} ${fileOut}
system.ocr.rotate String 90;180;270;
system.pdf.force.ocr Boolean Active

But how does it work? If I have a pdf image can I make a pdf searchable?
What does this OCR function?

PostPosted:**Fri Nov 01, 2013 10:51 am**

Take in mind you got document content index queue ( administration -> stats -> queue ). If document is not processed you're not able to search into.

I suggest take a look at administration -> database query
use jdbc and make a query to get OKM_NODE_DOCUMENT ( there's a column to indicate if text has been extracted = T and you can see there the extracted text in other column )

Final considerations, depending the resolucion of images in pdf etc... some OCR engine will be better than other. Last year tests seams tesserract gives better results than cuneiform from latest released versions.

PostPosted:**Mon Nov 11, 2013 1:09 pm**

extraction was done
My mistake was, I believe the OCR add a layer text to the PDF, but this is not the case.

If the document is a PDF (scanned image) there is nothing in the text extracted, I think there is no ocr done
If the file is a TIF, OCR is processed but the result is only minus : --------------------------------- ------------------------ ----------------------

PostPosted:**Wed Nov 13, 2013 10:17 am**

Open source ocr engines can not work with low resolution images. I suggest extract image into pdf and execute ocr application from terminal to see results. For example with Abby ocr capture will get good results with 100ppp images. Take in mind with open source solution not always will get same performance than comercial otherside nobody will buy it.

Open Source Document Management System | OpenKM

where is OCR menu in opensource version

where is OCR menu in opensource version

Re: where is OCR menu in opensource version

Re: where is OCR menu in opensource version

Re: where is OCR menu in opensource version

Re: where is OCR menu in opensource version

Re: where is OCR menu in opensource version