Page 1 of 1
where is OCR menu in opensource version
PostPosted:Tue Oct 29, 2013 11:59 am
by vincentk222
where is OCR menu in open source version?
here are my setting:
system.ocr String /usr/bin/tesseract ${fileIn} ${fileOut}
system.ocr.rotate String 90;180;270;
system.pdf.force.ocr Boolean Active
Re: where is OCR menu in opensource version
PostPosted:Wed Oct 30, 2013 12:07 pm
by jllort
What do you mean about OCR menu ? Are you talking about Zone OCR ?
Re: where is OCR menu in opensource version
PostPosted:Thu Oct 31, 2013 2:40 pm
by vincentk222
In general feature, I can see that OCR is marked as green.
I also configured :
system.ocr String /usr/bin/tesseract ${fileIn} ${fileOut}
system.ocr.rotate String 90;180;270;
system.pdf.force.ocr Boolean Active
But how does it work? If I have a pdf image can I make a pdf searchable?
What does this OCR function?
Re: where is OCR menu in opensource version
PostPosted:Fri Nov 01, 2013 10:51 am
by jllort
Take in mind you got document content index queue ( administration -> stats -> queue ). If document is not processed you're not able to search into.
I suggest take a look at administration -> database query
use jdbc and make a query to get OKM_NODE_DOCUMENT ( there's a column to indicate if text has been extracted = T and you can see there the extracted text in other column )
Final considerations, depending the resolucion of images in pdf etc... some OCR engine will be better than other. Last year tests seams tesserract gives better results than cuneiform from latest released versions.
Re: where is OCR menu in opensource version
PostPosted:Mon Nov 11, 2013 1:09 pm
by vincentk222
extraction was done
My mistake was, I believe the OCR add a layer text to the PDF, but this is not the case.
If the document is a PDF (scanned image) there is nothing in the text extracted, I think there is no ocr done
If the file is a TIF, OCR is processed but the result is only minus : --------------------------------- ------------------------ ----------------------
Re: where is OCR menu in opensource version
PostPosted:Wed Nov 13, 2013 10:17 am
by jllort
Open source ocr engines can not work with low resolution images. I suggest extract image into pdf and execute ocr application from terminal to see results. For example with Abby ocr capture will get good results with 100ppp images. Take in mind with open source solution not always will get same performance than comercial otherside nobody will buy it.