Page 1 of 1

Just want to aak for clarification about the scanner tool

PostPosted:Sat Oct 13, 2018 8:59 am
by GerryMacJr
Hello, just installed the latest Community Edition love it so far and easy enough to follow the instruction. Would just like to ask for clarification from the community with regards to the scanner tool.just installed the scanner tool and it was easy going my question is what image format and dpi should i use for the purpose of ocr and when converting it to pdf will it still be subjected to ocr upon upload to openkm?and will the conversion to pdf affect the text inside the document?

Thank you

Re: Just want to aak for clarification about the scanner tool

PostPosted:Sat Oct 13, 2018 10:05 am
by jllort
Usually, for good OCR engine process, I suggest 300dpi-600dpi. Anyway you should make some test because it depends on the font type used in the document, character and space between character size, cleaning or dirty image, ... well, there are a lot of factors.

300 dpi is a good starting point ( small size ), but you can consider working with bigger and after the OCR engine has been completed then apply from OpenKM side some compression ( this is something we do with customers with a small customization to convert pdf to pdf with group4 conversion, something quite easy to be done with ImageMagick tool).

Re: Just want to aak for clarification about the scanner tool

PostPosted:Sat Oct 13, 2018 10:32 am
by GerryMacJr
jllort wrote: Sat Oct 13, 2018 10:05 am Usually, for good OCR engine process, I suggest 300dpi-600dpi. Anyway you should make some test because it depends on the font type used in the document, character and space between character size, cleaning or dirty image, ... well, there are a lot of factors.

300 dpi is a good starting point ( small size ), but you can consider working with bigger and after the OCR engine has been completed then apply from OpenKM side some compression ( this is something we do with customers with a small customization to convert pdf to pdf with group4 conversion, something quite easy to be done with ImageMagick tool).
Yes i read about the test the ocr on a scanned document to check for better result my next question is should i upload it to openkm as an image file or pdf for lucene indexing

Re: Just want to aak for clarification about the scanner tool

PostPosted:Sun Oct 14, 2018 2:21 pm
by jllort
You should get similar results, into the PDF you also have the image.

Re: Just want to aak for clarification about the scanner tool

PostPosted:Sun Oct 14, 2018 10:36 pm
by GerryMacJr
jllort wrote: Sun Oct 14, 2018 2:21 pm You should get similar results, into the PDF you also have the image.
Thank you cheers