Just want to aak for clarification about the scanner tool

We tried to make OpenKM as intuitive as possible, but an advice is always welcome.
Forum rules
Please, before asking something see the documentation wiki or use the search feature of the forum. And remember we don't have a crystal ball or mental readers, so if you post about an issue tell us which OpenKM are you using and also the browser and operating system version. For more info read How to Report Bugs Effectively.
Post Reply
GerryMacJr
Fresh Boarder
Fresh Boarder
Posts: 3
Joined: Sat Oct 13, 2018 8:37 am

Just want to aak for clarification about the scanner tool

Post by GerryMacJr » Sat Oct 13, 2018 8:59 am

Hello, just installed the latest Community Edition love it so far and easy enough to follow the instruction. Would just like to ask for clarification from the community with regards to the scanner tool.just installed the scanner tool and it was easy going my question is what image format and dpi should i use for the purpose of ocr and when converting it to pdf will it still be subjected to ocr upon upload to openkm?and will the conversion to pdf affect the text inside the document?

Thank you

jllort
Moderator
Moderator
Posts: 10314
Joined: Fri Dec 21, 2007 11:23 am
Location: Sineu - ( Illes Balears ) - Spain
Contact:

Re: Just want to aak for clarification about the scanner tool

Post by jllort » Sat Oct 13, 2018 10:05 am

Usually, for good OCR engine process, I suggest 300dpi-600dpi. Anyway you should make some test because it depends on the font type used in the document, character and space between character size, cleaning or dirty image, ... well, there are a lot of factors.

300 dpi is a good starting point ( small size ), but you can consider working with bigger and after the OCR engine has been completed then apply from OpenKM side some compression ( this is something we do with customers with a small customization to convert pdf to pdf with group4 conversion, something quite easy to be done with ImageMagick tool).

GerryMacJr
Fresh Boarder
Fresh Boarder
Posts: 3
Joined: Sat Oct 13, 2018 8:37 am

Re: Just want to aak for clarification about the scanner tool

Post by GerryMacJr » Sat Oct 13, 2018 10:32 am

jllort wrote:
Sat Oct 13, 2018 10:05 am
Usually, for good OCR engine process, I suggest 300dpi-600dpi. Anyway you should make some test because it depends on the font type used in the document, character and space between character size, cleaning or dirty image, ... well, there are a lot of factors.

300 dpi is a good starting point ( small size ), but you can consider working with bigger and after the OCR engine has been completed then apply from OpenKM side some compression ( this is something we do with customers with a small customization to convert pdf to pdf with group4 conversion, something quite easy to be done with ImageMagick tool).
Yes i read about the test the ocr on a scanned document to check for better result my next question is should i upload it to openkm as an image file or pdf for lucene indexing

jllort
Moderator
Moderator
Posts: 10314
Joined: Fri Dec 21, 2007 11:23 am
Location: Sineu - ( Illes Balears ) - Spain
Contact:

Re: Just want to aak for clarification about the scanner tool

Post by jllort » Sun Oct 14, 2018 2:21 pm

You should get similar results, into the PDF you also have the image.

GerryMacJr
Fresh Boarder
Fresh Boarder
Posts: 3
Joined: Sat Oct 13, 2018 8:37 am

Re: Just want to aak for clarification about the scanner tool

Post by GerryMacJr » Sun Oct 14, 2018 10:36 pm

jllort wrote:
Sun Oct 14, 2018 2:21 pm
You should get similar results, into the PDF you also have the image.
Thank you cheers

Post Reply

Who is online

Users browsing this forum: No registered users and 1 guest