• Just want to aak for clarification about the scanner tool

  • We tried to make OpenKM as intuitive as possible, but an advice is always welcome.
We tried to make OpenKM as intuitive as possible, but an advice is always welcome.
Forum rules: Please, before asking something see the documentation wiki or use the search feature of the forum. And remember we don't have a crystal ball or mental readers, so if you post about an issue tell us which OpenKM are you using and also the browser and operating system version. For more info read How to Report Bugs Effectively.
 #46919  by GerryMacJr
 
Hello, just installed the latest Community Edition love it so far and easy enough to follow the instruction. Would just like to ask for clarification from the community with regards to the scanner tool.just installed the scanner tool and it was easy going my question is what image format and dpi should i use for the purpose of ocr and when converting it to pdf will it still be subjected to ocr upon upload to openkm?and will the conversion to pdf affect the text inside the document?

Thank you
 #46923  by jllort
 
Usually, for good OCR engine process, I suggest 300dpi-600dpi. Anyway you should make some test because it depends on the font type used in the document, character and space between character size, cleaning or dirty image, ... well, there are a lot of factors.

300 dpi is a good starting point ( small size ), but you can consider working with bigger and after the OCR engine has been completed then apply from OpenKM side some compression ( this is something we do with customers with a small customization to convert pdf to pdf with group4 conversion, something quite easy to be done with ImageMagick tool).
 #46925  by GerryMacJr
 
jllort wrote: Sat Oct 13, 2018 10:05 am Usually, for good OCR engine process, I suggest 300dpi-600dpi. Anyway you should make some test because it depends on the font type used in the document, character and space between character size, cleaning or dirty image, ... well, there are a lot of factors.

300 dpi is a good starting point ( small size ), but you can consider working with bigger and after the OCR engine has been completed then apply from OpenKM side some compression ( this is something we do with customers with a small customization to convert pdf to pdf with group4 conversion, something quite easy to be done with ImageMagick tool).
Yes i read about the test the ocr on a scanned document to check for better result my next question is should i upload it to openkm as an image file or pdf for lucene indexing

About Us

OpenKM is part of the management software. A management software is a program that facilitates the accomplishment of administrative tasks. OpenKM is a document management system that allows you to manage business content and workflow in a more efficient way. Document managers guarantee data protection by establishing information security for business content.