• Can't get OCR to work

  • Problems with installing OpenKM? No problemo, the solution is closer than you think.
Problems with installing OpenKM? No problemo, the solution is closer than you think.
Forum rules: Please, before asking something see the documentation wiki or use the search feature of the forum. And remember we don't have a crystal ball or mental readers, so if you post about an issue tell us which OpenKM are you using and also the browser and operating system version. For more info read How to Report Bugs Effectively.
 #42215  by redhot
 
Hi,

I want OpenKM to do a simple thing: watch a directory and process any PDF or image in that directory, and then remove the processed images from there but keep them in OpenKM's database.

And I want any document added to OpenKM to be processed with OCR so I can search for the contents within the document and find them easily.

This is for home use.

I installed OpenKM community on an Ubuntu server. I can upload documents, but I can't OCR them. I set `system.ocr` in the config file, I installed Tesseract but I don't get the OCR option anywhere in the application.

What am I missing?
 #42219  by jllort
 
Which openkm version did you installed ?
Which is the system.ocr value ?

Do you have documents at Administration > Stats > pending text extractor queue ( take in mind documents are not processed just in time they come into queue ).
 #42243  by gwaitsi
 
Redhot,

i set the same scenario some years ago already. you can try to go through my comms with jllort at the time if anything can help you.

https://forum.openkm.com/viewtopic.php? ... 824#p39824
https://forum.openkm.com/viewtopic.php? ... 816#p39816
https://forum.openkm.com/viewtopic.php? ... 738#p39738

I have since moved to the the server to omv nas box and access from a linux mint client.

the only thin of note when i moved;
- under ubuntu i had libreoffice 3.3x
- under omv i have libreoffice 4.2x headless

couldn't get it to work with office 5.x headless with i moved to omv and read somewhere it only worked with 3.x but for me 4.2x is fine.

best of luck
 #46132  by mold21
 
Hi,

I am providing some general instructions which should be kept in mind while using OCR:-
1. First tip has to be about the wonder called OCR! You can make your scanned images searchable by running OCR on them. Also, you can extract text from ‘image PDFs’ by doing so.
2. Before you scan a whole lot of documents to OCR later, scan one paper at different settings and run OCR to see how the results are. Use the settings that gives you the least number of OCR suspects.
3. To get the best results from OCR, use ClearScan. It generates smaller file sizes and looks better at a given DPI. For an in-depth description.
4. When scanning documents customize the options to improve the quality of the scan and hence the quality of OCR.
5. Create as high quality scan as possible. Expect a good OCR if the image is 300 dpi or better. 600 dpi is good enough for most common purposes.
6. Acrobat cannot OCR a page that is more than 45 inches in any one direction.
7. You can add files other than PDF documents by selecting ‘In Multiple Files’.
8. You can run OCR on an entire folder by selecting ‘In Multiple Files’ and then ‘Add Folders’.
9. You can disable OCR when scanning files. You may want to do so, when scanning many files and you have limited computing power! In the Configure Presets dialog for scanning, deselect ‘Make Searchable (Run OCR)’.
10. If you have a choice, do not use text over bright or dark graphics in the source to be scanned. Such text is not recognized properly during OCR, as the contrast between the text and the background is not high enough.
 #46567  by aleifuu
 
Hi oieceve27,

You download the open-community version from SourceForge, or otherwise you can use the installer that you can download off OpenKM's official web

After u get everything sorted out, point your browser to openkm's and happy uploading !
 #46584  by jllort
 
We have an installer what will do all the work for you https://www.openkm.com/en/download.html ( in this website section you will find the installer an a video guide for it ).

Please try do not merge distinct topic in the same post, because future readers might lose the point.

About ClearScan tools - commented by mold21 - seems is Adobe tools what comes with Adobe DC Pro and Adobe XI Pro. I do not know if the tools might be executed from the command line ( then will be able to integrate with OpenKM in Windows scenario )
http://blogs.adobe.com/acrolaw/2009/05/ ... n_is_smal/
https://forums.adobe.com/thread/1810210

About Us

OpenKM is part of the management software. A management software is a program that facilitates the accomplishment of administrative tasks. OpenKM is a document management system that allows you to manage business content and workflow in a more efficient way. Document managers guarantee data protection by establishing information security for business content.