Can't get OCR to work

Problems with installing OpenKM? No problemo, the solution is closer than you think.
Forum rules
Please, before asking something see the documentation wiki or use the search feature of the forum. And remember we don't have a crystal ball or mental readers, so if you post about an issue tell us which OpenKM are you using and also the browser and operating system version. For more info read How to Report Bugs Effectively.
Post Reply
redhot
Fresh Boarder
Fresh Boarder
Posts: 1
Joined: Sun Aug 28, 2016 9:03 pm

Can't get OCR to work

Post by redhot » Sun Aug 28, 2016 9:06 pm

Hi,

I want OpenKM to do a simple thing: watch a directory and process any PDF or image in that directory, and then remove the processed images from there but keep them in OpenKM's database.

And I want any document added to OpenKM to be processed with OCR so I can search for the contents within the document and find them easily.

This is for home use.

I installed OpenKM community on an Ubuntu server. I can upload documents, but I can't OCR them. I set `system.ocr` in the config file, I installed Tesseract but I don't get the OCR option anywhere in the application.

What am I missing?

jllort
Moderator
Moderator
Posts: 10347
Joined: Fri Dec 21, 2007 11:23 am
Location: Sineu - ( Illes Balears ) - Spain
Contact:

Re: Can't get OCR to work

Post by jllort » Tue Aug 30, 2016 9:19 am

Which openkm version did you installed ?
Which is the system.ocr value ?

Do you have documents at Administration > Stats > pending text extractor queue ( take in mind documents are not processed just in time they come into queue ).

gwaitsi
Senior Boarder
Senior Boarder
Posts: 54
Joined: Wed Sep 03, 2014 1:00 pm

Re: Can't get OCR to work

Post by gwaitsi » Sun Sep 04, 2016 6:06 am

Redhot,

i set the same scenario some years ago already. you can try to go through my comms with jllort at the time if anything can help you.

https://forum.openkm.com/viewtopic.php? ... 824#p39824
https://forum.openkm.com/viewtopic.php? ... 816#p39816
https://forum.openkm.com/viewtopic.php? ... 738#p39738

I have since moved to the the server to omv nas box and access from a linux mint client.

the only thin of note when i moved;
- under ubuntu i had libreoffice 3.3x
- under omv i have libreoffice 4.2x headless

couldn't get it to work with office 5.x headless with i moved to omv and read somewhere it only worked with 3.x but for me 4.2x is fine.

best of luck

mold21
Fresh Boarder
Fresh Boarder
Posts: 1
Joined: Sat Jun 16, 2018 12:04 pm

Re: Can't get OCR to work

Post by mold21 » Sat Jun 16, 2018 12:21 pm

Hi,

I am providing some general instructions which should be kept in mind while using OCR:-
1. First tip has to be about the wonder called OCR! You can make your scanned images searchable by running OCR on them. Also, you can extract text from ‘image PDFs’ by doing so.
2. Before you scan a whole lot of documents to OCR later, scan one paper at different settings and run OCR to see how the results are. Use the settings that gives you the least number of OCR suspects.
3. To get the best results from OCR, use ClearScan. It generates smaller file sizes and looks better at a given DPI. For an in-depth description.
4. When scanning documents customize the options to improve the quality of the scan and hence the quality of OCR.
5. Create as high quality scan as possible. Expect a good OCR if the image is 300 dpi or better. 600 dpi is good enough for most common purposes.
6. Acrobat cannot OCR a page that is more than 45 inches in any one direction.
7. You can add files other than PDF documents by selecting ‘In Multiple Files’.
8. You can run OCR on an entire folder by selecting ‘In Multiple Files’ and then ‘Add Folders’.
9. You can disable OCR when scanning files. You may want to do so, when scanning many files and you have limited computing power! In the Configure Presets dialog for scanning, deselect ‘Make Searchable (Run OCR)’.
10. If you have a choice, do not use text over bright or dark graphics in the source to be scanned. Such text is not recognized properly during OCR, as the contrast between the text and the background is not high enough.

oicive27
Fresh Boarder
Fresh Boarder
Posts: 1
Joined: Tue Aug 14, 2018 8:57 pm

Re: Can't get OCR to work

Post by oicive27 » Tue Aug 14, 2018 9:06 pm

How to installed Openkm Community on an Ubuntu server?
How can I upload documents?

aleifuu
Fresh Boarder
Fresh Boarder
Posts: 11
Joined: Wed Aug 08, 2018 3:39 am

Re: Can't get OCR to work

Post by aleifuu » Wed Aug 15, 2018 6:04 am

Hi oieceve27,

You download the open-community version from SourceForge, or otherwise you can use the installer that you can download off OpenKM's official web

After u get everything sorted out, point your browser to openkm's and happy uploading !

jllort
Moderator
Moderator
Posts: 10347
Joined: Fri Dec 21, 2007 11:23 am
Location: Sineu - ( Illes Balears ) - Spain
Contact:

Re: Can't get OCR to work

Post by jllort » Thu Aug 16, 2018 2:06 pm

We have an installer what will do all the work for you https://www.openkm.com/en/download.html ( in this website section you will find the installer an a video guide for it ).

Please try do not merge distinct topic in the same post, because future readers might lose the point.

About ClearScan tools - commented by mold21 - seems is Adobe tools what comes with Adobe DC Pro and Adobe XI Pro. I do not know if the tools might be executed from the command line ( then will be able to integrate with OpenKM in Windows scenario )
http://blogs.adobe.com/acrolaw/2009/05/ ... n_is_smal/
https://forums.adobe.com/thread/1810210

Post Reply

Who is online

Users browsing this forum: No registered users and 2 guests