• OCR for existing PDF files

  • We tried to make OpenKM as intuitive as possible, but an advice is always welcome.
We tried to make OpenKM as intuitive as possible, but an advice is always welcome.
Forum rules: Please, before asking something see the documentation wiki or use the search feature of the forum. And remember we don't have a crystal ball or mental readers, so if you post about an issue tell us which OpenKM are you using and also the browser and operating system version. For more info read How to Report Bugs Effectively.
 #53201  by polandia12

I was searching a lot and I can not find information about it. Is it possible to install some plugin what allow to scan existing pdf files? What I see in this moment is possiblity to scan only image files but we have quite big documentation and it is nice to have option that i can scann all old documents with OCR and then searching content inside.
 #53212  by jllort
I need some clarification on your scenario. New PDF documents are processed by the OCR engine and you are able to find contents but you are not able to find old ones that had not been processed in past. Basically, you wish to process by OCR the old ones. Is that your scenario?
 #53216  by polandia12
The scenario is very simply I would like to upload document what was scanned already - lets say by scanner and converted to pdf file. This file I want to upload to openkm and here make OCR. Maybe this system already contains feature like it? If not do you know some plugins for it?

Second question is about word, excel and etc preview do you know some plugins what can I use? In this momemnt I can see only PDF preview. I have OPENKM 6.3 CE.
 #53227  by jllort
First check the current text extractors with a document already uploaded, for it follow:
* Go to administration > tools > check extraction
* paste document UUID ( previously get one from the UI )
* click on execute

Share the screenshot of the result
 #54724  by MarcoOliveira
application/pdf | com.openkm.extractor.AbbyTextExtractor
----------------------------- ----------------------------- ----------------------------- ----------------------------- |
 #54741  by jllort
Should disable this extractor because you do not have the Abby OCR engine

About Us

OpenKM is part of the management software. A management software is a program that facilitates the accomplishment of administrative tasks. OpenKM is a document management system that allows you to manage business content and workflow in a more efficient way. Document managers guarantee data protection by establishing information security for business content.