Page 1 of 1

OCR for existing PDF files

PostPosted:Wed Jan 19, 2022 9:11 am
by polandia12
Hello,

I was searching a lot and I can not find information about it. Is it possible to install some plugin what allow to scan existing pdf files? What I see in this moment is possiblity to scan only image files but we have quite big documentation and it is nice to have option that i can scann all old documents with OCR and then searching content inside.

Re: OCR for existing PDF files

PostPosted:Sat Jan 22, 2022 10:29 am
by jllort
I need some clarification on your scenario. New PDF documents are processed by the OCR engine and you are able to find contents but you are not able to find old ones that had not been processed in past. Basically, you wish to process by OCR the old ones. Is that your scenario?

Re: OCR for existing PDF files

PostPosted:Mon Jan 24, 2022 8:42 am
by polandia12
The scenario is very simply I would like to upload document what was scanned already - lets say by scanner and converted to pdf file. This file I want to upload to openkm and here make OCR. Maybe this system already contains feature like it? If not do you know some plugins for it?

Second question is about word, excel and etc preview do you know some plugins what can I use? In this momemnt I can see only PDF preview. I have OPENKM 6.3 CE.

Re: OCR for existing PDF files

PostPosted:Sat Jan 29, 2022 10:33 am
by jllort
First check the current text extractors with a document already uploaded, for it follow:
* Go to administration > tools > check extraction
* paste document UUID ( previously get one from the UI )
* click on execute

Share the screenshot of the result