Page 1 of 1

OCR writes text to PDF file

PostPosted:Mon May 07, 2012 3:33 am
by Alexires
I've noticed that the text extraction from OCR only keeps the text in OpenKM. Once the file is downloaded, the text is no longer searchable.

Is it possible to use something like http://blog.konradvoelkel.de/2010/01/li ... em-solved/ to write any extracted OCR text to the PDF during upload so once the file has been downloaded, it is still searchable?

Re: OCR writes text to PDF file

PostPosted:Wed May 09, 2012 10:14 am
by pavila
It seems an interesting feature. I will add to OpenKM wishlist :)

Re: OCR writes text to PDF file

PostPosted:Fri May 11, 2012 7:42 am
by Alexires
You have no idea how hard I am wishing. ;)

Otherwise there is a program in linux called pdfocr which looks like it just took PDF's and OCR'd them then wrote the text to the PDF. Don't know what the status of the project is though: http://ubuntuforums.org/showthread.php?t=1456756

Re: OCR writes text to PDF file

PostPosted:Tue Jun 05, 2012 8:06 am
by pavila
I have tried this program but does not work very well :(

Re: OCR writes text to PDF file

PostPosted:Fri Jun 08, 2012 12:13 am
by Alexires
Yes, I found that also :(. I ended up getting sick of trying to use linux to OCR a file and used the Adobe OCR program. Still, it looks like it worked in the past, so perhaps it is possible to get it to work well in the future; I'll have to look into it. I'll let you know what I find...