Page 1 of 1

Execute Preprocess of PDF

PostPosted:Sat Sep 29, 2012 11:40 pm
by Netvoid
If users are adding PDF documents to OpenKM is there a good/simple way to run an application against the document first?

I already have a nice effective program that converts an image based PDF to a searchable PDF.

http://www.scantopdf.com/en/product/fil ... _line.aspx

I notice the great feature already in OpenKM that detects that a PDF doesn't have a text layer and that is when it performs conversion and OCR against a PDF. Instead of that could it run this program?

It is a bit better (although not free) but since then the PDF is not only indexed in OpenKM searches but when you use the OpenKM embedded file preview you can search for the text within the document and it highlights the findings.

I'm guessing the best way would be to build another extractor class like "com.openkm.extractor.PDFToSearchablePDF" and then just change the class in the system to use this one. But was curious if it is already possible to simply link a conversion process on upload of files of a type?

In the meantime I suppose I'll start investigating building the new extractor class...

Thanks for any input..

Re: Execute Preprocess of PDF

PostPosted:Sun Sep 30, 2012 10:29 am
by jllort
With openkm 6 will be more easy ( now we've yet started to package source code and I think in few weeks will be released, we've decided advance initial december release date to october release )