• Execute Preprocess of PDF

  • We tried to make OpenKM as intuitive as possible, but an advice is always welcome.
We tried to make OpenKM as intuitive as possible, but an advice is always welcome.
Forum rules: Please, before asking something see the documentation wiki or use the search feature of the forum. And remember we don't have a crystal ball or mental readers, so if you post about an issue tell us which OpenKM are you using and also the browser and operating system version. For more info read How to Report Bugs Effectively.
 #18573  by Netvoid
 
If users are adding PDF documents to OpenKM is there a good/simple way to run an application against the document first?

I already have a nice effective program that converts an image based PDF to a searchable PDF.

http://www.scantopdf.com/en/product/fil ... _line.aspx

I notice the great feature already in OpenKM that detects that a PDF doesn't have a text layer and that is when it performs conversion and OCR against a PDF. Instead of that could it run this program?

It is a bit better (although not free) but since then the PDF is not only indexed in OpenKM searches but when you use the OpenKM embedded file preview you can search for the text within the document and it highlights the findings.

I'm guessing the best way would be to build another extractor class like "com.openkm.extractor.PDFToSearchablePDF" and then just change the class in the system to use this one. But was curious if it is already possible to simply link a conversion process on upload of files of a type?

In the meantime I suppose I'll start investigating building the new extractor class...

Thanks for any input..
 #18578  by jllort
 
With openkm 6 will be more easy ( now we've yet started to package source code and I think in few weeks will be released, we've decided advance initial december release date to october release )

About Us

OpenKM is part of the management software. A management software is a program that facilitates the accomplishment of administrative tasks. OpenKM is a document management system that allows you to manage business content and workflow in a more efficient way. Document managers guarantee data protection by establishing information security for business content.