• PDF Search

  • OpenKM has many interesting features, but requires some configuration process to show its full potential.
OpenKM has many interesting features, but requires some configuration process to show its full potential.
Forum rules: Please, before asking something see the documentation wiki or use the search feature of the forum. And remember we don't have a crystal ball or mental readers, so if you post about an issue tell us which OpenKM are you using and also the browser and operating system version. For more info read How to Report Bugs Effectively.
 #52900  by farkinid2
 
Long time OpenKM user here. I've deployed OpenKM community in my own office for a while now but have never encountered this problem.

Using a VM, I've deployed a test OpenKM Community server as a test bed. Currently running OpenKM 6.3.11. The system is operating as intended except with regards to PDF files.

I've uploaded a couple of PDF files into the system but none of these PDF files have been successfully indexed. The files are a mix of scanned documents, print to pdf type documents and scanned documents which have been converted to fonts via tesseract (manually). At this point the search function works for all docx, xlsx, txt files. For PDF no text has been successfully extracted.

All files have already been processed in the text extractor (no files in queue). I've attached a screenshot of the list of words extracted for a test file as well as a sample pdf file.

On a side note, if we were to subscribe to OpenKm online but we have very large scanned pdf files to process, what sort of limitations would be we facing? For example some user's files are scanned documents totaling approximately 800mb per file. There are approximately 50,000 files of varying sizes
Attachments
Test pdf file
(43.05 KiB) Downloaded 136 times
No words extracted
No words extracted
pdf_no_text.JPG (53.07 KiB) Viewed 1501 times
 #52912  by farkinid2
 
Just a quick note. I've managed to resolve the situation.

Went to Utilities -> Plugins -> Text Extractor -> Disabled cuneiform text extractor
 #52936  by saleem55
 
I have same problem
can not extract text from pdf
Attachments
(36.73 KiB) Downloaded 126 times
pdf.PNG
pdf.PNG (39.32 KiB) Viewed 1319 times
 #52937  by saleem55
 
resolved .
disable force pdf OCR
solution was in one of the resolved topic

About Us

OpenKM is part of the management software. A management software is a program that facilitates the accomplishment of administrative tasks. OpenKM is a document management system that allows you to manage business content and workflow in a more efficient way. Document managers guarantee data protection by establishing information security for business content.