OpenKM has many interesting features, but requires some configuration process to show its full potential.
 #52900  by farkinid2
Long time OpenKM user here. I've deployed OpenKM community in my own office for a while now but have never encountered this problem.

Using a VM, I've deployed a test OpenKM Community server as a test bed. Currently running OpenKM 6.3.11. The system is operating as intended except with regards to PDF files.

I've uploaded a couple of PDF files into the system but none of these PDF files have been successfully indexed. The files are a mix of scanned documents, print to pdf type documents and scanned documents which have been converted to fonts via tesseract (manually). At this point the search function works for all docx, xlsx, txt files. For PDF no text has been successfully extracted.

All files have already been processed in the text extractor (no files in queue). I've attached a screenshot of the list of words extracted for a test file as well as a sample pdf file.

On a side note, if we were to subscribe to OpenKm online but we have very large scanned pdf files to process, what sort of limitations would be we facing? For example some user's files are scanned documents totaling approximately 800mb per file. There are approximately 50,000 files of varying sizes
 #52912  by farkinid2
Just a quick note. I've managed to resolve the situation.

Went to Utilities -> Plugins -> Text Extractor -> Disabled cuneiform text extractor
 #52936  by saleem55
I have same problem
can not extract text from pdf
 #52937  by saleem55
resolved .
disable force pdf OCR
solution was in one of the resolved topic

