Long time OpenKM user here. I've deployed OpenKM community in my own office for a while now but have never encountered this problem.
Using a VM, I've deployed a test OpenKM Community server as a test bed. Currently running OpenKM 6.3.11. The system is operating as intended except with regards to PDF files.
I've uploaded a couple of PDF files into the system but none of these PDF files have been successfully indexed. The files are a mix of scanned documents, print to pdf type documents and scanned documents which have been converted to fonts via tesseract (manually). At this point the search function works for all docx, xlsx, txt files. For PDF no text has been successfully extracted.
All files have already been processed in the text extractor (no files in queue). I've attached a screenshot of the list of words extracted for a test file as well as a sample pdf file.
On a side note, if we were to subscribe to OpenKm online but we have very large scanned pdf files to process, what sort of limitations would be we facing? For example some user's files are scanned documents totaling approximately 800mb per file. There are approximately 50,000 files of varying sizes
Using a VM, I've deployed a test OpenKM Community server as a test bed. Currently running OpenKM 6.3.11. The system is operating as intended except with regards to PDF files.
I've uploaded a couple of PDF files into the system but none of these PDF files have been successfully indexed. The files are a mix of scanned documents, print to pdf type documents and scanned documents which have been converted to fonts via tesseract (manually). At this point the search function works for all docx, xlsx, txt files. For PDF no text has been successfully extracted.
All files have already been processed in the text extractor (no files in queue). I've attached a screenshot of the list of words extracted for a test file as well as a sample pdf file.
On a side note, if we were to subscribe to OpenKm online but we have very large scanned pdf files to process, what sort of limitations would be we facing? For example some user's files are scanned documents totaling approximately 800mb per file. There are approximately 50,000 files of varying sizes
Attachments
Test pdf file
(43.05 KiB) Downloaded 150 times
(43.05 KiB) Downloaded 150 times
No words extracted
pdf_no_text.JPG (53.07 KiB) Viewed 1626 times
pdf_no_text.JPG (53.07 KiB) Viewed 1626 times