• Fulltext search for PDF

  • OpenKM has many interesting features, but requires some configuration process to show its full potential.
OpenKM has many interesting features, but requires some configuration process to show its full potential.
Forum rules: Please, before asking something see the documentation wiki or use the search feature of the forum. And remember we don't have a crystal ball or mental readers, so if you post about an issue tell us which OpenKM are you using and also the browser and operating system version. For more info read How to Report Bugs Effectively.
 #53645  by Hunv
 
Hi,
I just installed a new OpenKM Community instance 6.3 (current version).
For testing I imported some PDFs I have (invoices, which are pure digital, so not scanned).
I am no trying to find a word that is in one or more of the PDFs I imported. But I get no results. Even if I search for a part of the filename I get no results.
I already checked at Administration => Database Query the following command:
Code: Select all
select * from OKM_NODE_DOCUMENT WHERE NBS_UUID='2eafb84f-7073-4bd9-8e2b-beeffab674ee';
The result is, that the "NDC_TEXT" column is empty and the "NDC_TEXT_EXTRACTED" is T.
I also rebuilded all indexes using the utilities.
How to make OpenKM to find my text in the PDFs?
 #53662  by jllort
 
When you have PDF oppened in the adobe reader, do you can copy text to the clipboard?
 #53702  by jllort
 
I attach screenshot of Text Extractor checker, copy the uuid of the document and check from there. Please share screenshot of the result.
Selección_102.png
Selección_102.png (118.17 KiB) Viewed 1728 times
 #53714  by Hunv
 
Hi,

Seems there is nothing.
Just to proof, that there is text in the PDF, I also attached a screenshot of the file with marked text.

Regards,
Kristian
Attachments
2022-07-12 10_53_19-Monthly Invoice - Vivaldi.png
2022-07-12 10_53_19-Monthly Invoice - Vivaldi.png (66.36 KiB) Viewed 1714 times
2022-07-12 10_51_55-OpenKM Administration - Vivaldi.png
2022-07-12 10_51_55-OpenKM Administration - Vivaldi.png (50.3 KiB) Viewed 1714 times
 #53751  by scrumi
 
My problem was that I had system.pdf.force.ocr set to true. Not sure I completely understand it but working now.
 #53765  by jllort
 
Also it is relevant what text extractor plugins are enabled. I suggest check the text extration procedure in a file as I suggested before to identify what plugin is executed.

you can get a list of enabled plugin at Administration > Tools > Plugins
Selección_032.png
Selección_032.png (131.6 KiB) Viewed 1306 times

About Us

OpenKM is part of the management software. A management software is a program that facilitates the accomplishment of administrative tasks. OpenKM is a document management system that allows you to manage business content and workflow in a more efficient way. Document managers guarantee data protection by establishing information security for business content.