Page 1 of 1

Fulltext search for PDF

PostPosted:Wed Jun 29, 2022 8:36 pm
by Hunv
Hi,
I just installed a new OpenKM Community instance 6.3 (current version).
For testing I imported some PDFs I have (invoices, which are pure digital, so not scanned).
I am no trying to find a word that is in one or more of the PDFs I imported. But I get no results. Even if I search for a part of the filename I get no results.
I already checked at Administration => Database Query the following command:
Code: Select all
select * from OKM_NODE_DOCUMENT WHERE NBS_UUID='2eafb84f-7073-4bd9-8e2b-beeffab674ee';
The result is, that the "NDC_TEXT" column is empty and the "NDC_TEXT_EXTRACTED" is T.
I also rebuilded all indexes using the utilities.
How to make OpenKM to find my text in the PDFs?

Re: Fulltext search for PDF

PostPosted:Mon Jul 04, 2022 9:15 am
by jllort
When you have PDF oppened in the adobe reader, do you can copy text to the clipboard?

Re: Fulltext search for PDF

PostPosted:Mon Jul 04, 2022 8:16 pm
by Hunv
Hi,
yes I can.

Re: Fulltext search for PDF

PostPosted:Sun Jul 10, 2022 5:00 pm
by jllort
I attach screenshot of Text Extractor checker, copy the uuid of the document and check from there. Please share screenshot of the result.
Selección_102.png
Selección_102.png (118.17 KiB) Viewed 2061 times

Re: Fulltext search for PDF

PostPosted:Tue Jul 12, 2022 8:56 am
by Hunv
Hi,

Seems there is nothing.
Just to proof, that there is text in the PDF, I also attached a screenshot of the file with marked text.

Regards,
Kristian

Re: Fulltext search for PDF

PostPosted:Thu Jul 28, 2022 11:27 pm
by scrumi
I'm also running into this. Did you find a solution?

Re: Fulltext search for PDF

PostPosted:Fri Jul 29, 2022 8:07 pm
by scrumi
My problem was that I had system.pdf.force.ocr set to true. Not sure I completely understand it but working now.

Re: Fulltext search for PDF

PostPosted:Tue Aug 02, 2022 7:35 am
by jllort
Also it is relevant what text extractor plugins are enabled. I suggest check the text extration procedure in a file as I suggested before to identify what plugin is executed.

you can get a list of enabled plugin at Administration > Tools > Plugins
Selección_032.png
Selección_032.png (131.6 KiB) Viewed 1639 times