Hi,
I got PDFs with scanned images. So I configured OpenKM to use Tesseract 3 for ocr. I also enabled system.pdf.force.ocr, because without this option, no text is extracted from scanned pdf files.
Using german dictionary: dict-de_de-frami_2013-12-06.oxt
After running the text extractor cron, text was extracted.
Can you give me a hint why this do not work?
Thanks!
I got PDFs with scanned images. So I configured OpenKM to use Tesseract 3 for ocr. I also enabled system.pdf.force.ocr, because without this option, no text is extracted from scanned pdf files.
Using german dictionary: dict-de_de-frami_2013-12-06.oxt
After running the text extractor cron, text was extracted.
Code: Select all
But when I try to search for any text from <my text goes here> nothing is found in fulltext search window. So my question is why? Also also rebuild the indexes. Admin -> Utils -> Rebuild indexes -> Text extractor and Lucene too.DEBUG com.openkm.extractor.Tesseract3TextExtractor- TEXT: <my text goes here>
DEBUG com.openkm.extractor.PdfTextExtractor- OCR Extracted: <my text goes here>
Can you give me a hint why this do not work?
Thanks!