Tesseract integration does not search scanned Pdf's
PostPosted:Tue Jul 01, 2014 6:56 pm
Hey,
I am facing the error in catalina log for Tesseract integration with Tesseract. I am running OpenKm6.3 Community Edition on Ubuntu 10.04. I have configured the system.ocr property and dictionary corresponding to Tesseract 3.00. Whenever I search using the Check Extraction option from Admin-->Utilities-->Check Extraction I am able to extract the text from any file format with tesseract but the search result does not generate the pdf's containing that search word.
I am stuck with this for past 4 days.
Any help with this will be really appreciated.
I am facing the error in catalina log for Tesseract integration with Tesseract. I am running OpenKm6.3 Community Edition on Ubuntu 10.04. I have configured the system.ocr property and dictionary corresponding to Tesseract 3.00. Whenever I search using the Check Extraction option from Admin-->Utilities-->Check Extraction I am able to extract the text from any file format with tesseract but the search result does not generate the pdf's containing that search word.
I am stuck with this for past 4 days.
Any help with this will be really appreciated.
Code: Select all
org.apache.jackrabbit.extractor.PlainTextExtractor
org.apache.jackrabbit.extractor.MsWordTextExtractor
org.apache.jackrabbit.extractor.MsExcelTextExtractor
org.apache.jackrabbit.extractor.PdfTextExtractor
org.apache.jackrabbit.extractor.MsPowerPointTextExtractor
org.apache.jackrabbit.extractor.OpenOfficeTextExtractor
org.apache.jackrabbit.extractor.RTFTextExtractor
org.apache.jackrabbit.extractor.HTMLTextExtractor
org.apache.jackrabbit.extractor.XMLTextExtractor
org.apache.jackrabbit.extractor.PngTextExtractor
org.apache.jackrabbit.extractor.MsOutlookTextExtractor
com.openkm.extractor.PdfTextExtractor
com.openkm.extractor.AudioTextExtractor
com.openkm.extractor.ExifTextExtractor
com.openkm.extractor.SourceCodeTextExtractor
com.openkm.extractor.MsOffice2007TextExtractor
com.openkm.extractor.Tesseract3TextExtractor