Hi,
I can extract text from not-rotated images without any issue, in openkm.
But, I have an image that is roated by +90 degrees.
On ubuntu-command-line, when I use "-rotate -90" option in imagemagick and then use tesseract, I get the text extracted properly with this +90 rotated image.
However, when I set openkm-property "system.ocr.rotate" to "-90" and upload the same image (which is a pdf file), on checking text-extraction, all I see is gibberish words.
I tried
i) system.ocr.rotate String 90;180;270;
ii) system.ocr.rotate String -90;
ii)restarting openkm
but none of them worked.
Do you have any suggestion?
Here are my current configuration-settings in openkm ...
*********************************************
registered.text.extractors List ->
org.apache.jackrabbit.extractor.PlainTextExtractor
org.apache.jackrabbit.extractor.MsWordTextExtractor
org.apache.jackrabbit.extractor.MsExcelTextExtractor
org.apache.jackrabbit.extractor.MsPowerPointTextExtractor
org.apache.jackrabbit.extractor.OpenOfficeTextExtractor
org.apache.jackrabbit.extractor.RTFTextExtractor
org.apache.jackrabbit.extractor.HTMLTextExtractor
org.apache.jackrabbit.extractor.XMLTextExtractor
org.apache.jackrabbit.extractor.PngTextExtractor
org.apache.jackrabbit.extractor.MsOutlookTextExtractor
com.openkm.extractor.PdfTextExtractor
com.openkm.extractor.AudioTextExtractor
com.openkm.extractor.ExifTextExtractor
com.openkm.extractor.Tesseract3TextExtractor
com.openkm.extractor.SourceCodeTextExtractor
com.openkm.extractor.MsOffice2007TextExtractor
system.imagemagick.convert String /usr/bin/convert -density 300 ${fileIn} -depth 8 ${fileOut}
system.ocr String /usr/local/bin/tesseract ${fileIn} ${fileOut}
system.ocr.rotate String -90;
system.pdf.force.ocr Boolean Inactive
*********************************************
I can extract text from not-rotated images without any issue, in openkm.
But, I have an image that is roated by +90 degrees.
On ubuntu-command-line, when I use "-rotate -90" option in imagemagick and then use tesseract, I get the text extracted properly with this +90 rotated image.
However, when I set openkm-property "system.ocr.rotate" to "-90" and upload the same image (which is a pdf file), on checking text-extraction, all I see is gibberish words.
I tried
i) system.ocr.rotate String 90;180;270;
ii) system.ocr.rotate String -90;
ii)restarting openkm
but none of them worked.
Do you have any suggestion?
Here are my current configuration-settings in openkm ...
*********************************************
registered.text.extractors List ->
org.apache.jackrabbit.extractor.PlainTextExtractor
org.apache.jackrabbit.extractor.MsWordTextExtractor
org.apache.jackrabbit.extractor.MsExcelTextExtractor
org.apache.jackrabbit.extractor.MsPowerPointTextExtractor
org.apache.jackrabbit.extractor.OpenOfficeTextExtractor
org.apache.jackrabbit.extractor.RTFTextExtractor
org.apache.jackrabbit.extractor.HTMLTextExtractor
org.apache.jackrabbit.extractor.XMLTextExtractor
org.apache.jackrabbit.extractor.PngTextExtractor
org.apache.jackrabbit.extractor.MsOutlookTextExtractor
com.openkm.extractor.PdfTextExtractor
com.openkm.extractor.AudioTextExtractor
com.openkm.extractor.ExifTextExtractor
com.openkm.extractor.Tesseract3TextExtractor
com.openkm.extractor.SourceCodeTextExtractor
com.openkm.extractor.MsOffice2007TextExtractor
system.imagemagick.convert String /usr/bin/convert -density 300 ${fileIn} -depth 8 ${fileOut}
system.ocr String /usr/local/bin/tesseract ${fileIn} ${fileOut}
system.ocr.rotate String -90;
system.pdf.force.ocr Boolean Inactive
*********************************************
