• Image rotation not working

  • We tried to make OpenKM as intuitive as possible, but an advice is always welcome.
We tried to make OpenKM as intuitive as possible, but an advice is always welcome.
Forum rules: Please, before asking something see the documentation wiki or use the search feature of the forum. And remember we don't have a crystal ball or mental readers, so if you post about an issue tell us which OpenKM are you using and also the browser and operating system version. For more info read How to Report Bugs Effectively.
 #29004  by khalid2040
 
Hi,

I can extract text from not-rotated images without any issue, in openkm.
But, I have an image that is roated by +90 degrees.
On ubuntu-command-line, when I use "-rotate -90" option in imagemagick and then use tesseract, I get the text extracted properly with this +90 rotated image.
However, when I set openkm-property "system.ocr.rotate" to "-90" and upload the same image (which is a pdf file), on checking text-extraction, all I see is gibberish words.
I tried
i) system.ocr.rotate String 90;180;270;
ii) system.ocr.rotate String -90;
ii)restarting openkm
but none of them worked.

Do you have any suggestion?
Here are my current configuration-settings in openkm ...

*********************************************
registered.text.extractors List ->
org.apache.jackrabbit.extractor.PlainTextExtractor
org.apache.jackrabbit.extractor.MsWordTextExtractor
org.apache.jackrabbit.extractor.MsExcelTextExtractor
org.apache.jackrabbit.extractor.MsPowerPointTextExtractor
org.apache.jackrabbit.extractor.OpenOfficeTextExtractor
org.apache.jackrabbit.extractor.RTFTextExtractor
org.apache.jackrabbit.extractor.HTMLTextExtractor
org.apache.jackrabbit.extractor.XMLTextExtractor
org.apache.jackrabbit.extractor.PngTextExtractor
org.apache.jackrabbit.extractor.MsOutlookTextExtractor
com.openkm.extractor.PdfTextExtractor
com.openkm.extractor.AudioTextExtractor
com.openkm.extractor.ExifTextExtractor
com.openkm.extractor.Tesseract3TextExtractor
com.openkm.extractor.SourceCodeTextExtractor
com.openkm.extractor.MsOffice2007TextExtractor

system.imagemagick.convert String /usr/bin/convert -density 300 ${fileIn} -depth 8 ${fileOut}
system.ocr String /usr/local/bin/tesseract ${fileIn} ${fileOut}
system.ocr.rotate String -90;
system.pdf.force.ocr Boolean Inactive
*********************************************
 #29012  by jllort
 
You can take control of this kind of documents, can always be stored in some folder ( before doing ocr ) or identified by name, user, or metadata ? Because if we can identify in some way, we can separatelly process and doing all task needed to doing OCR correctly. Are you able to identify in some way or put always in same folder ? or set some metadata by user what identigy this kind of docs ?
 #29014  by khalid2040
 
The +90 degrees documents are random, so I cannot distinguish them from non-rotated documents.
Actually, I have been extracting text from mix-up of such documents in windows, without any issue. So, I was expecting it to be the same in Linux.
But it looks like being a freeware, tesseract has its own limitation... :)
I will download abbyy CLI for Linux and see how it performs in extracting text, from such mix-up.

Thanks for looking into this !!

About Us

OpenKM is part of the management software. A management software is a program that facilitates the accomplishment of administrative tasks. OpenKM is a document management system that allows you to manage business content and workflow in a more efficient way. Document managers guarantee data protection by establishing information security for business content.