Open Source Document Management System | OpenKM

Image rotation not working

Forum rules: Please, before asking something see the documentation wiki or use the search feature of the forum. And remember we don't have a crystal ball or mental readers, so if you post about an issue tell us which OpenKM are you using and also the browser and operating system version. For more info read How to Report Bugs Effectively.

3 posts

3 posts

Image rotation not working

#29004 by khalid2040
Sat Jun 21, 2014 5:02 pm

Hi,

I can extract text from not-rotated images without any issue, in openkm.
But, I have an image that is roated by +90 degrees.
On ubuntu-command-line, when I use "-rotate -90" option in imagemagick and then use tesseract, I get the text extracted properly with this +90 rotated image.
However, when I set openkm-property "system.ocr.rotate" to "-90" and upload the same image (which is a pdf file), on checking text-extraction, all I see is gibberish words.
I tried
i) system.ocr.rotate String 90;180;270;
ii) system.ocr.rotate String -90;
ii)restarting openkm
but none of them worked.

Do you have any suggestion?
Here are my current configuration-settings in openkm ...

*********************************************
registered.text.extractors List ->
org.apache.jackrabbit.extractor.PlainTextExtractor
org.apache.jackrabbit.extractor.MsWordTextExtractor
org.apache.jackrabbit.extractor.MsExcelTextExtractor
org.apache.jackrabbit.extractor.MsPowerPointTextExtractor
org.apache.jackrabbit.extractor.OpenOfficeTextExtractor
org.apache.jackrabbit.extractor.RTFTextExtractor
org.apache.jackrabbit.extractor.HTMLTextExtractor
org.apache.jackrabbit.extractor.XMLTextExtractor
org.apache.jackrabbit.extractor.PngTextExtractor
org.apache.jackrabbit.extractor.MsOutlookTextExtractor
com.openkm.extractor.PdfTextExtractor
com.openkm.extractor.AudioTextExtractor
com.openkm.extractor.ExifTextExtractor
com.openkm.extractor.Tesseract3TextExtractor
com.openkm.extractor.SourceCodeTextExtractor
com.openkm.extractor.MsOffice2007TextExtractor

system.imagemagick.convert String /usr/bin/convert -density 300 ${fileIn} -depth 8 ${fileOut}
system.ocr String /usr/local/bin/tesseract ${fileIn} ${fileOut}
system.ocr.rotate String -90;
system.pdf.force.ocr Boolean Inactive
*********************************************

Username

khalid2040

Rank

Fresh Boarder

Posts

Joined

Sat Jun 21, 2014 4:29 pm

Re: Image rotation not working

#29012 by jllort
Sun Jun 22, 2014 10:47 am

You can take control of this kind of documents, can always be stored in some folder ( before doing ocr ) or identified by name, user, or metadata ? Because if we can identify in some way, we can separatelly process and doing all task needed to doing OCR correctly. Are you able to identify in some way or put always in same folder ? or set some metadata by user what identigy this kind of docs ?

Username

jllort

Rank

Moderator

Posts

12160

Joined

Fri Dec 21, 2007 11:23 am

Location

Sineu - ( Illes Balears ) - Spain

Contact

Re: Image rotation not working

#29014 by khalid2040
Sun Jun 22, 2014 4:27 pm

The +90 degrees documents are random, so I cannot distinguish them from non-rotated documents.
Actually, I have been extracting text from mix-up of such documents in windows, without any issue. So, I was expecting it to be the same in Linux.
But it looks like being a freeware, tesseract has its own limitation...