Page 1 of 1

OCR not working in OpenKM configured to MySQL

PostPosted:Thu Jan 24, 2013 1:46 pm
by Muhammad Imran
Hi,
I have installed OpenKM 6.2.0 on windows 7. It was working well with embedded database HSQLDB. Then I configure to replaced HSQLDB with MySQL. Right now, OpenKM is working except OCR( full text ) search.

On the console window I can see that the image.JPG is extracted successfully with Tesseract3.0.
After that there is some problem:
Code: Select all
Caused by: java.sql.SQLException: Incorrect string value: '\xEF\xAC\x81\xEF\xAC\
x81...' for column 'NDC_TEXT' at row 1
        at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1075)
        at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3562)
        at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3494)
        at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:1960)
        at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2114)
        at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2696)
        at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:2105)
        at com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement.java:2398)
        at com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement.java:2316)
        at com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement.java:2301)
        at org.apache.tomcat.dbcp.dbcp.DelegatingPreparedStatement.executeUpdate(DelegatingPreparedStatement.java:105)
        at org.apache.tomcat.dbcp.dbcp.DelegatingPreparedStatement.executeUpdate(DelegatingPreparedStatement.java:105)
        at org.hibernate.persister.entity.AbstractEntityPersister.update(AbstractEntityPersister.java:2595)
I don't know what's the problem.
Please can any one can tell me to get rid of this problem?
Please give some hint or suggest me wiki link to get it resolve.

Re: OCR not working in OpenKM configured to MySQL

PostPosted:Fri Jan 25, 2013 5:49 pm
by jllort
There's some bug in version 6.2.0 with the text extraction feature ( this problems is caused when you're indexing some utf-16 files, chinese, russian, etc.... ), well if you upgrade to 6.2.2 I think is already solved there. Take a look at migration guide section for doing it http://wiki.openkm.com/index.php/Migration_Guide

Re: OCR not working in OpenKM configured to MySQL

PostPosted:Tue Jan 29, 2013 10:35 am
by Muhammad Imran
Thanx jllort for your reply!
I have migrated to OpenKM 6.2.2 successfully. Now ocr(Full Text Search) is working for only "image.png and document.docx".
Now in console window I can see the following error:
Code: Select all
Caused by: java.sql.SQLException: Incorrect string value: '\xEF\xAC\x81\xEF\xAC\x81...' for column 'NDC_TEXT' at row 1
        at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1075)
        at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3562)
        at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3494)
        at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:1960)
        at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2114)
        at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2696)
        at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:2105)
        at com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement.java:2398)
        at com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement.java:2316)
        at com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement.java:2301)
        at org.apache.tomcat.dbcp.dbcp.DelegatingPreparedStatement.executeUpdate(DelegatingPreparedStatement.java:105)
        at org.apache.tomcat.dbcp.dbcp.DelegatingPreparedStatement.executeUpdate(DelegatingPreparedStatement.java:105)
        at org.hibernate.persister.entity.AbstractEntityPersister.update(AbstractEntityPersister.java:2595)
What should I do to run full text search for .pdf,.JPG,.txt etc...?

Re: OCR not working in OpenKM configured to MySQL

PostPosted:Wed Jan 30, 2013 10:51 pm
by jllort
Try to install nighly build from http://integration.openkm.com because these seems and old bug that at least is solved on actual source code. Tell us if it not solves your problem.

Re: OCR not working in OpenKM configured to MySQL

PostPosted:Thu Jan 31, 2013 9:05 am
by Muhammad Imran
Hi jllort,
Thanks for replying.

I have installed nighly build at integration.openkm.com but still there is some problem in PdfTextExtraction.
Now in console window I can see the following error:
Code: Select all
2013-01-31 13:55:01,973 [Thread-15] WARN  com.openkm.extractor.PdfTextExtractor - PDF does not contains text layer
2013-01-31 13:55:01,974 [Thread-15] WARN  com.openkm.dao.NodeDocumentDAO - There  was a problem extracting text from '/okm:root/Testing/DatabaseBasics.pdf': Too few text extracted
How can I get it fix?

Re: OCR not working in OpenKM configured to MySQL

PostPosted:Fri Feb 01, 2013 11:58 pm
by bgrr
Have the same problem with version 6.2.2 build 7815 on ubuntu 12.04.1 ltd

JPG is working fine and text PDF ( selectable text in pdf) is working fine

But a scanned PDF (pdf with image raster) gives me the same error when i try tesseract in openkm and by commandline

Installed tesseract 3.02 and ImageMagick 6.6.9-7 2012-08-17 Q16

Re: OCR not working in OpenKM configured to MySQL

PostPosted:Sun Feb 03, 2013 10:58 am
by jllort
could be low resolution while scan image. Can you try with more high resolution ? If you got problem with command line, concentrate there.