Open Source Document Management System | OpenKM - OCR not working in OpenKM configured to MySQL

OCR not working in OpenKM configured to MySQL

Forum rules: Por favor, antes de preguntar algo consulta el wiki de documentación o utiliza la función de búsqueda del foro. Recuerda que no tenemos una bola de cristal ni poderes mentales, o sea que que para informar sobre un error es necesario que nos indiques tanto la versión de OpenKM que usas como la del navegador y sistema operativo. Para más información consulta Cómo informar de fallos de forma efectiva.

7 posts

7 posts

OCR not working in OpenKM configured to MySQL

#21051 by Muhammad Imran
Thu Jan 24, 2013 1:46 pm

Hi,
I have installed OpenKM 6.2.0 on windows 7. It was working well with embedded database HSQLDB. Then I configure to replaced HSQLDB with MySQL. Right now, OpenKM is working except OCR( full text ) search.

On the console window I can see that the image.JPG is extracted successfully with Tesseract3.0.
After that there is some problem:

Code: Select all

Caused by: java.sql.SQLException: Incorrect string value: '\xEF\xAC\x81\xEF\xAC\
x81...' for column 'NDC_TEXT' at row 1
        at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1075)
        at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3562)
        at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3494)
        at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:1960)
        at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2114)
        at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2696)
        at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:2105)
        at com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement.java:2398)
        at com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement.java:2316)
        at com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement.java:2301)
        at org.apache.tomcat.dbcp.dbcp.DelegatingPreparedStatement.executeUpdate(DelegatingPreparedStatement.java:105)
        at org.apache.tomcat.dbcp.dbcp.DelegatingPreparedStatement.executeUpdate(DelegatingPreparedStatement.java:105)
        at org.hibernate.persister.entity.AbstractEntityPersister.update(AbstractEntityPersister.java:2595)

I don't know what's the problem.
Please can any one can tell me to get rid of this problem?
Please give some hint or suggest me wiki link to get it resolve.

Last edited by Muhammad Imran on Wed Jan 30, 2013 8:03 am, edited 2 times in total.

Username

Muhammad Imran

Rank

Junior Boarder

Posts

Joined

Wed Jan 02, 2013 11:00 am

Re: OCR not working in OpenKM configured to MySQL

#21093 by jllort
Fri Jan 25, 2013 5:49 pm

There's some bug in version 6.2.0 with the text extraction feature ( this problems is caused when you're indexing some utf-16 files, chinese, russian, etc.... ), well if you upgrade to 6.2.2 I think is already solved there. Take a look at migration guide section for doing it http://wiki.openkm.com/index.php/Migration_Guide

Username

jllort

Rank

Moderator

Posts

12129

Joined

Fri Dec 21, 2007 11:23 am

Location

Sineu - ( Illes Balears ) - Spain

Contact

Re: OCR not working in OpenKM configured to MySQL

#21138 by Muhammad Imran
Tue Jan 29, 2013 10:35 am

Thanx jllort for your reply!
I have migrated to OpenKM 6.2.2 successfully. Now ocr(Full Text Search) is working for only "image.png and document.docx".
Now in console window I can see the following error:

Code: Select all

Caused by: java.sql.SQLException: Incorrect string value: '\xEF\xAC\x81\xEF\xAC\x81...' for column 'NDC_TEXT' at row 1
        at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1075)
        at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3562)
        at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3494)
        at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:1960)
        at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2114)
        at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2696)
        at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:2105)
        at com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement.java:2398)
        at com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement.java:2316)
        at com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement.java:2301)
        at org.apache.tomcat.dbcp.dbcp.DelegatingPreparedStatement.executeUpdate(DelegatingPreparedStatement.java:105)
        at org.apache.tomcat.dbcp.dbcp.DelegatingPreparedStatement.executeUpdate(DelegatingPreparedStatement.java:105)
        at org.hibernate.persister.entity.AbstractEntityPersister.update(AbstractEntityPersister.java:2595)

What should I do to run full text search for .pdf,.JPG,.txt etc...?

Username

Muhammad Imran

Rank

Junior Boarder

Posts

Joined

Wed Jan 02, 2013 11:00 am

Re: OCR not working in OpenKM configured to MySQL

#21193 by jllort
Wed Jan 30, 2013 10:51 pm

Try to install nighly build from http://integration.openkm.com because these seems and old bug that at least is solved on actual source code. Tell us if it not solves your problem.

Username

jllort

Rank

Moderator

Posts

12129

Joined

Fri Dec 21, 2007 11:23 am

Location

Sineu - ( Illes Balears ) - Spain

Contact

Re: OCR not working in OpenKM configured to MySQL

#21203 by Muhammad Imran
Thu Jan 31, 2013 9:05 am

Hi jllort,
Thanks for replying.

I have installed nighly build at integration.openkm.com but still there is some problem in PdfTextExtraction.
Now in console window I can see the following error:

Code: Select all

2013-01-31 13:55:01,973 [Thread-15] WARN  com.openkm.extractor.PdfTextExtractor - PDF does not contains text layer
2013-01-31 13:55:01,974 [Thread-15] WARN  com.openkm.dao.NodeDocumentDAO - There  was a problem extracting text from '/okm:root/Testing/DatabaseBasics.pdf': Too few text extracted

How can I get it fix?

Username

Muhammad Imran

Rank

Junior Boarder

Posts

Joined

Wed Jan 02, 2013 11:00 am

Re: OCR not working in OpenKM configured to MySQL

#21224 by bgrr
Fri Feb 01, 2013 11:58 pm

Have the same problem with version 6.2.2 build 7815 on ubuntu 12.04.1 ltd

JPG is working fine and text PDF ( selectable text in pdf) is working fine

But a scanned PDF (pdf with image raster) gives me the same error when i try tesseract in openkm and by commandline

Installed tesseract 3.02 and ImageMagick 6.6.9-7 2012-08-17 Q16

Username

bgrr

Rank

Fresh Boarder

Posts

Joined

Fri Feb 01, 2013 11:42 pm

Re: OCR not working in OpenKM configured to MySQL

#21234 by jllort
Sun Feb 03, 2013 10:58 am

could be low resolution while scan image. Can you try with more high resolution ? If you got problem with command line, concentrate there.

Username

jllort

Rank

Moderator

Posts

12129

Joined

Fri Dec 21, 2007 11:23 am

Location

Sineu - ( Illes Balears ) - Spain

Contact

Page 1 of 1
7 posts

Return to “Configuración”

Display:

Sort by:

Jump to: