Open Source Document Management System

OCR API

Forum rules: Please, before asking something see the documentation wiki or use the search feature of the forum. And remember we don't have a crystal ball or mental readers, so if you post about an issue tell us which OpenKM are you using and also the browser and operating system version. For more info read How to Report Bugs Effectively.

10 posts

10 posts

OCR API

#25489 by gvdm
Wed Sep 18, 2013 7:16 am

Hi to all

For my system I must use only APIs to access OpenKM's functionalities and I need che OCR ones.
In the reference there isn't any API for OCR conversion (I've already included the OCR module into my OpenKM).

How can I accomplish this goal?

I'm using OpenKM 6.2 Community Version over GNU/Linux Debian Wheezy.

Thank you
Giulio

Username

gvdm

Rank

Fresh Boarder

Posts

Joined

Thu Aug 08, 2013 9:42 am

Re: OCR API

#25513 by jllort
Thu Sep 19, 2013 11:09 am

OCR is executed automatically in relation with text extractor mime type file. There's nothing else here. What do you got in mind ?

Username

jllort

Rank

Moderator

Posts

12053

Joined

Fri Dec 21, 2007 11:23 am

Location

Sineu - ( Illes Balears ) - Spain

Contact

Re: OCR API

#25520 by gvdm
Thu Sep 19, 2013 1:38 pm

What I need is that anytime the user (using the APIs) uploads an image OpenKM should extract the content by using the OCR module and response withe the "OCR Document" containing the converted text.
Is it possibile?

Username

gvdm

Rank

Fresh Boarder

Posts

Joined

Thu Aug 08, 2013 9:42 am

Re: OCR API

#25527 by pavila
Fri Sep 20, 2013 10:09 am

This feature can be implemented, but I recommend to become a customer if you need it soon.

Username

pavila

Rank

Moderator

Posts

3140

Joined

Tue Dec 11, 2007 6:02 pm

Location

Alicante, Spain

Contact

Re: OCR API

#25532 by jllort
Fri Sep 20, 2013 11:49 am

That can be done in two ways. Force after upload the OCR text extraction or before ( when arrive index queue ). An easy way could be send daily mail with all text extraction results, I think will be better than push info each time. As you can see there several possibilities ... it's only a question to know what will be better for you.

Username

jllort

Rank

Moderator

Posts

12053

Joined

Fri Dec 21, 2007 11:23 am

Location

Sineu - ( Illes Balears ) - Spain

Contact

Re: OCR API

#25567 by gvdm
Mon Sep 23, 2013 8:04 am

I need the OCR conversion to be part of the workflow so it should be notified as soon as the OCR conversion is done.

Where can I get more info about the architecture of OpenKM? Just to see where I should put my hands to reach the OCR functionalities.

Username

gvdm

Rank

Fresh Boarder

Posts

Joined

Thu Aug 08, 2013 9:42 am

Re: OCR API

#25571 by gvdm
Mon Sep 23, 2013 2:56 pm

Ok, searching is the way!
Reading this thread I found out that I can access OCR text by executing this query into the "administration" -> "Database query" window:

Code: Select all

select * from OKM_NODE_DOCUMENT order by NDC_LAST_MODIFIED

(the "order by" is a facility).
Now I can see all the auto-scanned OCR text.

The next step is to understand how I can access this table from the outside of OpenKM. Is there any URL to which the OpenKM's database is on listening for queries (like, for example, the MySQL's 3306 port)?

Username

gvdm

Rank

Fresh Boarder

Posts

Joined

Thu Aug 08, 2013 9:42 am

Re: OCR API

#25582 by jllort
Tue Sep 24, 2013 6:02 pm

You should force text extraction from that document, that's something like:

Code: Select all

// Document extractor
TextExtractorWork tew = new TextExtractorWork();
tew.setDocUuid(uuid);
tew.setDocPath(docPath);
tew.setDocVerUuid(docVerUuuid);
  
// Execute extractor
//NodeDocumentDAO.getInstance().textExtractorHelper(tew);

Then you should read extracted text

Code: Select all

// Get extracted text
NodeDocument docNode = NodeDocumentDAO.getInstance().findByPk(uuid);
String text = docNode.getText();

Here you can take a look entire openkm classes documentation:
http://doxygen.openkm.com/openkm/
Normally you should use http://doxygen.openkm.com/openkm/d9/d6d ... _1api.html specially if you change repository pbject values. And aditionaly can use other methods from DAO etc... but in general except some exceptions always should use api ( this case is one of these exceptions where you need to go in more low level).

Username

jllort

Rank

Moderator

Posts

12053

Joined

Fri Dec 21, 2007 11:23 am

Location

Sineu - ( Illes Balears ) - Spain

Contact

Re: OCR API

#25589 by gvdm
Wed Sep 25, 2013 8:22 am

So I'll have to edit the OpenKM code by adding a custom class and rebuilding OpenKM, right?

Username

gvdm

Rank

Fresh Boarder

Posts

Joined

Thu Aug 08, 2013 9:42 am

Re: OCR API

#25654 by jllort
Sat Sep 28, 2013 2:53 pm

Can be done using Automation ( http://wiki.openkm.com/index.php/Automation ) controling events related with documents

Username

jllort

Rank

Moderator

Posts

12053

Joined

Fri Dec 21, 2007 11:23 am

Location

Sineu - ( Illes Balears ) - Spain

Contact

Page 1 of 1
10 posts

Return to “Web Services”

Display:

Sort by:

Jump to: