Open Source Document Management System | OpenKM

PostPosted:**Wed Sep 18, 2013 7:16 am**

Hi to all

For my system I must use only APIs to access OpenKM's functionalities and I need che OCR ones.
In the reference there isn't any API for OCR conversion (I've already included the OCR module into my OpenKM).

How can I accomplish this goal?

I'm using OpenKM 6.2 Community Version over GNU/Linux Debian Wheezy.

Thank you
Giulio

PostPosted:**Thu Sep 19, 2013 11:09 am**

OCR is executed automatically in relation with text extractor mime type file. There's nothing else here. What do you got in mind ?

PostPosted:**Thu Sep 19, 2013 1:38 pm**

What I need is that anytime the user (using the APIs) uploads an image OpenKM should extract the content by using the OCR module and response withe the "OCR Document" containing the converted text.
Is it possibile?

PostPosted:**Fri Sep 20, 2013 10:09 am**

This feature can be implemented, but I recommend to become a customer if you need it soon.

PostPosted:**Fri Sep 20, 2013 11:49 am**

That can be done in two ways. Force after upload the OCR text extraction or before ( when arrive index queue ). An easy way could be send daily mail with all text extraction results, I think will be better than push info each time. As you can see there several possibilities ... it's only a question to know what will be better for you.

PostPosted:**Mon Sep 23, 2013 8:04 am**

I need the OCR conversion to be part of the workflow so it should be notified as soon as the OCR conversion is done.

Where can I get more info about the architecture of OpenKM? Just to see where I should put my hands to reach the OCR functionalities.

PostPosted:**Mon Sep 23, 2013 2:56 pm**

Ok, searching is the way!
Reading this thread I found out that I can access OCR text by executing this query into the "administration" -> "Database query" window:

Code: Select all

select * from OKM_NODE_DOCUMENT order by NDC_LAST_MODIFIED

(the "order by" is a facility).
Now I can see all the auto-scanned OCR text.

The next step is to understand how I can access this table from the outside of OpenKM. Is there any URL to which the OpenKM's database is on listening for queries (like, for example, the MySQL's 3306 port)?

PostPosted:**Tue Sep 24, 2013 6:02 pm**

You should force text extraction from that document, that's something like:

Code: Select all

// Document extractor
TextExtractorWork tew = new TextExtractorWork();
tew.setDocUuid(uuid);
tew.setDocPath(docPath);
tew.setDocVerUuid(docVerUuuid);
  
// Execute extractor
//NodeDocumentDAO.getInstance().textExtractorHelper(tew);

Then you should read extracted text

Code: Select all

// Get extracted text
NodeDocument docNode = NodeDocumentDAO.getInstance().findByPk(uuid);
String text = docNode.getText();

Here you can take a look entire openkm classes documentation:
http://doxygen.openkm.com/openkm/
Normally you should use http://doxygen.openkm.com/openkm/d9/d6d ... _1api.html specially if you change repository pbject values. And aditionaly can use other methods from DAO etc... but in general except some exceptions always should use api ( this case is one of these exceptions where you need to go in more low level).

PostPosted:**Wed Sep 25, 2013 8:22 am**

So I'll have to edit the OpenKM code by adding a custom class and rebuilding OpenKM, right?

PostPosted:**Sat Sep 28, 2013 2:53 pm**

Can be done using Automation ( http://wiki.openkm.com/index.php/Automation ) controling events related with documents

Open Source Document Management System | OpenKM

OCR API

OCR API

Re: OCR API

Re: OCR API

Re: OCR API

Re: OCR API

Re: OCR API

Re: OCR API

Re: OCR API

Re: OCR API

Re: OCR API