Page 1 of 1
OCR API
PostPosted:Wed Sep 18, 2013 7:16 am
by gvdm
Hi to all
For my system I must use only APIs to access OpenKM's functionalities and I need che OCR ones.
In the reference there isn't any API for OCR conversion (I've already included the OCR module into my OpenKM).
How can I accomplish this goal?
I'm using OpenKM 6.2 Community Version over GNU/Linux Debian Wheezy.
Thank you
Giulio
Re: OCR API
PostPosted:Thu Sep 19, 2013 11:09 am
by jllort
OCR is executed automatically in relation with text extractor mime type file. There's nothing else here. What do you got in mind ?
Re: OCR API
PostPosted:Thu Sep 19, 2013 1:38 pm
by gvdm
What I need is that anytime the user (using the APIs) uploads an image OpenKM should extract the content by using the OCR module and response withe the "OCR Document" containing the converted text.
Is it possibile?
Re: OCR API
PostPosted:Fri Sep 20, 2013 10:09 am
by pavila
This feature can be implemented, but I recommend to become a customer if you need it soon.
Re: OCR API
PostPosted:Fri Sep 20, 2013 11:49 am
by jllort
That can be done in two ways. Force after upload the OCR text extraction or before ( when arrive index queue ). An easy way could be send daily mail with all text extraction results, I think will be better than push info each time. As you can see there several possibilities ... it's only a question to know what will be better for you.
Re: OCR API
PostPosted:Mon Sep 23, 2013 8:04 am
by gvdm
I need the OCR conversion to be part of the workflow so it should be notified as soon as the OCR conversion is done.
Where can I get more info about the architecture of OpenKM? Just to see where I should put my hands to reach the OCR functionalities.
Re: OCR API
PostPosted:Mon Sep 23, 2013 2:56 pm
by gvdm
Ok, searching is the way!
Reading
this thread I found out that I can access OCR text by executing this query into the "administration" -> "Database query" window:
Code: Select allselect * from OKM_NODE_DOCUMENT order by NDC_LAST_MODIFIED
(the "order by" is a facility).
Now I can see all the auto-scanned OCR text.
The next step is to understand how I can access this table from the outside of OpenKM. Is there any URL to which the OpenKM's database is on listening for queries (like, for example, the MySQL's 3306 port)?
Re: OCR API
PostPosted:Tue Sep 24, 2013 6:02 pm
by jllort
You should force text extraction from that document, that's something like:
Code: Select all// Document extractor
TextExtractorWork tew = new TextExtractorWork();
tew.setDocUuid(uuid);
tew.setDocPath(docPath);
tew.setDocVerUuid(docVerUuuid);
// Execute extractor
//NodeDocumentDAO.getInstance().textExtractorHelper(tew);
Then you should read extracted text
Code: Select all// Get extracted text
NodeDocument docNode = NodeDocumentDAO.getInstance().findByPk(uuid);
String text = docNode.getText();
Here you can take a look entire openkm classes documentation:
http://doxygen.openkm.com/openkm/
Normally you should use
http://doxygen.openkm.com/openkm/d9/d6d ... _1api.html specially if you change repository pbject values. And aditionaly can use other methods from DAO etc... but in general except some exceptions always should use api ( this case is one of these exceptions where you need to go in more low level).
Re: OCR API
PostPosted:Wed Sep 25, 2013 8:22 am
by gvdm
So I'll have to edit the OpenKM code by adding a custom class and rebuilding OpenKM, right?
Re: OCR API
PostPosted:Sat Sep 28, 2013 2:53 pm
by jllort
Can be done using Automation (
http://wiki.openkm.com/index.php/Automation ) controling events related with documents