• OCR API

  • Do you want to create a native client or integrate with third party applications: webservices are the solution.
Do you want to create a native client or integrate with third party applications: webservices are the solution.
Forum rules: Please, before asking something see the documentation wiki or use the search feature of the forum. And remember we don't have a crystal ball or mental readers, so if you post about an issue tell us which OpenKM are you using and also the browser and operating system version. For more info read How to Report Bugs Effectively.
 #25489  by gvdm
 
Hi to all

For my system I must use only APIs to access OpenKM's functionalities and I need che OCR ones.
In the reference there isn't any API for OCR conversion (I've already included the OCR module into my OpenKM).

How can I accomplish this goal?

I'm using OpenKM 6.2 Community Version over GNU/Linux Debian Wheezy.

Thank you
Giulio
 #25513  by jllort
 
OCR is executed automatically in relation with text extractor mime type file. There's nothing else here. What do you got in mind ?
 #25520  by gvdm
 
What I need is that anytime the user (using the APIs) uploads an image OpenKM should extract the content by using the OCR module and response withe the "OCR Document" containing the converted text.
Is it possibile?
 #25527  by pavila
 
This feature can be implemented, but I recommend to become a customer if you need it soon.
 #25532  by jllort
 
That can be done in two ways. Force after upload the OCR text extraction or before ( when arrive index queue ). An easy way could be send daily mail with all text extraction results, I think will be better than push info each time. As you can see there several possibilities ... it's only a question to know what will be better for you.
 #25567  by gvdm
 
I need the OCR conversion to be part of the workflow so it should be notified as soon as the OCR conversion is done.

Where can I get more info about the architecture of OpenKM? Just to see where I should put my hands to reach the OCR functionalities.
 #25571  by gvdm
 
Ok, searching is the way!
Reading this thread I found out that I can access OCR text by executing this query into the "administration" -> "Database query" window:
Code: Select all
select * from OKM_NODE_DOCUMENT order by NDC_LAST_MODIFIED
(the "order by" is a facility).
Now I can see all the auto-scanned OCR text.

The next step is to understand how I can access this table from the outside of OpenKM. Is there any URL to which the OpenKM's database is on listening for queries (like, for example, the MySQL's 3306 port)?
 #25582  by jllort
 
You should force text extraction from that document, that's something like:
Code: Select all
// Document extractor
TextExtractorWork tew = new TextExtractorWork();
tew.setDocUuid(uuid);
tew.setDocPath(docPath);
tew.setDocVerUuid(docVerUuuid);
  
// Execute extractor
//NodeDocumentDAO.getInstance().textExtractorHelper(tew);
Then you should read extracted text
Code: Select all
// Get extracted text
NodeDocument docNode = NodeDocumentDAO.getInstance().findByPk(uuid);
String text = docNode.getText();
Here you can take a look entire openkm classes documentation:
http://doxygen.openkm.com/openkm/
Normally you should use http://doxygen.openkm.com/openkm/d9/d6d ... _1api.html specially if you change repository pbject values. And aditionaly can use other methods from DAO etc... but in general except some exceptions always should use api ( this case is one of these exceptions where you need to go in more low level).
 #25589  by gvdm
 
So I'll have to edit the OpenKM code by adding a custom class and rebuilding OpenKM, right?

About Us

OpenKM is part of the management software. A management software is a program that facilitates the accomplishment of administrative tasks. OpenKM is a document management system that allows you to manage business content and workflow in a more efficient way. Document managers guarantee data protection by establishing information security for business content.