Page 1 of 1

What is indexed by OpenKM.

PostPosted:Tue Oct 18, 2011 3:50 pm
by sasoa
I am useing:
Ubuntu 10.04
OpenKM 5.1.7
OpenOffice 3.2
Tesseract

My problem is OCR indexing. My default language is Macedonian (Cyrillic). A lot of my documents are mixtrure from macedonian and english. English OCR is working verry good, but Macedonian not.
I was reading document "Experimental features" at the link: http://wiki.openkm.com/index.php/Experimental_features but I cannot see propertie called "TEXT" in Administration > Repository view.
Can you please give me more precize information how to enable this propertie, how I can be abble to see what is OCR'ed from my text?
Thanks!

Re: What is indexed by OpenKM.

PostPosted:Thu Oct 20, 2011 9:44 pm
by jllort
Property is on administration tab / second icon from left -> that's configuration parameters, must be enabled there. That might not be enabled on the production environment.

Re: What is indexed by OpenKM.

PostPosted:Sat Oct 22, 2011 2:03 pm
by sasoa
I had chacked propertie "experimental.text.extraction" to TRUE. (It is checked by default)
But I cannot see a property called text under Administration > Repository view.
Attached is picture with my repository view, where I cannot find TEXT propertie with indexed text. (I cannot see place with OCR'ed text)

Re: What is indexed by OpenKM.

PostPosted:Sun Oct 23, 2011 10:30 am
by jllort
If you read with care http://wiki.openkm.com/index.php/Experimental_features, you'll see that is stored on dbms on under OKM_ACTIVITY table and must execute the select

SELECT * FROM OKM_ACTIVITY WHERE ACT_ACTION='MISC_TEXT_EXTRACTION_FAILURE'

Re: What is indexed by OpenKM.

PostPosted:Sun Oct 23, 2011 9:20 pm
by sasoa
Yes I know that, but how to take a look at a OCR'ed text (extraction without error)?
I cannot find "From Administration > Repository view you would be able to see a property called text where the document extracted text is stored."
Thank you.

Re: What is indexed by OpenKM.

PostPosted:Mon Oct 24, 2011 8:20 am
by jllort
Really is not like you think. Text is extracted to be used on lucene ( indexer ) but you only need to store the blob not the contents ( will use the double space for saving a file, and that's not good idea ), hope you could understanding. Text is not stored on repository node, is stracted and passed to indexer, nothing else. On repository node is only stored the binary information ( that's the idea ).

Re: What is indexed by OpenKM.

PostPosted:Mon Oct 24, 2011 9:28 am
by sasoa
OK, I understand. Thank You.