Open Source Document Management System | OpenKM

PostPosted:**Tue Oct 18, 2011 3:50 pm**

I am useing:
Ubuntu 10.04
OpenKM 5.1.7
OpenOffice 3.2
Tesseract

My problem is OCR indexing. My default language is Macedonian (Cyrillic). A lot of my documents are mixtrure from macedonian and english. English OCR is working verry good, but Macedonian not.
I was reading document "Experimental features" at the link: http://wiki.openkm.com/index.php/Experimental_features but I cannot see propertie called "TEXT" in Administration > Repository view.
Can you please give me more precize information how to enable this propertie, how I can be abble to see what is OCR'ed from my text?
Thanks!

PostPosted:**Thu Oct 20, 2011 9:44 pm**

Property is on administration tab / second icon from left -> that's configuration parameters, must be enabled there. That might not be enabled on the production environment.

PostPosted:**Sat Oct 22, 2011 2:03 pm**

I had chacked propertie "experimental.text.extraction" to TRUE. (It is checked by default)
But I cannot see a property called text under Administration > Repository view.
Attached is picture with my repository view, where I cannot find TEXT propertie with indexed text. (I cannot see place with OCR'ed text)

PostPosted:**Sun Oct 23, 2011 10:30 am**

If you read with care http://wiki.openkm.com/index.php/Experimental_features, you'll see that is stored on dbms on under OKM_ACTIVITY table and must execute the select

SELECT * FROM OKM_ACTIVITY WHERE ACT_ACTION='MISC_TEXT_EXTRACTION_FAILURE'

PostPosted:**Sun Oct 23, 2011 9:20 pm**

Yes I know that, but how to take a look at a OCR'ed text (extraction without error)?
I cannot find "From Administration > Repository view you would be able to see a property called text where the document extracted text is stored."
Thank you.

PostPosted:**Mon Oct 24, 2011 8:20 am**

Really is not like you think. Text is extracted to be used on lucene ( indexer ) but you only need to store the blob not the contents ( will use the double space for saving a file, and that's not good idea ), hope you could understanding. Text is not stored on repository node, is stracted and passed to indexer, nothing else. On repository node is only stored the binary information ( that's the idea ).

PostPosted:**Mon Oct 24, 2011 9:28 am**

OK, I understand. Thank You.

Open Source Document Management System | OpenKM

What is indexed by OpenKM.

What is indexed by OpenKM.

Re: What is indexed by OpenKM.

Re: What is indexed by OpenKM.

Re: What is indexed by OpenKM.

Re: What is indexed by OpenKM.

Re: What is indexed by OpenKM.

Re: What is indexed by OpenKM.