Open Source Document Management System | OpenKM

What is indexed by OpenKM.

Forum rules: Please, before asking something see the documentation wiki or use the search feature of the forum. And remember we don't have a crystal ball or mental readers, so if you post about an issue tell us which OpenKM are you using and also the browser and operating system version. For more info read How to Report Bugs Effectively.

7 posts

7 posts

What is indexed by OpenKM.

#12624 by sasoa
Tue Oct 18, 2011 3:50 pm

I am useing:
Ubuntu 10.04
OpenKM 5.1.7
OpenOffice 3.2
Tesseract

My problem is OCR indexing. My default language is Macedonian (Cyrillic). A lot of my documents are mixtrure from macedonian and english. English OCR is working verry good, but Macedonian not.
I was reading document "Experimental features" at the link: http://wiki.openkm.com/index.php/Experimental_features but I cannot see propertie called "TEXT" in Administration > Repository view.
Can you please give me more precize information how to enable this propertie, how I can be abble to see what is OCR'ed from my text?
Thanks!

Username

sasoa

Rank

Fresh Boarder

Posts

Joined

Tue Oct 18, 2011 3:42 pm

Re: What is indexed by OpenKM.

#12653 by jllort
Thu Oct 20, 2011 9:44 pm

Property is on administration tab / second icon from left -> that's configuration parameters, must be enabled there. That might not be enabled on the production environment.

Username

jllort

Rank

Moderator

Posts

12179

Joined

Fri Dec 21, 2007 11:23 am

Location

Sineu - ( Illes Balears ) - Spain

Contact

Re: What is indexed by OpenKM.

#12667 by sasoa
Sat Oct 22, 2011 2:03 pm

I had chacked propertie "experimental.text.extraction" to TRUE. (It is checked by default)
But I cannot see a property called text under Administration > Repository view.
Attached is picture with my repository view, where I cannot find TEXT propertie with indexed text. (I cannot see place with OCR'ed text)

Attachments

MyRepositoryView.jpg (98.09 KiB) Viewed 4368 times

Username

sasoa

Rank

Fresh Boarder

Posts

Joined

Tue Oct 18, 2011 3:42 pm

Re: What is indexed by OpenKM.

#12683 by jllort
Sun Oct 23, 2011 10:30 am

If you read with care http://wiki.openkm.com/index.php/Experimental_features, you'll see that is stored on dbms on under OKM_ACTIVITY table and must execute the select

SELECT * FROM OKM_ACTIVITY WHERE ACT_ACTION='MISC_TEXT_EXTRACTION_FAILURE'

Username

jllort

Rank

Moderator

Posts

12179

Joined

Fri Dec 21, 2007 11:23 am

Location

Sineu - ( Illes Balears ) - Spain

Contact

Re: What is indexed by OpenKM.

#12688 by sasoa
Sun Oct 23, 2011 9:20 pm

Yes I know that, but how to take a look at a OCR'ed text (extraction without error)?
I cannot find "From Administration > Repository view you would be able to see a property called text where the document extracted text is stored."
Thank you.

Username

sasoa

Rank

Fresh Boarder

Posts

Joined

Tue Oct 18, 2011 3:42 pm

Re: What is indexed by OpenKM.

#12691 by jllort
Mon Oct 24, 2011 8:20 am

Really is not like you think. Text is extracted to be used on lucene ( indexer ) but you only need to store the blob not the contents ( will use the double space for saving a file, and that's not good idea ), hope you could understanding. Text is not stored on repository node, is stracted and passed to indexer, nothing else. On repository node is only stored the binary information ( that's the idea ).

Username

jllort

Rank

Moderator

Posts

12179

Joined

Fri Dec 21, 2007 11:23 am

Location

Sineu - ( Illes Balears ) - Spain

Contact

Re: What is indexed by OpenKM.

#12693 by sasoa
Mon Oct 24, 2011 9:28 am

OK, I understand. Thank You.

Username

sasoa

Rank

Fresh Boarder

Posts

Joined

Tue Oct 18, 2011 3:42 pm

Page 1 of 1
7 posts

Return to “Usage”

Display:

Sort by:

Jump to: