• What is indexed by OpenKM.

  • We tried to make OpenKM as intuitive as possible, but an advice is always welcome.
We tried to make OpenKM as intuitive as possible, but an advice is always welcome.
Forum rules: Please, before asking something see the documentation wiki or use the search feature of the forum. And remember we don't have a crystal ball or mental readers, so if you post about an issue tell us which OpenKM are you using and also the browser and operating system version. For more info read How to Report Bugs Effectively.
 #12624  by sasoa
 
I am useing:
Ubuntu 10.04
OpenKM 5.1.7
OpenOffice 3.2
Tesseract

My problem is OCR indexing. My default language is Macedonian (Cyrillic). A lot of my documents are mixtrure from macedonian and english. English OCR is working verry good, but Macedonian not.
I was reading document "Experimental features" at the link: http://wiki.openkm.com/index.php/Experimental_features but I cannot see propertie called "TEXT" in Administration > Repository view.
Can you please give me more precize information how to enable this propertie, how I can be abble to see what is OCR'ed from my text?
Thanks!
 #12653  by jllort
 
Property is on administration tab / second icon from left -> that's configuration parameters, must be enabled there. That might not be enabled on the production environment.
 #12667  by sasoa
 
I had chacked propertie "experimental.text.extraction" to TRUE. (It is checked by default)
But I cannot see a property called text under Administration > Repository view.
Attached is picture with my repository view, where I cannot find TEXT propertie with indexed text. (I cannot see place with OCR'ed text)
Attachments
MyRepositoryView.jpg
MyRepositoryView.jpg (98.09 KiB) Viewed 2894 times
 #12688  by sasoa
 
Yes I know that, but how to take a look at a OCR'ed text (extraction without error)?
I cannot find "From Administration > Repository view you would be able to see a property called text where the document extracted text is stored."
Thank you.
 #12691  by jllort
 
Really is not like you think. Text is extracted to be used on lucene ( indexer ) but you only need to store the blob not the contents ( will use the double space for saving a file, and that's not good idea ), hope you could understanding. Text is not stored on repository node, is stracted and passed to indexer, nothing else. On repository node is only stored the binary information ( that's the idea ).

About Us

OpenKM is part of the management software. A management software is a program that facilitates the accomplishment of administrative tasks. OpenKM is a document management system that allows you to manage business content and workflow in a more efficient way. Document managers guarantee data protection by establishing information security for business content.