Overview of OCR-Capabilities
PostPosted:Fri Apr 01, 2016 8:36 am
Hello,
I want an opensource CMS with OCR, and stumbled over OpenKM.
I 'm an absolute beginner, sorry for silly questions...
In the User Guide http://wiki.openkm.com/index.php/User_Guide
is a Menu button OCR/OMR, but not in my GUI.
I've searched the Forum and the Wiki, but it is not clear to me, which configuration is the best for OCR in common.
First I saw in http://wiki.openkm.com/index.php/OCR that I have to set
system.ocr=...
in?
Where is this one:
How will contents extracted from non-scanned file i.e. PDF, Framemaker and so on (Apache Tika?). Where I can
find it.
In the plugin search I found some "pdf to text". Is there a recommended Solution?
regards for every hint or pointer to a user guide for beginners
Steffen
I want an opensource CMS with OCR, and stumbled over OpenKM.
I 'm an absolute beginner, sorry for silly questions...
In the User Guide http://wiki.openkm.com/index.php/User_Guide
is a Menu button OCR/OMR, but not in my GUI.
I've searched the Forum and the Wiki, but it is not clear to me, which configuration is the best for OCR in common.
First I saw in http://wiki.openkm.com/index.php/OCR that I have to set
system.ocr=...
in?
Where is this one:
You need to modify the registered.text.extractors configuration property to match the OCR engine you have configured using system.ocr. By default only Cuneiform text extractor is enabled. If you want to configure Tesseract remove the Cuneiform extractor and add the Tesseract extractor.Than I read
You can enable any of these text extractors adding it in the textFilterClasses param of the SearchIndex section in your repository.xmlWhere is the repository xml?
How will contents extracted from non-scanned file i.e. PDF, Framemaker and so on (Apache Tika?). Where I can
find it.
In the plugin search I found some "pdf to text". Is there a recommended Solution?
regards for every hint or pointer to a user guide for beginners
Steffen