• There's no ocr data capture option in menu

  • We tried to make OpenKM as intuitive as possible, but an advice is always welcome.
We tried to make OpenKM as intuitive as possible, but an advice is always welcome.
Forum rules: Please, before asking something see the documentation wiki or use the search feature of the forum. And remember we don't have a crystal ball or mental readers, so if you post about an issue tell us which OpenKM are you using and also the browser and operating system version. For more info read How to Report Bugs Effectively.
 #28149  by IVSModern
 
I cannot see subj option:
1.JPG
1.JPG (36.95 KiB) Viewed 4774 times
I'm using OpenKM 6.2.6 build 8125 (upload manager is not working in stable version) on WinXP. Configuration settings are:
system.ocr C:\Program Files\Tesseract-OCR\tesseract -l rus+eng ${fileIn} ${fileOut}
system.ocr.rotate 90;180;270;
system.pdf.force.ocr True
 #28194  by jllort
 
OK, now I understood. Well this option is only part of professional version, in community version is not present.
 #28356  by jllort
 
I think you're on confusion with OMR and Zone OCR in table. In table Zone OCR is clearly not included in community version and probably will never been.
 #28415  by IVSModern
 
Ok. Thanks to your explaination and this thread: http://forum.openkm.com/viewtopic.php?f ... orm#p26151 now it's clear. Now, I suppose, I understand how it should work but it doesn't. Here is the log:
Code: Select all
2014-04-25 11:45:00,109 [Thread-58] INFO  com.openkm.extractor.TextExtractorWorker - processSerial.Working on {docUuid=29af1d2d-a4be-41a4-8edc-642c66d9a507, docPath=/okm:root/phototest.tif, docVerUuid=311e4074-cabc-45bf-9d07-1405c246e305, date=Fri Apr 25 11:41:31 YEKT 2014}
2014-04-25 11:45:06,203 [Thread-58] INFO  com.openkm.util.DocumentUtils - Using OpenOffice dictionary: C:\Program Files\OpenOffice 4\share\extensions\install\dict_ru_RU-0.3.6.oxt
2014-04-25 11:45:06,328 [Thread-58] WARN  com.openkm.extractor.Tesseract3TextExtractor - Failed to extract OCR text
java.lang.IllegalStateException: java.util.concurrent.ExecutionException: java.lang.NumberFormatException: For input string: "KOI8-R"
	at org.dts.spell.dictionary.OpenOfficeSpellDictionary.waitToLoad(OpenOfficeSpellDictionary.java:289)
	at org.dts.spell.dictionary.OpenOfficeSpellDictionary.getSuggestions(OpenOfficeSpellDictionary.java:264)
	at com.openkm.util.DocumentUtils.spellChecker(DocumentUtils.java:59)
	at com.openkm.extractor.Tesseract3TextExtractor.doOcr(Tesseract3TextExtractor.java:143)
	at com.openkm.extractor.Tesseract3TextExtractor.extractText(Tesseract3TextExtractor.java:82)
	at com.openkm.extractor.RegisteredExtractors.getText(RegisteredExtractors.java:211)
	at com.openkm.extractor.RegisteredExtractors.getText(RegisteredExtractors.java:172)
	at com.openkm.dao.NodeDocumentDAO.textExtractorHelper(NodeDocumentDAO.java:1300)
	at com.openkm.extractor.TextExtractorWorker.processSerial(TextExtractorWorker.java:138)
	at com.openkm.extractor.TextExtractorWorker.processQueue(TextExtractorWorker.java:125)
	at com.openkm.extractor.TextExtractorWorker.run(TextExtractorWorker.java:80)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at bsh.Reflect.invokeOnMethod(Unknown Source)
	at bsh.Reflect.invokeObjectMethod(Unknown Source)
	at bsh.BSHPrimarySuffix.doName(Unknown Source)
	at bsh.BSHPrimarySuffix.doSuffix(Unknown Source)
	at bsh.BSHPrimaryExpression.eval(Unknown Source)
	at bsh.BSHPrimaryExpression.eval(Unknown Source)
	at bsh.Interpreter.eval(Unknown Source)
	at bsh.Interpreter.eval(Unknown Source)
	at bsh.Interpreter.eval(Unknown Source)
	at com.openkm.util.ExecutionUtils.runScript(ExecutionUtils.java:112)
	at com.openkm.core.Cron$RunnerBsh.run(Cron.java:103)
	at java.lang.Thread.run(Thread.java:724)
Caused by: java.util.concurrent.ExecutionException: java.lang.NumberFormatException: For input string: "KOI8-R"
	at java.util.concurrent.FutureTask.report(FutureTask.java:122)
	at java.util.concurrent.FutureTask.get(FutureTask.java:188)
	at org.dts.spell.dictionary.OpenOfficeSpellDictionary.waitToLoad(OpenOfficeSpellDictionary.java:283)
	... 26 more
Caused by: java.lang.NumberFormatException: For input string: "KOI8-R"
	at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
	at java.lang.Integer.parseInt(Integer.java:492)
	at java.lang.Integer.parseInt(Integer.java:527)
	at org.dts.spell.dictionary.myspell.MySpell.load_tables(MySpell.java:398)
	at org.dts.spell.dictionary.myspell.MySpell.initFromStreams(MySpell.java:177)
	at org.dts.spell.dictionary.myspell.MySpell.<init>(MySpell.java:69)
	at org.dts.spell.dictionary.OpenOfficeSpellDictionary.initFromZipFile(OpenOfficeSpellDictionary.java:198)
	at org.dts.spell.dictionary.OpenOfficeSpellDictionary.access$100(OpenOfficeSpellDictionary.java:31)
	at org.dts.spell.dictionary.OpenOfficeSpellDictionary$2.call(OpenOfficeSpellDictionary.java:88)
	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	... 1 more
The ocr settings are:
registered.text.extractors: com.openkm.extractor.Tesseract3TextExtractor
system.ocr: C:\Program Files\Tesseract-OCR\tesseract ${fileIn} ${fileOut} -l rus+eng
system.ocr.rotate: 90;180;270;
Test file is from standart tesseract package (attached) and it's console extraction executed well. Any idea what's wrong with it?
Attachments
phototest.tif
phototest.tif (37.76 KiB) Viewed 4702 times
 #28420  by jllort
 
I do not know exactly what are you doing but the message
Code: Select all
Java.util.concurrent.ExecutionException: Java.lang.NumberFormatException: For input string: "KOI8-R"
Indicate you are trying to convert a string test to a number. that's why is raising the error. Could be some problem with dictionary ?

About Us

OpenKM is part of the management software. A management software is a program that facilitates the accomplishment of administrative tasks. OpenKM is a document management system that allows you to manage business content and workflow in a more efficient way. Document managers guarantee data protection by establishing information security for business content.