Page 1 of 1

There's no ocr data capture option in menu

PostPosted:Fri Mar 21, 2014 10:48 am
by IVSModern
I cannot see subj option:
1.JPG
1.JPG (36.95 KiB) Viewed 4768 times
I'm using OpenKM 6.2.6 build 8125 (upload manager is not working in stable version) on WinXP. Configuration settings are:
system.ocr C:\Program Files\Tesseract-OCR\tesseract -l rus+eng ${fileIn} ${fileOut}
system.ocr.rotate 90;180;270;
system.pdf.force.ocr True

Re: There's no ocr data capture option in menu

PostPosted:Sun Mar 23, 2014 5:48 am
by pavila
Where have you seen this option?

Re: There's no ocr data capture option in menu

PostPosted:Mon Mar 24, 2014 7:11 am
by IVSModern

Re: There's no ocr data capture option in menu

PostPosted:Wed Mar 26, 2014 10:52 am
by jllort
OK, now I understood. Well this option is only part of professional version, in community version is not present.

Re: There's no ocr data capture option in menu

PostPosted:Tue Apr 15, 2014 9:57 am
by IVSModern
According to http://www.openkm.com/en/overview/compa ... sions.html ("General features" block) this option included into Community ver. too without Zonal OCR only ("Modules" block).

Re: There's no ocr data capture option in menu

PostPosted:Thu Apr 17, 2014 7:10 am
by jllort
I think you're on confusion with OMR and Zone OCR in table. In table Zone OCR is clearly not included in community version and probably will never been.

Re: There's no ocr data capture option in menu

PostPosted:Fri Apr 25, 2014 8:13 am
by IVSModern
Ok. Thanks to your explaination and this thread: http://forum.openkm.com/viewtopic.php?f ... orm#p26151 now it's clear. Now, I suppose, I understand how it should work but it doesn't. Here is the log:
Code: Select all
2014-04-25 11:45:00,109 [Thread-58] INFO  com.openkm.extractor.TextExtractorWorker - processSerial.Working on {docUuid=29af1d2d-a4be-41a4-8edc-642c66d9a507, docPath=/okm:root/phototest.tif, docVerUuid=311e4074-cabc-45bf-9d07-1405c246e305, date=Fri Apr 25 11:41:31 YEKT 2014}
2014-04-25 11:45:06,203 [Thread-58] INFO  com.openkm.util.DocumentUtils - Using OpenOffice dictionary: C:\Program Files\OpenOffice 4\share\extensions\install\dict_ru_RU-0.3.6.oxt
2014-04-25 11:45:06,328 [Thread-58] WARN  com.openkm.extractor.Tesseract3TextExtractor - Failed to extract OCR text
java.lang.IllegalStateException: java.util.concurrent.ExecutionException: java.lang.NumberFormatException: For input string: "KOI8-R"
	at org.dts.spell.dictionary.OpenOfficeSpellDictionary.waitToLoad(OpenOfficeSpellDictionary.java:289)
	at org.dts.spell.dictionary.OpenOfficeSpellDictionary.getSuggestions(OpenOfficeSpellDictionary.java:264)
	at com.openkm.util.DocumentUtils.spellChecker(DocumentUtils.java:59)
	at com.openkm.extractor.Tesseract3TextExtractor.doOcr(Tesseract3TextExtractor.java:143)
	at com.openkm.extractor.Tesseract3TextExtractor.extractText(Tesseract3TextExtractor.java:82)
	at com.openkm.extractor.RegisteredExtractors.getText(RegisteredExtractors.java:211)
	at com.openkm.extractor.RegisteredExtractors.getText(RegisteredExtractors.java:172)
	at com.openkm.dao.NodeDocumentDAO.textExtractorHelper(NodeDocumentDAO.java:1300)
	at com.openkm.extractor.TextExtractorWorker.processSerial(TextExtractorWorker.java:138)
	at com.openkm.extractor.TextExtractorWorker.processQueue(TextExtractorWorker.java:125)
	at com.openkm.extractor.TextExtractorWorker.run(TextExtractorWorker.java:80)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at bsh.Reflect.invokeOnMethod(Unknown Source)
	at bsh.Reflect.invokeObjectMethod(Unknown Source)
	at bsh.BSHPrimarySuffix.doName(Unknown Source)
	at bsh.BSHPrimarySuffix.doSuffix(Unknown Source)
	at bsh.BSHPrimaryExpression.eval(Unknown Source)
	at bsh.BSHPrimaryExpression.eval(Unknown Source)
	at bsh.Interpreter.eval(Unknown Source)
	at bsh.Interpreter.eval(Unknown Source)
	at bsh.Interpreter.eval(Unknown Source)
	at com.openkm.util.ExecutionUtils.runScript(ExecutionUtils.java:112)
	at com.openkm.core.Cron$RunnerBsh.run(Cron.java:103)
	at java.lang.Thread.run(Thread.java:724)
Caused by: java.util.concurrent.ExecutionException: java.lang.NumberFormatException: For input string: "KOI8-R"
	at java.util.concurrent.FutureTask.report(FutureTask.java:122)
	at java.util.concurrent.FutureTask.get(FutureTask.java:188)
	at org.dts.spell.dictionary.OpenOfficeSpellDictionary.waitToLoad(OpenOfficeSpellDictionary.java:283)
	... 26 more
Caused by: java.lang.NumberFormatException: For input string: "KOI8-R"
	at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
	at java.lang.Integer.parseInt(Integer.java:492)
	at java.lang.Integer.parseInt(Integer.java:527)
	at org.dts.spell.dictionary.myspell.MySpell.load_tables(MySpell.java:398)
	at org.dts.spell.dictionary.myspell.MySpell.initFromStreams(MySpell.java:177)
	at org.dts.spell.dictionary.myspell.MySpell.<init>(MySpell.java:69)
	at org.dts.spell.dictionary.OpenOfficeSpellDictionary.initFromZipFile(OpenOfficeSpellDictionary.java:198)
	at org.dts.spell.dictionary.OpenOfficeSpellDictionary.access$100(OpenOfficeSpellDictionary.java:31)
	at org.dts.spell.dictionary.OpenOfficeSpellDictionary$2.call(OpenOfficeSpellDictionary.java:88)
	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	... 1 more
The ocr settings are:
registered.text.extractors: com.openkm.extractor.Tesseract3TextExtractor
system.ocr: C:\Program Files\Tesseract-OCR\tesseract ${fileIn} ${fileOut} -l rus+eng
system.ocr.rotate: 90;180;270;
Test file is from standart tesseract package (attached) and it's console extraction executed well. Any idea what's wrong with it?

Re: There's no ocr data capture option in menu

PostPosted:Sat Apr 26, 2014 10:49 am
by jllort
I do not know exactly what are you doing but the message
Code: Select all
Java.util.concurrent.ExecutionException: Java.lang.NumberFormatException: For input string: "KOI8-R"
Indicate you are trying to convert a string test to a number. that's why is raising the error. Could be some problem with dictionary ?

Re: There's no ocr data capture option in menu

PostPosted:Wed Apr 30, 2014 5:06 am
by IVSModern
You were right. I tried another dictionary and it works fine.
By the way, russian text extraction gets better without any dictionary.
Thanks a lot.