Ok. Thanks to your explaination and this thread:
http://forum.openkm.com/viewtopic.php?f ... orm#p26151 now it's clear. Now, I suppose, I understand how it should work but it doesn't. Here is the log:
Code: Select all2014-04-25 11:45:00,109 [Thread-58] INFO com.openkm.extractor.TextExtractorWorker - processSerial.Working on {docUuid=29af1d2d-a4be-41a4-8edc-642c66d9a507, docPath=/okm:root/phototest.tif, docVerUuid=311e4074-cabc-45bf-9d07-1405c246e305, date=Fri Apr 25 11:41:31 YEKT 2014}
2014-04-25 11:45:06,203 [Thread-58] INFO com.openkm.util.DocumentUtils - Using OpenOffice dictionary: C:\Program Files\OpenOffice 4\share\extensions\install\dict_ru_RU-0.3.6.oxt
2014-04-25 11:45:06,328 [Thread-58] WARN com.openkm.extractor.Tesseract3TextExtractor - Failed to extract OCR text
java.lang.IllegalStateException: java.util.concurrent.ExecutionException: java.lang.NumberFormatException: For input string: "KOI8-R"
at org.dts.spell.dictionary.OpenOfficeSpellDictionary.waitToLoad(OpenOfficeSpellDictionary.java:289)
at org.dts.spell.dictionary.OpenOfficeSpellDictionary.getSuggestions(OpenOfficeSpellDictionary.java:264)
at com.openkm.util.DocumentUtils.spellChecker(DocumentUtils.java:59)
at com.openkm.extractor.Tesseract3TextExtractor.doOcr(Tesseract3TextExtractor.java:143)
at com.openkm.extractor.Tesseract3TextExtractor.extractText(Tesseract3TextExtractor.java:82)
at com.openkm.extractor.RegisteredExtractors.getText(RegisteredExtractors.java:211)
at com.openkm.extractor.RegisteredExtractors.getText(RegisteredExtractors.java:172)
at com.openkm.dao.NodeDocumentDAO.textExtractorHelper(NodeDocumentDAO.java:1300)
at com.openkm.extractor.TextExtractorWorker.processSerial(TextExtractorWorker.java:138)
at com.openkm.extractor.TextExtractorWorker.processQueue(TextExtractorWorker.java:125)
at com.openkm.extractor.TextExtractorWorker.run(TextExtractorWorker.java:80)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at bsh.Reflect.invokeOnMethod(Unknown Source)
at bsh.Reflect.invokeObjectMethod(Unknown Source)
at bsh.BSHPrimarySuffix.doName(Unknown Source)
at bsh.BSHPrimarySuffix.doSuffix(Unknown Source)
at bsh.BSHPrimaryExpression.eval(Unknown Source)
at bsh.BSHPrimaryExpression.eval(Unknown Source)
at bsh.Interpreter.eval(Unknown Source)
at bsh.Interpreter.eval(Unknown Source)
at bsh.Interpreter.eval(Unknown Source)
at com.openkm.util.ExecutionUtils.runScript(ExecutionUtils.java:112)
at com.openkm.core.Cron$RunnerBsh.run(Cron.java:103)
at java.lang.Thread.run(Thread.java:724)
Caused by: java.util.concurrent.ExecutionException: java.lang.NumberFormatException: For input string: "KOI8-R"
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:188)
at org.dts.spell.dictionary.OpenOfficeSpellDictionary.waitToLoad(OpenOfficeSpellDictionary.java:283)
... 26 more
Caused by: java.lang.NumberFormatException: For input string: "KOI8-R"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:492)
at java.lang.Integer.parseInt(Integer.java:527)
at org.dts.spell.dictionary.myspell.MySpell.load_tables(MySpell.java:398)
at org.dts.spell.dictionary.myspell.MySpell.initFromStreams(MySpell.java:177)
at org.dts.spell.dictionary.myspell.MySpell.<init>(MySpell.java:69)
at org.dts.spell.dictionary.OpenOfficeSpellDictionary.initFromZipFile(OpenOfficeSpellDictionary.java:198)
at org.dts.spell.dictionary.OpenOfficeSpellDictionary.access$100(OpenOfficeSpellDictionary.java:31)
at org.dts.spell.dictionary.OpenOfficeSpellDictionary$2.call(OpenOfficeSpellDictionary.java:88)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
... 1 more
The ocr settings are:
registered.text.extractors: com.openkm.extractor.Tesseract3TextExtractor
system.ocr: C:\Program Files\Tesseract-OCR\tesseract ${fileIn} ${fileOut} -l rus+eng
system.ocr.rotate: 90;180;270;
Test file is from standart tesseract package (attached) and it's console extraction executed well. Any idea what's wrong with it?