Open Source Document Management System | OpenKM - There's no ocr data capture option in menu

There's no ocr data capture option in menu

Forum rules: Please, before asking something see the documentation wiki or use the search feature of the forum. And remember we don't have a crystal ball or mental readers, so if you post about an issue tell us which OpenKM are you using and also the browser and operating system version. For more info read How to Report Bugs Effectively.

9 posts

9 posts

There's no ocr data capture option in menu

#28149 by IVSModern
Fri Mar 21, 2014 10:48 am

I cannot see subj option:

1.JPG (36.95 KiB) Viewed 5282 times

I'm using OpenKM 6.2.6 build 8125 (upload manager is not working in stable version) on WinXP. Configuration settings are:
system.ocr C:\Program Files\Tesseract-OCR\tesseract -l rus+eng ${fileIn} ${fileOut}
system.ocr.rotate 90;180;270;
system.pdf.force.ocr True

Username

IVSModern

Rank

Fresh Boarder

Posts

Joined

Thu Mar 20, 2014 12:46 pm

Re: There's no ocr data capture option in menu

#28170 by pavila
Sun Mar 23, 2014 5:48 am

Where have you seen this option?

Username

pavila

Rank

Moderator

Posts

3143

Joined

Tue Dec 11, 2007 6:02 pm

Location

Alicante, Spain

Contact

Re: There's no ocr data capture option in menu

#28179 by IVSModern
Mon Mar 24, 2014 7:11 am

In user's manual
http://wiki.openkm.com/index.php/OCR_data_capture

Username

IVSModern

Rank

Fresh Boarder

Posts

Joined

Thu Mar 20, 2014 12:46 pm

Re: There's no ocr data capture option in menu

#28194 by jllort
Wed Mar 26, 2014 10:52 am

OK, now I understood. Well this option is only part of professional version, in community version is not present.

Username

jllort

Rank

Moderator

Posts

12160

Joined

Fri Dec 21, 2007 11:23 am

Location

Sineu - ( Illes Balears ) - Spain

Contact

Re: There's no ocr data capture option in menu

#28346 by IVSModern
Tue Apr 15, 2014 9:57 am

According to http://www.openkm.com/en/overview/compa ... sions.html ("General features" block) this option included into Community ver. too without Zonal OCR only ("Modules" block).

Username

IVSModern

Rank

Fresh Boarder

Posts

Joined

Thu Mar 20, 2014 12:46 pm

Re: There's no ocr data capture option in menu

#28356 by jllort
Thu Apr 17, 2014 7:10 am

I think you're on confusion with OMR and Zone OCR in table. In table Zone OCR is clearly not included in community version and probably will never been.

Username

jllort

Rank

Moderator

Posts

12160

Joined

Fri Dec 21, 2007 11:23 am

Location

Sineu - ( Illes Balears ) - Spain

Contact

Re: There's no ocr data capture option in menu

#28415 by IVSModern
Fri Apr 25, 2014 8:13 am

Ok. Thanks to your explaination and this thread: http://forum.openkm.com/viewtopic.php?f ... orm#p26151 now it's clear. Now, I suppose, I understand how it should work but it doesn't. Here is the log:

Code: Select all

2014-04-25 11:45:00,109 [Thread-58] INFO  com.openkm.extractor.TextExtractorWorker - processSerial.Working on {docUuid=29af1d2d-a4be-41a4-8edc-642c66d9a507, docPath=/okm:root/phototest.tif, docVerUuid=311e4074-cabc-45bf-9d07-1405c246e305, date=Fri Apr 25 11:41:31 YEKT 2014}
2014-04-25 11:45:06,203 [Thread-58] INFO  com.openkm.util.DocumentUtils - Using OpenOffice dictionary: C:\Program Files\OpenOffice 4\share\extensions\install\dict_ru_RU-0.3.6.oxt
2014-04-25 11:45:06,328 [Thread-58] WARN  com.openkm.extractor.Tesseract3TextExtractor - Failed to extract OCR text
java.lang.IllegalStateException: java.util.concurrent.ExecutionException: java.lang.NumberFormatException: For input string: "KOI8-R"
	at org.dts.spell.dictionary.OpenOfficeSpellDictionary.waitToLoad(OpenOfficeSpellDictionary.java:289)
	at org.dts.spell.dictionary.OpenOfficeSpellDictionary.getSuggestions(OpenOfficeSpellDictionary.java:264)
	at com.openkm.util.DocumentUtils.spellChecker(DocumentUtils.java:59)
	at com.openkm.extractor.Tesseract3TextExtractor.doOcr(Tesseract3TextExtractor.java:143)
	at com.openkm.extractor.Tesseract3TextExtractor.extractText(Tesseract3TextExtractor.java:82)
	at com.openkm.extractor.RegisteredExtractors.getText(RegisteredExtractors.java:211)
	at com.openkm.extractor.RegisteredExtractors.getText(RegisteredExtractors.java:172)
	at com.openkm.dao.NodeDocumentDAO.textExtractorHelper(NodeDocumentDAO.java:1300)
	at com.openkm.extractor.TextExtractorWorker.processSerial(TextExtractorWorker.java:138)
	at com.openkm.extractor.TextExtractorWorker.processQueue(TextExtractorWorker.java:125)
	at com.openkm.extractor.TextExtractorWorker.run(TextExtractorWorker.java:80)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at bsh.Reflect.invokeOnMethod(Unknown Source)
	at bsh.Reflect.invokeObjectMethod(Unknown Source)
	at bsh.BSHPrimarySuffix.doName(Unknown Source)
	at bsh.BSHPrimarySuffix.doSuffix(Unknown Source)
	at bsh.BSHPrimaryExpression.eval(Unknown Source)
	at bsh.BSHPrimaryExpression.eval(Unknown Source)
	at bsh.Interpreter.eval(Unknown Source)
	at bsh.Interpreter.eval(Unknown Source)
	at bsh.Interpreter.eval(Unknown Source)
	at com.openkm.util.ExecutionUtils.runScript(ExecutionUtils.java:112)
	at com.openkm.core.Cron$RunnerBsh.run(Cron.java:103)
	at java.lang.Thread.run(Thread.java:724)
Caused by: java.util.concurrent.ExecutionException: java.lang.NumberFormatException: For input string: "KOI8-R"
	at java.util.concurrent.FutureTask.report(FutureTask.java:122)
	at java.util.concurrent.FutureTask.get(FutureTask.java:188)
	at org.dts.spell.dictionary.OpenOfficeSpellDictionary.waitToLoad(OpenOfficeSpellDictionary.java:283)
	... 26 more
Caused by: java.lang.NumberFormatException: For input string: "KOI8-R"
	at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
	at java.lang.Integer.parseInt(Integer.java:492)
	at java.lang.Integer.parseInt(Integer.java:527)
	at org.dts.spell.dictionary.myspell.MySpell.load_tables(MySpell.java:398)
	at org.dts.spell.dictionary.myspell.MySpell.initFromStreams(MySpell.java:177)
	at org.dts.spell.dictionary.myspell.MySpell.<init>(MySpell.java:69)
	at org.dts.spell.dictionary.OpenOfficeSpellDictionary.initFromZipFile(OpenOfficeSpellDictionary.java:198)
	at org.dts.spell.dictionary.OpenOfficeSpellDictionary.access$100(OpenOfficeSpellDictionary.java:31)
	at org.dts.spell.dictionary.OpenOfficeSpellDictionary$2.call(OpenOfficeSpellDictionary.java:88)
	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	... 1 more

The ocr settings are:
registered.text.extractors: com.openkm.extractor.Tesseract3TextExtractor
system.ocr: C:\Program Files\Tesseract-OCR\tesseract ${fileIn} ${fileOut} -l rus+eng
system.ocr.rotate: 90;180;270;
Test file is from standart tesseract package (attached) and it's console extraction executed well. Any idea what's wrong with it?

Attachments

phototest.tif (37.76 KiB) Viewed 5210 times

Username

IVSModern

Rank

Fresh Boarder

Posts

Joined

Thu Mar 20, 2014 12:46 pm

Re: There's no ocr data capture option in menu

#28420 by jllort
Sat Apr 26, 2014 10:49 am

I do not know exactly what are you doing but the message

Code: Select all

Java.util.concurrent.ExecutionException: Java.lang.NumberFormatException: For input string: "KOI8-R"

Indicate you are trying to convert a string test to a number. that's why is raising the error. Could be some problem with dictionary ?

Username

jllort

Rank

Moderator

Posts

12160

Joined

Fri Dec 21, 2007 11:23 am

Location

Sineu - ( Illes Balears ) - Spain

Contact

Re: There's no ocr data capture option in menu

#28450 by IVSModern
Wed Apr 30, 2014 5:06 am

You were right. I tried another dictionary and it works fine.
By the way, russian text extraction gets better without any dictionary.
Thanks a lot.

Username

IVSModern

Rank

Fresh Boarder

Posts

Joined

Thu Mar 20, 2014 12:46 pm

Page 1 of 1
9 posts

Return to “Usage”

Display:

Sort by:

Jump to: