Page 1 of 1
Tesseract output file not found / not picked up by OpenKM
PostPosted:Sat May 04, 2013 1:49 pm
by anyonebutnoone
Hello,
I have the following in my logfile
Code: Select all2013-05-04 13:40:00,492 [Thread-601] WARN com.openkm.extractor.Tesseract3TextExtractor - IO exception executing command: /usr/local/bin/tesseract /home/kmadmin/openkm/tomcat/temp/okm8891882979629288432.gif /home/kmadmin/openkm/tomcat/temp/okm7728462093100462371 -l deu
java.io.FileNotFoundException: /home/kmadmin/openkm/tomcat/temp/okm7728462093100462371.txt (No such file or directory)
I am running Tesseract 3 which works fine on comandline. Also if i set the output file to OPENKMPATH/tomcat/temp/xxxxx
OpenKM is Version: 6.2.3 (build: 7945) Community
Can someone give me a hint on how to fix this?
Edit: Or a hint on where to look else for the Problem!
Re: Tesseract output file not found / not picked up by OpenK
PostPosted:Sun May 05, 2013 5:45 pm
by anyonebutnoone
setting logging to DEBUG for the tesseract call gave me more info and it was a tesseract problem!!
Re: Tesseract output file not found / not picked up by OpenK
PostPosted:Mon May 06, 2013 11:22 am
by pavila
This is because the the system.ocr configuration property does not match the text extractor registered at registered.text.extractors.
Re: Tesseract output file not found / not picked up by OpenK
PostPosted:Tue May 07, 2013 10:56 am
by anyonebutnoone
Hmm, i am getting the error again, tesseract works fine on the comandline
my settings are
Code: Select allsystem.ocr String /usr/local/bin/tesseract ${fileIn} ${fileOut} -l deu
Code: Select allregistered.text.extractors List
org.apache.jackrabbit.extractor.PlainTextExtractor
org.apache.jackrabbit.extractor.MsWordTextExtractor
org.apache.jackrabbit.extractor.MsExcelTextExtractor
org.apache.jackrabbit.extractor.MsPowerPointTextExtractor
org.apache.jackrabbit.extractor.OpenOfficeTextExtractor
org.apache.jackrabbit.extractor.RTFTextExtractor
org.apache.jackrabbit.extractor.HTMLTextExtractor
org.apache.jackrabbit.extractor.XMLTextExtractor
org.apache.jackrabbit.extractor.MsOutlookTextExtractor
com.openkm.extractor.PdfTextExtractor
com.openkm.extractor.AudioTextExtractor
com.openkm.extractor.ExifTextExtractor
com.openkm.extractor.MsOffice2007TextExtractor
com.openkm.extractor.Tesseract3TextExtractor
Am i missing something?
Re: Tesseract output file not found / not picked up by OpenK
PostPosted:Wed May 08, 2013 8:02 pm
by jllort
After change to com.openkm.extractor.Tesseract3TextExtractor did you restarted application ? otherside restart
Re: Tesseract output file not found / not picked up by OpenK
PostPosted:Sun May 12, 2013 6:46 pm
by anyonebutnoone
i had everyone log of several times and stoped and restarted catalina
what do you mean by otherside restart?
Re: Tesseract output file not found / not picked up by OpenK
PostPosted:Mon May 13, 2013 9:35 am
by pavila
Can you reproduce the problem with a recent night build?