Page 1 of 1

[solved] OCR with Tesseract on Ubuntu 14.04 - Error: Undefined OCR application

PostPosted:Fri Oct 23, 2015 3:55 am
by ds2k15
Ubuntu Server 14.04 64bit
OpenKM 6.3.1
Postgres 9.3
Tesseract 3.03



/opt/openkm-6.3.1-community/tomcat/OpenKM.cfg

system.ocr=/usr/bin/tesseract


Login as Administrator

-> Administration -> Utilities -> Check text extraction

upload a file or enter Document UUID

Error: Undefined OCR application



If i check on the system console

tesseract -l eng image001.jpg test

i get the TEXT from the image
in the file test.txt


How can i fix this ?

Thanks

Re: OCR with Tesseract on Ubuntu 14.04 - Error: Undefined OCR application

PostPosted:Fri Oct 23, 2015 8:35 am
by ds2k15
I did it new, dont know what i did changed but ( my bee make the DB new )

now i get in the catalina.out this error messages:
Code: Select all
2015-10-23 10:31:11,460 [http-bio-0.0.0.0-8080-exec-1] WARN  com.openkm.util.ExecutionUtils- Abnormal program termination: 1
2015-10-23 10:31:11,461 [http-bio-0.0.0.0-8080-exec-1] WARN  com.openkm.util.ExecutionUtils- CommandLine: [/usr/bin/tesseract]
2015-10-23 10:31:11,461 [http-bio-0.0.0.0-8080-exec-1] WARN  com.openkm.util.ExecutionUtils- STDERR: Usage:
  /usr/bin/tesseract imagename|stdin outputbase|stdout [options...] [configfile...]

OCR options:
Code: Select all
  --tessdata-dir /path  specify location of tessdata path
  -l lang[+lang]        specify language(s) used for OCR
  -c configvar=value    set value for control parameter.
                        Multiple -c arguments are allowed.
  -psm pagesegmode      specify page segmentation mode.
These options must occur before any configfile.
 

pagesegmode values are:
  0 = Orientation and script detection (OSD) only.
  1 = Automatic page segmentation with OSD.
  2 = Automatic page segmentation, but no OSD, or OCR
  3 = Fully automatic page segmentation, but no OSD. (Default)
  4 = Assume a single column of text of variable sizes.
  5 = Assume a single uniform block of vertically aligned text.
  6 = Assume a single uniform block of text.
  7 = Treat the image as a single text line.
  8 = Treat the image as a single word.
  9 = Treat the image as a single word in a circle.
  10 = Treat the image as a single character.


Single options:
  -v --version: version info
  --list-langs: list available languages for tesseract engine. Can be used with --tessdata-dir.
  --print-parameters: print tesseract parameters to the stdout.
Code: Select all
2015-10-23 10:31:11,462 [http-bio-0.0.0.0-8080-exec-1] WARN  com.openkm.extractor.Tesseract3TextExtractor- IO exception executing command: /usr/bin/tesseract

java.io.FileNotFoundException: /opt/openkm-6.3.1-community/tomcat/temp/okm3445709910616376817.txt (No such file or directory)
        at java.io.FileInputStream.open(Native Method)
        at java.io.FileInputStream.<init>(FileInputStream.java:146)
        at java.io.FileInputStream.<init>(FileInputStream.java:101)
        at com.openkm.extractor.Tesseract3TextExtractor.doOcr(Tesseract3TextExtractor.java:152)
        at com.openkm.extractor.Tesseract3TextExtractor.extractText(Tesseract3TextExtractor.java:99)
        at com.openkm.extractor.Tesseract3TextExtractor.extractText(Tesseract3TextExtractor.java:82)
        at com.openkm.extractor.Tesseract3TextExtractor.extractText(Tesseract3TextExtractor.java:59)
        at com.openkm.extractor.RegisteredExtractors.getText(RegisteredExtractors.java:214)
        at com.openkm.servlet.admin.CheckTextExtractionServlet.doPost(CheckTextExtractionServlet.java:139)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:650)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:731)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:303)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
        at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
        at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:311)
        at org.springframework.security.web.access.intercept.FilterSecurityInterceptor.invoke(FilterSecurityInterceptor.java:116)
        at org.springframework.security.web.access.intercept.FilterSecurityInterceptor.doFilter(FilterSecurityInterceptor.java:83)
        at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:323)
        at org.springframework.security.web.access.ExceptionTranslationFilter.doFilter(ExceptionTranslationFilter.java:113)
        at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:323)
        at org.springframework.security.web.session.SessionManagementFilter.doFilter(SessionManagementFilter.java:101)
        at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:323)
        at org.springframework.security.web.authentication.AnonymousAuthenticationFilter.doFilter(AnonymousAuthenticationFilter.java:113)
        at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:323)
        at org.springframework.security.web.servletapi.SecurityContextHolderAwareRequestFilter.doFilter(SecurityContextHolderAwareRequestFilter.java:54)
        at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:323)
        at org.springframework.security.web.savedrequest.RequestCacheAwareFilter.doFilter(RequestCacheAwareFilter.java:45)
        at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:323)
        at rg.springframework.security.web.authentication.AbstractAuthenticationProcessingFilter.doFilter(AbstractAuthenticationProcessingFilter.java:182)
        at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:323)
        at org.springframework.security.web.context.SecurityContextPersistenceFilter.doFilter(SecurityContextPersistenceFilter.java:87)
        at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:323)
        at org.springframework.security.web.FilterChainProxy.doFilter(FilterChainProxy.java:173)
        at org.springframework.web.filter.DelegatingFilterProxy.invokeDelegate(DelegatingFilterProxy.java:346)
        at org.springframework.web.filter.DelegatingFilterProxy.doFilter(DelegatingFilterProxy.java:259)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
        at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:220)
        at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:122)
        at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:505)
        at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:170)
        at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:103)
        at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:950)
        at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116)
        at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:423)
        at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1079)
        at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:620)
        at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:316)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
        at java.lang.Thread.run(Thread.java:745)
In the WebGUI i see this:
Code: Select all
/opt/openkm-6.3.1-community/tomcat/temp/okm3445709910616376817.txt (No such file or directory)

Re: OCR with Tesseract on Ubuntu 14.04 - Error: Undefined OCR application

PostPosted:Sat Oct 24, 2015 6:35 pm
by jllort
Take a look here :
http://wiki.openkm.com/index.php/Third- ... ation:_OCR

At administration > configuration parameters did you set the value:
Code: Select all
system.ocr=/usr/bin/tesseract ${fileIn} ${fileOut}

Re: OCR with Tesseract on Ubuntu 14.04 - Error: Undefined OCR application

PostPosted:Mon Oct 26, 2015 5:18 am
by ds2k15
thanks, but i get the same Error with:
Code: Select all
system.ocr=/usr/bin/tesseract ${fileIn} ${fileOut} -l eng  or  system.ocr=/usr/bin/tesseract ${fileIn} ${fileOut}  in the OpenKM.cfg
then i set: hibernate.hbm2ddl=create
in the OpenKM.cfg then it works!

reimport the language sql script

now it works, is the config in the DB stored ?

Re: [solved] OCR with Tesseract on Ubuntu 14.04 - Error: Undefined OCR application

PostPosted:Mon Oct 26, 2015 10:12 pm
by jllort
Yes the configuration is at DB, and almost parameters in OpenKM.cfg have effect on creation, then must be changed from Administration > configuration parameters