• [solved] OCR with Tesseract on Ubuntu 14.04 - Error: Undefined OCR application

  • OpenKM has many interesting features, but requires some configuration process to show its full potential.
OpenKM has many interesting features, but requires some configuration process to show its full potential.
Forum rules: Please, before asking something see the documentation wiki or use the search feature of the forum. And remember we don't have a crystal ball or mental readers, so if you post about an issue tell us which OpenKM are you using and also the browser and operating system version. For more info read How to Report Bugs Effectively.
 #40755  by ds2k15
 
Ubuntu Server 14.04 64bit
OpenKM 6.3.1
Postgres 9.3
Tesseract 3.03



/opt/openkm-6.3.1-community/tomcat/OpenKM.cfg

system.ocr=/usr/bin/tesseract


Login as Administrator

-> Administration -> Utilities -> Check text extraction

upload a file or enter Document UUID

Error: Undefined OCR application



If i check on the system console

tesseract -l eng image001.jpg test

i get the TEXT from the image
in the file test.txt


How can i fix this ?

Thanks
Last edited by ds2k15 on Mon Oct 26, 2015 5:40 am, edited 1 time in total.
 #40758  by ds2k15
 
I did it new, dont know what i did changed but ( my bee make the DB new )

now i get in the catalina.out this error messages:
Code: Select all
2015-10-23 10:31:11,460 [http-bio-0.0.0.0-8080-exec-1] WARN  com.openkm.util.ExecutionUtils- Abnormal program termination: 1
2015-10-23 10:31:11,461 [http-bio-0.0.0.0-8080-exec-1] WARN  com.openkm.util.ExecutionUtils- CommandLine: [/usr/bin/tesseract]
2015-10-23 10:31:11,461 [http-bio-0.0.0.0-8080-exec-1] WARN  com.openkm.util.ExecutionUtils- STDERR: Usage:
  /usr/bin/tesseract imagename|stdin outputbase|stdout [options...] [configfile...]

OCR options:
Code: Select all
  --tessdata-dir /path  specify location of tessdata path
  -l lang[+lang]        specify language(s) used for OCR
  -c configvar=value    set value for control parameter.
                        Multiple -c arguments are allowed.
  -psm pagesegmode      specify page segmentation mode.
These options must occur before any configfile.
 

pagesegmode values are:
  0 = Orientation and script detection (OSD) only.
  1 = Automatic page segmentation with OSD.
  2 = Automatic page segmentation, but no OSD, or OCR
  3 = Fully automatic page segmentation, but no OSD. (Default)
  4 = Assume a single column of text of variable sizes.
  5 = Assume a single uniform block of vertically aligned text.
  6 = Assume a single uniform block of text.
  7 = Treat the image as a single text line.
  8 = Treat the image as a single word.
  9 = Treat the image as a single word in a circle.
  10 = Treat the image as a single character.


Single options:
  -v --version: version info
  --list-langs: list available languages for tesseract engine. Can be used with --tessdata-dir.
  --print-parameters: print tesseract parameters to the stdout.
Code: Select all
2015-10-23 10:31:11,462 [http-bio-0.0.0.0-8080-exec-1] WARN  com.openkm.extractor.Tesseract3TextExtractor- IO exception executing command: /usr/bin/tesseract

java.io.FileNotFoundException: /opt/openkm-6.3.1-community/tomcat/temp/okm3445709910616376817.txt (No such file or directory)
        at java.io.FileInputStream.open(Native Method)
        at java.io.FileInputStream.<init>(FileInputStream.java:146)
        at java.io.FileInputStream.<init>(FileInputStream.java:101)
        at com.openkm.extractor.Tesseract3TextExtractor.doOcr(Tesseract3TextExtractor.java:152)
        at com.openkm.extractor.Tesseract3TextExtractor.extractText(Tesseract3TextExtractor.java:99)
        at com.openkm.extractor.Tesseract3TextExtractor.extractText(Tesseract3TextExtractor.java:82)
        at com.openkm.extractor.Tesseract3TextExtractor.extractText(Tesseract3TextExtractor.java:59)
        at com.openkm.extractor.RegisteredExtractors.getText(RegisteredExtractors.java:214)
        at com.openkm.servlet.admin.CheckTextExtractionServlet.doPost(CheckTextExtractionServlet.java:139)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:650)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:731)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:303)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
        at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
        at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:311)
        at org.springframework.security.web.access.intercept.FilterSecurityInterceptor.invoke(FilterSecurityInterceptor.java:116)
        at org.springframework.security.web.access.intercept.FilterSecurityInterceptor.doFilter(FilterSecurityInterceptor.java:83)
        at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:323)
        at org.springframework.security.web.access.ExceptionTranslationFilter.doFilter(ExceptionTranslationFilter.java:113)
        at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:323)
        at org.springframework.security.web.session.SessionManagementFilter.doFilter(SessionManagementFilter.java:101)
        at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:323)
        at org.springframework.security.web.authentication.AnonymousAuthenticationFilter.doFilter(AnonymousAuthenticationFilter.java:113)
        at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:323)
        at org.springframework.security.web.servletapi.SecurityContextHolderAwareRequestFilter.doFilter(SecurityContextHolderAwareRequestFilter.java:54)
        at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:323)
        at org.springframework.security.web.savedrequest.RequestCacheAwareFilter.doFilter(RequestCacheAwareFilter.java:45)
        at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:323)
        at rg.springframework.security.web.authentication.AbstractAuthenticationProcessingFilter.doFilter(AbstractAuthenticationProcessingFilter.java:182)
        at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:323)
        at org.springframework.security.web.context.SecurityContextPersistenceFilter.doFilter(SecurityContextPersistenceFilter.java:87)
        at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:323)
        at org.springframework.security.web.FilterChainProxy.doFilter(FilterChainProxy.java:173)
        at org.springframework.web.filter.DelegatingFilterProxy.invokeDelegate(DelegatingFilterProxy.java:346)
        at org.springframework.web.filter.DelegatingFilterProxy.doFilter(DelegatingFilterProxy.java:259)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
        at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:220)
        at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:122)
        at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:505)
        at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:170)
        at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:103)
        at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:950)
        at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116)
        at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:423)
        at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1079)
        at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:620)
        at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:316)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
        at java.lang.Thread.run(Thread.java:745)
In the WebGUI i see this:
Code: Select all
/opt/openkm-6.3.1-community/tomcat/temp/okm3445709910616376817.txt (No such file or directory)
 #40773  by ds2k15
 
thanks, but i get the same Error with:
Code: Select all
system.ocr=/usr/bin/tesseract ${fileIn} ${fileOut} -l eng  or  system.ocr=/usr/bin/tesseract ${fileIn} ${fileOut}  in the OpenKM.cfg
then i set: hibernate.hbm2ddl=create
in the OpenKM.cfg then it works!

reimport the language sql script

now it works, is the config in the DB stored ?

About Us

OpenKM is part of the management software. A management software is a program that facilitates the accomplishment of administrative tasks. OpenKM is a document management system that allows you to manage business content and workflow in a more efficient way. Document managers guarantee data protection by establishing information security for business content.