Page 1 of 1

Extracción texto PDF

PostPosted:Tue Jan 14, 2020 3:08 pm
by gcosta
Buenas tardes, estamos usando la versión Community 6.3.8 y hemos detectado que no extrae el texto de los archivos pdf.

Si ejecutamos el test del textextractor nos da el siguiente error:
Code: Select all
org.apache.pdfbox.pdmodel.graphics.xobject.PDXObjectForm cannot be cast to org.apache.pdfbox.pdmodel.graphics.xobject.PDXObjectImage
Que nos falta configurar?

Gracias.

Re: Extracción texto PDF

PostPosted:Sat Jan 18, 2020 10:47 am
by jllort
Puedes compartirnos un fichero PDF que no funcione para realizar un test de nuestro lado ?
Y si es posible la traza completa de el error ( fichero catalina.log )

Re: Extracción texto PDF

PostPosted:Thu Jan 23, 2020 4:00 pm
by gcosta
Buenas tardes, gracias por la respuesta. A continuación te mando el registro del log al ejecutar el text extractor test.

Referente al fichero, si tienes algun sitio privado donde te lo pueda colgar por favor indicame.

Gracias.
Code: Select all
StdErr: 
2020-01-23 16:54:28,866 [http-nio-0.0.0.0-8020-exec-8] [] WARN  com.openkm.util.ReportUtils - Report '7' has no params.xml file
2020-01-23 16:54:57,812 [http-nio-0.0.0.0-8020-exec-2] [] WARN  c.openkm.extractor.PdfTextExtractor - PDF does not contains text layer
2020-01-23 16:54:57,814 [http-nio-0.0.0.0-8020-exec-2] [] WARN  c.openkm.extractor.PdfTextExtractor - Failed to extract PDF text content
java.lang.ClassCastException: org.apache.pdfbox.pdmodel.graphics.xobject.PDXObjectForm cannot be cast to org.apache.pdfbox.pdmodel.graphics.xobject.PDXObjectImage
	at com.openkm.extractor.PdfTextExtractor.extractText(PdfTextExtractor.java:145) ~[classes/:6.3.8]
	at com.openkm.extractor.RegisteredExtractors.getText(RegisteredExtractors.java:164) [classes/:6.3.8]
	at com.openkm.servlet.admin.CheckTextExtractionServlet.doPost(CheckTextExtractionServlet.java:133) [classes/:6.3.8]
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:661) [servlet-api.jar:na]
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:742) [servlet-api.jar:na]
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:231) [catalina.jar:8.5.24]
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166) [catalina.jar:8.5.24]
	at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52) [tomcat-websocket.jar:8.5.24]
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193) [catalina.jar:8.5.24]
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166) [catalina.jar:8.5.24]
	at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:330) [spring-security-web-3.2.10.RELEASE.jar:na]
	at org.springframework.security.web.access.intercept.FilterSecurityInterceptor.invoke(FilterSecurityInterceptor.java:118) [spring-security-web-3.2.10.RELEASE.jar:na]
	at org.springframework.security.web.access.intercept.FilterSecurityInterceptor.doFilter(FilterSecurityInterceptor.java:84) [spring-security-web-3.2.10.RELEASE.jar:na]
	at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:342) [spring-security-web-3.2.10.RELEASE.jar:na]
	at org.springframework.security.web.access.ExceptionTranslationFilter.doFilter(ExceptionTranslationFilter.java:113) [spring-security-web-3.2.10.RELEASE.jar:na]
	at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:342) [spring-security-web-3.2.10.RELEASE.jar:na]
	at org.springframework.security.web.session.SessionManagementFilter.doFilter(SessionManagementFilter.java:103) [spring-security-web-3.2.10.RELEASE.jar:na]
	at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:342) [spring-security-web-3.2.10.RELEASE.jar:na]
	at org.springframework.security.web.authentication.AnonymousAuthenticationFilter.doFilter(AnonymousAuthenticationFilter.java:113) [spring-security-web-3.2.10.RELEASE.jar:na]
	at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:342) [spring-security-web-3.2.10.RELEASE.jar:na]
	at org.springframework.security.web.servletapi.SecurityContextHolderAwareRequestFilter.doFilter(SecurityContextHolderAwareRequestFilter.java:154) [spring-security-web-3.2.10.RELEASE.jar:na]
	at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:342) [spring-security-web-3.2.10.RELEASE.jar:na]
	at org.springframework.security.web.savedrequest.RequestCacheAwareFilter.doFilter(RequestCacheAwareFilter.java:45) [spring-security-web-3.2.10.RELEASE.jar:na]
	at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:342) [spring-security-web-3.2.10.RELEASE.jar:na]
	at org.springframework.security.web.authentication.AbstractAuthenticationProcessingFilter.doFilter(AbstractAuthenticationProcessingFilter.java:199) [spring-security-web-3.2.10.RELEASE.jar:na]
	at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:342) [spring-security-web-3.2.10.RELEASE.jar:na]
	at org.springframework.security.web.context.request.async.WebAsyncManagerIntegrationFilter.doFilterInternal(WebAsyncManagerIntegrationFilter.java:50) [spring-security-web-3.2.10.RELEASE.jar:na]
	at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:106) [spring-web-3.2.18.RELEASE.jar:3.2.18.RELEASE]
	at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:342) [spring-security-web-3.2.10.RELEASE.jar:na]
	at org.springframework.security.web.context.SecurityContextPersistenceFilter.doFilter(SecurityContextPersistenceFilter.java:87) [spring-security-web-3.2.10.RELEASE.jar:na]
	at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:342) [spring-security-web-3.2.10.RELEASE.jar:na]
	at org.springframework.security.web.FilterChainProxy.doFilterInternal(FilterChainProxy.java:192) [spring-security-web-3.2.10.RELEASE.jar:na]
	at org.springframework.security.web.FilterChainProxy.doFilter(FilterChainProxy.java:160) [spring-security-web-3.2.10.RELEASE.jar:na]
	at org.springframework.web.filter.DelegatingFilterProxy.invokeDelegate(DelegatingFilterProxy.java:343) [spring-web-3.2.18.RELEASE.jar:3.2.18.RELEASE]
	at org.springframework.web.filter.DelegatingFilterProxy.doFilter(DelegatingFilterProxy.java:260) [spring-web-3.2.18.RELEASE.jar:3.2.18.RELEASE]
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193) [catalina.jar:8.5.24]
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166) [catalina.jar:8.5.24]
	at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:198) [catalina.jar:8.5.24]
	at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:96) [catalina.jar:8.5.24]
	at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:504) [catalina.jar:8.5.24]
	at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:140) [catalina.jar:8.5.24]
	at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:81) [catalina.jar:8.5.24]
	at org.apache.catalina.valves.AbstractAccessLogValve.invoke(AbstractAccessLogValve.java:650) [catalina.jar:8.5.24]
	at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:87) [catalina.jar:8.5.24]
	at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:342) [catalina.jar:8.5.24]
	at org.apache.coyote.http11.Http11Processor.service(Http11Processor.java:803) [tomcat-coyote.jar:8.5.24]
	at org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:66) [tomcat-coyote.jar:8.5.24]
	at org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:790) [tomcat-coyote.jar:8.5.24]
	at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1459) [tomcat-coyote.jar:8.5.24]
	at org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:49) [tomcat-coyote.jar:8.5.24]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_71]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_71]
	at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61) [tomcat-util.jar:8.5.24]
	at java.lang.Thread.run(Thread.java:745) [na:1.8.0_71]
2020-01-23 16:55:00,060 [Thread-11759] [] WARN  com.openkm.core.Cron - Crontab task mail address is empty: Return: null
<hr/>
StdOut: 
<hr/>
StdErr: 
2020-01-23 16:55:00,079 [Thread-11760] [] WARN  com.openkm.core.Cron - Crontab task mail address is empty: Return: null
<hr/>
StdOut: 
<hr/>
StdErr: 
2020-01-23 16:55:03,083 [Thread-11758] [] WARN  com.openkm.core.Cron - Crontab task mail address is empty: Return: null
<hr/>
StdOut: 
<hr/>
StdErr: 

Re: Extracción texto PDF

PostPosted:Sat Jan 25, 2020 11:10 am
by jllort
Contacta con nostros a través de el formulario de contacto indicando la url de el foro ( pero sin el http: de delante o no te dejará enviar la consulta ) y ya nos pondremos en contacto contigo
https://www.openkm.com/es/contacto.html

Re: Extracción texto PDF

PostPosted:Mon Jan 27, 2020 9:37 am
by gcosta
Ok, enviado.

Gracias.

Re: Extracción texto PDF

PostPosted:Thu Jan 30, 2020 6:53 pm
by jllort
Si con lo que te hemos respondido directamente por email no termina de funcionarte, indícame que sistema operativo estas utilizando.

Re: Extracción texto PDF

PostPosted:Fri Jan 31, 2020 3:13 pm
by jllort
Te sugiero actualizar a la ultima versión ( que saldrá la próxima semana ) y adicionalmente que version de jdk estas utilizando ?

Re: Extracción texto PDF

PostPosted:Fri Jan 31, 2020 3:55 pm
by gcosta
ok, la semana próxima actualizo.

Referente a la versión java 1.8.0_71. Actualizada no hace mucho.

Gracias.

Re: Extracción texto PDF

PostPosted:Sat Feb 01, 2020 10:25 am
by jllort
Pues esta versión es de el año de la castaña :) debe tener más de 1-2 años seguro. Te aconsejo que te instales el openjdk ( en Linux ), nosotros después de el cambio de licenciamiento de Oracle con el JDK nos hemos movido a openjdk en todos los entornos ( de hecho en previsión de este cambio ya hace más de un año que empezamos con el cambio ).