Page 1 of 1

.xlsx TexteXtractor is not working correctly CE-6.3.9

PostPosted:Fri Jun 26, 2020 2:18 pm
by aamgad@planet.com.eg
Hello dear,
Hope you are doing great.
I was checking the text extractions for the .xlsx files, but unfortunately i ran into a problem that it only extracts strings from the file and ignores numbers which is not the case in the .xls files. I have checked the source code in the github.
In the file document-management-system/src/main/java/com/openkm/extractor/SpreadsheetMLContentHandler.java
method: public String getFilePattern() {
return "xl/sharedStrings.xml";
}
Gets info from sharedStrings.xml file from the excel zip file which contains only strings and ignore numbers.

Re: .xlsx TexteXtractor is not working correctly CE-6.3.9

PostPosted:Sat Jun 27, 2020 8:56 am
by jllort
Try configuring the parameter system.catdoc.xls2csv ( https://docs.openkm.com/kcenter/view/ok ... eters.html ) what is used by NativeMsExcelTextExtractor ( if you are in windows take a look the bin folder where might be the exe file it is already available. If not search in google "catdoc xls2csv windows" )

Re: .xlsx TexteXtractor is not working correctly CE-6.3.9

PostPosted:Sat Jun 27, 2020 2:37 pm
by aamgad@planet.com.eg
Hello dear,
Thank you for your reply.
NativeMsExcelTextExtractor is used for .xls files(older than MSOffice2007) and it working fine.
The problem is with .xlsx files which use different text extractor which is com.openkm.extractor.MsOffice2007TextExtractor
This only extracts strings and ignores numbers.
Thank you in advance

Re: .xlsx TexteXtractor is not working correctly CE-6.3.9

PostPosted:Mon Jun 29, 2020 11:34 am
by jllort
Add the issue at https://github.com/openkm/document-mana ... tem/issues . If possible share xlsx sample. In the body of the issue add a link to this post