• .xlsx TexteXtractor is not working correctly CE-6.3.9

  • We tried to make OpenKM as intuitive as possible, but an advice is always welcome.
We tried to make OpenKM as intuitive as possible, but an advice is always welcome.
Forum rules: Please, before asking something see the documentation wiki or use the search feature of the forum. And remember we don't have a crystal ball or mental readers, so if you post about an issue tell us which OpenKM are you using and also the browser and operating system version. For more info read How to Report Bugs Effectively.
 #50146  by aamgad@planet.com.eg
 
Hello dear,
Hope you are doing great.
I was checking the text extractions for the .xlsx files, but unfortunately i ran into a problem that it only extracts strings from the file and ignores numbers which is not the case in the .xls files. I have checked the source code in the github.
In the file document-management-system/src/main/java/com/openkm/extractor/SpreadsheetMLContentHandler.java
method: public String getFilePattern() {
return "xl/sharedStrings.xml";
}
Gets info from sharedStrings.xml file from the excel zip file which contains only strings and ignore numbers.
Attachments
screen.png
screen.png (77.65 KiB) Viewed 1909 times
(8.51 KiB) Downloaded 167 times
screen2.png
screen2.png (137.76 KiB) Viewed 1909 times
 #50154  by jllort
 
Try configuring the parameter system.catdoc.xls2csv ( https://docs.openkm.com/kcenter/view/ok ... eters.html ) what is used by NativeMsExcelTextExtractor ( if you are in windows take a look the bin folder where might be the exe file it is already available. If not search in google "catdoc xls2csv windows" )
 #50161  by aamgad@planet.com.eg
 
Hello dear,
Thank you for your reply.
NativeMsExcelTextExtractor is used for .xls files(older than MSOffice2007) and it working fine.
The problem is with .xlsx files which use different text extractor which is com.openkm.extractor.MsOffice2007TextExtractor
This only extracts strings and ignores numbers.
Thank you in advance

About Us

OpenKM is part of the management software. A management software is a program that facilitates the accomplishment of administrative tasks. OpenKM is a document management system that allows you to manage business content and workflow in a more efficient way. Document managers guarantee data protection by establishing information security for business content.