.xlsx TexteXtractor is not working correctly CE-6.3.9

We tried to make OpenKM as intuitive as possible, but an advice is always welcome.
Forum rules
Please, before asking something see the documentation wiki or use the search feature of the forum. And remember we don't have a crystal ball or mental readers, so if you post about an issue tell us which OpenKM are you using and also the browser and operating system version. For more info read How to Report Bugs Effectively.
Post Reply
aamgad@planet.com.eg
Fresh Boarder
Fresh Boarder
Posts: 5
Joined: Sat Jun 06, 2020 11:39 pm

.xlsx TexteXtractor is not working correctly CE-6.3.9

Post by aamgad@planet.com.eg »

Hello dear,
Hope you are doing great.
I was checking the text extractions for the .xlsx files, but unfortunately i ran into a problem that it only extracts strings from the file and ignores numbers which is not the case in the .xls files. I have checked the source code in the github.
In the file document-management-system/src/main/java/com/openkm/extractor/SpreadsheetMLContentHandler.java
method: public String getFilePattern() {
return "xl/sharedStrings.xml";
}
Gets info from sharedStrings.xml file from the excel zip file which contains only strings and ignore numbers.
Attachments
screen.png
Book1.xlsx
(8.51 KiB) Downloaded 6 times
screen2.png

jllort
Moderator
Moderator
Posts: 11123
Joined: Fri Dec 21, 2007 11:23 am
Location: Sineu - ( Illes Balears ) - Spain
Contact:

Re: .xlsx TexteXtractor is not working correctly CE-6.3.9

Post by jllort »

Try configuring the parameter system.catdoc.xls2csv ( https://docs.openkm.com/kcenter/view/ok ... eters.html ) what is used by NativeMsExcelTextExtractor ( if you are in windows take a look the bin folder where might be the exe file it is already available. If not search in google "catdoc xls2csv windows" )

aamgad@planet.com.eg
Fresh Boarder
Fresh Boarder
Posts: 5
Joined: Sat Jun 06, 2020 11:39 pm

Re: .xlsx TexteXtractor is not working correctly CE-6.3.9

Post by aamgad@planet.com.eg »

Hello dear,
Thank you for your reply.
NativeMsExcelTextExtractor is used for .xls files(older than MSOffice2007) and it working fine.
The problem is with .xlsx files which use different text extractor which is com.openkm.extractor.MsOffice2007TextExtractor
This only extracts strings and ignores numbers.
Thank you in advance

jllort
Moderator
Moderator
Posts: 11123
Joined: Fri Dec 21, 2007 11:23 am
Location: Sineu - ( Illes Balears ) - Spain
Contact:

Re: .xlsx TexteXtractor is not working correctly CE-6.3.9

Post by jllort »

Add the issue at https://github.com/openkm/document-mana ... tem/issues . If possible share xlsx sample. In the body of the issue add a link to this post

Post Reply