• Check text extraction for docx issue

  • OpenKM has many interesting features, but requires some configuration process to show its full potential.
OpenKM has many interesting features, but requires some configuration process to show its full potential.
Forum rules: Please, before asking something see the documentation wiki or use the search feature of the forum. And remember we don't have a crystal ball or mental readers, so if you post about an issue tell us which OpenKM are you using and also the browser and operating system version. For more info read How to Report Bugs Effectively.
 #52935  by saleem55
 
hello,
when I extract docx document I m getting un-readable content
please see the attachments , and when i search the content of the document ,nothing is showing
Attachments
docx trxt extraction.PNG
docx trxt extraction.PNG (236.08 KiB) Viewed 758 times
(11.54 KiB) Downloaded 29 times
 #52946  by jllort
 
Must install the LibreOffice Arabic dictionary to get it working. The OOTextExtractor is the LibreOffice ( OpenOffice ) text extractor, the problem I think is in this point, a missing language in the application, that will explain why is not able to open the file to get the content.
 #52949  by saleem55
 
jllort wrote: Sat Oct 16, 2021 6:37 pm Must install the LibreOffice Arabic dictionary to get it working. The OOTextExtractor is the LibreOffice ( OpenOffice ) text extractor, the problem I think is in this point, a missing language in the application, that will explain why is not able to open the file to get the content.
hello jllort
this is English document
 #52956  by jllort
 
Watching your screen again, the problem seems this is not a Docx file, this is a PDF file. If you take a look at the beginning of the raw will see "PDF-1.5" etc...
 #53112  by ketarino
 
Hello. I have the exact same issue. Seems that when upload a Word document (docx) it gets converted to pdf automatically, but file extension remains the same.
 #53120  by jllort
 
The OpenKM store the documents in the original format if you do not have done a customization for this purpose ( I suppose not ). I suggest checking the type of the document before uploading it into OpenKM.
 #53186  by silverspr
 
I also have the same issue, it is definitely a .docx file and was directly uploaded to the "check text extraction" tool under Administration, utilities. Looks like the wrong extractor is being used ? OOT I have no idea why the output is indicating this as a PDF file.

thanks
Attachments
docx 2022-01-10 124831.png
docx 2022-01-10 124831.png (56.14 KiB) Viewed 280 times
 #53194  by jllort
 
Sorry, but you shared a small image and is not possible to read anything there. Try sharing bigger and if possible the document.

About Us

OpenKM is part of the management software. A management software is a program that facilitates the accomplishment of administrative tasks. OpenKM is a document management system that allows you to manage business content and workflow in a more efficient way. Document managers guarantee data protection by establishing information security for business content.