Page 1 of 1

MS Office files .doc and .xls not indexed during upload

PostPosted:Wed Jul 07, 2010 10:58 am
by timsen
Hi,
I recognized a problem with files saved with Microsoft Office for example in .xls format.

I uploaded a .xls file with some unique patterns inside but I was not able to find these patterns via the "Search" function.

After I exported the file as a openOffice document (.ods) and uploaded it I was able to find the unique patterns with the "Search" function.

I use the following Version of OpenKM Version: 4.1 (build: 1683) on a Debian Lenny.

Is this bug already known? Should I post it in another forum? Sorry for my question but I am new to this forum and I do not know how bug reports are handled.

Regards from Germany, Timsen

Re: MS Office files .doc and .xls not indexed during upload

PostPosted:Thu Jul 08, 2010 8:55 pm
by jllort
upload here the file and we'll test it. Explain to us the query you've done

Re: MS Office files .doc and .xls not indexed during upload

PostPosted:Fri Jul 09, 2010 3:58 pm
by timsen
Hi jllort,
I uploaded two files to the current demo under http://demo.openkm.com/OpenKM/ as user07 to folder "Thread_Excel_vs_OO and the files are called Nummeries.xls and Nummeries.otd.
[both documents have the same content]

I then switched to the "Search Tab" and entered a search pattern for example 888 and it only shows the .otd file and not the .xls file as seen in the screenshot (attached).

Thanks for your reply!

Timsen

Re: MS Office files .doc and .xls not indexed during upload

PostPosted:Sat Jul 10, 2010 1:19 pm
by jllort
You'll must upload other time, because yesterady we changed demo to 5.0 and we've deleted all repository before I watched the post

Re: MS Office files .doc and .xls not indexed during upload

PostPosted:Mon Jul 12, 2010 7:33 am
by timsen
Hi,
I uploaded them a few minutes ago as .pdf, .xls and as .ods and with the new version you installed in your DEMO, none of the three file types seem to be indexed.
I can not find the documents when trying to search for a unique pattern from the dummy content of the files.

Can you confirm that there is a problem with the index mechanism of new files on your DEMO environment?

Regards from Germany, Tim

Re: MS Office files .doc and .xls not indexed during upload

PostPosted:Mon Jul 12, 2010 2:44 pm
by jllort
Could you tell us where are these documents, and which are the input search values ? in order to trying us

Re: MS Office files .doc and .xls not indexed during upload

PostPosted:Tue Jul 13, 2010 6:38 am
by timsen
Hi,
I created a folder "Thread_Excel_vs_OO" and placed a .pdf, a .xls and an .ods file in this folder. The content of each file is completely the same and the preview of each document type works well!

If I now search for a unique pattern which is in all of the three files (for example 88888) the search does not show any file, so my assumption is that there is a problem with the index during the upload of the files.

Gets your DEMO environment rolled back every night? Because the files I uploaded yesterday are gone again, so I uploaded them again a few minutes ago.

Best regards and thanks in advance for your feedback so far!

Timsen

Re: MS Office files .doc and .xls not indexed during upload

PostPosted:Thu Jul 15, 2010 9:44 am
by pavila
I need these sample document to test the text extraction, please.

Re: MS Office files .doc and .xls not indexed during upload

PostPosted:Thu Jul 15, 2010 1:00 pm
by timsen
Hi,
I added them into a .zip archive as attachment. Hope this helps. Otherwise it is easy to re-produce by creating some sample files with some unique patterns.

1. Create in OpenOffice a Spreadsheet and save as .ods
2. Export the .ods as .pdf
3. Export the .ods as MS .xls file
4. Upload all three files
5. Try to find the unique pattern

Hope this helps!

Best regards from Germany, Timsen

Re: MS Office files .doc and .xls not indexed during upload

PostPosted:Thu Jul 15, 2010 2:01 pm
by jllort
There's no problem to find it in demo.openkm.com

Simply go to search tab and in content write aaa*

If you not put * means you're looking for exact word in you case might be aaaaa

Re: MS Office files .doc and .xls not indexed during upload

PostPosted:Mon Jul 19, 2010 6:14 am
by timsen
Hi jllort,
thank you for your time and your testing! As you described it works for all documents so far!

Best regards from Germany, Timsen