Open Source Document Management System | OpenKM

PostPosted:**Thu Feb 26, 2015 12:43 pm**

Hi OpenKM Support Team,
We have OpenKM Community Edition 6.3.0 installed on our machine. Browsers used: Mozilla Firefox, IE

In the content search we are facing following issues,
1. The content search is not working for notepad content.
2. The search result is not returned for the styled text like bold and italics.

Please let us know if there are some configurations which are to be done so that the appropriate result is returned.

Regards,
Prajakta

PostPosted:**Fri Feb 27, 2015 5:42 pm**

Please provide us some screenshots ( zip in the post ), and if it's possible some document to reproduce the problem.

PostPosted:**Mon Mar 02, 2015 10:10 am**

Please find the screenshots.zip consisting of the screenshots of the OpenKM search and notepad reference document.zip consisting of the notepad file used for searching.
We were not able to attach the PDF reference document due to its size. Please find below the link to the PDF reference document :-
http://docs.spring.io/spring/docs/2.5.x ... erence.pdf

PostPosted:**Fri Mar 06, 2015 5:24 pm**

Did you see if document has been processed by text extractor queue -> Administration -> Stats -> pending extractor queue. Documents are not processed just in time, go into queue and processed to extract text.

In Administratin -> Crontab tab you got the task "Text extractor" what does it, you can force execution from there.

PostPosted:**Wed May 13, 2015 1:05 pm**

Hi,

I am colleague of the member who posted this issue, and would be working on this issue.
So after adding debug logs for Text Extractor, I see below exception in logs:

Code: Select all

2015-05-13 18:30:00,112 [Thread-29] INFO  com.openkm.extractor.TextExtractorWorker - processSerial.Working on {docUuid=00d1db8d-4dbd-4376-a1c1-47ddb8d851f8, docPath=/okm:trash/okmAdmin/date results.txt, docVerUuid=548b2da3-97ad-4053-93c0-ec6fd59dfbf4, date=Fri Oct 04 16:35:00 IST 2013}
2015-05-13 18:30:00,113 [Thread-29] WARN  com.openkm.extractor.TextExtractorWorker - /usr/share/apache-tomcat-7.0.53/repository/datastore/54/8b/2d/a3/548b2da3-97ad-4053-93c0-ec6fd59dfbf4 (No such file or directory)
java.io.FileNotFoundException: /usr/share/apache-tomcat-7.0.53/repository/datastore/54/8b/2d/a3/548b2da3-97ad-4053-93c0-ec6fd59dfbf4 (No such file or directory)
        at java.io.FileInputStream.open(Native Method)
        at java.io.FileInputStream.<init>(FileInputStream.java:146)
        at com.openkm.module.db.stuff.FsDataStore.read(FsDataStore.java:68)
        at com.openkm.dao.NodeDocumentDAO.textExtractorHelper(NodeDocumentDAO.java:1291)
        at com.openkm.extractor.TextExtractorWorker.processSerial(TextExtractorWorker.java:138)
        at com.openkm.extractor.TextExtractorWorker.processQueue(TextExtractorWorker.java:125)
        at com.openkm.extractor.TextExtractorWorker.run(TextExtractorWorker.java:80)

Is it because of this exception that the textextrator is not completing ? Please help.

PostPosted:**Thu May 14, 2015 2:30 pm**

Seems the document processed is on trash /okm:trash/okmAdmin. We'll can you go to administration -> utilities and do a repository check from /okm:trash node ( choose version history check ). I suspect there's a missing version file /usr/share/apache-tomcat-7.0.53/repository/datastore/54/8b/2d/a3/548b2da3-97ad-4053-93c0-ec6fd59dfbf4 and repository checker tools will check if there's some error on repository or not ?

PostPosted:**Mon May 18, 2015 6:58 am**

We are still getting the File Not Found Exception, even we run the repository checker form Administration -> Utilities as suggested in above post.

Code: Select all

2015-05-18 11:35:00,022 [Thread-2481] INFO  com.openkm.extractor.TextExtractorWorker - processSerial.Working on {docUuid=00d1db8d-4dbd-4376-a1c1-47ddb8d851f8, docPath=/okm:trash/okmAdmin/date results.txt, docVerUuid=548b2da3-97ad-4053-93c0-ec6fd59dfbf4, date=Fri Oct 04 16:35:00 IST 2013}
2015-05-18 11:35:00,023 [Thread-2481] WARN  com.openkm.extractor.TextExtractorWorker - /usr/share/apache-tomcat-7.0.53/repository/datastore/54/8b/2d/a3/548b2da3-97ad-4053-93c0-ec6fd59dfbf4 (No such file or directory)
java.io.FileNotFoundException: /usr/share/apache-tomcat-7.0.53/repository/datastore/54/8b/2d/a3/548b2da3-97ad-4053-93c0-ec6fd59dfbf4 (No such file or directory)
        at java.io.FileInputStream.open(Native Method)
        at java.io.FileInputStream.<init>(FileInputStream.java:146)
        at com.openkm.module.db.stuff.FsDataStore.read(FsDataStore.java:68)
        at com.openkm.dao.NodeDocumentDAO.textExtractorHelper(NodeDocumentDAO.java:1291)
        at com.openkm.extractor.TextExtractorWorker.processSerial(TextExtractorWorker.java:138)
        at com.openkm.extractor.TextExtractorWorker.processQueue(TextExtractorWorker.java:125)
        at com.openkm.extractor.TextExtractorWorker.run(TextExtractorWorker.java:80)
        at sun.reflect.GeneratedMethodAccessor856.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at bsh.Reflect.invokeOnMethod(Unknown Source)

Is there any way we can do some cleanup to resolve this issue ?

PostPosted:**Mon May 18, 2015 7:09 am**

Another additional thing I want to share is that, I am not able to purge trash folder. When I select any folder or file and try to delete, they do not get deleted. I do not get any error message also.

PostPosted:**Tue May 19, 2015 2:33 pm**

My suggestion is upgrade to nighly build ( integration.openkm.com ). The migration process you must do is http://wiki.openkm.com/index.php/Migrat ... 3_to_6.3.1

There was a bug on deleting documents with more than one version, that was not deleted in correct order and that caused this problem. To solve it should create the missing files on hard disk ( probably should execute the process serveral times until you get it solved ).
1- Go to administration -> utilities -> check repository
2- For each missing file execute the command
touch /usr/share/apache-tomcat-7.0.53/repository/datastore/54/8b/2d/a3/548b2da3-97ad-4053-93c0-ec6fd59dfbf4

For example, if document had 5 versions, probably you should execute the process 5 times ( apologies for this tedious bug ).

PostPosted:**Wed May 20, 2015 4:32 am**

Thank you for your suggestion, but we have some customizations in OpenKM because of which upgrading would not be simple.

I have OpenKM hosted on Windows machine, so can you tell me what command to execute in place of "touch" ?

PostPosted:**Fri May 22, 2015 1:40 pm**

use it

Code: Select all

echo $null >> d:\repository\tenant_2\datastore\6e\c5\b5\ef\6ec5b5ef-b3de-4698-bd17-abf7cb8ea099

If you've modified 6.3.0 code, you can create a patch and apply to 6.3.1 ( actual 6.3 branch ), we've done minimal changes and should go right without conflicts.

PostPosted:**Thu Jul 09, 2015 12:05 pm**

Hi,

content search is not working properly for notepad content , styled text like bold and italics.
I uploaded 10 sample .text files with the same content as ( Admin & Admin )
Then i tried to search for the content "Admin & Admin"
But it didn't returned any document
Then as suggested by you
i tried to see if the documents i uploaded recently has been processed by text extractor queue in (-> Administration -> Stats -> pending extractor queue. )

but found that documents are still in pending queue for around 5 min
what if i don't want to force execution from Crontab -Text Extractor

Can you please tell me
where to find the configuration of awaking the text extractor after certain time period(in our case its 5 min)
i tried to add new property managed.text.extraction.pool.timeout =1 minute
But its not working

PostPosted:**Mon Jul 13, 2015 10:14 am**

Hi

You can not search by exact phrase, you're searching by keywords ( tokens ), your query should be a single keyword Admin. Take in mind when content goes into lucene to be processed, it removed some characters ( stop characters ) etc ... you can set your own lucene analyzer.

First step should be see which content has been extracted

Code: Select all

SELECT * FROM OKM_NODE_DOCUMENT WHERE NBS_UUID='your doc uuid here ';

And also where are you doing the query, on simply or advanced search view ? because are not doing exactly the same.

PostPosted:**Mon Jul 13, 2015 12:37 pm**

Text extraction is not working properly
i uploaded 5 documents with the same content and
found that text extractor marked them as extracted in the database OKM_NODE_DOCUMENT(NDC_TEXT_EXTRACTED =T)
but in NDC_TEXT column it put the value as "NULL"
Also for some documents i can see an entry in OKM_NODE_BASE table with same NBS_UUID( present in OKM_NODE_DOCUMENT)

So even though the document is processed by text extractor,searching is not working.

Can you please tell me, root cause behind this problem and how extraction process works internally ,so that we can fix the problem

PostPosted:**Thu Jul 16, 2015 4:01 pm**

What kind of documents are you uploading ? Seems the extractor is not going right for your documents.

Open Source Document Management System | OpenKM

Content search not working for notepad content and styled text

Content search not working for notepad content and styled text

Re: Content search not working for notepad content and styled text

Re: Content search not working for notepad content and styled text

Re: Content search not working for notepad content and styled text

Re: Content search not working for notepad content and styled text

Re: Content search not working for notepad content and styled text

Re: Content search not working for notepad content and styled text

Re: Content search not working for notepad content and styled text

Re: Content search not working for notepad content and styled text

Re: Content search not working for notepad content and styled text

Re: Content search not working for notepad content and styled text

Re: Content search not working for notepad content and styled text

Re: Content search not working for notepad content and styled text

OpenKM Text extraction is not working properly

Re: Content search not working for notepad content and styled text