• Content search not working for notepad content and styled text

  • OpenKM tiene muchas características interesantes, pero es necesario un proceso de configuración para mostrar todo su potencial.
OpenKM tiene muchas características interesantes, pero es necesario un proceso de configuración para mostrar todo su potencial.
Forum rules: Por favor, antes de preguntar algo consulta el wiki de documentación o utiliza la función de búsqueda del foro. Recuerda que no tenemos una bola de cristal ni poderes mentales, o sea que que para informar sobre un error es necesario que nos indiques tanto la versión de OpenKM que usas como la del navegador y sistema operativo. Para más información consulta Cómo informar de fallos de forma efectiva.
 #31431  by Prajakta
 
Hi OpenKM Support Team,
We have OpenKM Community Edition 6.3.0 installed on our machine. Browsers used: Mozilla Firefox, IE

In the content search we are facing following issues,
1. The content search is not working for notepad content.
2. The search result is not returned for the styled text like bold and italics.

Please let us know if there are some configurations which are to be done so that the appropriate result is returned.

Regards,
Prajakta
 #31481  by Prajakta
 
Please find the screenshots.zip consisting of the screenshots of the OpenKM search and notepad reference document.zip consisting of the notepad file used for searching.
We were not able to attach the PDF reference document due to its size. Please find below the link to the PDF reference document :-
http://docs.spring.io/spring/docs/2.5.x ... erence.pdf
Attachments
Reference
(6.22 KiB) Downloaded 218 times
Screenshots
(543.32 KiB) Downloaded 204 times
 #31513  by jllort
 
Did you see if document has been processed by text extractor queue -> Administration -> Stats -> pending extractor queue. Documents are not processed just in time, go into queue and processed to extract text.

In Administratin -> Crontab tab you got the task "Text extractor" what does it, you can force execution from there.
 #39539  by ravikumar
 
Hi,

I am colleague of the member who posted this issue, and would be working on this issue.
So after adding debug logs for Text Extractor, I see below exception in logs:
Code: Select all
2015-05-13 18:30:00,112 [Thread-29] INFO  com.openkm.extractor.TextExtractorWorker - processSerial.Working on {docUuid=00d1db8d-4dbd-4376-a1c1-47ddb8d851f8, docPath=/okm:trash/okmAdmin/date results.txt, docVerUuid=548b2da3-97ad-4053-93c0-ec6fd59dfbf4, date=Fri Oct 04 16:35:00 IST 2013}
2015-05-13 18:30:00,113 [Thread-29] WARN  com.openkm.extractor.TextExtractorWorker - /usr/share/apache-tomcat-7.0.53/repository/datastore/54/8b/2d/a3/548b2da3-97ad-4053-93c0-ec6fd59dfbf4 (No such file or directory)
java.io.FileNotFoundException: /usr/share/apache-tomcat-7.0.53/repository/datastore/54/8b/2d/a3/548b2da3-97ad-4053-93c0-ec6fd59dfbf4 (No such file or directory)
        at java.io.FileInputStream.open(Native Method)
        at java.io.FileInputStream.<init>(FileInputStream.java:146)
        at com.openkm.module.db.stuff.FsDataStore.read(FsDataStore.java:68)
        at com.openkm.dao.NodeDocumentDAO.textExtractorHelper(NodeDocumentDAO.java:1291)
        at com.openkm.extractor.TextExtractorWorker.processSerial(TextExtractorWorker.java:138)
        at com.openkm.extractor.TextExtractorWorker.processQueue(TextExtractorWorker.java:125)
        at com.openkm.extractor.TextExtractorWorker.run(TextExtractorWorker.java:80)
Is it because of this exception that the textextrator is not completing ? Please help.
 #39559  by jllort
 
Seems the document processed is on trash /okm:trash/okmAdmin. We'll can you go to administration -> utilities and do a repository check from /okm:trash node ( choose version history check ). I suspect there's a missing version file /usr/share/apache-tomcat-7.0.53/repository/datastore/54/8b/2d/a3/548b2da3-97ad-4053-93c0-ec6fd59dfbf4 and repository checker tools will check if there's some error on repository or not ?
 #39583  by JavaDev
 
We are still getting the File Not Found Exception, even we run the repository checker form Administration -> Utilities as suggested in above post.
Code: Select all
2015-05-18 11:35:00,022 [Thread-2481] INFO  com.openkm.extractor.TextExtractorWorker - processSerial.Working on {docUuid=00d1db8d-4dbd-4376-a1c1-47ddb8d851f8, docPath=/okm:trash/okmAdmin/date results.txt, docVerUuid=548b2da3-97ad-4053-93c0-ec6fd59dfbf4, date=Fri Oct 04 16:35:00 IST 2013}
2015-05-18 11:35:00,023 [Thread-2481] WARN  com.openkm.extractor.TextExtractorWorker - /usr/share/apache-tomcat-7.0.53/repository/datastore/54/8b/2d/a3/548b2da3-97ad-4053-93c0-ec6fd59dfbf4 (No such file or directory)
java.io.FileNotFoundException: /usr/share/apache-tomcat-7.0.53/repository/datastore/54/8b/2d/a3/548b2da3-97ad-4053-93c0-ec6fd59dfbf4 (No such file or directory)
        at java.io.FileInputStream.open(Native Method)
        at java.io.FileInputStream.<init>(FileInputStream.java:146)
        at com.openkm.module.db.stuff.FsDataStore.read(FsDataStore.java:68)
        at com.openkm.dao.NodeDocumentDAO.textExtractorHelper(NodeDocumentDAO.java:1291)
        at com.openkm.extractor.TextExtractorWorker.processSerial(TextExtractorWorker.java:138)
        at com.openkm.extractor.TextExtractorWorker.processQueue(TextExtractorWorker.java:125)
        at com.openkm.extractor.TextExtractorWorker.run(TextExtractorWorker.java:80)
        at sun.reflect.GeneratedMethodAccessor856.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at bsh.Reflect.invokeOnMethod(Unknown Source)
Is there any way we can do some cleanup to resolve this issue ?
 #39592  by jllort
 
My suggestion is upgrade to nighly build ( integration.openkm.com ). The migration process you must do is http://wiki.openkm.com/index.php/Migrat ... 3_to_6.3.1

There was a bug on deleting documents with more than one version, that was not deleted in correct order and that caused this problem. To solve it should create the missing files on hard disk ( probably should execute the process serveral times until you get it solved ).
1- Go to administration -> utilities -> check repository
2- For each missing file execute the command
touch /usr/share/apache-tomcat-7.0.53/repository/datastore/54/8b/2d/a3/548b2da3-97ad-4053-93c0-ec6fd59dfbf4

For example, if document had 5 versions, probably you should execute the process 5 times ( apologies for this tedious bug ).
 #39617  by jllort
 
use it
Code: Select all
echo $null >> d:\repository\tenant_2\datastore\6e\c5\b5\ef\6ec5b5ef-b3de-4698-bd17-abf7cb8ea099
If you've modified 6.3.0 code, you can create a patch and apply to 6.3.1 ( actual 6.3 branch ), we've done minimal changes and should go right without conflicts.
 #40070  by Prajakta
 
Hi,

content search is not working properly for notepad content , styled text like bold and italics.
I uploaded 10 sample .text files with the same content as ( Admin & Admin )
Then i tried to search for the content "Admin & Admin"
But it didn't returned any document
Then as suggested by you
i tried to see if the documents i uploaded recently has been processed by text extractor queue in (-> Administration -> Stats -> pending extractor queue. )

but found that documents are still in pending queue for around 5 min
what if i don't want to force execution from Crontab -Text Extractor

Can you please tell me
where to find the configuration of awaking the text extractor after certain time period(in our case its 5 min)
i tried to add new property managed.text.extraction.pool.timeout =1 minute
But its not working
 #40094  by jllort
 
Hi

You can not search by exact phrase, you're searching by keywords ( tokens ), your query should be a single keyword Admin. Take in mind when content goes into lucene to be processed, it removed some characters ( stop characters ) etc ... you can set your own lucene analyzer.

First step should be see which content has been extracted
Code: Select all
SELECT * FROM OKM_NODE_DOCUMENT WHERE NBS_UUID='your doc uuid here ';
And also where are you doing the query, on simply or advanced search view ? because are not doing exactly the same.
 #40102  by Prajakta
 
Text extraction is not working properly
i uploaded 5 documents with the same content and
found that text extractor marked them as extracted in the database OKM_NODE_DOCUMENT(NDC_TEXT_EXTRACTED =T)
but in NDC_TEXT column it put the value as "NULL"
Also for some documents i can see an entry in OKM_NODE_BASE table with same NBS_UUID( present in OKM_NODE_DOCUMENT)

So even though the document is processed by text extractor,searching is not working.

Can you please tell me, root cause behind this problem and how extraction process works internally ,so that we can fix the problem

About Us

OpenKM is part of the management software. A management software is a program that facilitates the accomplishment of administrative tasks. OpenKM is a document management system that allows you to manage business content and workflow in a more efficient way. Document managers guarantee data protection by establishing information security for business content.