We are hosting OpenKM 6.2.5 on Amazon cloud platform. There seems to be a problem with the text message extractor. 3 days back, I uploaded 5 text files to test the findByContent web service. Of the 5 files uploaded, only two had the messages extracted. I confirmed this by looking at the log and the indexes built for these 5 files. In the index, I could see the 'message Extracted' property is set to true only for the two files that returned text matches when tested with 'findByContent' webservice. I am not sure about the status of the remaining 3 files, whether text extraction failed or still pending? I ran the query 'select * from okm_activity where act_action = 'MISC_TEXT_EXTRACTION_FAILURE', but none of the 3 files appeared in the result. So I guess, text extraction failure may not be the reason.
That leaves me with only one option, whether the text extraction is still pending. I checked the 'Pending Text Extraction' option under 'Stats' feature. In the screen, I could see only 20 results, but the count showed more that 7000 files are pending for text extraction! I couldn't confirm if the 3 files that didn't return result in text search were part of the pending queue, as the UI only showed 20 files and I couldn't see any option to view all the files pending for extraction.
Next, I checked the Crontab and the text extractor job is scheduled to run every 5 seconds, but the count of pending jobs never changes. I even manually invoked it from the 'Execute' option of Crontab for the text extractor job, but still nothing changes.
My questions are
1) How can I find whether a particular document's text extraction status?(Some kind of database query would be helpful)
2) Why is it that the text extractor job does not seem to execute? Where can I check if the text extractor job is executing as scheduled?
3) If the text extractor job is not executing as scheduled, is there any way I can execute it manually? Also, if I can select a particular file for text extraction?
4) Can the text extractor be triggered through the code while the file is getting uploaded through the OpenKM webservices?
Please share your suggestions.
That leaves me with only one option, whether the text extraction is still pending. I checked the 'Pending Text Extraction' option under 'Stats' feature. In the screen, I could see only 20 results, but the count showed more that 7000 files are pending for text extraction! I couldn't confirm if the 3 files that didn't return result in text search were part of the pending queue, as the UI only showed 20 files and I couldn't see any option to view all the files pending for extraction.
Next, I checked the Crontab and the text extractor job is scheduled to run every 5 seconds, but the count of pending jobs never changes. I even manually invoked it from the 'Execute' option of Crontab for the text extractor job, but still nothing changes.
My questions are
1) How can I find whether a particular document's text extraction status?(Some kind of database query would be helpful)
2) Why is it that the text extractor job does not seem to execute? Where can I check if the text extractor job is executing as scheduled?
3) If the text extractor job is not executing as scheduled, is there any way I can execute it manually? Also, if I can select a particular file for text extraction?
4) Can the text extractor be triggered through the code while the file is getting uploaded through the OpenKM webservices?
Please share your suggestions.
