Hi guys,
I got a question. I got PDFs with OCR which contain text. Through text extraction mechanism, everything was extracted successful.
When I check extraction in backend (Administration -> Utilities -> Check text extraction) I found a lot of extracted text for documents. So every document was extracted successful.
Anyway. I can't search (fulltext search) for every document. Because there a some minor ones left, which are not found.
To check this. I went back to Administration -> Utilities -> List indexes and activated "Show terms". A lot of documents got the extracted text as terms. But these one, which I can't search for, also doesn't contain any terms.
I also tried to Rebuild indexes few times (Administration -> Utilities -> Rebuild indexes -> Lucene indexes). But without success. Terms for some documents are still empty.
So my question is, where does these terme for documents come from? And do you have any idea whats going wrong here?
Regards!
I got a question. I got PDFs with OCR which contain text. Through text extraction mechanism, everything was extracted successful.
When I check extraction in backend (Administration -> Utilities -> Check text extraction) I found a lot of extracted text for documents. So every document was extracted successful.
Anyway. I can't search (fulltext search) for every document. Because there a some minor ones left, which are not found.
To check this. I went back to Administration -> Utilities -> List indexes and activated "Show terms". A lot of documents got the extracted text as terms. But these one, which I can't search for, also doesn't contain any terms.
I also tried to Rebuild indexes few times (Administration -> Utilities -> Rebuild indexes -> Lucene indexes). But without success. Terms for some documents are still empty.
So my question is, where does these terme for documents come from? And do you have any idea whats going wrong here?
Regards!
Last edited by Catscratch on Mon Jul 27, 2015 6:38 am, edited 2 times in total.
