Page 3 of 3

Re: OCR function, PNG works except for PDF files

PostPosted:Fri Apr 10, 2015 5:59 pm
by pavila
The problem was the image orientation inside the PDF. You see the image with the right orientation when open the PDF, but sometimes when it's extracted is rotated. I read the image rotation info (stored in PDF) and rotate it after being extracted to set the right orientation.

Re: OCR function, PNG works except for PDF files

PostPosted:Mon Apr 13, 2015 7:42 am
by fsouren
Now it makes sense ;) Thanks for the effort and patience!

Re: OCR function, PNG works except for PDF files

PostPosted:Fri Apr 17, 2015 12:20 pm
by pavila
You're welcome 8)

Re: OCR function, PNG works except for PDF files

PostPosted:Thu Jun 04, 2015 7:44 pm
by gwaitsi
wonderful......discovered this feature after having a java heap error. but couldn't see any recognisable text.

Have upgraded to 6.3.1 as suggested and the app has migrated and runs ok.

I observe the .txt files in /usr/local/openkm/temp a number of files;
okm123
okm123.txt

if i view the .txt file i can see OCR text from both english and german languages so tesseract is working with the language options.

But what happens after the files are deleted? There is no options in the application to view or benefit from the OCR'd text.
I don't see the search words increase, etc. So i am not clear on what this feature is providing?

Re: OCR function, PNG works except for PDF files

PostPosted:Sat Jun 06, 2015 10:51 am
by jllort
Copy some document UUID from properties tab.
Go to administration -> database query.
At bottom right choose "jdbc"
Execute the query: SELECT * FROM OKM_NODE_DOCUMENT WHERE NBS_UUID='the uuid you copyed".

The field NDC_TEXT contains the extracted text.
The field NDC_TEXT_EXTRACTED = 'T' or 'F' indicate if has been processed by text extractor queue or is still on queue.

Hope this explanation could help you to take more control about what OpenKM does.