Open Source Document Management System | OpenKM

PostPosted:**Fri Apr 10, 2015 5:59 pm**

The problem was the image orientation inside the PDF. You see the image with the right orientation when open the PDF, but sometimes when it's extracted is rotated. I read the image rotation info (stored in PDF) and rotate it after being extracted to set the right orientation.

PostPosted:**Mon Apr 13, 2015 7:42 am**

Now it makes sense

Thanks for the effort and patience!

PostPosted:**Fri Apr 17, 2015 12:20 pm**

You're welcome

PostPosted:**Thu Jun 04, 2015 7:44 pm**

wonderful......discovered this feature after having a java heap error. but couldn't see any recognisable text.

Have upgraded to 6.3.1 as suggested and the app has migrated and runs ok.

I observe the .txt files in /usr/local/openkm/temp a number of files;
okm123
okm123.txt

if i view the .txt file i can see OCR text from both english and german languages so tesseract is working with the language options.

But what happens after the files are deleted? There is no options in the application to view or benefit from the OCR'd text.
I don't see the search words increase, etc. So i am not clear on what this feature is providing?

PostPosted:**Sat Jun 06, 2015 10:51 am**

Copy some document UUID from properties tab.
Go to administration -> database query.
At bottom right choose "jdbc"
Execute the query: SELECT * FROM OKM_NODE_DOCUMENT WHERE NBS_UUID='the uuid you copyed".

The field NDC_TEXT contains the extracted text.
The field NDC_TEXT_EXTRACTED = 'T' or 'F' indicate if has been processed by text extractor queue or is still on queue.

Hope this explanation could help you to take more control about what OpenKM does.

Open Source Document Management System | OpenKM

OCR function, PNG works except for PDF files

Re: OCR function, PNG works except for PDF files

Re: OCR function, PNG works except for PDF files

Re: OCR function, PNG works except for PDF files

Re: OCR function, PNG works except for PDF files

Re: OCR function, PNG works except for PDF files