• OCR function, PNG works except for PDF files

  • OpenKM has many interesting features, but requires some configuration process to show its full potential.
OpenKM has many interesting features, but requires some configuration process to show its full potential.
Forum rules: Please, before asking something see the documentation wiki or use the search feature of the forum. And remember we don't have a crystal ball or mental readers, so if you post about an issue tell us which OpenKM are you using and also the browser and operating system version. For more info read How to Report Bugs Effectively.
 #38543  by pavila
 
The problem was the image orientation inside the PDF. You see the image with the right orientation when open the PDF, but sometimes when it's extracted is rotated. I read the image rotation info (stored in PDF) and rotate it after being extracted to set the right orientation.
 #39807  by gwaitsi
 
wonderful......discovered this feature after having a java heap error. but couldn't see any recognisable text.

Have upgraded to 6.3.1 as suggested and the app has migrated and runs ok.

I observe the .txt files in /usr/local/openkm/temp a number of files;
okm123
okm123.txt

if i view the .txt file i can see OCR text from both english and german languages so tesseract is working with the language options.

But what happens after the files are deleted? There is no options in the application to view or benefit from the OCR'd text.
I don't see the search words increase, etc. So i am not clear on what this feature is providing?
 #39828  by jllort
 
Copy some document UUID from properties tab.
Go to administration -> database query.
At bottom right choose "jdbc"
Execute the query: SELECT * FROM OKM_NODE_DOCUMENT WHERE NBS_UUID='the uuid you copyed".

The field NDC_TEXT contains the extracted text.
The field NDC_TEXT_EXTRACTED = 'T' or 'F' indicate if has been processed by text extractor queue or is still on queue.

Hope this explanation could help you to take more control about what OpenKM does.

About Us

OpenKM is part of the management software. A management software is a program that facilitates the accomplishment of administrative tasks. OpenKM is a document management system that allows you to manage business content and workflow in a more efficient way. Document managers guarantee data protection by establishing information security for business content.