Page 2 of 3
Re: OCR function, PNG works except for PDF files
PostPosted:Thu Mar 12, 2015 1:19 pm
by fsouren
Thanks for your reply! I'll dig into it.
But on the other hand, i'm probably not the only one trying to index Dutch text PDF files i guess. (English text PDF works great).
I just can't figure out where it goes wrong.
Re: OCR function, PNG works except for PDF files
PostPosted:Sun Mar 15, 2015 8:10 am
by jllort
Should debug the temp files created before be executed with OCR. Two weeks ago we've released portable dev environment
http://sourceforge.net/projects/openkmportabledev/ my suggestion is download it, set some breakpoint into pdf text extractor, and step by step, take a look about what's happening, specially on tmp files. ( upload only one document and from crontab -> force indexing ).
Re: OCR function, PNG works except for PDF files
PostPosted:Tue Mar 17, 2015 12:38 pm
by fsouren
Could you try one more thing for me?
I have 2 scans, doc1.pdf and doc3.pdf. doc1.pdf works, doc3.pdf doesn't.
What could be the difference?
http://www.famsouren.nl/doc1.pdf
http://www.famsouren.nl/doc3.pdf
doc3.pdf works if i first convert it manually to PNG, then upload the PNG file.
Re: OCR function, PNG works except for PDF files
PostPosted:Sat Mar 21, 2015 6:50 pm
by jllort
I've test in our online demo and seems there is going right. I attach here the text extracted.
Do you got the last OpenKM version ( the nighly build, because there're we've corrected some issues
http://integration.openkm.com/ and here information about migration
http://wiki.openkm.com/index.php/Migration_Guide).
Re: OCR function, PNG works except for PDF files
PostPosted:Sun Mar 22, 2015 8:51 am
by fsouren
I've tried what you said, and upgraded to build 8186.
But still i get a lot of garbage when indexing doc3.pdf.
Sadly enough i should let it go i guess, i just can't seem the get it working

Re: OCR function, PNG works except for PDF files
PostPosted:Tue Mar 24, 2015 12:18 pm
by fsouren
Is it possible to drop the settings from the demo site here? So i can compare them.
Can't seem to view them when logging in as a demo user.
Re: OCR function, PNG works except for PDF files
PostPosted:Sun Mar 29, 2015 3:07 pm
by jllort
Demo is based on professional version, is not he community ( both versions have a similar base, but are quite different ).
For what you told us, with nightly build you got exactly the same problem no ? Can you post here a text file with extracted contents ?
Re: OCR function, PNG works except for PDF files
PostPosted:Mon Mar 30, 2015 7:19 am
by fsouren
Yes, exact the same problem.
Re: OCR function, PNG works except for PDF files
PostPosted:Fri Apr 03, 2015 11:19 am
by fsouren
Anyone?
Re: OCR function, PNG works except for PDF files
PostPosted:Sat Apr 04, 2015 9:13 am
by pavila
So you've tested with a recent nightbuilt, haven't you?
Re: OCR function, PNG works except for PDF files
PostPosted:Sun Apr 05, 2015 6:50 am
by fsouren
Yes i did. I even did a clean install with Ubuntu 14.04 en OpenKM nightly.
The only thing i did was install LibreOffice and ImageMagick, then OpenKM and replace OpenKM.war with a nightly one.
Re: OCR function, PNG works except for PDF files
PostPosted:Tue Apr 07, 2015 9:20 am
by pavila
I've made some improvements to PDF text extraction, please try with tonight nightbuild.
Check you have installed Tesseract and configured the com.openkm.extractor.Tesseract3TextExtractor in registered.text.extractors. If present, remove com.openkm.extractor.CuneiformTextExtractor.
Re: OCR function, PNG works except for PDF files
PostPosted:Tue Apr 07, 2015 10:32 am
by fsouren
Thanks for looking into this!
So i should wait till tomorrow? Build 8189 still ins't working for me.
Still the same text as uploaded in doc3.zip.
Re: OCR function, PNG works except for PDF files
PostPosted:Tue Apr 07, 2015 3:06 pm
by pavila
Wait until tomorrow to generate a new build.
Re: OCR function, PNG works except for PDF files
PostPosted:Thu Apr 09, 2015 10:18 am
by fsouren
I installed the new build and it's working great now!
Could you try to explain what the underlying problem was? (in simple english please

)