Open Source Document Management System | OpenKM - OCR function, PNG works except for PDF files

OCR function, PNG works except for PDF files

Forum rules: Please, before asking something see the documentation wiki or use the search feature of the forum. And remember we don't have a crystal ball or mental readers, so if you post about an issue tell us which OpenKM are you using and also the browser and operating system version. For more info read How to Report Bugs Effectively.

35 posts

Re: OCR function, PNG works except for PDF files

#31587 by fsouren
Thu Mar 12, 2015 1:19 pm

Thanks for your reply! I'll dig into it.

But on the other hand, i'm probably not the only one trying to index Dutch text PDF files i guess. (English text PDF works great).
I just can't figure out where it goes wrong.

Username

fsouren

Rank

Junior Boarder

Posts

Joined

Fri Feb 20, 2015 12:22 pm

Re: OCR function, PNG works except for PDF files

#31607 by jllort
Sun Mar 15, 2015 8:10 am

Should debug the temp files created before be executed with OCR. Two weeks ago we've released portable dev environment http://sourceforge.net/projects/openkmportabledev/ my suggestion is download it, set some breakpoint into pdf text extractor, and step by step, take a look about what's happening, specially on tmp files. ( upload only one document and from crontab -> force indexing ).

Username

jllort

Rank

Moderator

Posts

12185

Joined

Fri Dec 21, 2007 11:23 am

Location

Sineu - ( Illes Balears ) - Spain

Contact

Re: OCR function, PNG works except for PDF files

#31621 by fsouren
Tue Mar 17, 2015 12:38 pm

Could you try one more thing for me?
I have 2 scans, doc1.pdf and doc3.pdf. doc1.pdf works, doc3.pdf doesn't.

What could be the difference?

http://www.famsouren.nl/doc1.pdf
http://www.famsouren.nl/doc3.pdf

doc3.pdf works if i first convert it manually to PNG, then upload the PNG file.

Username

fsouren

Rank

Junior Boarder

Posts

Joined

Fri Feb 20, 2015 12:22 pm

Re: OCR function, PNG works except for PDF files

#31665 by jllort
Sat Mar 21, 2015 6:50 pm

I've test in our online demo and seems there is going right. I attach here the text extracted.

Do you got the last OpenKM version ( the nighly build, because there're we've corrected some issues http://integration.openkm.com/ and here information about migration http://wiki.openkm.com/index.php/Migration_Guide).

Attachments

3.txt.zip

(2.01 KiB) Downloaded 290 times

Username

jllort

Rank

Moderator

Posts

12185

Joined

Fri Dec 21, 2007 11:23 am

Location

Sineu - ( Illes Balears ) - Spain

Contact

Re: OCR function, PNG works except for PDF files

#31671 by fsouren
Sun Mar 22, 2015 8:51 am

I've tried what you said, and upgraded to build 8186.
But still i get a lot of garbage when indexing doc3.pdf.

Sadly enough i should let it go i guess, i just can't seem the get it working

Username

fsouren

Rank

Junior Boarder

Posts

Joined

Fri Feb 20, 2015 12:22 pm

Re: OCR function, PNG works except for PDF files

#32146 by fsouren
Tue Mar 24, 2015 12:18 pm

Is it possible to drop the settings from the demo site here? So i can compare them.
Can't seem to view them when logging in as a demo user.

Username

fsouren

Rank

Junior Boarder

Posts

Joined

Fri Feb 20, 2015 12:22 pm

Re: OCR function, PNG works except for PDF files

#37384 by jllort
Sun Mar 29, 2015 3:07 pm

Demo is based on professional version, is not he community ( both versions have a similar base, but are quite different ).

For what you told us, with nightly build you got exactly the same problem no ? Can you post here a text file with extracted contents ?

Username

jllort

Rank

Moderator

Posts

12185

Joined

Fri Dec 21, 2007 11:23 am

Location

Sineu - ( Illes Balears ) - Spain

Contact

Re: OCR function, PNG works except for PDF files

#37414 by fsouren
Mon Mar 30, 2015 7:19 am

Yes, exact the same problem.

Attachments

doc3.zip

(2.19 KiB) Downloaded 263 times

Username

fsouren

Rank

Junior Boarder

Posts

Joined

Fri Feb 20, 2015 12:22 pm

Re: OCR function, PNG works except for PDF files

#38396 by fsouren
Fri Apr 03, 2015 11:19 am

Anyone?

Username

fsouren

Rank

Junior Boarder

Posts

Joined

Fri Feb 20, 2015 12:22 pm

Re: OCR function, PNG works except for PDF files

#38403 by pavila
Sat Apr 04, 2015 9:13 am

So you've tested with a recent nightbuilt, haven't you?

Username

pavila

Rank

Moderator

Posts

3146

Joined

Tue Dec 11, 2007 6:02 pm

Location

Alicante, Spain

Contact

Re: OCR function, PNG works except for PDF files

#38465 by fsouren
Sun Apr 05, 2015 6:50 am

Yes i did. I even did a clean install with Ubuntu 14.04 en OpenKM nightly.
The only thing i did was install LibreOffice and ImageMagick, then OpenKM and replace OpenKM.war with a nightly one.

Username

fsouren

Rank

Junior Boarder

Posts

Joined

Fri Feb 20, 2015 12:22 pm

Re: OCR function, PNG works except for PDF files

#38473 by pavila
Tue Apr 07, 2015 9:20 am

I've made some improvements to PDF text extraction, please try with tonight nightbuild.

Check you have installed Tesseract and configured the com.openkm.extractor.Tesseract3TextExtractor in registered.text.extractors. If present, remove com.openkm.extractor.CuneiformTextExtractor.

Username

pavila

Rank

Moderator

Posts

3146

Joined

Tue Dec 11, 2007 6:02 pm

Location

Alicante, Spain

Contact

Re: OCR function, PNG works except for PDF files

#38476 by fsouren
Tue Apr 07, 2015 10:32 am

Thanks for looking into this!

So i should wait till tomorrow? Build 8189 still ins't working for me.
Still the same text as uploaded in doc3.zip.

Username

fsouren

Rank

Junior Boarder

Posts

Joined

Fri Feb 20, 2015 12:22 pm

Re: OCR function, PNG works except for PDF files

#38478 by pavila
Tue Apr 07, 2015 3:06 pm

Wait until tomorrow to generate a new build.

Username

pavila

Rank

Moderator

Posts

3146

Joined

Tue Dec 11, 2007 6:02 pm

Location

Alicante, Spain

Contact

Re: OCR function, PNG works except for PDF files

#38536 by fsouren
Thu Apr 09, 2015 10:18 am

I installed the new build and it's working great now!
Could you try to explain what the underlying problem was? (in simple english please

)

Username

fsouren

Rank

Junior Boarder

Posts

Joined

Fri Feb 20, 2015 12:22 pm

Page 2 of 3
35 posts

Return to “Configuration”

Display:

Sort by:

Jump to: