Page 1 of 1
OCR Timeout
PostPosted:Thu May 03, 2012 6:05 am
by Alexires
Hi all,
Would it be possible to build in a timeout for OCR processes? I find with some PDF's the OCR just hangs on a page while other PDF's are fine. If a timeout was built in, this would prevent server being under load waiting for OCR to finish (which it doesn't).
Ubuntu 10.10, OpenKM 5.1.9
Re: OCR Timeout
PostPosted:Sat May 05, 2012 8:27 am
by jllort
I think we have added it in trunk pavila can confirm it.
Re: OCR Timeout
PostPosted:Sat May 05, 2012 8:41 pm
by pavila
This is implemented in the 5.1 branch, try 5.1.10 night build to test it. The timeout is hardcoded to 5 minutes.
Re: OCR Timeout
PostPosted:Mon May 07, 2012 3:29 am
by Alexires
I think I fixed the problem by upgrading Cuneiform from 0.7.0 to 1.1.0 in Ubuntu via Aptitude. Good to know anyway. Thank you.
Re: OCR Timeout
PostPosted:Wed May 09, 2012 10:10 am
by pavila
Depending on Cuneiform version and operating system, the program may fail. Cuneiform 1.0.0 and more recent version works fine, almosty in Ubuntu and Debian.
Re: OCR Timeout
PostPosted:Mon Jul 16, 2012 2:25 pm
by Alexires
Did this end up being included in 5.1.10? Can I change that timeout time?
Re: OCR Timeout
PostPosted:Thu Jul 19, 2012 7:22 am
by jllort
As said pavila "The timeout is hardcoded to 5 minutes." can not be changed now from administration configuration I will add in our features ticket system
http://issues.openkm.com/view.php?id=2215
Re: OCR Timeout
PostPosted:Tue Jul 24, 2012 3:00 pm
by Alexires
Having this in the admin configuration would be massively useful. As an example, for my system, a PID of the cuneiform process rarely runs for more than 10 seconds, so a timeout of 5 minutes is far too long (for my system). If I'm uploading many files, I need to sit there and watch htop and kill cuneiform processes that hang.
Re: OCR Timeout
PostPosted:Thu Jul 26, 2012 3:11 pm
by jllort
The temporary solution for you could be disabling ocr, and enabling before ... that will not index image files, but repository can be reindexed before.