Active
Project:
Aegir HTTPS
Version:
7.x-3.x-dev
Component:
Code
Priority:
Normal
Category:
Feature request
Assigned:
Unassigned
Reporter:
Created:
22 Jun 2018 at 23:33 UTC
Updated:
17 Nov 2020 at 17:50 UTC
Jump to comment: Most recent, Most recent file
Comments
Comment #2
bdragon commentedHere's an initial go at a patch.
I'm not completely happy with it, and it has not been tested, but would appreciate feedback.
Comment #3
bdragon commentedSo the theory here is as follows:
* Sites that have been verified within the last month or so shouldn't need to be reverified.
* Sites that have gone the longest without being verified should be verified first.
* We should be using the spread queue to spread the verifies out as much as possible.
Unknowns
* Not sure how the math works out with the spread queue -- maybe I should be leaving the count at all sites instead of eligible sites? -- should be fine as long as verifys actually happen
* Maybe be a bit more conservative on the math for eligible items and consider them ready for verify a few days LESS than the 7*4? Haven't thought out a scheduling simulation for different numbers of sites. Might need to have it set to (renewal window) - (schedule period)? Not sure about the math.
Comment #4
helmo commentedLooks like a good idea.
The spread queue was a fairly recent addition in #1405904: Hosting cron # of sites per run mismatch between UI and reality
It does however give me a 'PDOException: SQLSTATE[HY093]: Invalid parameter number: number of bound variables does not match number of tokens' though.
Comment #5
dpovshed commentedYep, I confirm - the same error
Comment #6
dpovshed commentedUpdated patch attached
Comment #7
dpovshed commented...
Comment #8
dpovshed commentedThis one is superseding my previous patches.
In addition here SQL request is updated - JOIN was used and the query itself converted to standard Drupal format.
This one seems finally working, let me see how it will proceed large number of sites.
Comment #9
helmo commentedNow on my dev ... seems to work.
I've committed it on the 2910575-verify-spread branch.
The $queue_only parameter could have a name that's more descriptive, but I've not sure what ... I've added some phpdoc comments to at least explain it a bit.
Comment #11
helmo commentedPS(note to self): An easy way to debug this function is
drush @hm php-eval "print_r(hosting_letsencrypt_get_sites(true, 2));"Comment #13
helmo commentedComment #15
acWould this change effectively show no sites in the queue if all sites had been verified recently?
Previously I always had 50+ sites in the queue at any given time but lately I noticed there are none. Running drush @hm php-eval "print_r(hosting_letsencrypt_get_sites(true, 2));" returns an empty array.
Comment #16
helmo commented@ac yes I think that was the idea here.
However I came to this issue because I noticed a site where the cert is about to expire in 14 days. The last verify task was 16 days ago so 12 days before the site is scheduled again in the renewal queue.
That is cutting it very close :( ... I'd like to avoid getting nervous from monitoring warnings. What about changing the 4 weeks to 2.
Comment #17
colanWhen I originally wrote this stuff, it was tricky staying between LE's rate limits and not getting sites updated in time. So any suggestions to optimize this would be welcome!
Comment #18
helmo commentedHere's the patch for that
Comment #20
ergonlogicAs Colan mentioned, when we originally wrote this, we took some shortcuts. Scheduling verify tasks without checking the certs' expiry was expedient, but perhaps a little heavy-handed. We could have the queue task check for imminent expiry (within a week, for example) and only queue the 'verify' task at that point.
Comment #21
colan+1
This would:
Setting back to Active as I'm now convinced this is what we should do (and we don't have a patch for it yet).
Comment #22
dpovshed commentedWhile the attached patch is not a rework with more modest usage of LetsEncrypt I decided to share it in case someone else with large number of files finds it useful.
The modification does two things:
- prevent the queue to be stuck by firing the same task(s) again and again - if the verification already scheduled it will not be scheduled for 2nd time;
- in case of verification has failed it will not be repeated for one week. This prevents misconfigured item to block the queue.