Problem/Motivation
Currently we run all stages of the update life cycle in 1 cron request. This might cause timeouts in some cases. See #3357632-2: [META] Update doc comment on BatchProcessor to specify why use the batch system for background info.
Also the fact we run the complete life cycle in cron means it is not compatible in with Automated Cron which happens in kernel terminate event
Proposed resolution
Since we already have the drush command we should use that to run updates during cron
- Change CronUpdateStage to not extends StageBase but rather just fires off drush command to run updates.
We should fire of the command but not wait for it to complete.In #3360485: Add Symfony Console command to allow running cron updates via console and by a separate user, for defense-in-depth we will change this to not require drush.
-
Change
\Drupal\automatic_updates\CronUpdateStage::runto call the decorated cron service first. Since the unattended update will no longer happen directly in the web request we do not have to worry about timeouts before the update. - Remove
automatic_updates_cronentirely and move running of the status checks into the drush command - Remove
CronServerValidatorbecause we are no longer making a sub-request to postApply - Remove
AutomatedCronDisabledValidatoras we should be compatible with Automated CronAdd
\Drupal\Tests\automatic_updates\Build\CoreUpdateTest::testAutomatedCronto prove we are compatible - Change
DrushUpdateStageto directly extendUpdateStageasCronUpdateStagewon't be a stage.This means that most of the logic now in
CronUpdateStagewill be moved toDrushUpdateStageLikewise most coverage in
CronUpdateStageTestwill move toDrushUpdateStageTestCronUpdateStageTestwill now just test that hook_cron is invoked before the terminal command is invoked
Closed out issues
This issue removes the need for
- #3360656: For web cron updates run each stage life cycle phase in a different request
- #3318587: Research alternatives to curl request on post apply event during cron updates we won't be using curl anymore for post apply
Remaining tasks
- Test coverage that the drush command runs status checks.
This will likely be a functional test because
DrushTestTraitdoes not work with kernel tests.
Follow-up issues
- #3375940: Rename CronUpdateStage to CronUpdateRunner just delayed to make this issue smaller
- #3360485: Add Symfony Console command to allow running cron updates via console and by a separate user, for defense-in-depth
since this issue will remove the requirement before beta the current issue only does simply validation for drush being available.
Issue fork automatic_updates-3357969
Show commands
Start within a Git clone of the project using the version control instructions.
Or, if you do not have SSH keys set up on git.drupalcode.org:
Comments
Comment #2
tedbowComment #3
tedbowComment #4
tedbowComment #5
tedbowComment #6
tedbowComment #7
effulgentsia commentedI think it makes a lot of sense to decouple unattended updates from cron.php and hook_cron(). Primarily so that unattended updates can run as a non-web user while cron.php runs as the web user. #3351895: Add Drush command to allow running cron updates via console and by a separate user, for defense-in-depth is starting on that separation, though I'm unclear whether it is providing the Drush command instead of or in addition to hook_cron(). My recommendation would be to do it instead of. In other words, to never run automatic updates as part of cron.php.
Scheduling the Drush command to run automatically would require adding it to the host system's crontab. If it's being done this way (and not via a web URL like cron.php), then I don't think there's a timeout to worry about. So I think we can do all the phases in a single request.
However, to support sites that can't add to their host system's crontab, we'd need to also implement something similar to the Automated Cron module, meaning our own subscriber to
KernelEvents::TERMINATEthat runs when the site receives traffic. Since this is a web-launched PHP request, it would only be suitable for a web-writable codebase (which I think is fine; site owners who are able to setup a non-web-writable codebase should also be able to add to their crontab), and it would be potentially subject to timeouts depending on how the web server interacts with PHP (e.g., via cgi, mod_php, php-fpm, etc.). Because of these timeouts, I agree that theKernelEvents::TERMINATEsubscriber should only process one phase per request.For a 0 traffic website that isn't running the Drush command in their crontab, we'd need to also handle the case that the first request to hit the website could be many hours after a security update is available, which could also mean that this request is itself an attack trying to exploit that vulnerability. So I think in addition to
KernelEvents::TERMINATE, we'd need an early-in-the-request event that checks if unattended updates are enabled but too long has passed since we last checked for an update, in which case we should return a "site under maintenance" message and then proceed to run theKernelEvents::TERMINATEevent.Comment #8
wim leers@effulgentsia is correct:
— https://www.php.net/manual/en/info.configuration.php#ini.max-execution-time
But there's a significantly higher amount of time needed for Automatic Updates than for pretty much every cron hook: to process the typical massive file copying that is necessary in the
createandapplyphases of the stage life cycle, it can easily take minutes.See the preliminary performance test results at #3304108-13: Determine if rsync is faster than the PHP file copier for Composer stager and #3304108-16: Determine if rsync is faster than the PHP file copier for Composer stager: even for a very barebones site, it takes ~30 seconds for the
beginstage to complete and another ~30 for theapplystage. That's a whole minute. That makes it very likely for this (very creative! 😁👏) use ofKernelEvents::TERMINATEto exceed the time allowed byphp-fpmetcAh, interesting! So sites without a "proper"
crontabin place which would execute the CLI command to apply automatic updates (see #3351895: Add Drush command to allow running cron updates via console and by a separate user, for defense-in-depth) would then spend effectively an entire request/response cycle on this. That would mean there is a higher likelihood of not running into timeout problems!We could then even combine that with a
<meta http-equiv="Refresh" content="120; URL…to have the browser reload the page automatically after 2 minutes (for example, exact logic/duration TBD). That would largely address the edge case concern I raised in the preceding paragraph?Comment #9
wim leersI checked how WordPress' auto-updater works.
(Yes, that's right, WordPress literally hardcoded lists of files and directories to add/remove per past release of WordPress. 🤯)
It seems they have gotten by just fine for the vast majority of their users? Or perhaps WP users complain not in their official issue tracker?
We validate many things already that they don't — such as file system permissions, so we pre-emptively catch many problems that WP does not seem to check.
But … their architecture has some performance/reliability/recoverability advantages that we don't:
stagetoactivefails due to a timeout, we have no way to recover, other than asking the site owner to restore the DB + files from a back-up.The reliability/recoverability part in PM/AU could be addressed by keeping track of which package versions were installed previously and are supposed to be installed now — independently of the state of the filesystem. That way, even when the
composer.json|lockfile does not match the actual state on the filesystem, we could redostageas it was intended and re-copy it overactive.The performance part is not something we can truly improve in PM/AU AFAICT since it's a fundamental requirement for security reasons, unless we start copying only the parts of the codebase that we actually know will change … which would be possible once #3343802: When determining scaffold file mappings, determine them consistently and completely, accounting for overrides is done (because Drupal core uses the
drupal/core-composer-scaffoldComposer plugin to copy files out of the typical locations into the web root etc). Thanks to #3331168: Limit trusted Composer plugins to a known list, allow user to add more, it's already certain that no additional Composer plugins are moving files around.Comment #10
wim leersDetails for that last paragraph in #3343802-4: When determining scaffold file mappings, determine them consistently and completely, accounting for overrides.
Comment #11
wim leers@tedbow posted an interesting observation in Slack:
Very interesting indeed! That reminds me of something:
max_execution_timedoes NOT apply tosleep()and other I/O — only to actual PHP execution. See #3357941-3: Write helpful message to FailureMarker if script execution time was exceeded during stage life cycle. Run that script locally to observe it— https://www.php.net/manual/en/function.set-time-limit.php (edited)
So, combining @tedbow's idea with that information::
KernelEvents::TERMINATEwe don’t do any work but trigger a request to do the ACTUAL work and wait for that, it will NOT count towards PHP’s execution time!php-fpmetc timeout of course, but it definitely is one layer of extra hardeningKernelEvents::TERMINATEwe’ll basically do one request forcreate, one forrequire, one forapplyand one fordestroy… even if each of those take 1 minute, the time the Terminate event handler has consumed will still be less than a second!Comment #12
phenaproximaI could absolutely see a contrib module (or custom code) modifying Package Manager to do exactly this. Really, all you'd have to do would be to override the Beginner and Committer services so that they do nothing at all, and then override the Stager so that it always does its thing in the active directory.
Another thing I could see a contrib case for -- using the event system to create a git snapshot of the site before the update, so that it can be rolled back if there's a failure. Probably with some (hopefully small) changes to Composer Stager, you could probably diff the active and stage directories.
Comment #13
effulgentsia commentedCorrect, but I also don't think we need to worry about
max_execution_timeat all, because we can callset_time_limit(0)if we want to. Are there cases where PHP can disallowset_time_limit(0)from being called and overridingmax_execution_time?The timeouts we need to worry about are the web server killing the PHP process. In the case of Apache and mod_php, I don't think there's a timeout. In the case of Apache calling PHP as a CGI script, I think Apache will time out waiting on the CGI script, and return a 504 or similar, but I don't think it actually kills the PHP script when that happens, so
KernelEvents::TERMINATEcan still take however long it needs to. However, I think there are cases where PHP-FPM can kill a PHP script that's taking too long, and I also don't know if there are shared hosting companies that implement hard timeouts on long running PHP scripts as well.Also, I don't know how different web servers handle the case of a user clicking the browser's stop ("X") button. I don't think it's common for that to propagate back to the web server and result in the web server killing the PHP script, but maybe there are some web servers / configurations where it does?
Comment #14
wim leersYes: https://stackoverflow.com/questions/17434013/set-time-limit-has-been-dis... — so this is likely to be enab
Hosting companies do use this:
But Dreamhost does not forbid this though.
This is why Drupal's
\Drupal\Component\Utility\Environment::setTimeLimit()checks this explicitly!IIRC stopping the request triggers the HTTP connection getting closed, which triggers PHP's shutdown functions. See the big comment block inside
_batch_page()for that — introduced in #2851111: fastcgi_finish_request and shutdown functions (e.g. batch) do not play nicely together.💯
For example,
mod_fcgiddefaults to a pretty aggressive 40 seconds: https://httpd.apache.org/mod_fcgid/mod/mod_fcgid.html#fcgidiotimeoutComment #15
effulgentsia commentedWow, that's aggressive. Nice find!
That's the IO timeout, so I think that's the case where it will return a 504, but the script can keep running, just without sending anything back to the client. However, at that point, I think Apache might consider that process to be idle, so the (default 5 minute) idle timeout might kick in, or maybe Apache will kill and restart that process before then if it needs it to serve another request?
Comment #16
wim leersRight, understanding that on all common platforms is something we should also do. That will require a pretty significant amount of research though; there appears no centralized overview or some authoritative article: https://duckduckgo.com/?q=understanding+php+timeout+fcgi+fpm&ia=web 😓
Comment #17
tedbowComment #19
tedbowAssign to myself to set up the boiler plate code
Comment #20
wim leersYou seem to already have set up the boilerplate code in
8ffd5e6e? Curious what else you mean 🤓Comment #21
wim leersRelated: see my proposal at #3318587-18: Research alternatives to curl request on post apply event during cron updates.
Comment #22
effulgentsia commentedI wonder if we can take advantage of this even in the case where someone can't setup a crontab by having
KernelEvents::TERMINATEexecute a background process to do the update and exit without waiting for that background process to finish. Then the web server would have no need to kill the original process, and the background process would follow the rules of a console process rather than a CGI/mod_php/php-fpm/etc. process. The answers to https://stackoverflow.com/questions/43235629/symfony-process-background-... seem promising, but I have not tested this myself. Some hosts may disallowexecand similar functions, and I don't know ifproc_opento an intermediary shell process that runs a background process would work. If something like this works for a large majority of hosts, then I think I'd be in favor of saying that for the minority of hosts that disable any form of launching background processes that those sites are required to use a crontab entry.If this works, the advantage would be that we'd only have to implement a console based process that can do all the stages in a single request and that same process would work the same way regardless of whether it's executed via crontab or via
KernelEvents::TERMINATE.Comment #23
wim leersI'm not too hopeful that many shared hosting companies allow
exec(),system(),proc_open()and friends.Gathering data for that seems hard.
BUT!!!!!!!!!!!!!
php-tuf/composer-stagerusessymfony/process, which … does this:… meaning that without this, NONE of what Automatic Updates will work anyway. Which makes sense: we need to be able to execute the installed
composer.Which means that what @effulgentsia proposed in #22 should totally work?! 🤩🤯🥳🥳🥳
We'd just A) specify timeout
NULL(meaning "no limit") and B) use the console command (well, for now Drush command) that #3351895: Add Drush command to allow running cron updates via console and by a separate user, for defense-in-depth introduced! Tada, problem solved!Comment #24
tedbowre #23 for users where they can't set up the server crontab to run the drush command would still require them to have Drush on their system to use unattended automatic Updates?
Comment #25
effulgentsia commentedIt does on my Mac. If I have the following
test.php:And the following
test2.php:Then when I visit
test.phpin the browser, I get the "done sleeping, exiting" response in 5 seconds, and after 20 seconds/tmp/file.txtgets created.I don't know if some shared hosting companies have ways to prevent this or to periodically kill long-running processes, but short of such measures, I think the above is fairly robust.
Comment #26
wim leers#25: wow, very interesting! I didn't realize that was a shell feature.
In my experiment, I didn't use a shell, and that failed:
I'd be more worried about PHP being executed by a user that is not allowed to execute
sh. IOW: that the user of the PHP process cannot accessCould that be as simple as
is_executable()on the output ofwhich sh? 🤔Comment #27
tedbowI talked with @Wim Leers yesterday about this issue
My thoughts
I think the proposed solution is smaller change that needs less research and will help out the problem because it will significantly reduce the time we spend on updating in any one request and will make sure
apply()is nearly the only that that happens in 1 request.I have also updated the title with a guess at one the solution wanted is
(caveat I am not a hosting platform expert, just brainstorming 😉)
We have long process that if we could do it all at once without risk of timeouts it would really benefit sites that need to run the updates during the web request, especially low traffic sites that if we split the updating into multiple requests could take a long time to complete.
Most of the solutions discussed here look to solve the problem by figuring out "How can we spawn another process during a web request that will not be affected by the timeout restrictions of the web host?"
While if we could solve that question it would be very good for us it seems like it would be in conflict with the intentions of timeout restrictions on hosting platforms, especially lower priced ones, if not their actual configuration. It seems like hosting companies put in time restrictions so that on web request cannot take a very long time and a lot of resources.
My guess would be that we will not a way find a to spawn another process that is restricted on some level by timeout restrictions on the host. If we do find a way to spawn another process that is not subject to time out restrictions then it would probably good method for anyone who wants work around time out restrictions on their host and run processes as long as they want from a single request.
For that reason I think even if we find a way that works now to spawn such a process we should consider possibility that hosting companies would want to figure out a way to put some time restriction on such process.
Comment #28
tedbow@Wim Leers assigning to you to update the summary to reflect the discussion
Comment #30
tedbowComment #31
tedbowComment #32
wim leersRE #27.5: regarding violating the intentions of a host when they impose restrictions: agreed with what you wrote (and as we discussed several times several weeks ago). I just realized one more nuance though: the intentions of shared hosting companies are typically to keep resources available for serving requests.
A long-running PHP process in the background does not consume any of the available allocated web server (Apache/nginx/…) workers or PHP workers (PHP-FPM/…), nor does it trigger additional DB queries. In other words: it consumes none of the sold resources, and in particular it consumes almost no CPU.
What this long-running PHP process does consume is quite a bit of file I/O (temporarily some storage, but primarily I/O: moving bytes around) and some CPU time (not much though: primarily for computing how dependencies need to be updated, which since Composer 2 is very fast — in a Composer 1 world this would've been a hard blocker most likely).
Conclusion: it is is not strongly in conflict with the intentions of a hosting company — because it's about different concerns … which means it may very well be acceptable.
Comment #33
tedbowre #32 yeah I agree with that
Comment #34
tedbowI going to give this is a start
Comment #36
pwolanin commentedTalking about this at Drupalcon, going to work on implementing this. I don't think we need to use the extra "sh" but but in any case I don't think this is going to work on windows, so those sites would need to run cron or the dedicated console command via the command line on a regular basis. We validated this restriction as acceptable with xjm in terms of the impact on future core releases.
Comment #37
effulgentsia commentedThat would be great if possible. I don't know off-hand how PHP can put a process into the background without going through an intermediary shell process, but if there is a way that would be great. Without putting the process into the background, I think the process would be terminated when the web request finishes (see #26).
Comment #38
pwolanin commentedFound this post that seems quite close to what we want to do:
https://stackoverflow.com/questions/50039911/how-to-run-a-background-pro...
Comment #39
pwolanin commentedOk, here's a variant on what effulgentsia first tried. I ran this in a local docker setup and checked the process list and can see that we have a orphan process that has a PPID of 1:
The output from the webserver:
The content of the temp file:
web/test.php:
web/test2.php:
Comment #40
wim leersPer @tedbow at #3360656-7: For web cron updates run each stage life cycle phase in a different request, this is now THE way to solve not only tighter security for those who want it, but also to prevent running into the PHP max execution time during cron/unattended updates 👍
#36: I wasn't around for that, but that's WONDERFUL news!
#39: 🤩🤩🤩👏
Comment #41
tedbowComment #43
wim leersAFAICT this is the logic to spawn the background process:
We just discussed in a meeting at a high level the need for ensuring this does not run forever, both for testing and production purposes. We need to know whether the process is still doing things or not.
Ideas:
register_shutdown_function()and writing a key/value pair toStateProcessrelies onproc_open(). We can call$process->getPid(), store that PID inStateand then in any subsequent request we can useproc_get_status()to see if it's still running! (or even use the classical PID file approach)\Symfony\Component\Process\Process::signal()orpctnl_signalallows sending a signal to the opened process. We could make it at that time write something toState. Example code: https://www.php.net/manual/en/pcntl.example.php.register_tick_function()🤪Thoughts on each:
pcntlPHP extension.Comment #44
tedbow@Wim Leers thanks for ideas!
Postoning on #3368741: Drush build test symlinks Drush, and does not install any of its dependencies because build test will not pass locally at least until then
Comment #45
tedbowBlocker committed
Comment #46
effulgentsia commentedIn case it helps with manual testing, the PHP-FPM configuration to do this is request-terminate-timeout. It would be great to incorporate that into an automated test, but DrupalCI containers don't use FPM, and I don't know what the equivalent configuration for Apache mod_php is.
Comment #47
tedbowPostponing on #3357969: For web server dependent unattended updates run the entire life cycle in a separate process that will not be affected by hosting time limits because this will make it easier to fix our last test failures
Comment #48
tedbowComment #49
tedbowWhoops wrong issue #3374753: HookCronTest is broken and doesn't test how often status checks will be run
Comment #50
tedbow#3374753: HookCronTest is broken and doesn't test how often status checks will be run was committed but now discovered #3375679: CronFrequencyValidator should not check if updates will be run via console
so leaving postponed
Comment #51
tedbow#3375679: CronFrequencyValidator should not check if updates will be run via console committed.
StatusCheckFailureEmailTestshould pass now.\Drupal\Tests\automatic_updates\Kernel\HookCronTest::testStatusChecksRunOnCronwill be the lone failure hopefully.I think we need to change who we test this.
We no longer have hook_cron implementation in this module.
so I think how we could test it is
CronUpdateStageTestthough it is test somewhat here.@phenaproxima if you could start to look at this that would be great. I am sure it has more to be done.
Comment #52
tedbowUpdate summary
Comment #53
tedbowComment #54
phenaproximaOkay, I carefully reviewed this entire thing. First of all, major kudos to @tedbow for taking this on! It's been nearly two months.
The one file I basically skipped over was DrushUpdateStageTest. It looks like basically a rename of CronUpdateStageTest, without any actual cron runs.
I think the overall approach makes sense here. DrushUpdateStage is doing 95% of what CronUpdateStage used to do. CronUpdateStage, which will be renamed, is a relatively thin decorator around the cron service that knows to invoke the stage if there's an update waiting in the wings.
Most of my comments are about the details, making sure everything is properly commented. But if it works properly, I can get behind the architectural decisions that have been made here.
Comment #56
tedbow@phenaproxima I think all the threads are back to you
Comment #57
phenaproximaKicking over for adjustments now that #3378774: Change package_manager_test_event_logger\EventSubscriber to log to a file to simplify it and code which asserts against it is in.
Comment #58
tedbow@phenaproxima I merged in 3.0.x now that we finished #3378793: Move error checking that should apply to all updates from cron build test to update assert and #3378774: Change package_manager_test_event_logger\EventSubscriber to log to a file to simplify it and code which asserts against it please take another look
Comment #59
phenaproximaBack to @tedbow for test failures...
Comment #60
tedbowFixed test failure. It was because in DrushUpdateStageTest::testUpdateStageCalled(which was moved unchanged from CronUpdateStageTest) we were mocking the event_dispatcher service to not fire validation. But now since we persisting that service it doesn't work anymore.
I think need to worry about not firing the validation. Maybe at the time there was some problem with one or more of the validators
Comment #61
tedbowDrushUpdateStageTest is still failing for some reason on drupalci
Comment #62
tedbowOk last test should be back to passing
Comment #63
phenaproximaOkay, I think I'm comfortable with where this is.
This is a major, seismic change in Automatic Updates that took months to do. It means we can't support Windows for the time being. Our class naming leaves something to be desired, and we need to get rid of the Drush integration as soon as humanly possible, since it adds a lot of complication that we'd best remove.
But this is still a big improvement -- we're finally compatible with Automated Cron, and this lays the groundwork for us to be even more secure, since updates don't have to be done via the web.
Let's ship it.
Comment #65
tedbowComment #66
wim leersEpic to see this land! 🤩👏
935 additions and 1158 deletions👈 not surprised at this epic diff stat 😅Was very curious to see this, so skimmed the MR and had a few questions:
Why 5 seconds? Is it related to
which I see elsewhere? 🤔
Unpostponed! 👍
Already finished by @phenaproxima! 😄
In which issue will that happen? 🤓
Comment #67
tedbow\Drupal\package_manager_test_event_logger\EventSubscriber\EventLogSubscriber::logEventInfo(). Since we maybe be for the whole update life cycle, in the case of cron update, this could take a few minutes. Basically so we don't check hundreds of times.In case of the UI updates where we fire this at the end of the update after we have seen confirmation in the UI we should never get to this wait in a passing test because it will have exited the loop right before this.
Comment #68
wim leersThanks! 👍