Problem/Motivation

Currently we run all stages of the update life cycle in 1 cron request. This might cause timeouts in some cases. See #3357632-2: [META] Update doc comment on BatchProcessor to specify why use the batch system for background info.

Also the fact we run the complete life cycle in cron means it is not compatible in with Automated Cron which happens in kernel terminate event

Proposed resolution

Since we already have the drush command we should use that to run updates during cron

  1. Change CronUpdateStage to not extends StageBase but rather just fires off drush command to run updates.
    We should fire of the command but not wait for it to complete.

    In #3360485: Add Symfony Console command to allow running cron updates via console and by a separate user, for defense-in-depth we will change this to not require drush.

  2. Change \Drupal\automatic_updates\CronUpdateStage::run to call the decorated cron service first. Since the unattended update will no longer happen directly in the web request we do not have to worry about timeouts before the update.
  3. Remove automatic_updates_cron entirely and move running of the status checks into the drush command
  4. Remove CronServerValidator because we are no longer making a sub-request to postApply
  5. Remove AutomatedCronDisabledValidator as we should be compatible with Automated Cron

    Add \Drupal\Tests\automatic_updates\Build\CoreUpdateTest::testAutomatedCron to prove we are compatible

  6. Change DrushUpdateStage to directly extend UpdateStage as CronUpdateStage won't be a stage.

    This means that most of the logic now in CronUpdateStage will be moved to DrushUpdateStage

    Likewise most coverage in CronUpdateStageTest will move to DrushUpdateStageTest

    CronUpdateStageTest will now just test that hook_cron is invoked before the terminal command is invoked

Closed out issues

This issue removes the need for

  1. #3360656: For web cron updates run each stage life cycle phase in a different request
  2. #3318587: Research alternatives to curl request on post apply event during cron updates we won't be using curl anymore for post apply

Remaining tasks

  1. Test coverage that the drush command runs status checks.

    This will likely be a functional test because DrushTestTrait does not work with kernel tests.

Follow-up issues

  1. #3375940: Rename CronUpdateStage to CronUpdateRunner just delayed to make this issue smaller
  2. #3360485: Add Symfony Console command to allow running cron updates via console and by a separate user, for defense-in-depth

    since this issue will remove the requirement before beta the current issue only does simply validation for drush being available.

Command icon Show commands

Start within a Git clone of the project using the version control instructions.

Or, if you do not have SSH keys set up on git.drupalcode.org:

Comments

tedbow created an issue. See original summary.

tedbow’s picture

Issue summary: View changes
tedbow’s picture

Title: Use the cron queue worker system to run the avoid timeouts during cron » For unattended updates run each stage in a different request
tedbow’s picture

tedbow’s picture

Issue summary: View changes
tedbow’s picture

Title: For unattended updates run each stage in a different request » For unattended updates run each stage life cycle phase in a different request
effulgentsia’s picture

I think it makes a lot of sense to decouple unattended updates from cron.php and hook_cron(). Primarily so that unattended updates can run as a non-web user while cron.php runs as the web user. #3351895: Add Drush command to allow running cron updates via console and by a separate user, for defense-in-depth is starting on that separation, though I'm unclear whether it is providing the Drush command instead of or in addition to hook_cron(). My recommendation would be to do it instead of. In other words, to never run automatic updates as part of cron.php.

Scheduling the Drush command to run automatically would require adding it to the host system's crontab. If it's being done this way (and not via a web URL like cron.php), then I don't think there's a timeout to worry about. So I think we can do all the phases in a single request.

However, to support sites that can't add to their host system's crontab, we'd need to also implement something similar to the Automated Cron module, meaning our own subscriber to KernelEvents::TERMINATE that runs when the site receives traffic. Since this is a web-launched PHP request, it would only be suitable for a web-writable codebase (which I think is fine; site owners who are able to setup a non-web-writable codebase should also be able to add to their crontab), and it would be potentially subject to timeouts depending on how the web server interacts with PHP (e.g., via cgi, mod_php, php-fpm, etc.). Because of these timeouts, I agree that the KernelEvents::TERMINATE subscriber should only process one phase per request.

For a 0 traffic website that isn't running the Drush command in their crontab, we'd need to also handle the case that the first request to hit the website could be many hours after a security update is available, which could also mean that this request is itself an attack trying to exploit that vulnerability. So I think in addition to KernelEvents::TERMINATE, we'd need an early-in-the-request event that checks if unattended updates are enabled but too long has passed since we last checked for an update, in which case we should return a "site under maintenance" message and then proceed to run the KernelEvents::TERMINATE event.

wim leers’s picture

If it's being done this way (and not via a web URL like cron.php), then I don't think there's a timeout to worry about. So I think we can do all the phases in a single request.

@effulgentsia is correct:

This sets the maximum time in seconds a script is allowed to run before it is terminated by the parser. This helps prevent poorly written scripts from tying up the server. The default setting is 30. When running PHP from the command line the default setting is 0.

https://www.php.net/manual/en/info.configuration.php#ini.max-execution-time

Because of these timeouts, I agree that the

KernelEvents::TERMINATE

subscriber should only process one phase per request.

But there's a significantly higher amount of time needed for Automatic Updates than for pretty much every cron hook: to process the typical massive file copying that is necessary in the create and apply phases of the stage life cycle, it can easily take minutes.

See the preliminary performance test results at #3304108-13: Determine if rsync is faster than the PHP file copier for Composer stager and #3304108-16: Determine if rsync is faster than the PHP file copier for Composer stager: even for a very barebones site, it takes ~30 seconds for the begin stage to complete and another ~30 for the apply stage. That's a whole minute. That makes it very likely for this (very creative! 😁👏) use of KernelEvents::TERMINATE to exceed the time allowed by php-fpm etc

[…] in which case we should return a "site under maintenance" message and then proceed to run the KernelEvents::TERMINATE event.

Ah, interesting! So sites without a "proper" crontab in place which would execute the CLI command to apply automatic updates (see #3351895: Add Drush command to allow running cron updates via console and by a separate user, for defense-in-depth) would then spend effectively an entire request/response cycle on this. That would mean there is a higher likelihood of not running into timeout problems!

We could then even combine that with a <meta http-equiv="Refresh" content="120; URL… to have the browser reload the page automatically after 2 minutes (for example, exact logic/duration TBD). That would largely address the edge case concern I raised in the preceding paragraph?

wim leers’s picture

I checked how WordPress' auto-updater works.

  • they do have problems with timeouts — example: https://core.trac.wordpress.org/ticket/54166
  • https://github.com/WordPress/wordpress-develop/blob/trunk/src/wp-admin/i... ← they do avoid others from performing requests while the site is being updated, de facto announcing to the world that the site is being updated
  • This is the heart of their core update logic: https://github.com/WordPress/wordpress-develop/blob/d62f8581899be84fd59b...it all happens in a single request.
  • The essence of the process is documented in https://core.trac.wordpress.org/browser/trunk/src/wp-admin/includes/upda...
    1014	 * The steps for the upgrader for after the new release is downloaded and
    1015	 * unzipped is:
    1016	 *   1. Test unzipped location for select files to ensure that unzipped worked.
    1017	 *   2. Create the .maintenance file in current WordPress base.
    1018	 *   3. Copy new WordPress directory over old WordPress files.
    1019	 *   4. Upgrade WordPress to new version.
    1020	 *     4.1. Copy all files/folders other than wp-content
    1021	 *     4.2. Copy any language files to WP_LANG_DIR (which may differ from WP_CONTENT_DIR
    1022	 *     4.3. Copy any new bundled themes/plugins to their respective locations
    1023	 *   5. Delete new WordPress directory path.
    1024	 *   6. Delete .maintenance file.
    1025	 *   7. Remove old files.
    1026	 *   8. Delete 'update_core' option.
    1027.        *
    1028	 * There are several areas of failure. For instance if PHP times out before step
    1029	 * 6, then you will not be able to access any portion of your site. Also, since
    1030	 * the upgrade will not continue where it left off, you will not be able to
    1031	 * automatically remove old files and remove the 'update_core' option. This
    1032	 * isn't that bad.
    

    (Yes, that's right, WordPress literally hardcoded lists of files and directories to add/remove per past release of WordPress. 🤯)

  • It seems they have gotten by just fine for the vast majority of their users? Or perhaps WP users complain not in their official issue tracker?

    We validate many things already that they don't — such as file system permissions, so we pre-emptively catch many problems that WP does not seem to check.

    But … their architecture has some performance/reliability/recoverability advantages that we don't:

  1. performance: they do not create a full copy of the site's codebase: they first block incoming requests, then modify the live codebase in-place 😱 → far less file I/O!
  2. reliability/recoverability: in case of problems, a rollback is possible: because they literally have lists of files + directories that are new/removed/changed between versions (at least for core) — if in our case the copying from stage to active fails due to a timeout, we have no way to recover, other than asking the site owner to restore the DB + files from a back-up.

The reliability/recoverability part in PM/AU could be addressed by keeping track of which package versions were installed previously and are supposed to be installed now — independently of the state of the filesystem. That way, even when the composer.json|lock file does not match the actual state on the filesystem, we could redo stage as it was intended and re-copy it over active.

The performance part is not something we can truly improve in PM/AU AFAICT since it's a fundamental requirement for security reasons, unless we start copying only the parts of the codebase that we actually know will change … which would be possible once #3343802: When determining scaffold file mappings, determine them consistently and completely, accounting for overrides is done (because Drupal core uses the drupal/core-composer-scaffold Composer plugin to copy files out of the typical locations into the web root etc). Thanks to #3331168: Limit trusted Composer plugins to a known list, allow user to add more, it's already certain that no additional Composer plugins are moving files around.

wim leers’s picture

@tedbow posted an interesting observation in Slack:

@wimleers on interesting idea is their spawn_cron https://github.com/WordPress/WordPress/blob/0008d8df0605c9eccf9c4b9015d3...
It seems like their version of automated_cron
Sends a request to run cron through HTTP request that doesn’t halt page loading.
So this would also the problem of cron running at the end of a request with time out issues.
We could possibly do something like this but for auto updates.
At an early part of the request make a 2nd request out to auto-update.php that would only be responsible for updating, 1 phase at a time.
or in KernelEvents::TERMINATE we could make a request to `auto-update.php but then that request would not have to worry about what else is going on in the page. The whole request would be for performing 1 phase(begin, stage, apply, destroy) of the update process.

Very interesting indeed! That reminds me of something: max_execution_time does NOT apply to sleep() and other I/O — only to actual PHP execution. See #3357941-3: Write helpful message to FailureMarker if script execution time was exceeded during stage life cycle. Run that script locally to observe it

Any time spent on activity that happens outside the execution of the script such as system calls using system(), stream operations, database queries, etc. is not included when determining the maximum time that the script has been running.

https://www.php.net/manual/en/function.set-time-limit.php (edited)

So, combining @tedbow's idea with that information::

  1. if in KernelEvents::TERMINATE we don’t do any work but trigger a request to do the ACTUAL work and wait for that, it will NOT count towards PHP’s execution time!
  2. that way it could easily last multiple minutes — which does not overcome the php-fpm etc timeout of course, but it definitely is one layer of extra hardening
  3. … because in the KernelEvents::TERMINATE we’ll basically do one request for create, one for require, one for apply and one for destroy … even if each of those take 1 minute, the time the Terminate event handler has consumed will still be less than a second!
phenaproxima’s picture

performance: they do not create a full copy of the site's codebase: they first block incoming requests, then modify the live codebase in-place 😱 → far less file I/O!

I could absolutely see a contrib module (or custom code) modifying Package Manager to do exactly this. Really, all you'd have to do would be to override the Beginner and Committer services so that they do nothing at all, and then override the Stager so that it always does its thing in the active directory.

reliability/recoverability: in case of problems, a rollback is possible: because they literally have lists of files + directories that are new/removed/changed between versions (at least for core) — if in our case the copying from stage to active fails due to a timeout, we have no way to recover, other than asking the site owner to restore the DB + files from a back-up.

Another thing I could see a contrib case for -- using the event system to create a git snapshot of the site before the update, so that it can be rolled back if there's a failure. Probably with some (hopefully small) changes to Composer Stager, you could probably diff the active and stage directories.

effulgentsia’s picture

max_execution_time does NOT apply to sleep() and other I/O

Correct, but I also don't think we need to worry about max_execution_time at all, because we can call set_time_limit(0) if we want to. Are there cases where PHP can disallow set_time_limit(0) from being called and overriding max_execution_time?

The timeouts we need to worry about are the web server killing the PHP process. In the case of Apache and mod_php, I don't think there's a timeout. In the case of Apache calling PHP as a CGI script, I think Apache will time out waiting on the CGI script, and return a 504 or similar, but I don't think it actually kills the PHP script when that happens, so KernelEvents::TERMINATE can still take however long it needs to. However, I think there are cases where PHP-FPM can kill a PHP script that's taking too long, and I also don't know if there are shared hosting companies that implement hard timeouts on long running PHP scripts as well.

Also, I don't know how different web servers handle the case of a user clicking the browser's stop ("X") button. I don't think it's common for that to propagate back to the web server and result in the web server killing the PHP script, but maybe there are some web servers / configurations where it does?

wim leers’s picture

Are there cases where PHP can disallow set_time_limit(0) from being called and overriding max_execution_time?

Yes: https://stackoverflow.com/questions/17434013/set-time-limit-has-been-dis... — so this is likely to be enab

Hosting companies do use this:

  • Gandi sets it to 180 seconds and does not allow overriding it.
  • Hostgator sets it to 30 seconds and does not allow overriding it.
  • … I can probably find more

But Dreamhost does not forbid this though.

This is why Drupal's \Drupal\Component\Utility\Environment::setTimeLimit() checks this explicitly!

I don't think it's common for that to propagate back to the web server and result in the web server killing the PHP script, but maybe there are some web servers / configurations where it does?

IIRC stopping the request triggers the HTTP connection getting closed, which triggers PHP's shutdown functions. See the big comment block inside _batch_page() for that — introduced in #2851111: fastcgi_finish_request and shutdown functions (e.g. batch) do not play nicely together.

The timeouts we need to worry about are the web server killing the PHP process.

💯

For example, mod_fcgid defaults to a pretty aggressive 40 seconds: https://httpd.apache.org/mod_fcgid/mod/mod_fcgid.html#fcgidiotimeout

effulgentsia’s picture

Hostgator sets it to 30 seconds and does not allow overriding it.

Wow, that's aggressive. Nice find!

mod_fcgid defaults to a pretty aggressive 40 seconds

That's the IO timeout, so I think that's the case where it will return a 504, but the script can keep running, just without sending anything back to the client. However, at that point, I think Apache might consider that process to be idle, so the (default 5 minute) idle timeout might kick in, or maybe Apache will kill and restart that process before then if it needs it to serve another request?

wim leers’s picture

That's the IO timeout, so I think […]

Right, understanding that on all common platforms is something we should also do. That will require a pretty significant amount of research though; there appears no centralized overview or some authoritative article: https://duckduckgo.com/?q=understanding+php+timeout+fcgi+fpm&ia=web 😓

tedbow’s picture

Issue summary: View changes

tedbow’s picture

Assigned: Unassigned » tedbow

Assign to myself to set up the boiler plate code

wim leers’s picture

You seem to already have set up the boilerplate code in 8ffd5e6e? Curious what else you mean 🤓

wim leers’s picture

effulgentsia’s picture

If it's being done this way (and not via a web URL like cron.php), then I don't think there's a timeout to worry about. So I think we can do all the phases in a single request.

I wonder if we can take advantage of this even in the case where someone can't setup a crontab by having KernelEvents::TERMINATE execute a background process to do the update and exit without waiting for that background process to finish. Then the web server would have no need to kill the original process, and the background process would follow the rules of a console process rather than a CGI/mod_php/php-fpm/etc. process. The answers to https://stackoverflow.com/questions/43235629/symfony-process-background-... seem promising, but I have not tested this myself. Some hosts may disallow exec and similar functions, and I don't know if proc_open to an intermediary shell process that runs a background process would work. If something like this works for a large majority of hosts, then I think I'd be in favor of saying that for the minority of hosts that disable any form of launching background processes that those sites are required to use a crontab entry.

If this works, the advantage would be that we'd only have to implement a console based process that can do all the stages in a single request and that same process would work the same way regardless of whether it's executed via crontab or via KernelEvents::TERMINATE.

wim leers’s picture

I'm not too hopeful that many shared hosting companies allow exec(), system(), proc_open() and friends.

Gathering data for that seems hard.

BUT!!!!!!!!!!!!!

php-tuf/composer-stager uses symfony/process, which … does this:

    public function __construct(array $command, string $cwd = null, array $env = null, mixed $input = null, ?float $timeout = 60)
    {
        if (!\function_exists('proc_open')) {
            throw new LogicException('The Process class relies on proc_open, which is not available on your PHP installation.');
        }

… meaning that without this, NONE of what Automatic Updates will work anyway. Which makes sense: we need to be able to execute the installed composer.

Which means that what @effulgentsia proposed in #22 should totally work?! 🤩🤯🥳🥳🥳

We'd just A) specify timeout NULL (meaning "no limit") and B) use the console command (well, for now Drush command) that #3351895: Add Drush command to allow running cron updates via console and by a separate user, for defense-in-depth introduced! Tada, problem solved!

tedbow’s picture

re #23 for users where they can't set up the server crontab to run the drush command would still require them to have Drush on their system to use unattended automatic Updates?

effulgentsia’s picture

I don't know if proc_open to an intermediary shell process that runs a background process would work.

It does on my Mac. If I have the following test.php:

<?php
use Symfony\Component\Process\Process;

$autoloader = require_once 'autoload.php';

$process = new Process(['sh', '-c', 'php test2.php &'], __DIR__, null, null, 0);
$process->start();
sleep(5);
print 'done sleeping, exiting';

And the following test2.php:

<?php
sleep(20);
touch("/tmp/file.txt");

Then when I visit test.php in the browser, I get the "done sleeping, exiting" response in 5 seconds, and after 20 seconds /tmp/file.txt gets created.

I don't know if some shared hosting companies have ways to prevent this or to periodically kill long-running processes, but short of such measures, I think the above is fairly robust.

wim leers’s picture

#25: wow, very interesting! I didn't realize that was a shell feature.

In my experiment, I didn't use a shell, and that failed:

    $process = new Process(['chromedriver', '--port=4444', '&'], null, null, null, 0);
    $process->start();
    var_dump($process->isStarted());
    print 'yar';
    sleep(5);
    var_dump($process->getOutput());
    sleep(5);
    var_dump($process->getOutput());
    print 'done sleeping, exiting';
    exit;
I don't know if some shared hosting companies have ways to prevent this or to periodically kill long-running processes, but short of such measures, I think the above is fairly robust.

I'd be more worried about PHP being executed by a user that is not allowed to execute sh. IOW: that the user of the PHP process cannot access

$ which sh
/bin/sh

Could that be as simple as is_executable() on the output of which sh? 🤔

tedbow’s picture

Title: For unattended updates run each stage life cycle phase in a different request » For web server dependent unattended updates run the entire life cycle in a separate process that will not be affected by hosting time limits
Issue tags: +Needs issue summary update

I talked with @Wim Leers yesterday about this issue

My thoughts

  1. Most(all?) of the discussion on this issue has not revolved around my MR or proposed solution.
    I think the proposed solution is smaller change that needs less research and will help out the problem because it will significantly reduce the time we spend on updating in any one request and will make sure apply() is nearly the only that that happens in 1 request.
  2. The discussion here about using using processes in way that is affected by hosting timeouts is good though a larger change that needs more research into actual hosting set ups and may have some risks.
  3. For those reasons I have moved my proposal to newly created #3360656: For web cron updates run each stage life cycle phase in a different request. It is easier to move a MR with no discussion to a new issue than to move the discussion. I will move the MR there and close it here.
  4. I have deleted everything in this issue's summary except the Problem/Motivation section. @Wim Leers please update the summary as you see fit.

    I have also updated the title with a guess at one the solution wanted is

  5. I think this ideas are potentially very good. The one big problem I see with them is:
    (caveat I am not a hosting platform expert, just brainstorming 😉)

    We have long process that if we could do it all at once without risk of timeouts it would really benefit sites that need to run the updates during the web request, especially low traffic sites that if we split the updating into multiple requests could take a long time to complete.

    Most of the solutions discussed here look to solve the problem by figuring out "How can we spawn another process during a web request that will not be affected by the timeout restrictions of the web host?"

    While if we could solve that question it would be very good for us it seems like it would be in conflict with the intentions of timeout restrictions on hosting platforms, especially lower priced ones, if not their actual configuration. It seems like hosting companies put in time restrictions so that on web request cannot take a very long time and a lot of resources.

    My guess would be that we will not a way find a to spawn another process that is restricted on some level by timeout restrictions on the host. If we do find a way to spawn another process that is not subject to time out restrictions then it would probably good method for anyone who wants work around time out restrictions on their host and run processes as long as they want from a single request.

    For that reason I think even if we find a way that works now to spawn such a process we should consider possibility that hosting companies would want to figure out a way to put some time restriction on such process.

tedbow’s picture

Assigned: tedbow » wim leers
Issue summary: View changes

@Wim Leers assigning to you to update the summary to reflect the discussion

tedbow’s picture

Issue summary: View changes
tedbow’s picture

wim leers’s picture

RE #27.5: regarding violating the intentions of a host when they impose restrictions: agreed with what you wrote (and as we discussed several times several weeks ago). I just realized one more nuance though: the intentions of shared hosting companies are typically to keep resources available for serving requests.

A long-running PHP process in the background does not consume any of the available allocated web server (Apache/nginx/…) workers or PHP workers (PHP-FPM/…), nor does it trigger additional DB queries. In other words: it consumes none of the sold resources, and in particular it consumes almost no CPU.

What this long-running PHP process does consume is quite a bit of file I/O (temporarily some storage, but primarily I/O: moving bytes around) and some CPU time (not much though: primarily for computing how dependencies need to be updated, which since Composer 2 is very fast — in a Composer 1 world this would've been a hard blocker most likely).

Conclusion: it is is not strongly in conflict with the intentions of a hosting company — because it's about different concerns … which means it may very well be acceptable.

tedbow’s picture

Assigned: wim leers » Unassigned

re #32 yeah I agree with that

tedbow’s picture

Assigned: Unassigned » tedbow

I going to give this is a start

pwolanin’s picture

Talking about this at Drupalcon, going to work on implementing this. I don't think we need to use the extra "sh" but but in any case I don't think this is going to work on windows, so those sites would need to run cron or the dedicated console command via the command line on a regular basis. We validated this restriction as acceptable with xjm in terms of the impact on future core releases.

effulgentsia’s picture

I don't think we need to use the extra "sh"

That would be great if possible. I don't know off-hand how PHP can put a process into the background without going through an intermediary shell process, but if there is a way that would be great. Without putting the process into the background, I think the process would be terminated when the web request finishes (see #26).

pwolanin’s picture

Found this post that seems quite close to what we want to do:
https://stackoverflow.com/questions/50039911/how-to-run-a-background-pro...

pwolanin’s picture

Ok, here's a variant on what effulgentsia first tried. I ran this in a local docker setup and checked the process list and can see that we have a orphan process that has a PPID of 1:

UID        PID  PPID  C STIME TTY          TIME CMD
...
www-data   110    84  0 20:21 ?        00:00:00 php-fpm: pool www
root       116     0  0 20:22 pts/0    00:00:00 /bin/bash
www-data   130    86  0 20:24 ?        00:00:00 nginx: worker process
www-data   131    86  0 20:24 ?        00:00:00 nginx: worker process
www-data   213     1  1 20:39 ?        00:00:00 /usr/bin/php test2.php

The output from the webserver:

PHP path /usr/bin/php 2023-06-07T20:39:03+00:00 done sleeping, exiting

The content of the temp file:

2023-06-07T20:39:13+00:00

web/test.php:

<?php
use Symfony\Component\Process\Process;
use Symfony\Component\Process\PhpExecutableFinder;

$autoloader = require_once 'autoload.php';

$phpBinaryFinder = new PhpExecutableFinder();
$phpBinaryPath = $phpBinaryFinder->find();

$process = Process::fromShellCommandline($phpBinaryPath . ' test2.php &');
$process->setWorkingDirectory(__DIR__);
$process->disableOutput();
$process->setTimeout(0);
$process->start();

sleep(1);
print"PHP path $phpBinaryPath\n ";
print gmdate('c') . "\n";
print 'done sleeping, exiting';

web/test2.php:

<?php
sleep(10);
file_put_contents("/tmp/file.txt", gmdate('c') . "\n");
wim leers’s picture

Per @tedbow at #3360656-7: For web cron updates run each stage life cycle phase in a different request, this is now THE way to solve not only tighter security for those who want it, but also to prevent running into the PHP max execution time during cron/unattended updates 👍

#36: I wasn't around for that, but that's WONDERFUL news!

#39: 🤩🤩🤩👏

tedbow’s picture

Issue tags: +Pittsburgh2023

wim leers’s picture

AFAICT this is the logic to spawn the background process:

    $process = Process::fromShellCommandline($phpBinaryFinder->find() . " $drush_path auto-update &");
    // $process = new Process([$phpBinaryFinder->find(), $drush_path, 'auto-update', '&']);
    $process->setWorkingDirectory($path_locator->getProjectRoot() . DIRECTORY_SEPARATOR . $path_locator->getWebRoot());
    $process->disableOutput();
    $process->setTimeout(0);

We just discussed in a meeting at a high level the need for ensuring this does not run forever, both for testing and production purposes. We need to know whether the process is still doing things or not.

Ideas:

  1. at a minimum, the Drush/console command AFAICT could guarantee that it notifies us when it either crashes or exits gracefully, by using register_shutdown_function() and writing a key/value pair to State
  2. Process relies on proc_open(). We can call $process->getPid(), store that PID in State and then in any subsequent request we can use proc_get_status() to see if it's still running! (or even use the classical PID file approach)
  3. \Symfony\Component\Process\Process::signal() or pctnl_signal allows sending a signal to the opened process. We could make it at that time write something to State. Example code: https://www.php.net/manual/en/pcntl.example.php.
  4. Shell: https://stackoverflow.com/questions/117226/how-to-check-if-a-php-script-...
  5. register_tick_function() 🤪

Thoughts on each:

  1. 👎 Does not indicate current aliveness.
  2. 👍 No new system requirements, matches long-standing UNIX patterns.
  3. 👎 Requires the pcntl PHP extension.
  4. 👎 Triggers another process, which is … fine but unnecessary since we can do without: see 2. I think this is a good alternative to 2 if that turns out to be impossible.
  5. 👎 Deprecated in PHP 5 apparently. But still around … just in case 🤪😶‍🌫️
tedbow’s picture

Status: Active » Postponed

@Wim Leers thanks for ideas!

Postoning on #3368741: Drush build test symlinks Drush, and does not install any of its dependencies because build test will not pass locally at least until then

tedbow’s picture

Status: Postponed » Needs work

Blocker committed

effulgentsia’s picture

I think there are cases where PHP-FPM can kill a PHP script that's taking too long

In case it helps with manual testing, the PHP-FPM configuration to do this is request-terminate-timeout. It would be great to incorporate that into an automated test, but DrupalCI containers don't use FPM, and I don't know what the equivalent configuration for Apache mod_php is.

tedbow’s picture

Title: For web server dependent unattended updates run the entire life cycle in a separate process that will not be affected by hosting time limits » [PP-1] For web server dependent unattended updates run the entire life cycle in a separate process that will not be affected by hosting time limits
Related issues: +#3357969: For web server dependent unattended updates run the entire life cycle in a separate process that will not be affected by hosting time limits
tedbow’s picture

Status: Needs work » Postponed
tedbow’s picture

Title: [PP-1] For web server dependent unattended updates run the entire life cycle in a separate process that will not be affected by hosting time limits » For web server dependent unattended updates run the entire life cycle in a separate process that will not be affected by hosting time limits
Assigned: tedbow » phenaproxima
Status: Postponed » Needs review

#3375679: CronFrequencyValidator should not check if updates will be run via console committed. StatusCheckFailureEmailTest should pass now.

\Drupal\Tests\automatic_updates\Kernel\HookCronTest::testStatusChecksRunOnCron will be the lone failure hopefully.

I think we need to change who we test this.

We no longer have hook_cron implementation in this module.

so I think how we could test it is

  1. Test that terminal command is invoked on cron. We could expand CronUpdateStageTest though it is test somewhat here.
  2. Test that terminal command does or does not run status checks we could use the DrushTest trait to test this directly

@phenaproxima if you could start to look at this that would be great. I am sure it has more to be done.

tedbow’s picture

Issue summary: View changes
phenaproxima’s picture

Assigned: phenaproxima » tedbow
Status: Needs review » Needs work

Okay, I carefully reviewed this entire thing. First of all, major kudos to @tedbow for taking this on! It's been nearly two months.

The one file I basically skipped over was DrushUpdateStageTest. It looks like basically a rename of CronUpdateStageTest, without any actual cron runs.

I think the overall approach makes sense here. DrushUpdateStage is doing 95% of what CronUpdateStage used to do. CronUpdateStage, which will be renamed, is a relatively thin decorator around the cron service that knows to invoke the stage if there's an update waiting in the wings.

Most of my comments are about the details, making sure everything is properly commented. But if it works properly, I can get behind the architectural decisions that have been made here.

tedbow’s picture

Assigned: tedbow » phenaproxima
Status: Needs work » Needs review

@phenaproxima I think all the threads are back to you

phenaproxima’s picture

tedbow’s picture

phenaproxima’s picture

Assigned: phenaproxima » tedbow
Status: Needs review » Needs work

Back to @tedbow for test failures...

tedbow’s picture

Assigned: tedbow » phenaproxima
Status: Needs work » Needs review

Fixed test failure. It was because in DrushUpdateStageTest::testUpdateStageCalled(which was moved unchanged from CronUpdateStageTest) we were mocking the event_dispatcher service to not fire validation. But now since we persisting that service it doesn't work anymore.

I think need to worry about not firing the validation. Maybe at the time there was some problem with one or more of the validators

tedbow’s picture

Assigned: phenaproxima » tedbow
Status: Needs review » Needs work

DrushUpdateStageTest is still failing for some reason on drupalci

tedbow’s picture

Assigned: tedbow » phenaproxima
Status: Needs work » Needs review

Ok last test should be back to passing

phenaproxima’s picture

Assigned: phenaproxima » Unassigned
Status: Needs review » Reviewed & tested by the community

Okay, I think I'm comfortable with where this is.

This is a major, seismic change in Automatic Updates that took months to do. It means we can't support Windows for the time being. Our class naming leaves something to be desired, and we need to get rid of the Drush integration as soon as humanly possible, since it adds a lot of complication that we'd best remove.

But this is still a big improvement -- we're finally compatible with Automated Cron, and this lays the groundwork for us to be even more secure, since updates don't have to be done via the web.

Let's ship it.

  • tedbow committed 93e8392e on 3.0.x
    Issue #3357969 by tedbow, pwolanin, Wim Leers: For web server dependent...
tedbow’s picture

wim leers’s picture

Epic to see this land! 🤩👏

935 additions and 1158 deletions 👈 not surprised at this epic diff stat 😅

Was very curious to see this, so skimmed the MR and had a few questions:

  1.       // Wait a bit before checking again.
          sleep(5);
    

    Why 5 seconds? Is it related to

          $this->logger->error('Background update failed because the process did not start within 5 seconds.');
    

    which I see elsewhere? 🤔

  2.  * @todo Rename this class to CronUpdateRunner because it is no longer a stage
     *   in https://drupal.org/i/3375940.
    

    Unpostponed! 👍

  3.     // @todo Check if on Windows to not allow cron updates in
        //   https://drupal.org/i/3377237.
    

    Already finished by @phenaproxima! 😄

  4.     // For some reason 'vendor/bin/drush' does not exist in build tests but this
        // method will be removed entirely before beta.
        $command_path = $this->pathLocator->getVendorDirectory() . '/drush/drush/drush';
    

    In which issue will that happen? 🤓

tedbow’s picture

Status: Reviewed & tested by the community » Fixed
  1. No, this basically waits until the site has completed the update process by checking to see all the expected events have been logged via \Drupal\package_manager_test_event_logger\EventSubscriber\EventLogSubscriber::logEventInfo(). Since we maybe be for the whole update life cycle, in the case of cron update, this could take a few minutes. Basically so we don't check hundreds of times.

    In case of the UI updates where we fire this at the end of the update after we have seen confirmation in the UI we should never get to this wait in a passing test because it will have exited the loop right before this.

  2. #3360485: Add Symfony Console command to allow running cron updates via console and by a separate user, for defense-in-depth. In that issue we will search our codebase for "drush" and make sure all references are gone because we will remove the dependency. but yes we should have linked the todo.
wim leers’s picture

Thanks! 👍

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.