Problem/Motivation

Currently we run all stages of the update life cycle in 1 cron request. This might cause timeouts in some cases. See #3357632-2: [META] Update doc comment on BatchProcessor to specify why use the batch system for background info.

Proposed resolution

Cron is probably not the right mechanism for these unattended update considering that they could be split over 3 cron runs and the default value for Automated cron 3 hours? (see Remaining tasks)

Proposal

  1. Create a new service that subscribes to both KernelEvents::TERMINATE and KernelEvents::REQUEST
  2. On KernelEvents::TERMINATE if no update was in progress it would just check if new updates had been checked in the last X minutes(probably 1 hour default)
  3. If needs to check for updates it would check and if no updates return
  4. If there is an update to start it would do the "begin" stage and set a state value that "require" should be done in the next request with the stage id
  5. In KernelEvents::TERMINATE If there state value say that "require" should be done then it will do "require" and then set a state value that "apply" should be done in the next request and put the site in maintenance mode
  6. In KernelEvents::REQUEST it will check if there is state value that "apply" should be done.
    If so it will do "apply" and then do the sub-request to handle "post-apply" as \Drupal\automatic_updates\CronUpdateStage::triggerPostApply does now.
    When post-apply is finished it should take the site out of maintenance mode
  7. @todo In request that does "apply" nothing else should happen even the user an access the site in maintenance mode.

    Not sure what the best way to do this is. We could just set a response in KernelEvents::REQUEST and it looks like \Symfony\Component\HttpKernel\HttpKernel::handleRaw() will then just return the response instead of calling the controller that would normally handle the page. Will have to investigate how maintenance itself works

Deleting the stage

The above steps would have update applied but the stage would not be deleted. There are few ways we could handle deleting

  1. Just delete the stage in KernelEvents::REQUEST when we apply the update.

    Con: might take to long and have time out issues

  2. Flag the stage for deletion in KernelEvents::REQUEST after it has been applied and the actually delete it in the next request during KernelEvents::TERMINATE

    Con: if the next request is UI that is going to try to claim the stage they would get a notice that there is an existing stage

    We could mitigate this by having stages flag themselves as "ready to destroy" and then if another stage tried to claim it it would automatically be destroyed.

  3. Idea from @phenaproxima

    Change the way \Drupal\package_manager\StageBase::destroy() works to NOT actually delete the directory but flag it for deletion or rename with a DELETE_ prefix to avoid a more complex system. Because each stage directory has a unique name under a site unique staging directory leaving the directory will not affect the next use of Package Manager. Then in package_manager_cron we would actually do the directory deletion.

    This change should make the call to \Drupal\package_manager\StageBase::destroy() very quick. Then we could probably leave the delete in \Drupal\automatic_updates\CronUpdateStage::handlePostApply()

    This would have the side benefit of speeding up UI interactions. For instance in AutoUpdates we could probably combine \Drupal\automatic_updates\BatchProcessor::postApply() and BatchProcessor::clean() and do both post-apply and destroy in 1 step.

    Cons: can't think of any

Once we have #3351895: Add Drush command to allow running cron updates via console and by a separate user, for defense-in-depth we could also document you should set up a cron job for this too. Making 2 external cron jobs is not twice as hard as making 1, considering for many people the hard part is figuring out how to make it at all. This allow people to set regular cron for 3 hours and ours for much more frequently

Previous Cron idea

Use the cron queue work system to run each stage in its own item in queue. We need to determine if the queue system would allow us to run all items in the request, for instance if we determine that create and apply were completed under a certain time.

There are number problems with using the cron system documented Remaining tasks.

Additionally if use the cron queue system it is run after all the implementations of hook_cron. Since we don't know how long those will take then we still might have timeouts. We could use the queue system but use directly in our implementation of hook_cron and make sure our implementation of hook_cron runs first. But still that could mean that things that run after us are more likely to timeout because as far as I can tell you can't tell cron not to run items after you.

If we did use the queue system we could make sure hours run first.

Remaining tasks

  1. Determine how this would affect how quickly critical security updates will happen. Right now in CronFrequencyValidator we warn if you have cron set to run every 3 hours and error if you have it set to run every 24 hours.

    This would mean if you have cron set to run every 23 hours it could take 3 days to apply a critical security update if it takes 3 cron runs. There are 4 stages but in Destroy the update has already been applied.

  2. Determine if there are other UX considerations. I think now if cron updates are running you could not start one via the UI unless you delete the current update(which you can't do if it is currently applying)

    If the update is going to take 3 cron runs even if you have cron set to run every hour we still might want to show the user a prompt like "An unattended update is in progress, complete this update now" instead of waiting for the 3 cron runs. This could trigger a batch process that either destroyed the cron stage and started from scratch or just do the remaining stages of the cron stage. Doing the remaining stages of cron stage might be tricky as \Drupal\package_manager\StageBase::claim checks that the update was started with the same session.

    We may want to prompt the user on the update form or if the site is currently not on a secure version prompt them on every page.

    If we wrote a CronUpdateInProgressValidator that subscribed to StatusCheckEvent then we could inform the user and provide a link to complete the update. We could vary the message based on whether the site was currently insecure. If site was NOt insecure we could also provide a link to delete the cron update. This would allow other Package Manager UIs to be used even if there was a cron update that had been started.

  3. Should we do the first "create" stage of the update in cron even if there is not an available update? This would mean that when a new update arrives if each stage of the life-cycle took its own cron run we would only need 2 cron runs to apply the update. This could very important for critical updates when cron is set to run infrequently

    The logic in LockFileValidator could tell use if the existing stage that was created in previous cron was still valid. If it was not we could recreate the stage.

User interface changes

API changes

Data model changes

Command icon Show commands

Start within a Git clone of the project using the version control instructions.

Or, if you do not have SSH keys set up on git.drupalcode.org:

Comments

tedbow created an issue. See original summary.

tedbow’s picture

Assigned: Unassigned » tedbow
Status: Active » Needs work
tedbow’s picture

I talked with @phenaproxima and he suggested a few things I have implemented

  1. Decorate the maintenance mode service to ensure maintenance mode when we are going to apply an update
  2. Do all the phases in kernel terminate. I was doing doing apply in the kernel request which is at the beginning but he pointed out that this would make so the maintenance mode response would not be returned to the user until after the update was applied so this would take a long time. Since always enforce maintenance mode on the request we are going to apply the update in then we don't have to worry about the terminate event happening after a long request. It will only send the maintenance mode response message

Wim Leers made their first commit to this issue’s fork.

tedbow’s picture

Title: For unattended updates run each stage life cycle phase in a different request » For web cron updates run each stage life cycle phase in a different request
Assigned: tedbow » Unassigned
Status: Needs work » Closed (won't fix)
Issue tags: -sprint, -core-post-mvp
Related issues: +#3362978: Module does not work with Automated Cron

The current MR 3360656-split-requests suffers from the same problem that we discovered in #3362978: Module does not work with Automated Cron, basically we can't use the shared temp store kernel terminate.

I just closed #3362978: Module does not work with Automated Cron in favor of first trying #3357969: For web server dependent unattended updates run the entire life cycle in a separate process that will not be affected by hosting time limits

If we can get #3357969 to work we probably won't need to reopen this.

If #3357969 we can look into split the phases into different requests of the newly created \Drupal\automatic_updates\CronUpdateStage::run decorates the cron service. Or we could create our route that could independent of cron.

wim leers’s picture

👏

Thanks for the super clear comment in #7, that really helps, and is deeply appreciated!

See you in #3357969: For web server dependent unattended updates run the entire life cycle in a separate process that will not be affected by hosting time limits 😊