If an error occurs during an attended process, delete the stage on behalf of the user [#3346644]

Problem/Motivation

In #3277034: Unhandled Composer Stager exceptions leave the update process in an indeterminate state we are wrapping the line $this->beginner->begin($active_dir, $stage_dir, new PathList($event->getExcludedPaths()), NULL, $timeout); and $this->stager->stage($command, $active_dir, $stage_dir, NULL, $timeout);in try-catch structures which can handle Composer Stager exceptions.

But, if an event subscriber to the stage lifecycle throws an exception, it can result in bad, incongruous UX. For example, if a PostRequireEvent subscriber throws an exception, the user is still redirected to the confirmation page to continue the update (/admin/automatic-update-ready). They will see the error message, and they won't be able to continue the update...but if a PreRequireEvent subscriber throws an exception, users are redirected to the start page!

Worse yet, there may not be any way to recover from that error. The only recourse is to press the "Delete existing update" button and try again, and hope it works this time. That sucks.

Also there is a inconsistency in handling of composer stager exceptions in StageBase that is the require() catches all throwable exceptions and then catch it checks for the two classes( InvalidArgumentException and PreconditionException) while in apply() there are two catches one for the two classes and other for the all throwable exceptions

Proposed resolution

When an exception is thrown by a stage lifecycle event subscriber, we should destroy the stage on behalf of the user, before returning them to an appropriate page for them to continue their work. Change StageBase::create(), StageBase::require() and StageBase::postApply() to destroy stage when there is an error in dispatching the events. With this change the user will now won't have to click "Cancel update" button on the UpdateReady when there is an error because we now have destroyed the stage for them.

Currently BatchProcessor::postApply() throws the exception which causes it the site to be redirected to UpdateReady form when there is an error. Change this behaviour to match with code>BatchProcessor::clean() because if we got successfully till BatchProcessor::postApply() the update is already done and we don't need to redirect it back to update ready form when there is an error.

Also make handling of composer stager exceptions in StageBase::require() consistent with StageBase::apply().

Remaining tasks

file a follow-up to repeat the solution for Automatic Updates Extensions

Comment	File	Size	Author
#21	testUpdateStoppedByEventSubscriber-output.zip	35.84 KB	tedbow
#21	testUpdateStoppedByEventSubscriber-debug.patch	2.18 KB	tedbow

Issue fork automatic_updates-3346644

Show commands

Start within a Git clone of the project using the version control instructions.

Add & fetch this issue fork’s repository

Or, if you do not have SSH keys set up on git.drupalcode.org:

Add & fetch this issue fork’s repository

3346644-if-an-error changes, plain diff MR !909
Check out this branch for the first time

Check out existing branch, if you already have it locally
make-require-consistent changes, plain diff MR !915
Check out this branch for the first time

Check out existing branch, if you already have it locally

About issue forks

Comments

Comment #1

8 March 2023 at 07:55

kunal.sachdev created an issue. See original summary.

Comment #2

tedbow

he/him

English

Ithaca, NY, USA

commented 5 April 2023 at 12:35

Issue summary:	View changes
Issue tags:		+sprint

Comment #3

phenaproxima

he/him

English

Massachusetts

commented 17 April 2023 at 19:25

Issue summary:

View changes

Merging in some stuff from #3354699: If a PostRequireEvent subscriber throws an exception, users are redirected to the confirmation page, which I'm closing as a duplicate.

Comment #4

phenaproxima

he/him

English

Massachusetts

commented 17 April 2023 at 19:27

Title:

Determine if we should delete the stage in the Updater for any errors because there is not UI to recover from situation

» If an error occurs during an attended process, delete the stage on behalf of the user

Re-titling.

Comment #5

kunal.sachdev commented 8 June 2023 at 08:43

Assigned:

Unassigned

» kunal.sachdev

Comment #6

8 June 2023 at 10:24

kunal.sachdev opened merge request !909

Comment #7

kunal.sachdev commented 12 June 2023 at 08:44

Issue summary:

View changes

Comment #8

kunal.sachdev commented 12 June 2023 at 09:04

I think the problem that we want to solve in this issue is that if there are any errors during PreCreateEvent/PostCreateEvent or PreRequireEvent/PostRequireEvent then we want to destroy the stage in UpdateStage for the user. If that's right then why shouldn’t we do it in StageBase so that the same thing will not have to be repeated for Project Browser @tedbow, @phenaproxima? Actually @Wim Leers and I were discussing about this issue and he suggested this.

Comment #9

wim leers

Ghent 🇧🇪🇪🇺

commented 12 June 2023 at 10:22

Status:	Active	» Needs work
Issue tags:		+Needs issue summary update

Actually @Wim Leers and I were discussing about this issue and he suggested this.

Yep.

The rationale for doing it only for UpdateStage is not documented on this issue nor on #3277034: Unhandled Composer Stager exceptions leave the update process in an indeterminate state.

Furthermore, @tedbow wrote at #3277034:

[…] because we know in Automatic Updates we don't have a UI to recover from an error here to in Updater we would still want to delete the stage

… but implementing it in UpdateStage affects both manually triggered updates (i.e. "user is present") as well unattended updates, because CronUpdateStage extends UpdateStage.

Tagging Needs issue summary update for clarifying this.

@kunal.sachdev: Can you think of a reason why it should be implemented as you did, @kunal.sachdev? Like I explained on our meeting earlier, that'd help inform the discussion.

#8 is too succinct: it doesn't try to articulate a rationale for the current approach. I expected more detail.

Comment #10

kunal.sachdev commented 12 June 2023 at 11:21

@Wim Leers I don't think I had any reason for implementing it this way, I just followed the same way I implemented this in #3277034: Unhandled Composer Stager exceptions leave the update process in an indeterminate state in which @tedbow while reviewing initially requested changes in UpdateStage( Updater) https://git.drupalcode.org/project/automatic_updates/-/merge_requests/63... but later in https://git.drupalcode.org/project/automatic_updates/-/merge_requests/74... he suggested we should do this in separate issue and that is the current issue.

Comment #11

kunal.sachdev commented 12 June 2023 at 14:41

Discussed with @tedbow and @Wim Leers and decided that we should destroy the stage in the StageBase only. I reverted the changes in UpdateStage and added the logic to destroy the stage in StageBase itself.

Comment #12

kunal.sachdev commented 13 June 2023 at 08:54

Issue tags:

+Needs architectural review

\Drupal\Tests\package_manager\Kernel\LockFileValidatorTest::testLockFileChanged and \Drupal\Tests\package_manager\Kernel\LockFileValidatorTest::testLockFileDeleted is on Pre-require event while asserting status check results because now we are deleting stage on error during Pre-require event in StageBase see

    $on_error = $destroy_on_failure ? [$this, 'destroy'] : NULL;

    $this->dispatch(new PreRequireEvent($this, $runtime, $dev), $on_error);

and we pass the same stage while checking the status check results in test see

$stage = $this->assertResults([$result], $event_class);
// A status check should agree that there is an error here.
$this->assertStatusCheckResults([$result], $stage);

but now that stage is not available the error is not added. I think this behaviour is right as already during pre-require event we have shown the error and destroy the stage so we should have error added during status-check event after that.

So for now I think instead of

$stage = $this->assertResults([$result], $event_class);
// A status check should agree that there is an error here.
$this->assertStatusCheckResults([$result], $stage);

we should do

    $stage = $this->assertResults([$result], $event_class);
    $this->assertStatusCheckResults($event_class === PreRequireEvent::class
      // A status check should not surface this error, because the stage has
      // automatically been destroyed.
      ? []
      // A status check should agree that there is an error here.
      : [$result],
      $stage);

because this method test pre-apply event also during which we don't destroy the stage and we should see error during status check event.

Also while debugging this I found some inconsistencies handling composer stager exceptions in \Drupal\package_manager\StageBase::require and \Drupal\package_manager\StageBase::apply i.e. require() does

      try {
        $this->stager->stage($command, $active_dir, $stage_dir, NULL, $timeout);
      }
      catch (\Throwable $e) {
        // If the caught exception isn't InvalidArgumentException or
        // PreconditionException, a Composer operation was actually attempted,
        // and failed. The stage should therefore be destroyed, because it's in
        // an indeterminate and possibly unrecoverable state.
        assert(!$e instanceof InvalidArgumentException, sprintf("%s is calling Composer Stager with invalid arguments: this indicates a bug in your stage. The message was: %s.", get_class(), $e->getMessage()));
        assert(!$e instanceof InvalidArgumentException, sprintf("A precondition wasn't fulfilled while calling Composer Stager in %s: this indicates a bug in your stage. The message was: %s.", get_class(), $e->getMessage()));
        if ($destroy_on_failure) {
          $this->destroy();
        }
        elseif (!$e instanceof InvalidArgumentException && !$e instanceof PreconditionException) {
          $this->destroy();
        }
        $this->rethrowAsStageException($e);
      }
    };

while apply() does

    try {
      $this->committer->commit($stage_dir, $active_dir, $paths_to_exclude, NULL, $timeout);
    }
    catch (InvalidArgumentException | PreconditionException $e) {
      // The commit operation has not started yet, so we can clear the failure
      // marker.
      $this->failureMarker->clear();
      $this->rethrowAsStageException($e);
    }
    catch (\Throwable $throwable) {
      // The commit operation may have failed midway through, and the site code
      // is in an indeterminate state. Release the flag which says we're still
      // applying, because in this situation, the site owner should probably
      // restore everything from a backup.
      $this->setNotApplying()();
      // Update the marker file with the information from the throwable.
      $this->failureMarker->write($this, $this->getFailureMarkerMessage(), $throwable);
      throw new ApplyFailedException($this, $this->failureMarker->getMessage(), $throwable->getCode(), $throwable);
    }

. The inconsistency is that the require() catches all throwable exceptions and then catch it checks for the two classes( InvalidArgumentException and PreconditionException) while in apply() there are two catches one for the two classes and other for the all throwable exceptions. So tagging this issue as Needs architectural review.

Comment #13

kunal.sachdev commented 13 June 2023 at 12:04

Assigned:	kunal.sachdev	» Unassigned
Status:	Needs work	» Needs review

Comment #14

phenaproxima

he/him

English

Massachusetts

commented 14 June 2023 at 14:22

After reading #12, I too am wondering why they act differently. On its face, I can't see any reason for the discrepancy.

I'm going to open a separate MR on this same issue fork to change it and see what breaks. I think that what apply() does is cleaner.

Comment #15

14 June 2023 at 14:25

phenaproxima opened merge request !915

Comment #16

phenaproxima

he/him

English

Massachusetts

commented 14 June 2023 at 15:10

Issue tags:

-Needs architectural review

Looks like changing require() to be consistent with the way apply() works won't break anything. So let's go ahead and make them consistent, either in this issue or a follow-up.

Comment #17

kunal.sachdev commented 15 June 2023 at 07:37

Assigned:	Unassigned	» kunal.sachdev
Status:	Needs review	» Needs work

Comment #18

kunal.sachdev commented 15 June 2023 at 08:36

Issue summary:

View changes

I think we can make require() and apply() consistent in this issue. So, updating the issue summary.

Comment #19

kunal.sachdev commented 15 June 2023 at 08:37

Issue summary:

View changes

Comment #20

kunal.sachdev commented 15 June 2023 at 11:52

Assigned:	kunal.sachdev	» Unassigned
Status:	Needs work	» Needs review

Comment #21

tedbow

he/him

English

Ithaca, NY, USA

commented 21 June 2023 at 19:48

Status:	Needs review	» Needs work
Issue tags:		+Needs tests, +Usability

Status	File	Size
new	testUpdateStoppedByEventSubscriber-debug.patch	2.18 KB
new	testUpdateStoppedByEventSubscriber-output.zip	35.84 KB

I don't think the "Proposed resolution" actually covers what this issue is doing. It talks about "lifecycle event subscriber" but we are only handling "require". Is that right?

What happens if there is error on preApply()? In PostApply?

I think we should handle the other events in this issue or rescope the "Proposed resolution" make follow-ups for other events. I think we should handle them here.

Also even if we only handled require I think an issue like this would need functional test coverage

from the summary

For example, if a PostRequireEvent subscriber throws an exception, the user is still redirected to the confirmation page to continue the update (/admin/automatic-update-ready).

If that is not the case anymore we should have test coverage confirming this.

\Drupal\Tests\automatic_updates\Functional\UpdateErrorTest::testUpdateStoppedByEventSubscriber might be good place for this test coverage if we don't have it anywhere else. We actually already have todo in the test referencing this issue

In the "Proposed resolution" we should detail what the changes will be from the user's perspective now that they we will be deleting the stage for them. Will they be redirected to different page? Will they not have to click the "Cancel update" button on the UpdateReady from because we now have destroyed the stage for them?

for instance I debugged testUpdateStoppedByEventSubscriber. If an error happens PostApply the user is redirected back to UpdateReady and there only option is "Cancel update". They also get the LockFileValidator error because the stage has already been applied. In this case I think it seems reasonable that we would just delete the stage for them.

testUpdateStoppedByEventSubscriber also showed me that if an error happens in PreApply the user is also redirected back to UpdateReady but they have the option to "Continue" or "Cancel Update". In this case I think it is reasonable not to delete the stage automatically because the validation error might be something they could fix. But we may want to look into updating \Drupal\automatic_updates\BatchProcessor::finishCommit to tell the user they should try to resolve this issue if possible before continuing. That can be another issue.

Also in PostCreate if we get an error we don't delete the stage. There is already a @todo for this in the test to solve it in this issue.

Basically we need to figure out the situations where the only thing the user can do is "Cancel Update" and start over. In those situations we should probably be deleting the stage for them.

I am including a patch I used to output the page for all the test cases in testUpdateStoppedByEventSubscriber where the user is left. This can be used to determine when we should delete the stage.

Also including the zip file of the debug output I got from the current MR with this patch. It has all 14 test cases. Some of them are probably fine but I think at least PostApply and PostCreate are not great UX

Comment #22

tedbow

he/him

English

Ithaca, NY, USA

commented 21 June 2023 at 19:49

Assigned:

Unassigned

» kunal.sachdev

Comment #23

kunal.sachdev commented 28 June 2023 at 14:36

Created #3371076: [PP-1] There is an error shown along with 'Update complete!' on update page if there is an error. as per discussion with @tedbow during scrum.

Comment #24

kunal.sachdev commented 3 July 2023 at 10:48

I think we still need to destroy stage in \Drupal\automatic_updates\BatchProcessor::stage() for the case if $this->stager->stage($command, $active_dir, $stage_dir, NULL, $timeout); fails in \Drupal\package_manager\StageBase::require() with InvalidArgumentException or PreconditionException where we don't destroy stage.

Comment #25

kunal.sachdev commented 3 July 2023 at 11:37

Assigned:	kunal.sachdev	» Unassigned
Status:	Needs work	» Needs review

Comment #26

wim leers

Ghent 🇧🇪🇪🇺

commented 5 July 2023 at 09:38

Assigned:	Unassigned	» kunal.sachdev
Status:	Needs review	» Needs work

A new quarter, time to increase the expectations! 🤓 I know you can do it! 😊

#21 added Needs tests. Why is this marked Needs review if the tag is still present? Are the necessary tests present now?
I spotted a few small issues in the MR, and have a few questions.
Finally: I tagged this Needs issue summary update for something very specific in #9 >3 weeks ago. That clarification is still necessary. If you're able to push this forward, you're also able to address this, which would make reviews much more efficient.

Comment #27

kunal.sachdev commented 6 July 2023 at 15:03

Created #3372940: Tell the user to try resolve the issue if possible before continuing if there is an error in PreApply as per #21

testUpdateStoppedByEventSubscriber also showed me that if an error happens in PreApply the user is also redirected back to UpdateReady but they have the option to "Continue" or "Cancel Update". In this case I think it is reasonable not to delete the stage automatically because the validation error might be something they could fix. But we may want to look into updating \Drupal\automatic_updates\BatchProcessor::finishCommit to tell the user they should try to resolve this issue if possible before continuing. That can be another issue.

Comment #28

kunal.sachdev commented 6 July 2023 at 15:05

Issue tags:

-Needs tests

The test is already there i.e. \Drupal\Tests\automatic_updates\Functional\UpdateErrorTest::testUpdateStoppedByEventSubscriber which is updated accordingly.