Problem/Motivation

In #3277034: Unhandled Composer Stager exceptions leave the update process in an indeterminate state we are wrapping the line $this->beginner->begin($active_dir, $stage_dir, new PathList($event->getExcludedPaths()), NULL, $timeout); and $this->stager->stage($command, $active_dir, $stage_dir, NULL, $timeout);in try-catch structures which can handle Composer Stager exceptions.

But, if an event subscriber to the stage lifecycle throws an exception, it can result in bad, incongruous UX. For example, if a PostRequireEvent subscriber throws an exception, the user is still redirected to the confirmation page to continue the update (/admin/automatic-update-ready). They will see the error message, and they won't be able to continue the update...but if a PreRequireEvent subscriber throws an exception, users are redirected to the start page!

Worse yet, there may not be any way to recover from that error. The only recourse is to press the "Delete existing update" button and try again, and hope it works this time. That sucks.

Also there is a inconsistency in handling of composer stager exceptions in StageBase that is the require() catches all throwable exceptions and then catch it checks for the two classes( InvalidArgumentException and PreconditionException) while in apply() there are two catches one for the two classes and other for the all throwable exceptions

Proposed resolution

When an exception is thrown by a stage lifecycle event subscriber, we should destroy the stage on behalf of the user, before returning them to an appropriate page for them to continue their work. Change StageBase::create(), StageBase::require() and StageBase::postApply() to destroy stage when there is an error in dispatching the events. With this change the user will now won't have to click "Cancel update" button on the UpdateReady when there is an error because we now have destroyed the stage for them.

Currently BatchProcessor::postApply() throws the exception which causes it the site to be redirected to UpdateReady form when there is an error. Change this behaviour to match with code>BatchProcessor::clean() because if we got successfully till BatchProcessor::postApply() the update is already done and we don't need to redirect it back to update ready form when there is an error.

Also make handling of composer stager exceptions in StageBase::require() consistent with StageBase::apply().

Remaining tasks

file a follow-up to repeat the solution for Automatic Updates Extensions

Command icon Show commands

Start within a Git clone of the project using the version control instructions.

Or, if you do not have SSH keys set up on git.drupalcode.org:

Comments

kunal.sachdev created an issue. See original summary.

tedbow’s picture

Issue summary: View changes
Issue tags: +sprint
phenaproxima’s picture

phenaproxima’s picture

Title: Determine if we should delete the stage in the Updater for any errors because there is not UI to recover from situation » If an error occurs during an attended process, delete the stage on behalf of the user

Re-titling.

kunal.sachdev’s picture

Assigned: Unassigned » kunal.sachdev

kunal.sachdev’s picture

Issue summary: View changes
kunal.sachdev’s picture

I think the problem that we want to solve in this issue is that if there are any errors during PreCreateEvent/PostCreateEvent or PreRequireEvent/PostRequireEvent then we want to destroy the stage in UpdateStage for the user. If that's right then why shouldn’t we do it in StageBase so that the same thing will not have to be repeated for Project Browser @tedbow, @phenaproxima? Actually @Wim Leers and I were discussing about this issue and he suggested this.

wim leers’s picture

Status: Active » Needs work
Issue tags: +Needs issue summary update

Actually @Wim Leers and I were discussing about this issue and he suggested this.

Yep.

The rationale for doing it only for UpdateStage is not documented on this issue nor on #3277034: Unhandled Composer Stager exceptions leave the update process in an indeterminate state.

Furthermore, @tedbow wrote at #3277034:

[…] because we know in Automatic Updates we don't have a UI to recover from an error here to in Updater we would still want to delete the stage

… but implementing it in UpdateStage affects both manually triggered updates (i.e. "user is present") as well unattended updates, because CronUpdateStage extends UpdateStage.

Tagging Needs issue summary update for clarifying this.


@kunal.sachdev: Can you think of a reason why it should be implemented as you did, @kunal.sachdev? Like I explained on our meeting earlier, that'd help inform the discussion.

#8 is too succinct: it doesn't try to articulate a rationale for the current approach. I expected more detail.

kunal.sachdev’s picture

@Wim Leers I don't think I had any reason for implementing it this way, I just followed the same way I implemented this in #3277034: Unhandled Composer Stager exceptions leave the update process in an indeterminate state in which @tedbow while reviewing initially requested changes in UpdateStage( Updater) https://git.drupalcode.org/project/automatic_updates/-/merge_requests/63... but later in https://git.drupalcode.org/project/automatic_updates/-/merge_requests/74... he suggested we should do this in separate issue and that is the current issue.

kunal.sachdev’s picture

Discussed with @tedbow and @Wim Leers and decided that we should destroy the stage in the StageBase only. I reverted the changes in UpdateStage and added the logic to destroy the stage in StageBase itself.

kunal.sachdev’s picture

\Drupal\Tests\package_manager\Kernel\LockFileValidatorTest::testLockFileChanged and \Drupal\Tests\package_manager\Kernel\LockFileValidatorTest::testLockFileDeleted is on Pre-require event while asserting status check results because now we are deleting stage on error during Pre-require event in StageBase see

    $on_error = $destroy_on_failure ? [$this, 'destroy'] : NULL;

    $this->dispatch(new PreRequireEvent($this, $runtime, $dev), $on_error);

and we pass the same stage while checking the status check results in test see

$stage = $this->assertResults([$result], $event_class);
// A status check should agree that there is an error here.
$this->assertStatusCheckResults([$result], $stage);

but now that stage is not available the error is not added. I think this behaviour is right as already during pre-require event we have shown the error and destroy the stage so we should have error added during status-check event after that.

So for now I think instead of

$stage = $this->assertResults([$result], $event_class);
// A status check should agree that there is an error here.
$this->assertStatusCheckResults([$result], $stage);

we should do

    $stage = $this->assertResults([$result], $event_class);
    $this->assertStatusCheckResults($event_class === PreRequireEvent::class
      // A status check should not surface this error, because the stage has
      // automatically been destroyed.
      ? []
      // A status check should agree that there is an error here.
      : [$result],
      $stage);

because this method test pre-apply event also during which we don't destroy the stage and we should see error during status check event.

Also while debugging this I found some inconsistencies handling composer stager exceptions in \Drupal\package_manager\StageBase::require and \Drupal\package_manager\StageBase::apply i.e. require() does

      try {
        $this->stager->stage($command, $active_dir, $stage_dir, NULL, $timeout);
      }
      catch (\Throwable $e) {
        // If the caught exception isn't InvalidArgumentException or
        // PreconditionException, a Composer operation was actually attempted,
        // and failed. The stage should therefore be destroyed, because it's in
        // an indeterminate and possibly unrecoverable state.
        assert(!$e instanceof InvalidArgumentException, sprintf("%s is calling Composer Stager with invalid arguments: this indicates a bug in your stage. The message was: %s.", get_class(), $e->getMessage()));
        assert(!$e instanceof InvalidArgumentException, sprintf("A precondition wasn't fulfilled while calling Composer Stager in %s: this indicates a bug in your stage. The message was: %s.", get_class(), $e->getMessage()));
        if ($destroy_on_failure) {
          $this->destroy();
        }
        elseif (!$e instanceof InvalidArgumentException && !$e instanceof PreconditionException) {
          $this->destroy();
        }
        $this->rethrowAsStageException($e);
      }
    };

while apply() does

    try {
      $this->committer->commit($stage_dir, $active_dir, $paths_to_exclude, NULL, $timeout);
    }
    catch (InvalidArgumentException | PreconditionException $e) {
      // The commit operation has not started yet, so we can clear the failure
      // marker.
      $this->failureMarker->clear();
      $this->rethrowAsStageException($e);
    }
    catch (\Throwable $throwable) {
      // The commit operation may have failed midway through, and the site code
      // is in an indeterminate state. Release the flag which says we're still
      // applying, because in this situation, the site owner should probably
      // restore everything from a backup.
      $this->setNotApplying()();
      // Update the marker file with the information from the throwable.
      $this->failureMarker->write($this, $this->getFailureMarkerMessage(), $throwable);
      throw new ApplyFailedException($this, $this->failureMarker->getMessage(), $throwable->getCode(), $throwable);
    }

. The inconsistency is that the require() catches all throwable exceptions and then catch it checks for the two classes( InvalidArgumentException and PreconditionException) while in apply() there are two catches one for the two classes and other for the all throwable exceptions. So tagging this issue as Needs architectural review.

kunal.sachdev’s picture

Assigned: kunal.sachdev » Unassigned
Status: Needs work » Needs review
phenaproxima’s picture

After reading #12, I too am wondering why they act differently. On its face, I can't see any reason for the discrepancy.

I'm going to open a separate MR on this same issue fork to change it and see what breaks. I think that what apply() does is cleaner.

phenaproxima’s picture

Looks like changing require() to be consistent with the way apply() works won't break anything. So let's go ahead and make them consistent, either in this issue or a follow-up.

kunal.sachdev’s picture

Assigned: Unassigned » kunal.sachdev
Status: Needs review » Needs work
kunal.sachdev’s picture

Issue summary: View changes

I think we can make require() and apply() consistent in this issue. So, updating the issue summary.

kunal.sachdev’s picture

Issue summary: View changes
kunal.sachdev’s picture

Assigned: kunal.sachdev » Unassigned
Status: Needs work » Needs review
tedbow’s picture

Status: Needs review » Needs work
Issue tags: +Needs tests, +Usability
StatusFileSize
new2.18 KB
new35.84 KB

I don't think the "Proposed resolution" actually covers what this issue is doing. It talks about "lifecycle event subscriber" but we are only handling "require". Is that right?

What happens if there is error on preApply()? In PostApply?

I think we should handle the other events in this issue or rescope the "Proposed resolution" make follow-ups for other events. I think we should handle them here.

Also even if we only handled require I think an issue like this would need functional test coverage

from the summary

For example, if a PostRequireEvent subscriber throws an exception, the user is still redirected to the confirmation page to continue the update (/admin/automatic-update-ready).

If that is not the case anymore we should have test coverage confirming this.

\Drupal\Tests\automatic_updates\Functional\UpdateErrorTest::testUpdateStoppedByEventSubscriber might be good place for this test coverage if we don't have it anywhere else. We actually already have todo in the test referencing this issue

In the "Proposed resolution" we should detail what the changes will be from the user's perspective now that they we will be deleting the stage for them. Will they be redirected to different page? Will they not have to click the "Cancel update" button on the UpdateReady from because we now have destroyed the stage for them?

for instance I debugged testUpdateStoppedByEventSubscriber. If an error happens PostApply the user is redirected back to UpdateReady and there only option is "Cancel update". They also get the LockFileValidator error because the stage has already been applied. In this case I think it seems reasonable that we would just delete the stage for them.

testUpdateStoppedByEventSubscriber also showed me that if an error happens in PreApply the user is also redirected back to UpdateReady but they have the option to "Continue" or "Cancel Update". In this case I think it is reasonable not to delete the stage automatically because the validation error might be something they could fix. But we may want to look into updating \Drupal\automatic_updates\BatchProcessor::finishCommit to tell the user they should try to resolve this issue if possible before continuing. That can be another issue.

Also in PostCreate if we get an error we don't delete the stage. There is already a @todo for this in the test to solve it in this issue.

Basically we need to figure out the situations where the only thing the user can do is "Cancel Update" and start over. In those situations we should probably be deleting the stage for them.

I am including a patch I used to output the page for all the test cases in testUpdateStoppedByEventSubscriber where the user is left. This can be used to determine when we should delete the stage.

Also including the zip file of the debug output I got from the current MR with this patch. It has all 14 test cases. Some of them are probably fine but I think at least PostApply and PostCreate are not great UX

tedbow’s picture

Assigned: Unassigned » kunal.sachdev
kunal.sachdev’s picture

I think we still need to destroy stage in \Drupal\automatic_updates\BatchProcessor::stage() for the case if $this->stager->stage($command, $active_dir, $stage_dir, NULL, $timeout); fails in \Drupal\package_manager\StageBase::require() with InvalidArgumentException or PreconditionException where we don't destroy stage.

kunal.sachdev’s picture

Assigned: kunal.sachdev » Unassigned
Status: Needs work » Needs review
wim leers’s picture

Assigned: Unassigned » kunal.sachdev
Status: Needs review » Needs work

A new quarter, time to increase the expectations! 🤓 I know you can do it! 😊

  1. #21 added Needs tests. Why is this marked Needs review if the tag is still present? Are the necessary tests present now?
  2. I spotted a few small issues in the MR, and have a few questions.
  3. Finally: I tagged this Needs issue summary update for something very specific in #9 >3 weeks ago. That clarification is still necessary. If you're able to push this forward, you're also able to address this, which would make reviews much more efficient.
kunal.sachdev’s picture

Created #3372940: Tell the user to try resolve the issue if possible before continuing if there is an error in PreApply as per #21

testUpdateStoppedByEventSubscriber also showed me that if an error happens in PreApply the user is also redirected back to UpdateReady but they have the option to "Continue" or "Cancel Update". In this case I think it is reasonable not to delete the stage automatically because the validation error might be something they could fix. But we may want to look into updating \Drupal\automatic_updates\BatchProcessor::finishCommit to tell the user they should try to resolve this issue if possible before continuing. That can be another issue.
kunal.sachdev’s picture

Issue tags: -Needs tests

The test is already there i.e. \Drupal\Tests\automatic_updates\Functional\UpdateErrorTest::testUpdateStoppedByEventSubscriber which is updated accordingly.

kunal.sachdev’s picture

Issue summary: View changes
Issue tags: -Needs issue summary update

Updated the issue summary.

kunal.sachdev’s picture

Assigned: kunal.sachdev » Unassigned
Status: Needs work » Needs review
tedbow’s picture

Status: Needs review » Needs work
Issue tags: +Needs tests

looking good

kunal.sachdev’s picture

Issue tags: -Needs tests
kunal.sachdev’s picture

Status: Needs work » Needs review