Problem/Motivation
This was discovered by Wim in #3338666: Add functional test that proves there is reasonable UX whenever a stage event subscriber has an exception. Copying here for posterity:
th[e] stack trace [of an exception raised during batch processing] should also have been logged, to make diagnosis after the fact possible. We should assert that this indeed happened. Otherwise post-mortems in the real-world (where updates will be installed automatically, and hence no user will be present to witness the error) will become impossible.
This was discovered during form updates but we want exceptions from the event subscribers like this to be logged however they occur.
Proposed resolution
- In
\Drupal\package_manager\StageBase::dispatch()we should log any error with the backtrace. - Add test coverage in
StageBaseTestfor the logging - After 1) we should be able to remove the logging that would now be duplicated in StageBase
For example
We logging at the end of the
catchafter$stage_id = parent::begin(['drupal' => $target_version], $timeout); $this->stage(); $this->apply();We would no longer need to log this if the exception was a
StageEventExceptioninstance because this would already be logged.try { $this->postApply(); } catch (\Throwable $e) { $this->logger->error($e->getMessage()); }Would also not need to be logged if it is instance of
StageEventExceptioninstancetry { $this->destroy(); } catch (StageEventException $e) { $this->logger->error($e->getMessage()); }Looking at the comment before this I think we could remove the try/catch around this all together
Issue fork automatic_updates-3354701
Show commands
Start within a Git clone of the project using the version control instructions.
Or, if you do not have SSH keys set up on git.drupalcode.org:
Comments
Comment #2
wim leersWithout this, figuring out what actually happened on real sites would be impossible. Bumping to .
Comment #3
wim leersThis came up again at #3351895-13: Add Drush command to allow running cron updates via console and by a separate user, for defense-in-depth.
Pulling into sprint.
Comment #4
wim leersComment #5
yash.rode commentedComment #7
wim leersNo commits yet in >36 hours. Everything going okay here, @yash.rode?
Comment #8
yash.rode commentedI was wrong yesterday that PreDestroyEvent exception is not logged but the case is both PreDestroyEvent and PostDestroyEvent Exceptions are not getting logged, I tried debugging it, as we don't know where exactly the error messages are getting logged I could not find it why it is happening.
Comment #9
tedbowIn the quote from @Wim Leers in the summary we have
but in the suggested solution it says we should change
UpdateErrorTest::testExceptionFromEventSubscriber(),which doesn't exist anymore since #3354325: Consolidate UpdateErrorTest but was a functional that test of what happens when the user does the update manually so won't cover "no user will be present "Also the current MR changes
src/BatchProcessor.phpwhich also won't affect cron updates.This is another example of why we should not make issues that simply quote another issue as the "Problem/Motivation" because is unclear unless you really dig in the other issue what the scope of the problem to be addressed is.
Comment #10
phenaproxima@tedbow, to me this was perfectly clear. If an exception occurs from an event subscriber during batch processing, we want to be sure the exception and its backtrace are logged. Although the issue summary refers to a non-existent method, the gist is still correct, and what we're trying to implement. What's the problem here?
Comment #11
tedbowre #10
but the quote from the issue summary says
(emphasis mine)
If "no user will be present to witness the error" then we are talking about cron updates, cron updates do not use "batch processing"
Comment #12
wim leers@tedbow's point in #11 is fair, since apparently cron vs UI (non-cron) updates have different execution flow.
The real point here though is that exceptions should ALWAYS be logged, regardless of cron vs UI updates! And #11 may be correct in quoting what I said literally but is sort of ignoring the spirit: we cannot expect end users to meticulously take notes of everything they see happen. No matter whether that's during cron updates (where they CANNOT see it) or UI updates (where they COULD see it).
However … it's shocking news to me that cron updates and non-cron updates apparently do not use the exact same logic! 😳 It never even occurred to me that they'd be implemented differently. The stage life cycle is the same, only the way things get executed is different.
We need to document why these are implemented so differently. Why is it for UI-based installing of updates not okay to assume the entire stage life cycle finishes within a single request, but for cron-based installing of updates it is okay? 🤔
Unassigning @yash.rode, because @tedbow and @phenaproxima need to clarify this. We also need a follow-up issue that only either of them can work on to document the architectural differences between cron vs non-cron updates.
Comment #13
tedbowI agree. This should have been state in the summary more clearly. I don't think we should be making issues and handing them off to others follow the "spirit" of the issue.
StagedDBUpdateValidatorbut this is very important information and the user may decide not to run an update that has staged database updates(to run it at another time). Other contrib or custom code also follow this same pattern of listening to StatusCheckEvent and if there is staged update show some warning if it is not cron and an error if it is cron.In cron updates we don't allow updates to run if there are staged database updates. There wouldn't be anybody to show this information to anyways.
Comment #14
tedbowCreated a follow-up #3357632: [META] Update doc comment on BatchProcessor to specify why use the batch system
Comment #15
tedbowComment #16
omkar.podey commentedComment #17
omkar.podey commentedSo is the conclusion that cron sets a longer time hence can be assumed that it happens in a single request so this should be only done for cron and not for the UI updates ?, can you update the issue summary so it's more clear what the next steps are. and could this issue affected by #3357969: For web server dependent unattended updates run the entire life cycle in a separate process that will not be affected by hosting time limits
Comment #18
tedbow@omkar.podey I have update the summary. I think this is actionable now
The issue will not be affect by #3357969: For web server dependent unattended updates run the entire life cycle in a separate process that will not be affected by hosting time limits
Comment #19
omkar.podey commentedComment #20
omkar.podey commentedComment #21
omkar.podey commentedComment #22
omkar.podey commentedComment #23
wim leersA new quarter, time to increase the expectations! 🤓 I know you can do it! 😊
Comment #24
omkar.podey commentedComment #25
omkar.podey commented@Wim.leers I was trying to think of a way to test the back trace as i'm currently logging it as context. do you think that's necessary ?
Comment #26
phenaproximaSelf-assigning for review.
Comment #27
phenaproximaThis seems like a reasonable start, but there are some aspects that feel...porous to me. Maybe I'm missing something, though. In general I think we need better comments.
Comment #28
omkar.podey commentedComment #29
omkar.podey commentedComment #30
tedbowreviewing
Comment #31
tedbowLooking good. Just a few more things
Comment #32
tedbowComment #33
omkar.podey commentedComment #34
tedbowNeeds to be merged/rebased with 3.0.x
Comment #35
phenaproximaComment #36
tedbowComment #37
phenaproximaComment #38
tedbowLooks good!
Comment #40
phenaproxima