Problem/Motivation

I discovered this while working on #3166602: Ensure media_filter → media_embed mapping does not cause fatal errors and subsequently #3166930: Migrated filters may have invalid/incomplete settings applied.

If a fatal PHP error occurs in the middle of a migration, it leaves the migration in a weird state:

  1. the message of the fatal error is not logged into the Migration system's messages tables, but only in the PHP error log and into watchdog
  2. the migration is left in the \Drupal\migrate\Plugin\MigrationInterface::STATUS_IMPORTING state
  3. therefore the migration is effectively "stuck", and you need to manually unstuck it

Steps to reproduce

See the STR in #3166930: Migrated filters may have invalid/incomplete settings applied.

Proposed resolution

Set a custom exception handler while in MigrateExecutable, because these are currently caught by _drupal_exception_handler(), so surely the migration system can do the same, but provide extra context?

  1. In PHP 7, fatal errors can be caught! So let's catch them, and treat them like exceptions. See https://www.php.net/manual/en/language.errors.php7.php → see #2 👀
  2. We currently only have
      /**
       * Migration error.
       */
      const MESSAGE_ERROR = 1;
    
      /**
       * Migration warning.
       */
      const MESSAGE_WARNING = 2;
    
      /**
       * Migration notice.
       */
      const MESSAGE_NOTICE = 3;
    
      /**
       * Migration info.
       */
      const MESSAGE_INFORMATIONAL = 4;
    

    Also add MESSAGE_FATAL_ERROR so that it's possible to communicate this is even worse than MESSAGE_ERROR. → see #3 👀

Remaining tasks

The decision was made not to do this, see the complete comments, #16, #17, #22 and #33.
Discussed at a migrate meeting #3168931: [meeting] Migrate Meeting 2020-09-10, #3245469: [meeting] Migrate Meeting 2021-10-28 1400Z and #3247150: [meeting] Migrate Meeting 2021-11-04 2100Z

User interface changes

Fatal errors also show up.

API changes

Addition: \Drupal\migrate\Plugin\MigrationInterface:MESSAGE_FATAL_ERROR.

Data model changes

None.

Release notes snippet

Fatal errors while running a migration no longer cause the migration system to crash.

Comments

Wim Leers created an issue. See original summary.

wim leers’s picture

Issue summary: View changes
Status: Active » Needs review
StatusFileSize
new2.84 KB
wim leers’s picture

StatusFileSize
new4.45 KB
wim leers’s picture

Issue summary: View changes
StatusFileSize
new2.99 KB

Forgot the interdiff for #3.

wim leers’s picture

Issue tags: +PHP 7.0 (duplicate)

The last submitted patch, 2: 3167267-2.patch, failed testing. View results

Status: Needs review » Needs work

The last submitted patch, 3: 3167267-3.patch, failed testing. View results

wim leers’s picture

Status: Needs work » Needs review
StatusFileSize
new1.26 KB
new4.28 KB

#2's failures:

  • Drupal\Tests\migrate\Unit\MigrateExecutableTest + Drupal\Tests\migrate\Unit\MigrateExecutableMemoryExceededTest fail because they're simply tightly coupled to an existing function signature
  • Drupal\Tests\migrate\Kernel\MigrateSkipRowTest::testPrepareRowSkip
    Failed asserting that '3' matches expected 2.
    

    … this seems like a genuine failure: I should move the new catch statement to after the catch(MigrateSkipRowException $e) 😃

  • Drupal\Tests\language\Kernel\Migrate\d7\MigrateLanguageContentSettingsTest::testLanguageContent
    Failed asserting that an array is empty.
    

    … I suspect this has the same root cause as the previous point.

Status: Needs review » Needs work

The last submitted patch, 8: 3167267-7.patch, failed testing. View results

wim leers’s picture

Yay, that works! Only the unit tests need to be updated now, plus explicit test coverage.

wim leers’s picture

Issue tags: +Novice

Fixing the few remaining test failures are excellent for a novice contributor 😊

ridhimaabrol24’s picture

Status: Needs work » Needs review
StatusFileSize
new4.84 KB
new504 bytes

Fixing test cases!

benjifisher’s picture

Issue summary: View changes
Status: Needs review » Needs work
Issue tags: +Needs issue summary update

The Novice task has been done, so I am removing that tag.

It is really annoying that \Error does not correspond to 'error', but I do not think we can do anything about that (unless we go through a deprecation process). If we want to do that, it can be in a follow-up issue.

It is not too late to replace 'fatal' with 'critical', to match const MESSAGE_CRITICAL. Let's do that.

I am adding the tag for an issue summary update. Update the Proposed resolution section to match the current patch. (At least the name of the constant needs to be updated.) That seems like a Novice-level task, so maybe I will not remove that tag after all.

If we have explicit steps to reproduce, then we should be able to create a test. Probably it will have to be a functional test, not a real unit test.

Vidushi Mehta’s picture

Issue summary: View changes
Status: Needs work » Needs review
StatusFileSize
new4.85 KB
new1.94 KB

Added a patch which replace the 'fatal' as mentioned by #13.

wim leers’s picture

#12 and #14 look great!

If we have explicit steps to reproduce, then we should be able to create a test. Probably it will have to be a functional test, not a real unit test.

Add a custom migration that we run explicitly, that intentionally triggers a fatal PHP error.

mikelutz’s picture

Priority: Major » Normal
Issue tags: -Novice +Needs framework manager review

So, I'm going to go on the record as being opposed to this change. A quick search through core for /catch\s?\(.*\\Error\s.*\)/ tells me that nowhere else in Drupal core do we attempt to catch php errors, and for pretty good reasons. Having to trap errors at this level suggests that there are significant flaws in the migration system that should be addressed. There shouldn't be a way to write a custom migration using only core code that is capable of intentionally triggering a fatal error, unless you have a test module with a custom plugin that is intentionally doing something bad.
If there is a way to write a custom migration that triggers a fatal error that isn't handled by the migration system before it happens, we should fix that, not try to hide it.
If we have to write a broken custom plugin to test this, then we aren't really testing core code.
There are plenty of processes in core that would be left in a funky state if a fatal error occurs during them, and core passes off to uncontrolled contrib plugins all the time, so why should this particular system be the one instance where we try to catch a Fatal error?

I understand the reasons for wanting to do this, and at the end of the day, I'm happy to be overruled, but to me, this is not good coding. We shouldn't be catching errors, we should be programming defensively to make sure they don't occur.

mikelutz’s picture

Adding in a relevant thread from the recent migration meeting:

4️⃣ MigrateExecutable should catch not only exceptions, but also fatal errors

benjifisher #3167267: MigrateExecutable should catch not only exceptions, but also fatal errors
mikelutz I mentioned in the issues, but I’m opposed to this.
benjifisher @mikelutz thinks this is a bad idea. We should be fixing/preventing fatal errors, not catching them.
benjifisher x-post
mikelutz I couldn’t find another place in core where we try to catch actual php errors.
benjifisher I do not have a strong opinion here. If I am the hard-working developer trying to migrate my client's site, then I do not want to get into lots of trouble because one of the contrib modules has some unstable code. OTOH, something needs to be fixed in order to get the content migrated. I am not really in "lots of trouble" because I can just drush mrs my_broken_migration and get back to work.
benjifisher Let me think about this some more. If I agree with you, shall we just close the issue? Or should we let a framework manager decide (since you added t hat tag)?
xurizaemon I've run into fatal errors in migration code and they can be hard to identify at times, but I think I'm with Mike on not trapping them. As an aid to migration developers, and maybe an awful halfway measure, perhaps we could catch and rethrow them after logging? That might allow us to surface the underlying issue in the place a migration dev is already looking. OTOH the php error log is "the right" place for fatal errors to go already. (edited)
benjifisher Does it make a difference if we can get into this situation with a simple typo in the migration (YAML)? The PHP code is arguably correct, even if it is not 100% bullet-proof.After carefully analyzing the code, I came up with this example. (That is my story, and I am sticking to it.)Suppose I have a migration with destination: {plugin: 'entity:media'} and, instead of bundle: - plugin: default_value default_value: imageI have bundle: - plugin: default_value default_value: imagineWhen the migration tries to save a Media entity, it gets to this line in Drupal\media\Entity\Media::getSource(): return $this->bundle->entity->getSource();This leads to the PHP errorCall to a member function getSource() on null in Drupal\media\Entity\Media->getSource() (line 137 of /var/www/html/web/core/modules/media/src/Entity/Media.php)As I said on Thursday, the solution is drush mrs my_broken_migration , then fix the typo, clear the pluign cache, and try again.
mikelutz Assuming the underlying issue is that \Drupal::entityTypeManager()->getStorage('media')->create(['name' => 'media', 'type' => 'this_bundle_doesnt_exist']) Triggers a php error instead of throwing an exception, then that should be fixed in media.
mikelutz Either way, it doesn’t change my mind that the code that is throwing the error should be fixed.  Media should ensure there is a valid bundle before trying to call getSource();
benjifisher Pretty close. I will have to check, but I think create() is OK; it is save() that causes the error.

Participants:

benjifisher, mikelutz, xurizaemon

wim leers’s picture

A quick search through core for /catch\s?\(.*\\Error\s.*\)/ tells me that nowhere else in Drupal core do we attempt to catch php errors

… because this only became possible in Drupal 9, which hasn't existed for very long, since it requires PHP >=7.

Having to trap errors at this level suggests that there are significant flaws in the migration system that should be addressed.

Any migration source/destination/process plugin can trigger a fatal PHP error that can crash an entire migration. I'm arguing it should not.

If there is a way to write a custom migration that triggers a fatal error that isn't handled by the migration system before it happens, we should fix that, not try to hide it.

Can you point me to the places in the codebase and to issue queue discussions where this is an explicit rationale?

If we have to write a broken custom plugin to test this, then we aren't really testing core code.

Core code is generally more robust than contrib/custom code, so IMHO this is not a convincing justification.

We shouldn't be catching errors, we should be programming defensively to make sure they don't occur.

Agreed :) But we can't control all code that all people write. And I think this very measure would fall under the umbrella of "programming defensively"? 😅🤓


Not everyone is knowledgeable enough to debug PHP errors. Empowering people with less experience to get pretty far along the way in a migration without having to know xdebug to figure out the cause for a fatal PHP error would make running migrations more accessible.

mikelutz’s picture

A quick search through core for /catch\s?\(.*\\Error\s.*\)/ tells me that nowhere else in Drupal core do we attempt to catch php errors

… because this only became possible in Drupal 9, which hasn't existed for very long, since it requires PHP >=7.

Drupal 9 has been branched off for over a year at this point. Plenty of time to trap one error somewhere if that was what we were going to do.

Having to trap errors at this level suggests that there are significant flaws in the migration system that should be addressed.

Any migration source/destination/process plugin can trigger a fatal PHP error that can crash an entire migration. I'm arguing it should not.

This isn't compelling to me. ANY plugin(/hook/controller/event handler/theme function) that executes code can crash an entire page load or any other process that drupal manages. This is not unique to migrate plugins, yet you propose a unique solution.

If there is a way to write a custom migration that triggers a fatal error that isn't handled by the migration system before it happens, we should fix that, not try to hide it.

Can you point me to the places in the codebase and to issue queue discussions where this is an explicit rationale?

You want me to find precedent for the idea that we should fix errors rather than hide them?

If we have to write a broken custom plugin to test this, then we aren't really testing core code.

Core code is generally more robust than contrib/custom code, so IMHO this is not a convincing justification.

This isn't a justification, it's simply a fact. You would be testing a reaction to code using the core apis that is intentionally broken, and not just broken in that it is using the core apis in an unintended way (which we SHOULD be programming defensively for) but for code that decides to write $x = $y / ($z-$z); which has never been something we have previously attempted to save the user from.

We shouldn't be catching errors, we should be programming defensively to make sure they don't occur.

Agreed :) But we can't control all code that all people write. And I think this very measure would fall under the umbrella of "programming defensively"? 😅🤓

I do agree that we can't control all the code that all people write. Drupal's policy has always been to use contributed modules at your own risk. Our job is to provide a rock-solid core experience, and a robust api, and tools that contrib maintainers and contrib contributors can use to monitor their code and have issues reported. If core is going to be responsible for all catchable errors that contrib code might possibly cause, then I think that's a bigger discussion, not something we pioneer in the migration system. If this is the policy we are going to go with then we need to start wrapping any hook or event or plugin calls or anywhere else that can call custom or contrib code with try/catch(\Throwable). If we decide to go that route as a community, so be it, but I just don't think it's right to try tarbitrarilyly do it here.

wim leers’s picture

Plenty of time to trap one error somewhere if that was what we were going to do.

This is false. Core committers have intentionally kept 8.8.x, 8.9.x and 9.0.x as similar as possible, to not make development unnecessarily complicated.

This isn't compelling to me. ANY plugin(/hook/controller/event handler/theme function) that executes code can crash an entire page load or any other process that drupal manages. This is not unique to migrate plugins, yet you propose a unique solution.

Migrations are long-running, HTML responses are not.

You would be testing a reaction to code using the core apis that is intentionally broken

I strongly disagree, because the migration system allows through the use of YAML files to combine lots of migration plugins (PHP code) together, and those may be combined in novel ways, leading to fatal errors due to hitherto unseen edge cases.

which has never been something we have previously attempted to save the user from.

Eh … this was true in the D7 and earlier days, but not anymore. We've specifically made it harder for people to shoot themselves in the foot, with stricter validation, stricter types, helpful exceptions, et cetera — rather than "oh PHP fatal, you just use a debugger and figure it out".


I'm a little bit shocked by your very strong reaction, to be honest.

How does this patch adversely affect you?

Your argument boils down to "developers should write perfect code". I agree! But we all make mistakes, and we all miss edge cases.

For those developers that do write perfect code, this patch will have zero effect. For those that work with imperfect code, this helps their migrations to not crash midway. I'm genuinely wondering what the harm is here?

larowlan’s picture

Code review below - we need to be really sure that we want to do this before going ahead, particularly because there's a breaking API change for anyone that has sub-classed MigrateExecutable and handles exceptions differently. Yes, there's the 1:1 rule, but this would definitely be 9.1 only because of a breaking change - and we'd have to be reallllly sure we wanted to do it. Personally, I've been in the scenario where this has caught me out whilst I'm working on developing a new source/process plugin - but if someone is in that scenario - they are comfortable interpreting PHP errors, and resetting the migration status.

So, tl;dr I don't feel strongly either way (sorry).

  1. +++ b/core/modules/migrate/src/MigrateExecutable.php
    @@ -187,9 +187,9 @@ public function import() {
    +    catch (\Exception | \Error $e) {
    
    @@ -217,6 +217,11 @@ public function import() {
    +      catch (\Exception | \Error $e) {
    
    @@ -244,7 +249,7 @@ public function import() {
    +        catch (\Exception | \Error $e) {
    
    @@ -267,10 +272,10 @@ public function import() {
    +      catch (\Exception | \Error $e) {
    
    @@ -436,19 +441,20 @@ public function saveMessage($message, $level = MigrationInterface::MESSAGE_ERROR
    +   * @param \Exception|\Error $exception
    

    sidenote: we could just use \Throwable here as it's a common ancestor of both of these?

  2. +++ b/core/modules/migrate/src/MigrateExecutable.php
    @@ -436,19 +441,20 @@ public function saveMessage($message, $level = MigrationInterface::MESSAGE_ERROR
    -  protected function handleException(\Exception $exception, $save = TRUE) {
    +  protected function handleException($exception, $save = TRUE) {
    

    this is an api change, and if we changed the typehint to \Throwable, it doesn't help - see https://3v4l.org/RlS0n and https://3v4l.org/rIOsE

heddn’s picture

Why does try/catch have to be added to the executable? Why couldn't it be added to the UI or batch form that is calling the executable? That's how we handled it in migrate_tools, where we've added a --continue-on-failure flag to do sorta what is proposed here.

Version: 9.1.x-dev » 9.2.x-dev

Drupal 9.1.0-alpha1 will be released the week of October 19, 2020, which means new developments and disruptive changes should now be targeted for the 9.2.x-dev branch. For more information see the Drupal 9 minor version schedule and the Allowed changes during the Drupal 9 release cycle.

wim leers’s picture

StatusFileSize
new6.38 KB

Thanks, @ridhimaabrol24 (#12) and @Vidushi Mehta (#14)!

#2969551: Migrate messages from caught exceptions need file and line details was committed and requires this patch to be rebased.

wim leers’s picture

StatusFileSize
new2.58 KB
new4.97 KB

#24:

diff --git a/core/modules/migrate/src/MigrateExecutable.php.rej b/core/modules/migrate/src/MigrateExecutable.php.rej
…

Obviously this didn't belong in here 😬


#21:

  1. 👍 Done.
  2. 😔— once again https://wimleers.com/talk/bc-evolvability-maintainability is relevant 😬

Attached is a rerolled #24 with #21 already addressed.

wim leers’s picture

#22: very, very, very good question! 🤔

The reason I put it in MigrateExecutable is because that's what is already catching \Exception and is logging that.

I think that in your reasoning, that should also be moved out of MigrateExecutable? Because why put one kind of exception handling in one place and another elsewhere?

Furthermore, there is one enormous limitation in the way that MigrateToolsCommands wraps the MigrateExecutable with its --continue-on-failure logic: it continues subsequent migration plugins, but it does not continue subsequent rows within the current migration plugin.

So the crucial difference is that this core patch allows source rows after the one that triggered a fatal error to still be imported, rather than skipping forward to the next migration plugin.

heddn’s picture

+++ b/core/modules/migrate/src/MigrateMessage.php
@@ -17,6 +17,7 @@ class MigrateMessage implements MigrateMessageInterface {
+    'critical' => RfcLogLevel::CRITICAL,

This might seem like a small thing, but adding an additional level of logging to the migrate message could be a BC issue. We've had issues before in migrate tools; I assume in migrate run as well. Could we remove the logging level to a follow-up so we don't bikeshed on it?

The rest of my concerns are pretty well addressed. I'm still a little concerned about continuing to run the migration once it hits errors. What if someone has logic built-in to their execution to handle them. But let's see what other have to say.

wim leers’s picture

This might seem like a small thing, but adding an additional level of logging to the migrate message could be a BC issue.

How could that cause a BC issue? It's not like we're inventing new log levels?

We've had issues before in migrate tools; I assume in migrate run as well. Could we remove the logging level to a follow-up so we don't bikeshed on it?

Happy to do that but I'd love to understand this first. I'd also love to understand what needs to happen to be able to introduce the sensible log level in a follow-up.

The rest of my concerns are pretty well addressed.

🥳

I'm still a little concerned about continuing to run the migration once it hits errors. What if someone has logic built-in to their execution to handle them.

What exactly is the concern? Can you share an example of such built-in logic? That'd probably help me understand your concern better and perhaps others too 😊

heddn’s picture

re #28.1: Drush uses different logging level and mapping. See https://gitlab.com/drupalspoons/migrate_tools/-/blob/8.x-5.x/src/Drush9L.... We also have similar logic added to migrate_upgrade and other migrate contrib modules.

re #28.2: If anything can be done, then it probably will be done in Drupal. I'm probably being overly cautious, but https://xkcd.com/1172/ seems appropriate.

quietone’s picture

Issue tags: -migrate-d7-d8

AFAIKT this issue is not specific to Drupal 7 sources, removing tag.

Version: 9.2.x-dev » 9.3.x-dev

Drupal 9.2.0-alpha1 will be released the week of May 3, 2021, which means new developments and disruptive changes should now be targeted for the 9.3.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

wim leers’s picture

Assigned: Unassigned » quietone

#29:
RE #28.1: Is that a reason why we can't have it in the migration system itself? I'm not sure what point you're making here? 🤔
RE #28.2: I'm still not sure what that "built in to their execution" could be? Do you mean in a process plugin? Do you mean in an overridden MigrateExecutable?

@quietone I think @heddn asked for your thoughts in #27, where he wrote:

But let's see what other have to say.
quietone’s picture

Assigned: quietone » Unassigned
StatusFileSize
new39.6 KB
new12.49 KB
new56.05 KB

The proposed resolution in the issue summary needs an update. It should emphasize that this is proposing a change in behaviour in MigrateExecutable and explain the pros and cons of it. Especially, what are the uses cases where continuing despite fatal errors is an improvement. It should also explain what happens to the state of the migration when a fatal error occurs.

What stands out for me are these comments, all of which I agree with.

heddn: "I'm still a little concerned about continuing to run the migration once it hits errors."
mikelutz: "If core is going to be responsible for all catchable errors that contrib code might possibly cause, then I think that's a bigger discussion"
larowlan: "Personally, I've been in the scenario where this has caught me out whilst I'm working on developing a new source/process plugin - but if someone is in that scenario - they are comfortable interpreting PHP errors, and resetting the migration status."

And I am not yet convinced that using the migrate message tables to log fatal errors is the right thing to do.

I then tested this when using the Migrate UI. I made a divide by zero error in the process plugin, FieldInstanceSettings, when the field was a term reference. Without the patch the Ajax error (screenshot below) error is displayed and the migration is halted. Looking in the log, the latest error in the divide by zero error so it is easy for the user to find. With the patch the migration continues on after the fatal errors and the success message is displayed. I looked at the log (/admin/reports) and, of course, the divide by zero error was not there. The user has no feedback that there was an error. So, at best, the patch needs work for that.

Now, we know that the error message is in migrate_message_d7_field_instance but how will the user of the UI find it? Migrate tables are not accessible via the UI and even if they use something like PhpMyadmin, they would have to search all the message tables for errors. And then even if they do find it only contains, 'divide by zero', which clearly lacks the information needed to track down the problem.

All of which means that I do not agree with this proposal.

Still, as chx often said, 'migrate is a special snowflake'. Perhaps the error could be caught so that state could be changed to indicate a fatal error has occurred instead of IMPORTING. However, I do not want a behaviour change, I would still expect the migration to be 'stuck' in this state until some action is taken to un-stick it. That would prevent an accidental actions on the migration, such as ''mim --update' or rollback. Even with that suggestion I am still concerned about setting a precedent of capturing errors and what others may think that means for other parts of core.


Without the patch


Watchdog message without the patch

With the patch

Logged message in message table with the patch. There is no watchdog message.
1 eef7ce1a543f5239383e7a7e256d5d4b78706b000306068229... 4 Division by zero

anybody’s picture

Status: Needs review » Needs work

As of #33

quietone’s picture

Issue summary: View changes
Status: Needs work » Needs review

@Anybody, thanks. I left this at Needs Review because there is not an agreed upon path forward so working on the patch isn't needed now.

Let's see if I can make the clearer in the IS.

edurenye’s picture

StatusFileSize
new5.01 KB

The patch does no longer apply in 9.3.x, so I re-rolled it.

quietone’s picture

Issue summary: View changes
Status: Needs review » Closed (won't fix)

This was brought up at #3247150: [meeting] Migrate Meeting 2021-11-04 2100Z and the maintainer present, benjifisher, mikelutz, and quietone, agree to close this as won't fix after the meeting, assuming no further comments. Since there hasn't been I am closing this.

See the Remaining Tasks section in the Issue Summary for links to the relevant comments.

quietone’s picture

Issue summary: View changes

Add link to an earlier migrate meeting where this was discusses.

wim leers’s picture