Problem/Motivation

taxonomy_post_update_make_taxonomy_term_revisionable() fails if a term name is null. This isn't enforced by the 8.6.x schema and it looks like potentially devel generate can create terms like this (going from reports below, not reproduced).

We should either default to an empty string, or do the same thing we did on #3052147: comment_update_8701 fails if there are comments without field_name and prevent the update if this condition is found.

From #3052464: Cannot update to 8.7.0 because of taxonomy_post_update_make_taxonomy_term_revisionable OP

Update started: taxonomy_post_update_make_taxonomy_term_revisionable
>  [error]  Drupal\Core\Database\IntegrityConstraintViolationException: SQLSTATE[23000]: Integrity constraint violation: 1048 Column 'name' cannot be null: INSERT INTO {tmp_d305d9taxonomy_term_field_data} (tid, revision_id, vid, langcode, name, description__value, description__format, weight, changed, default_langcode, status, revision_translation_affected) VALUES (:db_insert_placeholder_0, :db_insert_placeholder_1, :db_insert_placeholder_2, :db_insert_placeholder_3, :db_insert_placeholder_4, :db_insert_placeholder_5, :db_insert_placeholder_6, :db_insert_placeholder_7, :db_insert_placeholder_8, :db_insert_placeholder_9, :db_insert_placeholder_10, :db_insert_placeholder_11); Array
> (
>     [:db_insert_placeholder_0] => 1
>     [:db_insert_placeholder_1] => 1
>     [:db_insert_placeholder_2] => tags
>     [:db_insert_placeholder_3] => tr
>     [:db_insert_placeholder_4] =>
>     [:db_insert_placeholder_5] =>
>     [:db_insert_placeholder_6] =>
>     [:db_insert_placeholder_7] =>
>     [:db_insert_placeholder_8] =>
>     [:db_insert_placeholder_9] =>
>     [:db_insert_placeholder_10] =>
>     [:db_insert_placeholder_11] => 1
> )
>  in Drupal\Core\Database\Connection->handleQueryException() (line 689 of /var/www/drupal8/web/core/lib/Drupal/Core/Database/Connection.php).
>  [error]  The entity update process failed while processing the entity type taxonomy_term, ID: 1.
>  [error]  Update failed: taxonomy_post_update_make_taxonomy_term_revisionable
 [error]  Update aborted by: taxonomy_post_update_make_taxonomy_term_revisionable
 [error]  Finished performing updates.

And from #3052464-53: Cannot update to 8.7.0 because of taxonomy_post_update_make_taxonomy_term_revisionable

I had this problem and try infinite ways to solve it. I try updating core first and the problem appear. I revert to 8.6.1 and the error was not there. I updated everything I could before retrying updating to 8.7.1. I even learn and did a composer installation and the error still appear.

The I read the #13 comment and check my db and found some Devel Generated Taxonomy terms. I couldn't delete it via phpMyAdmin because everything broke. So I did a fresh reinstall of 8.6.1 version and created a view for taxonomy terms showing ID.

I found the two terms with id but without name. I try to delete the via Views Bulks Operation but it didn't react to delete operation. So I went directly to the term via url and I after I open the term, that was completely empty, I delete them inside edit.

After that I try the core update without any problem, event though the db update gave a message, it was purely informative. The taxonomy terms where now revisionable.

And schema from an 8.6.x database:

MariaDB [aftlms]> DESCRIBE taxonomy_term_field_data;
+---------------------+------------------+------+-----+---------+-------+
| Field               | Type             | Null | Key | Default | Extra |
+---------------------+------------------+------+-----+---------+-------+
| tid                 | int(10) unsigned | NO   | PRI | NULL    |       |
| vid                 | varchar(32)      | NO   | MUL | NULL    |       |
| langcode            | varchar(12)      | NO   | PRI | NULL    |       |
| name                | varchar(255)     | NO   | MUL | NULL    |       |
| description__value  | longtext         | YES  |     | NULL    |       |
| description__format | varchar(255)     | YES  |     | NULL    |       |
| weight              | int(11)          | NO   |     | NULL    |       |
| changed             | int(11)          | YES  |     | NULL    |       |
| default_langcode    | tinyint(4)       | NO   |     | NULL    |       |
| status              | tinyint(4)       | NO   | MUL | NULL    |       |
+---------------------+------------------+------+-----+---------+-------+

Proposed resolution

Remaining tasks

User interface changes

API changes

Data model changes

Release notes snippet

Menu links or taxonomy term with no default translation are now automatically fixed before making the entity type revisionable. See the change notice for more details on how to check for possible side-effects.

CommentFileSizeAuthor
#65 taxonomy-revisionable_update_data_fix-3056543-65.patch23.82 KBplach
#56 interdiff-52-56.txt1.28 KBjungle
#56 3056543-56.patch23.94 KBjungle
#55 interdiff-52-55.patch1.28 KBjungle
#55 3056543-55.patch23.94 KBjungle
#52 taxonomy-revisionable_update_data_fix-3056543-52.patch23.95 KBplach
#52 taxonomy-revisionable_update_data_fix-3056543-52.interdiff.txt722 bytesplach
#51 taxonomy-revisionable_update_data_fix-3056543-51.patch23.95 KBplach
#51 taxonomy-revisionable_update_data_fix-3056543-51.interdiff.txt21.58 KBplach
#49 taxonomy-revisionable_update_data_fix-3056543-49.patch21.32 KBplach
#49 taxonomy-revisionable_update_data_fix-3056543-49.interdiff.txt750 bytesplach
#48 taxonomy-revisionable_update_data_fix-3056543-48.patch21.06 KBplach
#48 taxonomy-revisionable_update_data_fix-3056543-48.interdiff.txt21.11 KBplach
#43 taxonomy-revisionable_update_data_fix-3056543-43.patch24.22 KBplach
#43 taxonomy-revisionable_update_data_fix-3056543-43.test.patch12.38 KBplach
#43 taxonomy-revisionable_update_data_fix-3056543-43.interdiff.txt2.74 KBplach
#42 taxonomy-revisionable_update_data_fix-3056543-42.patch24.24 KBplach
#42 taxonomy-revisionable_update_data_fix-3056543-42.interdiff.txt9.11 KBplach
#38 taxonomy-revisionable_update_data_fix-3056543-38.interdiff.txt3.91 KBplach
#38 taxonomy-revisionable_update_data_fix-3056543-38.patch22.61 KBplach
#8 term-name-null-test-3056543-8.patch1.95 KBBerdir
#30 taxonomy-revisionable_update_data_fix-3056543-30.patch11.4 KBplach
#8 term-name-null-3056543-8.patch3.01 KBBerdir
#15 taxonomy-revisionable_update_fix-3056543-15.patch13.62 KBplach
#14 taxonomy-revisionable_update_fix-3056543-14.test.patch10.35 KBplach
#15 taxonomy-revisionable_update_fix-3056543-15.interdiff.txt844 bytesplach
#14 taxonomy-revisionable_update_fix-3056543-14.patch13.61 KBplach
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

catch created an issue. See original summary.

catch’s picture

Issue summary: View changes
catch’s picture

Title: taxonomy_post_update_make_taxonomy_term_revisionable() fails when term name is NULL » taxonomy_post_update_make_taxonomy_term_revisionable() fails when term name or status is NULL
catch’s picture

Issue tags: +Drupal 8 upgrade path
catch’s picture

Issue tags: +WI critical

Version: 8.8.x-dev » 8.9.x-dev

Drupal 8.8.0-alpha1 will be released the week of October 14th, 2019, which means new developments and disruptive changes should now be targeted against the 8.9.x-dev branch. (Any changes to 8.9.x will also be committed to 9.0.x in preparation for Drupal 9’s release, but some changes like significant feature additions will be deferred to 9.1.x.). For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

subson’s picture

I ran into similar issue when upgrading from 8.6.16 to 8.7.8.
I found out that there were 3 fields on a vocabulary which were causing the issues in this function preUpdateEntityTypeSchema() in SqlContentEntityStorageSchema.php on line 468:
$temporary_table_names = array_combine($sandbox['new_table_mapping']->getTableNames(), $sandbox['temporary_table_mapping']->getTableNames());

The tables from $sandbox['new_table_mapping']->getTableNames() (argument 1) should match with $sandbox['temporary_table_mapping']->getTableNames()(argument 2) which was different in our case, So I need to figure out for which tables temporary_table_mapping is not able to generate revision tables. we didn't have any content for those fields so we deleted the fields, but if you have content in the field tables then you need to figure out why revision tables are not generated.

Berdir’s picture

> This isn't enforced by the 8.6.x schema and it looks like potentially devel generate can create terms like this (going from reports below, not reproduced).

It is, though. Trying to create a term with a raw insert query based on the 8.0 filled dumps results in:

Drupal\Core\Database\IntegrityConstraintViolationException : SQLSTATE[23000]: Integrity constraint violation: 19 NOT NULL constraint failed: taxonomy_term_field_data.name: INSERT INTO {taxonomy_term_field_data} (tid, vid, langcode, name, weight, changed) VALUES (?, ?, ?, ?, ?, ?); Array
(
    [0] => 999
    [1] => tags
    [2] => en
    [3] => 
    [4] => 0
    [5] => 1579555999
)

Same with status really, although maybe combined with updating from a broken version when updating and coming from content_translation_status, you *might* have managed to end up with NULL. But I thought we learned our lesson from the block_content update and had considerably improved the initial value handling.

But actually, #3052464-107: Cannot update to 8.7.0 because of taxonomy_post_update_make_taxonomy_term_revisionable gave me a hint now at what might be going wrong. That reports that it can be fixed by setting default_language to 1. If I set my example data to default_language = 0 for single available translation, which obviously is not a valid situation then I can get it to fail on the update because it just doesn't know how to handle that data.

Basically, the data is then loaded in the wrong language key in \Drupal\Core\Entity\Sql\SqlContentEntityStorage::loadFromSharedTables(), I tried to fix that directly in there, but it fails weirdly in the update. Posting what I have so far, I'm very done for tonight.

The last submitted patch, 8: term-name-null-test-3056543-8.patch, failed testing. View results
- codesniffer_fixes.patch Interdiff of automated coding standards fixes only.

catch’s picture

Title: taxonomy_post_update_make_taxonomy_term_revisionable() fails when term name or status is NULL » taxonomy_post_update_make_taxonomy_term_revisionable() fails when terms have no default translation

. If I set my example data to default_language = 0 for single available translation, which obviously is not a valid situation then I can get it to fail on the update because it just doesn't know how to handle that data.

Basically, the data is then loaded in the wrong language key in

OK so next steps here seem to be:

1. For single-language entities with no default translation, we should be able to write a select to find them, and a select to update them. Seems like something we add to the very beginning of the update function. Although we might want a spin-off to figure out if it's still possible for 8.8+ sites to get into this state in the first place too.

2. For multiple-language entities with no default translation, I don't think we can make a decision on behalf of the site, so we should probably do another hook_requirements('update') telling people what's wrong so they can investigate it themselves.

Berdir’s picture

Hm, I don't quite agree with that :)

The default_langcode column can be reliable derived, we have the default langcode also in the base field table, that's what my fix is doing now transparently on entity load.

That works for single and multi-language entities just fine, it would actually be harder to figure out if an entity has one or multiple translations because we need to see if there are multiple rows in the data table and if none of them have default_langcode set.

So a query to find them would be select b.nid,b.langcode from node b inner join node_field_data d on b.nid = d.nid and b.langcode = d.langcode and d.default_langcode = 0;. And you can do the same for the node_revision/node_field_revision tables. For mysql, you could probably even do that in an update query, but we don't support updates with joins in our API.

So an update function could loop over all translatable entity types with the default storage, run that query until we have no results left, and fix the default_langcode field for the ones where it doesn't match. We could also detect those records that have it incorrectly set on the wrong translation in theory.

That said, I still don't really understand how you can get in this situation in the first place, and I'm a bit wary of just fixing up the data without understanding the root cause, but in my opinion this shouldn't be possible through the API. *Maaaybe* with some weird migrations? A one-time data fix will not prevent it from happening again.

catch’s picture

The default_langcode column can be reliable derived, we have the default langcode also in the base field table,

Oh good point...

That said, I still don't really understand how you can get in this situation in the first place, and I'm a bit wary of just fixing up the data without understanding the root cause, but in my opinion this shouldn't be possible through the API. *Maaaybe* with some weird migrations? A one-time data fix will not prevent it from happening again.

I think we should specifically fix it for taxonomy terms in taxonomy_post_update_make_taxonomy_term_revisionable() (and maybe menu links in case it's also affecting #3052318: Update from 8.6 to 8.7 fails due to corrupt menu_link_content or taxonomy_term entity data), but not do a blanket update for everything. This doesn't stop even those sites from getting into a similar situation again but it will allow them to update to a supported release of Drupal core.

We can open a follow-up to decide whether to do a one-time fix for everything else and to diagnose how people got into this state.

DamienMcKenna’s picture

Tagging as a requirement for Drupal 9.0-beta1.

plach’s picture

I had a few discussions about this with @Berdir, @amateescu and @catch.

TL;DR: We did not reach consensus on how to handle corrupt data yet.

We laid out a few possible solutions but it seems the main problem is that we do not agree on the "constraints " a valid solution should respect:

  • @Berdir and @catch are ok with manipulating user data: either manually fixing corrupt records before running the update (@catch) or as a consequence of an entity save operation following a "fix" on load (@Berdir's patch).
  • I'm not comfortable with manipulating user data but I am ok with optionally "leaving it behind": I suggested to add an opt-in update mode that would catch exceptions thrown while writing entity data to the new storage (more on this below).
  • @amateescu thinks user data should be all manually fixed before running the update and suggests implementing a "dry-run" mode that would allow people to attempt the update as many times as needed to fix everything manually.

Here are the reasons behind my proposal:

  • I'm not comfortable with the idea of manipulating user data: any change to it could alter the site behavior in an unexpected way, even if data can be reliably restored. If it was previously being filtered out because of joins or filters on default_langcode = 1 (which is a quite common filter), people might rely on it not showing up, which would be problematic in case of sensitive data, for instance.
  • OTOH not copying over broken data would still give people the opportunity to manually recover it from backup tables, without forcing them not to update until every single case is tracked down/resolved.
  • Removing broken data is exactly the solution users tried to make updates complete both here and over at #3052318-100: Update from 8.6 to 8.7 fails due to corrupt menu_link_content or taxonomy_term entity data. AAMOF this proposal would allow to "address" not only this issue, but any corrupt data issue, including menu link ones. It seems that non-technical site owners could deem losing some broken data worth compared to being stuck with an unsupported Drupal version.

As a side note, the "skip broken data" mode would be compatible with the "dry-run" mode @amateescu is suggesting: people wishing to fix everything before running the update would simply not enable it and instead use a dry run to identify all broken data and fix it however they see fit.

For these reasons I'm posting a patch implementing my solution. Happy to go other ways if consensus arises around one of the other solutions, but at least we have something else to evaluate besides #8.

plach’s picture

plach’s picture

plach’s picture

The last submitted patch, 14: taxonomy-revisionable_update_fix-3056543-14.test.patch, failed testing. View results
- codesniffer_fixes.patch Interdiff of automated coding standards fixes only.

The last submitted patch, 14: taxonomy-revisionable_update_fix-3056543-14.patch, failed testing. View results
- codesniffer_fixes.patch Interdiff of automated coding standards fixes only.

amateescu’s picture

Discussed this issue with @plach today, and my opinion is that skipping entities that couldn't to be written to the new storage is very risky, because no matter how many warnings we display at the end of the update process, there will be cases when people will ignore them and potentially end up with lost data.

IMO, the easiest and non-controversial solution that can be implemented quickly is a "data integrity checker" system, which would check various things like:

- field storages that are marked as required in the schema but have NULL values in the database (we already introduced something like this for comments in #3052147: comment_update_8701 fails if there are comments without field_name)
- entities which don't have a default translation
- etc.

This system would work very much like the entity definition change list (EntityDefinitionUpdateManager::getChangeSummary() and ::getChangeList()), and we would show them in the status report page, as well as raising update requirement errors if there any problems found.

catch’s picture

OK trying to summarise thoughts.

Summary: I think in this issue, we should implement the same approach we took in #3052147: comment_update_8701 fails if there are comments without field_name - a hook_requirements('update') that looks for data that can't be updated, informs the site owner they can't run updates, and points them to a change record for instructions on how to continue. More or less the same as @amateescu's position.

That change record can then inform people various ways to find, fix and/or remove the corrupt data on their sites, prior to attempting the update again.

This is not great UX for the site admin, but it is the most data friendly option, for potentially irreplaceable data that may have been added by end users.

Not summary:
It also seems to me though that in the case of one language and no default language, we do actually have a reliable query that we can run that would fix those taxonomy terms. @plach is right that this could result in 'missing' content showing up in places that it didn't, but that would also be the bug we're fixing. I think we could tackle that approach in a (possibly still critical, but not 9.x-blocking) follow-up.

Even if we decided to go with trying to automatically fix some data, we'd still have unresolvable cases where there are multi-language terms with no default language, so we'd need the hook_requirements() for those anyway, and a similar change record all of which we'd be adding here.

The current patch by @plach is interesting, but I don't think we can do that unless we had some kind of interactive update mode like
option 1 try to fix data
option 2, run updates and back-up data
option 3, fix data yourself then reload this form.
- but that requires infrastructure we don't have, not even sure we have an issue for that kind of interactive update stuff yet although it has been discussed elsewhere.

IMO, the easiest and non-controversial solution that can be implemented quickly is a "data integrity checker" system, which would check various things like

This is a good idea, but we want to backport this patch to 8.8.x, so would also move the data integrity system to a follow-up and we could refactor the hook_requirements() in 8.9/9.0 to use it as part of that issue.

Berdir’s picture

@catch: On single vs. multiple translations, we discussed that before in #10/#11, we can reliable figure out the correct default_langcode value also for translated entities. But I'm fine with that being done in a separate issue.

I'm still not exactly sure how to move forward here then. IMHO, the only thing that we could check in advance is the known default_langcode problem, but if we're going to hopefully fix that then I'm not sure what we end up with here. There could be other data integrity issues, but we don't even really know what they could be, so how could we detect for that apart from doing a full-blown dry-run of resaving all terms, which is a very time-consuming operation, I'm not sure where and how we'd do that.

Pretty much the only idea I still have is to improve the error that we display, point to a documentation page where we collect information on known issues and how to fix them manually. Still the user would have a half-updated database and would need to restore a backup?

The generic data-integrity-check-and-fix thing is interesting, but I don't really see how that would work for core. We wouldn't only need to backport it to 8.8, we would need to backport to 8.7, 8.6, 8.5, to help those stuck on those versions? Seems like it would make more sense to create that as a contrib module that these sites can install and fix their installation before updating to 8.8+.

catch’s picture

I don't think we need to account for unknown data integrity issues, if we've identified one and can reliably warn people about it, then that is fine. If people find additional issues, they can open additional bug reports but at least we'll know it's not for this one then.

I did forget it was possible to figure out the default for multiple language terms too, that means potentially doing work here we'd need to undo later, but most keen on getting something into 8.8/8.9 which is an improvement on the current state and the hook_update_N() should not make things worse for people than they already are.

catch’s picture

Another option, might be worse than all the others but just in case:

Fix the data, unpublish the terms, tell people what we did and add links.

No information disclosure but means hiding things which presumably show up somewhere?

catch’s picture

Having written #24, if we actually log and warn people what we updated, would that address @plach's and @amateescu's concern?

1. Find all the terms with broken data and update them, make a note of the tids.

2. In the hook_update_N() output (and log messages) link to the term and/or term edit pages for terms that have been fixed.

plach’s picture

I can live with #25 :)

catch’s picture

Status: Needs review » Needs work

OK let's try that, moving to needs work.

plach’s picture

I will likely work on this tomorrow, if none beats me to it.

plach’s picture

Assigned: Unassigned » plach

On this

plach’s picture

Here's a first attempt at implementing #25. If it looks good I will work on adding the menu link equivalent.

catch’s picture

  1. +++ b/core/modules/taxonomy/taxonomy.post_update.php
    @@ -231,7 +242,97 @@ function taxonomy_post_update_make_taxonomy_term_revisionable(&$sandbox) {
    +    $count = $sandbox['data_fix']['default_langcode']['processed'];
    +    // @todo Simplify with https://www.drupal.org/node/2548095
    +    $base_url = str_replace('/update.php', '', \Drupal::request()->getBaseUrl());
    +    $args = [
    +      ':url' => Url::fromRoute('dblog.overview', [], ['query' => ['type' => ['update'], 'severity' => [RfcLogLevel::WARNING]]])
    +        ->setOption('base_url', $base_url)
    +        ->toString(TRUE)
    +        ->getGeneratedUrl(),
    +    ];
    +    return new PluralTranslatableMarkup($count, 'Taxonomy terms have been converted to be revisionable. One term with data integrity issues was restored. More details have been <a href=":url">logged</a>.', 'Taxonomy terms have been converted to be revisionable. @count terms with data integrity issues were restored. More details have been <a href=":url">logged</a>.', $args);
    +  }
    

    This needs to check moduleExists('dblog'). When it's not there, we can probably just have the same message without a link.

  2. +++ b/core/modules/taxonomy/taxonomy.post_update.php
    @@ -231,7 +242,97 @@ function taxonomy_post_update_make_taxonomy_term_revisionable(&$sandbox) {
    +
    +    \Drupal::logger('update')
    +      ->warning('The term with ID @id had data integrity issues and was restored.', ['@id' => $record->id]);
    +  }
    +
    

    Would be nice to link to the term here if we could, but is that getting risky to do during an update?

xjm’s picture

Title: taxonomy_post_update_make_taxonomy_term_revisionable() fails when terms have no default translation » taxonomy_post_update_make_taxonomy_term_revisionable() and the menu link content equivalent fail when entities have no default translation

Expanding the scope of the title.

catch’s picture

The sqlite and posgres fails are real, for example:

1) Drupal\Tests\aggregator\Functional\Update\AggregatorUpdateTest::testSourceFeedRequired
The update failed with the following message: "Failed: Drupal\Core\Database\DatabaseExceptionWrapper: SQLSTATE[42703]: Undefined column: 7 ERROR: column "id" does not exist LINE 7: HAVING id IS NOT NULL AND default_langcode = 0 ^: SELECT MIN(b.tid) AS id, MIN(b.langcode) as langcode, SUM(d.default_langcode) AS default_langcode FROM {taxonomy_term_field_data} d LEFT JOIN {taxonomy_term_data} b ON d.tid = b.tid AND d.langcode = b.langcode AND d.default_langcode = 0 WHERE d.tid > :id GROUP BY d.tid HAVING id IS NOT NULL AND default_langcode = 0 ORDER BY id LIMIT 1000 OFFSET 0; Array ( [:id] => 0 ) in _taxonomy_post_update_make_taxonomy_term_revisionable__fix_default_langcode() (line 316 of /var/www/html/core/modules/taxonomy/taxonomy.post_update.php)."
plach’s picture

Yep, I'll look into them

plach’s picture

Would be nice to link to the term here if we could, but is that getting risky to do during an update?

That would require a term load, I was trying to keep the process as lightweight as possible.

catch’s picture

That would require a term load, I was trying to keep the process as lightweight as possible.

I think that's the right decision to just put the ID. Enough information to be able to check things, without adding risk to the update itself.

xjm’s picture

I agree with #36. We can mention the IDs and give them the instructions for fixing them in the CR.

Status: Needs review » Needs work
catch’s picture

+++ b/sites/default/default.settings.php
+++ b/sites/default/default.settings.php
@@ -76,7 +76,7 @@

@@ -76,7 +76,7 @@
  * specific needs.
  *
  * @code
- * $databases['default']['default'] = [
+ * $databases['default']['default'] = array (
  *   'database' => 'databasename',
  *   'username' => 'sqlusername',
  *   'password' => 'sqlpassword',

Looks like test failure is due to this patch cruft.

plach’s picture

Yep, I had some unwanted changes around due to branch switching and fs permissions. I'll get rid of those in the next patch.

plach’s picture

This gets rid of the cruft that caused the last failure in #38 and adds support for custom menu links.

Unassigning in case someone else wishes to work on this while I'm asleep ;)

Berdir’s picture

+++ b/core/modules/menu_link_content/menu_link_content.post_update.php
@@ -99,5 +110,101 @@ function menu_link_content_post_update_make_menu_link_content_revisionable(&$san
+  $query->addExpression("MIN(b.{$id_key})", 'id');
+  $query->addExpression("MIN(b.{$langcode_key})", 'langcode');

I tend to call that pseudo-generic code, using the variable names here is imho longer and more complicated than just hardcoding id, especially since it's hardcoded just a few lines above anyway. Don't really care that much though.

Are we sure that using an alias with the same name as the column isn't going to cause an issue on some database or so?

I wonder if it's really faster to put both things (it being 0 for the default langcode and no other row having 1) in the same query, requiring group by/having as opposed to checking that when we update them as an additional query. I'm assuming that even sites that are affected by this only have a handful of rows affected by this, but maybe that's not true.

+++ b/core/modules/menu_link_content/menu_link_content.post_update.php
@@ -6,12 +6,23 @@
  */
 function menu_link_content_post_update_make_menu_link_content_revisionable(&$sandbox) {
+  $finished = _menu_link_content_post_update_make_menu_link_content_revisionable__fix_default_langcode($sandbox);
+  if (!$finished) {
+    $sandbox['#finished'] = 0;
+    return NULL;
+  }
+

Wouldn't it be easier to make this a regular update function? That's guaranteed to be called first then, and we're only doing raw database stuff, which should make it a safe operation?

xjm’s picture

Are we sure that using an alias with the same name as the column isn't going to cause an issue on some database or so?

This is a good question. I queued a few more.

plach’s picture

Assigned: Unassigned » plach

Working on this again

catch’s picture

  1. +++ b/core/modules/menu_link_content/menu_link_content.install
    @@ -64,3 +70,89 @@ function menu_link_content_update_8601() {
     }
    +
    +/**
    + * Fix recoverable data integrity issues in the "default_langcode" field.
    + */
    +function menu_link_content_update_8701(array &$sandbox) {
    +  $storage = \Drupal::entityTypeManager()->getStorage('menu_link_content');
    +
    

    +1 to moving this to a hook_update_N() in terms of how it simplifies the code, but see below for how it complicates other things*

  2. +++ b/core/modules/menu_link_content/menu_link_content.install
    @@ -64,3 +70,89 @@ function menu_link_content_update_8601() {
    +  // having the "default_langcode" flag set to with 0, which is not supported.
    +  $query = $database->select($data_table_name, 'd');
    +  $query->leftJoin('menu_link_content', 'b', 'd.id = b.id AND d.langcode = b.langcode AND d.default_langcode = 0');
    +  $result = $query->fields('d', ['id', 'langcode'])
    +    ->condition('d.id', $last_id, '>')
    +    ->isNotNull('b.id')
    +    ->orderBy('d.id')
    +    ->range(0, $limit)
    +    ->execute();
    +
    

    The query is much, much easier to read now, good change.

*There are one or possibly two problems with moving the post update logic to hook_update_N(), which is that we are going to run into #3108658: [policy] Handling update path divergence between 11.x and 10.x.

We have three types of sites:

1. Sites updating to from <=8.7 which have neither run the hook_update_N() nor the post update, for these sites the hook_update_N() is great.

2. Sites updating from 8.8.x to 8.8.NEXT - these sites would run the hook_update_N(), and they've already run the post update. Maybe this is fine (although the query would be run agains the new schema instead of the old one)? If it's not fine, we could check key value post updates existing updates to see if the update has already run and return early.

3. Sites updating from 8.8.0 to 9.0.x will be skipping this update. Sites that install a new 9.0.0 site won't have it picked up for schema version. Now they don't need it as such, but it leaves them with an inconsistent schema version compared to sites that update to 8.8.4-ish first. If we were to increase taxonomy_update_last_removed() those 8.8.0 updaters will get an error which we also don't want. So... I think the only option would be to add the hook_update_N() to Drupal 9, emptied out, so that they get recorded without actually doing anything. Then every database is in sync.

For #3 we would just need a very simple 9.x patch to add the empty updates, I am not sure yet about #2 what the best course of action is.

plach’s picture

Myself, from a Slack thread with @catch and @Berdir:

- I think the only possible concern is the schema inconsistency mentioned in #3
- @catch happy to revert to a single post update, if that’s preferable
- on a second thought, it’s not ok if this update runs on a revisionable schema because revision tables would not be cleaned up
- so, either we make it support also revision tables or we bundle it back into the post-update IMO
- OTOH if the update run successfully before this, there’s likely nothing to clean up
- :bomb-boom:
- +1 on bundling back into the post update to avoid introducing real mess for what is likely a very minor use case
catch’s picture

So I wasn't necessarily suggesting #51 when I wrote #50, but... yes putting the genie back in the bottle seems like overall the least pain here. The query changes still look like a good simplification though.

Berdir’s picture

Only have a nitpick on the assert method, looks pretty good to me now. The query makes sense to me, and going back to the "nested post update" approach is tricky but avoids the problems that were pointed out in #50 as discussed.

  1. +++ b/core/modules/menu_link_content/tests/src/Functional/Update/MenuLinkContentUpdateTest.php
    @@ -101,6 +120,35 @@ public function testConversionToRevisionable() {
    +    $this->assertIdentical($title ?: '', $expected_title);
    

    Should use assertSame with reversed arguments.

  2. +++ b/core/modules/taxonomy/tests/src/Functional/Update/TaxonomyTermUpdatePathTest.php
    @@ -207,6 +226,34 @@ public function testConversionToRevisionable() {
    +    $this->assertIdentical($name ?: '', $expected_name);
    

    same.

jungle’s picture

jungle’s picture

FileSize
23.94 KB
1.28 KB

Sorry, wrong interdiff file extension at #55, Tests cancelled. upload again.

plach’s picture

Interdiff looks good to me, thanks!

RTBC +1

Berdir’s picture

Status: Needs review » Reviewed & tested by the community

Ok, lets get this done then.

Hopefully this really is the issue that people have been struggling with in regards to those updates and not something else, so far I think nobody with an actually affected site confirmed or disproved that. This is the only possibly issue we identified so far. I suspect that many who were active enough to post in the issue found some manual workaround by now, but there might be those who are following the issue and are waiting on this to be resolved.

I was involved in identifying what the problem seems to be and posted some patches here, but that was a completely different approach, so I think it's OK if I RTBC it. AFAIK @catch also didn't have any other remarks left that would block this.

catch’s picture

This is the only possible issue we identified so far.

It's actually one of three issues fixing the same upgrade paths.

There is #3052318: Update from 8.6 to 8.7 fails due to corrupt menu_link_content or taxonomy_term entity data, which we don't have confirmed reports of for taxonomy terms, but we did have several for menu links, and like this issue it's equally applicable to both in theory.

And #3056539: Updating an entity type from non-revisionable to revisionable fails if it has non-revisionable fields stored in dedicated tables which I think will only affect sites with additional field definitions, but accounted for at least one report on the original issue.

At least between the three, if we do get more reports, we've narrowed down the possibilities considerably so can hopefully improve the quality of bug reports we get if they keep coming in.

Apart from there being yet another cause of fatal errors in the same upgrade paths, the only other thing that concerns me is if there's an underlying bug in either core or a popular contrib module causing either of the two identified data integrity issues in the first place, but I don't see us being able to diagnose that without an actual site experiencing it and debugging. It's been extremely difficult to identify what we have so far from just error messages.

Not committing right this second, but will do a bit later, patch looks great.

  • catch committed 28c19ef on 8.9.x
    Issue #3056543 by plach, jungle, Berdir, catch, xjm, amateescu:...

  • catch committed a9f1812 on 8.8.x
    Issue #3056543 by plach, jungle, Berdir, catch, xjm, amateescu:...
catch’s picture

Version: 8.9.x-dev » 8.7.x-dev
Status: Reviewed & tested by the community » Patch (to be ported)

Committed c1d3332 and pushed to 8.9.x. Thanks!

Cherry-picked to 8.8.x

This could also be backported to 8.7.x, but the commit doesn't cherry-pick cleanly, so will need a re-roll if we want to do that.

plach’s picture

Issue summary: View changes
Issue tags: +8.9.0 release notes

Added a dedicated CR for this and a release notes snippet.

plach’s picture

Working on the backport

plach’s picture

Assigned: plach » Unassigned
Status: Patch (to be ported) » Needs review
FileSize
23.82 KB

Here it is

plach’s picture

Status: Needs review » Reviewed & tested by the community

Green, back to RTBC

  • catch committed 338bcf9 on 8.7.x
    Issue #3056543 by plach, jungle, Berdir, catch, xjm, amateescu:...
catch’s picture

Status: Reviewed & tested by the community » Fixed

Committed 338bcf9 and pushed to 8.7.x. Thanks!

xjm’s picture

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.