Problem/Motivation

Wrong langcode in node table due to wrong Query in d7_node_complete Source Plugin.

Steps to reproduce

  1. Needs Drupal 7 site with:
    • entity translation enabled
    • language enabled: en, de, it
    • at least one existing node with original language English and translations in Italian and German
  2. Needs Drupal 9 site with translation enabled.
  3. Enable modules (migrate_plus, migrate, migrate_tools, migrate_drupal, migrate_drupal_ui, drupal_upgrade) in Drupal 9 site
  4. Make sure a proper Drupal 7 DB configuration should be exists in settings.php file.
  5. Run migrations.

The node_field_data.langcode and node_field_data.content_translation_source fields have the correct value.
The node.langcode field has the wrong value set to "de" (instead of "en")

Proposed resolution

Change the orderBy and use etr.source instead of etr.language so that the first content created is that with the original language

Remaining tasks

In the next comment the proposed patch

Issue fork drupal-3299938

Command icon Show commands

Start within a Git clone of the project using the version control instructions.

Or, if you do not have SSH keys set up on git.drupalcode.org:

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

robertom created an issue. See original summary.

robertom’s picture

Status: Active » Needs review
Issue tags: +migrate-d7-d9, +migrate-d7-d8
FileSize
579 bytes

Attached the proposed patch

Status: Needs review » Needs work

The last submitted patch, 2: 3299938-2.patch, failed testing. View results

robertom’s picture

I have cases where the creation date does not reflect the reality of the original language (I think due to the language change after creating some content).

Sorting by the source field also solves these cases, because the original language will always have the source field empty and, therefore, shown first.

Attached the new patch

robertom’s picture

Status: Needs work » Needs review
robertom’s picture

Issue summary: View changes
mikelutz’s picture

Status: Needs review » Needs work
Issue tags: -migrate-d7-d9, -migrate-d7-d8 +Needs tests

I've gone back and forth on this. One big issue here is that we only have one text fixture that we use for migration tests in core, and while we can manipulate it slightly for specific tests, it generally packs as many examples of migration complexities as it can in there, so we can test the most complicated cases. The drawback to this approach is that sometimes less complicated source databases will have different results that we don't test or code for, and I'm thinking that might be the issue here.

Here's a ramble of all the thoughts that when through my head over the last 24 hours thinking about this. I rebuilt my d7 install off the test fixture to try to dig into this to make sure I wasn't missing anything.

One of the things that threw me here was your comment about the creation date not matching the original language. I'm not sure how that happened, but to my knowledge, and as far as I could tell trying on my d7 install, You can't change the original language of a node via the UI. if this was done via a script or some custom workaround, then it's not going to be something we can compensate for in the core migrations, and I nearly closed this as won't fix thinking that that was your ultimate issue.

It bugged me a bit that we are sorting by langcode here, and in the test fixture, the original language of the nodes we test is english, and the translations are french and icelandic, which means we are always getting the original language first for the revisions, and I wondered if that was an issue.

But near as I could tell, that shouldn't matter. The language in the node table doesn't ever change, it's always the first language created for a node, and in our fixture and tests, that's always going to be right, because the first revision for any node in this system will only have one language: the original. For a full revisions+translations migration, like node complete is tested and designed for, the subsort of the individual languages for each revision doesn't really matter. As a matter of fact, near as I can tell, the ```$query->orderBy('etr.revision_id');``` is completely pointless, as we are already sorting by nr.vid, and joining nr.vid to etr.revision_id, so that should just be redundantly sorting the same as the vid sort above. Sorting by language in this case is only done for testing purposes, we need a deterministic source order for tests because of postgres. While mysql will always return ambiguous sorts in the order the rows were added to the database, Postgres will not, and it makes testing difficult if we don't have a deterministic sort order (incidentally, we can't just sort by source because of this, as it leaves all but the source language in an indeterminate order)

Anyway, all that to start wondering what would happen if we were not saving revisions on d7. I have not tried to set this up or test this yet, but I wonder if THAT is the case that is problematic. If we don't have the full node history to create the first revision in the right language, then the first revision created would be the base language, and if there was a langcode preceding the correct base langcode alphabetically, then that would be created first here, and I could see that causing this problem.

But I wasn't sure how to test this, given the constraints of our fixture. Then I realized I could roughly simulate it by deleting the oldest revisions of one of the nodes we tested. THAT got me into a situation where I could see the first processed revision of a node having multiple translations instead of just the first one, and while the sort by language would have happened to put the english translation first in this test case, I can see where that would not be the case always.

So.. Here's the test case we need to add to the fixture: we need to create a node with French as the base language, translate it into english, maybe icelandic, and then delete the original french revision. I THINK that will give us a test run where we can prove a bug, and then figure out the best way to fix this, which, I'm sorry to say for your specific case is probably going to be by subsorting the etr table by created date for the deterministic sort, and doing things in the correct order for valid database, though I'm open to arguments that sorting by source is correct, provided we then sort by language or creation date so the final order is fully deterministic.

I'd like to bring this up at the next migration meeting, and get @quietone's thoughts. While it's conceivable that in the case of revisions being turned off in D7, we simply say that is not the case that node complete was designed for, and you should use node and node translations classic migrations, I don't believe we've given much of any consideration to what node complete will do if if some revisions have been deleted, and that is definitely something we should test for and make sure we know what is happening.

quietone’s picture

@mikelutz, thanks for analysis. I have added this to my list for this week.

TheodorosPloumis made their first commit to this issue’s fork.

TheodorosPloumis’s picture

We have the some problem on a migration from a 7.x multilingual Drupal (using module entity_translation) where the sorting of the revisions cause new Nodes on the 10.x Drupal to have a wrong "Original language" (source).

The sorting was like this (for an "en/de/nl/fr" language setup):

vid=1, language=de
vid=1, language=en
vid=1, language=fr
vid=1, language=nl

The solution was to add an extra sortBy the "created" column because the source translation revision is always created first. After the patch the sorting was:

vid=1, language=en
vid=1, language=de
vid=1, language=fr
vid=1, language=nl

Even with this patch we need to add specific Tests.

Version: 10.1.x-dev » 11.x-dev

Drupal core is moving towards using a “main” branch. As an interim step, a new 11.x branch has been opened, as Drupal.org infrastructure cannot currently fully support a branch named main. New developments and disruptive changes should now be targeted for the 11.x branch, which currently accepts only minor-version allowed changes. For more information, see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.