Problem/Motivation
Was getting errors from a translation migration (didn't keep the details) because the language was NULL. When I looked at the source database there were entries in the relevant i18n table but not a corresponding entry in the locales_target table, which is the one with the language. I recall seeing this before and at least one of the source plugins has a condition to only retrieve rows where language is not NULL. This condition should be applied to all the translation source plugins that involve the i18n tables.
The alternative is to modify all the migrations to skip_on_empty on the language field. It just adds extra processing.
The source plugins with an ->isNotNull('language')
condition are d6/ProfileFieldTranslation.php and d6/FieldLabelDescriptionTranslation.php. There is actually a mix of conditions in the source plugin in an attempt to make sure there is data. Maybe they should all use the same 'safest' choice?
core/modules/field/src/Plugin/migrate/source/d6/FieldLabelDescriptionTranslation.php: ->isNotNull('language')
core/modules/field/src/Plugin/migrate/source/d6/FieldLabelDescriptionTranslation.php: ->isNotNull('translation');
core/modules/field/src/Plugin/migrate/source/d6/FieldOptionTranslation.php: ->isNotNull('translation');
core/modules/user/src/Plugin/migrate/source/d6/ProfileFieldOptionTranslation.php: ->isNotNull('translation');
core/modules/block_content/src/Plugin/migrate/source/d7/BlockCustomTranslation.php: ->isNotNull('lt.lid');
core/modules/taxonomy/src/Plugin/migrate/source/d6/TermLocalizedTranslation.php: $query->isNotNull('i18n.lid');
core/modules/menu_link_content/src/Plugin/migrate/source/d6/MenuLinkTranslation.php: $query->isNotNull('i18n.lid');
core/modules/config_translation/src/Plugin/migrate/source/d6/ProfileFieldTranslation.php: ->isNotNull('language');
Proposed resolution
Use an innerjoin instead of isNotNull to prevent rows where the language is Null which would have to be skipped in the process pipeline.
Modify the source plugin tests to have a row in an 'extra' row in the i18n_table that does NOT have a corresponding row in the locales_table.
There are the relevant source plugins.
$ grep -r locales_target core/modules/*/src/Plugin/migrate/source/ | awk '{print $1}' | sort -u
core/modules/block_content/src/Plugin/migrate/source/d7/BlockCustomTranslation.php:
core/modules/block/src/Plugin/migrate/source/d7/BlockTranslation.php:
core/modules/config_translation/src/Plugin/migrate/source/d6/ProfileFieldTranslation.php:
core/modules/content_translation/src/Plugin/migrate/source/I18nQueryTrait.php:
core/modules/field/src/Plugin/migrate/source/d6/FieldLabelDescriptionTranslation.php:
core/modules/field/src/Plugin/migrate/source/d6/FieldOptionTranslation.php:
core/modules/field/src/Plugin/migrate/source/d7/FieldLabelDescriptionTranslation.php:
core/modules/field/src/Plugin/migrate/source/d7/FieldOptionTranslation.php:
core/modules/menu_link_content/src/Plugin/migrate/source/d6/MenuLinkTranslation.php:
core/modules/taxonomy/src/Plugin/migrate/source/d6/TermLocalizedTranslation.php:
core/modules/taxonomy/src/Plugin/migrate/source/d6/VocabularyTranslation.php:
core/modules/taxonomy/src/Plugin/migrate/source/d7/TermLocalizedTranslation.php:
core/modules/user/src/Plugin/migrate/source/d6/ProfileFieldOptionTranslation.php:
Remaining tasks
Review
Commit
User interface changes
N/A
API changes
N/A
Data model changes
N/A
Release notes snippet
N/A
Comment | File | Size | Author |
---|---|---|---|
#23 | 3030937-23.patch | 19.98 KB | quietone |
#21 | 3030937-21.patch | 21.53 KB | quietone |
#20 | 3030937-30.patch | 13.13 KB | quietone |
#5 | 3030937-5.patch | 11.8 KB | quietone |
#5 | interdiff-3-5.txt | 4.53 KB | quietone |
Comments
Comment #2
quietone CreditAttribution: quietone as a volunteer commentedComment #3
quietone CreditAttribution: quietone as a volunteer commentedSomething like this. I'm heading out and haven't had time to run all the tests but wanted to get something posted here.
Comment #4
quietone CreditAttribution: quietone as a volunteer commentedComment #5
quietone CreditAttribution: quietone as a volunteer commentedThis adds an entry to the locales_target table in the source plugin tests where the language is NULL.
Comment #6
heddnfrom Gabor in slack:
Comment #8
quietone CreditAttribution: quietone as a volunteer commentedI think we are talking about two different things. I'm specifically addressing what is stored in the various i18n tables and locales_target and having a query constructed that returns NULL for the language. In the i18n tables in the fixture and locales_target language is a key and should not be NULL. The one exception is Drupal 6 i18n_block where the language is not a key but it is still set to be not NULL. That says to me that a language column will not be NULL in any of the tables and if a query returns NULL then it is corrupted data. If that happens the row has no language and can't save the translation (assuming one is found). If that is true, then that case can be avoided in the query.
Of course, this can be handled in the pipeline with a skip row, which has the advantage of recording the skip.
Now, having said that it seems that this isn't a problem in that no one is reporting problems.
Comment #9
Gábor HojtsyHm, is a "no language specified" block/term, etc. in Drupal 6/7 storing null in the DB or are they storing empty string or something else? If this patch will result in data skipped for items that had no language specified (which is still valid data) then that is a problem. If null is not stored for items where no language was specified then this should be fine.
Comment #10
mikelutzI assumes it stored 'und', as we didn't separate out undefined and unspecified back then, but I'm not certain. I'm moving to NW until there is an answer.
Comment #11
quietone CreditAttribution: quietone as a volunteer commentedTrying to answer #9 and #10 here. I looked at the case of taxonomy terms in D6. When the language is not specified the language field is an empty string. That value is not changed in the source plugin or in d6_taxonomy_term_translation. But, in this particular case the migration does not use the i18n tables anyway.
Then took a look at the d7 custom block migration. That does use the i18n tables but the source query is already limited by isNotNull('locales_target.lid').
And just for myself I went back and ran this query on the d6 and d7 test fixtures.
select i18n.lid, lt.language from i18n_string i18n left join locales_target lt on lt.lid=i18n.lid where isNULL(lt.language)
For D6 zero rows are returned but for D7 there are 109. So, there are quite a few rows in the D7 i18n table that do not, in fact, have corresponding translation data. And that is what started this whole issue for me. The field/term/block or whatever that has an entry in the i18n string table has already been migrated by the non translation version of that migration. The entry in the 18n string tables seems to be a placeholder (though I haven't read the i18n source code).
Comment #12
quietone CreditAttribution: quietone commentedWhile I think this is safe to do I don't want to spend more time on it. If you think this is too risky, set to Won't Fix.
Comment #13
Gábor HojtsyI don't know what could we do with data in the locales_target table that has null as langcode. I don't know how to interpret the language of that. It would be good to know if i18n module is using this for some kind of special data tracking in some case of term/block/etc translation. That the D7 fixture has strange data could also just mean that it is how it got built together from hundreds of commits throughout the years. That does not mean an actual Drupal 7 site would have those?
Comment #14
quietone CreditAttribution: quietone commentedIn the d7 fixture, the data is the locales_target always has a language. It just because the query uses a left join that 109 rows with no language are return. I guess that could just change to an inner join.
Comment #15
Gábor HojtsyOk so we are talking about this producing null values from locales_target for missing translations in locales_target basically. If we only care about the translations in all of the "problematic" migrations, then doing an inner join should be fine as if there was no translation then no translations should be saved in Drupal 8 either.
If we do other things with the data, then an inner join would loose us some original items so then a left join is better. But if we only care for translations, then we can inner join them.
Comment #16
quietone CreditAttribution: quietone at Acro Commerce commentedYes, that is the question. 'do we only care about translations' for these i18n migrations. I think the answer is yes.
On principle, migration source plugins get as much data as possible and leave it up to the process pipeline to decide what to do thus allowing custom migrations maximum flexibility. In this case a custom migration would be able to identify every item that is translatable but does not have a translation. Using a migration to get that data seems odd to me. And as we see the D7 test fixture has 109 cases like that, a real site will have many more. How many is unknown but having many skipped rows may cause folks to spend time investigating the cause. Time they could spend elsewhere. Yes, documentation will help. And we have to remember that some of the i18n migration source plugins already do this and some don't. Whatever is decided we should make them consistent which means either change from left join to inner join or keep the left join and add a skip row if language is empty.
Comment #17
heddnI prefer inner join. Let's get the data needed. skip rows are messy.
Comment #18
Gábor HojtsyAgree with inner join.
Comment #19
quietone CreditAttribution: quietone at Acro Commerce commentedIt will be good to review all these queries.
However, I realized that some of these queries start with $query = parent::query() which means I don't know how to change it to use an inner join between the i18n_string and locales_target table. In those cases the notNull('language') is simpler.
Comment #20
quietone CreditAttribution: quietone at Acro Commerce commentedOK, made a few tweaks.
Comment #21
quietone CreditAttribution: quietone at Acro Commerce commentedWorking on this found that #2979964: Migrate translations for D7 i18n taxonomy 'localized' terms doesn't have a test of the source plugin. Made a follow up for that oversight, Follow up added #3073442: Add test of d7 term localized source plugin.
Now here is a patch for all the source plugins that use the locales_target table. It turns out that in locales_target that the language cannot be NULL so many of the source plugin tests are changed to remove rows where the language was set to NULL. Instead a row in i18n_string is added that does not have a corresponding row in locales_target. This was all a bit tedious. Let's see if the changes to the source plugins cause any failures.
Comment #23
quietone CreditAttribution: quietone at Acro Commerce commentedDarn bad patch. Had a version of d7/TermLocalizedTranslationTest.php that is being fixed in #3073442: Add test of d7 term localized source plugin. So, just removing that error.
Comment #24
quietone CreditAttribution: quietone at Acro Commerce commentedComment #25
Gábor HojtsyChanges look good. Thanks for adding the additional test coverage with no translations :)
Comment #26
Gábor HojtsyComment #28
Gábor HojtsyThanks all, committed this given no concerns raised for 2 weeks :)