Problem/Motivation
First, a quick tip of the hat to a certain legendary core developer, who shall here remain nameless, for discovering this problem.
Migrate includes an EntityRevision destination, which can migrate data into specific revisions of entities. Because EntityRevision inherits from the EntityContentBase destination, it also supports migrating data into specific translations of specific revisions of entities.
However, there is a critical hole in that functionality: the migrated translation data is keyed only by revision ID, and not by langcode. This can potentially result in data loss -- one language's data being overwritten with data intended for another language. Uh-oh!
It also means that it is impossible to look up the source identifier by destination revision ID and language, since the langcode was not part of the destination identifier.
Proposed resolution
EntityRevision needs to allow migrated data to be keyed by langcode in addition to revision ID.
Remaining tasks
Write a patch- Review it
- Improve it
- Commit it
- Breathe a big fat sigh of relief
User interface changes
None.
API changes
Nothing user- or developer-facing.
Data model changes
Nothing user- or developer-facing.
Comment | File | Size | Author |
---|---|---|---|
#86 | 2921661-86-8.6.x.patch | 6.89 KB | quietone |
#76 | 2921661-76.patch | 15.3 KB | heddn |
Comments
Comment #2
phenaproximaBy committer agreement, Migrate criticals are now just plain critical. Sorry!
Comment #3
phenaproximaBehold, a patch with a test, and a test-only patch to prove the problem. Boo-yah!
Comment #4
phenaproximaComment #5
heddnI like all these asserts. Can we have a few more to assert we can load and have all the history of revisions for all the title changes? That will help us know if we overwrote anything.
Default 1
French 1
French 2
Titre nouveau, tabarnak!
Comment #6
phenaproximaI should clarify: data will not be overwritten in a normal, direct migration (like the kind in the test). It works more or less by accident, but it does work.
Here's what I believe will break it: trying to migrate some data into a previously-migrated revision translation. I believe that will load the right revision, but the wrong translation, and the revision's default language will end up getting data that was intended for one of its translations.
I'll add another test to confirm that case.
Comment #7
phenaproximaSo I know that this can lead to potential data loss, and therefore I said it was critical. However, until I can prove the nature of the data loss (i.e., with a fail patch), I don't think I want to add more stuff to our collective plate. So I'm downgrading this for now. Once I have a fail patch that can prove the data loss, we can return this to critical.
Comment #8
xjmUnfortunately data loss is pretty much always critical regardless of what subsystem it's in or which plates it's on, so reupgrading. This is one of the most clear-cut points in https://www.drupal.org/core/issue-priority#critical-bug. We wouldn't downgrade this unless there was some mitigation that made this circumstance impossible under normal migration operation.
We also prefer to leave issues at critical until we've verified that they've not so that they don't get lost. The committers will probably discuss this more the next time we do a triage.
Retitling to describe the impact of the bug; let me know if that title's accurate. Thanks! Also thanks for the clear issue summary.
Comment #9
heddnI've added a test for what I asked for in #5. While we aren't perhaps able to reproduce what originally caused the issue for @phenaproxima in the OP, we can confirm that there is a bug with the tests.
Comment #10
heddnArg, here's the patches.
Comment #11
heddnHere's some minor improvements(?) based on a self-review of the patch. Mainly some naming changes.
Comment #12
phenaproximaHonestly...having had time to consider it, I think this patch looks good. It is thoroughly testing that revision translations are properly keyed by revision ID and language, which means there can be absolutely no ambiguity when doing data lookups from previous migrations (the ambiguity would be the source of the data loss, since the translation to load would not be specified, thus potentially overwriting data in the default translation).
I can't RTBC since I had a heavy hand in this one, but it has my blessing. I vote RTBC.
Comment #13
quietone CreditAttribution: quietone at Acro Commerce commentedLooks really nice, just a one question.
Why the change? If this is restored to the original the new test still passes. The doc for updateEntity says it will return NULL if the entity is the same as input, but I don't see that. What have I missed here?
Needs a short description in the doc block.
Like the extra tests. Makes it extra clear what is being tested.
Comment #14
maxocub CreditAttribution: maxocub as a volunteer commentedI think this looks good, except for what @quietone mentioned above. And I like the extra test too, it clearly show that the translation was updated, and not the default language.
Comment #15
phenaproximaLooking at the code, the doc comment is wrong. updateEntity() never returns NULL, so we might as well change that now.
Comment #16
heddnSo, this fixes some things. Mostly comments. Bumping back to NW though to provide an upgrade path. We need to loop through all migrations that use EntityRevision as the destination. And alter the sql table schema so it includes the langcode column. For this update, we should load the revision, get its default langcode, and set the langcode that.
Comment #17
heddnOK, here's the patches from 16 and the new ones for 17.
Comment #18
heddnSo, an interesting thought to consider as I was reviewing the code in the last patch this morning. Core doesn't have any migrations that fill the criteria of needing the update hook in this patch. Why? Because the fix associated with this issue is only for revisions that also support translations. There are none of those in core right now. Which makes it really hard to test the update hook.
Based on that information, I'm going to re-categorize this as a feature request. We cannot have a bug for something that we don't currently support. I'd also argue this isn't critical either. But this has already been bumped back and forth from a critical once, so I'll leave that to someone else to decide.
The following is the line of code in the update hook that made me come to this conclusion. The key is isTranslationDestination()
if ($destination_plugin instanceof EntityRevision && $migration->getIdMap() instanceof Sql && $migration instanceof EntityContentBase && $migration->isTranslationDestination()) {
We only need this functionality if we are dealing with an EntityRevision destination that is a translation destination.
Comment #19
heddnSee #2746541: Migrate D6 and D7 node revision translations to D8 for the feature where we are still working on building out this feature. Based on the fact it is only major, I'm going to drop priority on this issue.
And I'm also going to post a new patch here after my next meeting that removes the update hook. No need for it.Strike that. I'm going to tag as novice and have someone else re-roll the patch without those changes. Just remove all the hook_update code and re-post a patch.
Comment #20
jofitz CreditAttribution: jofitz at ComputerMinds commentedComment #21
badmetevils CreditAttribution: badmetevils at OpenSense Labs commentedDId a Reroll of last uploaded patch
Comment #23
heddnLooks like a legit error. Not sure why, since the patch was working earlier.
Comment #24
jofitz CreditAttribution: jofitz at ComputerMinds commentedGot the test version of EntityRevision::updateEntity() to return a sensible value.
Comment #25
heddnComment #26
quietone CreditAttribution: quietone at Acro Commerce commentedBrief review, just looking at comments. Leaving at NR to allow others to feedback.
What? Admittedly, not familiar with this hook but the summary line isn't helping either.
Do we know if this is a translated entity here? Or updateEntity called to make sure translations are updated, if it has translations?
Comment #27
heddn26.2: fixed comment.
26.1: That function in Sql is protected, so we don't have access to it. I've added some comments to link to it better.
Comment #29
quietone CreditAttribution: quietone at Acro Commerce commentedAssigning to myself for review
Comment #30
quietone CreditAttribution: quietone at Acro Commerce commentedI just have a question about the comment in #18
Not sure if this means there is more work to do here or in a followup.
Comment #31
quietone CreditAttribution: quietone at Acro Commerce commentedComment #32
heddnRe #30: there is no more work. Either here or in a follow-up. No update is needed. Because there are no core migrations to update.
Comment #33
quietone CreditAttribution: quietone at Acro Commerce commentedAssigning to myself for review
Comment #34
quietone CreditAttribution: quietone at Acro Commerce commentedReread the issue, reviewed the code (just one small item) and ran the tests locally. It all looks great but I can't RTBC because I don't know much about .install and how the new function _migrate_get_field_schema fits into the picture.
One nit
s/MigrateSourceInterface/MigrateSourceInterface::getIds()/
Comment #35
quietone CreditAttribution: quietone at Acro Commerce commentedUnassigning myself
Comment #36
maxocub CreditAttribution: maxocub as a volunteer commentedThe
_migrate_get_field_schema()
function should be removed. It was used in thehook_update()
that was removed in #21, so it is not used anywhere anymore.Other than that, I like the patch. Having tried to work on #2746541: Migrate D6 and D7 node revision translations to D8 in the past, I can see now why it didn't work.
Comment #37
quietone CreditAttribution: quietone at Acro Commerce commented@maxocub, thx! Now it makes sense, I just missed looking at the removed hook_update code.
Comment #38
heddnWell, that's an easy fix. Less code now.
Comment #39
quietone CreditAttribution: quietone at Acro Commerce commentedOne nit and good to go.
Not used, can be deleted.
Comment #40
heddnComment #41
quietone CreditAttribution: quietone at Acro Commerce commentedThanks heddn.
I think this is now good to go. I know I've gone over it many times, phenaproxima did a self review and supports RTBC (#12). Although that was a while ago the changes since then added a hook update and that code that was then removed. My misunderstanding about the function in .install was cleared up my maxocub in #36. And we have good tests as well.
Comment #42
alexpottWhy not just pass in the error message and create and throw the exception in addDestinationId(). All the exceptions passed in are the same class. And if we need support different exception types then lets pass in the class name rather than creating an object.
Also should this be back to critical bug?
Comment #43
heddnI feel (not very strongly) this should stay as a major and a feature. This is a very edge case and is currently Data loss is the reason for critical. This isn't an upgrade situation. This is a pure migrate issue at this point as is only an API enhancement, not a bug.
That's why we even removed the update hooks in early versions of this issue. We just don't currently support what we are doing here. See #18
I've also re-titled things. I'm not sure if we need a CR here because again, this is something we didn't previously even support. It is a new capability.
The reason this is a D8 only issue is because: Only in D8 did we start supporting multiple languages for the same entity id. In Drupal 7 and before, each language was its own node id or term id with a related language code.
New patch attached to address the feedback in #42.
Comment #44
maxocub CreditAttribution: maxocub as a volunteer commentedFeedback addressed, back to RTBC.
I also agree that this should stay Major since there's no support for migrating multilingual revisions yet.
Comment #45
alexpottRe featurey-ness and majory-ness doesn't the fact we now have current Drupal sources impact this issue and make it a critical bug?
Doing some more thinking about this. Given this is a base class that people extend and migrate is now stable I'm not 100% about adding protected methods to it without really good reason. We don't have to do this. We can just add the langcode key in in EntityRevision::getIds() the same way the revision ID is added in HEAD. I think we should be more cautious about breaking things now we are stable.
Comment #46
heddnLet's remove the addDestinationId protected method and add that logic inline.
Comment #47
maxocub CreditAttribution: maxocub as a volunteer commentedAddressing #45 & #46.
Comment #48
quietone CreditAttribution: quietone at Acro Commerce commentedIssues raised in #42 and #45 have been addressed, this can return to RTBC.
Should there be a follow up to add addDestinationId() later?
Comment #49
catchWe should have the entity type available here, so could use that in the exception message?
Comment #50
maxocub CreditAttribution: maxocub as a volunteer commentedI added the entity type to the exception messages and I added a test for the EntityRevision exception messages because there were none.
Comment #51
heddnFeedback from #49 is addressed. Back to rtbc.
Comment #53
heddnUnrelated.
Comment #54
alexpottI think given that in non-translatable circumstances the entity passed in is also updated we should continue to do that and rely on the side effects that we already rely on. To do this all we have to do is pass the object in by reference.
Comment #55
maxocub CreditAttribution: maxocub as a volunteer commentedWon't that break BC? In core we have the Book module that extends
EntityContentBase
and overridesupdateEntity()
. I suppose there's might be some contrib modules that also do that.Comment #56
alexpott@maxocub looking at the book override...
It's already relying on the side effect. I think we should update them to use the pass by reference (atm we're passing by pointer) and actually isn't the book one quite broken for multi-lingual revisioned books?
Comment #57
heddnI'm not aware of contrib overriding. The only contrib project I thought might was ERR, but they just call it. Since the method is not on an interface and is protected, I think that means its internal implementation details. But still, I don't think anyone is overriding it anyway.
From my limited research though, this will only result in a Warning, not an error. So, let's try it and see what happens.
Comment #59
heddnInterdiff in #57 is correct. Patch was wrong. Here's what should have been uploaded.
Comment #60
maxocub CreditAttribution: maxocub as a volunteer commentedWe can remove " ?: $entity" since
updateEntity()
will always return the entity.Comment #61
heddnComment #62
maxocub CreditAttribution: maxocub as a volunteer commentedAll feedback are addressed, back to RTBC.
Comment #63
alexpottI've done a bit more research and discovered that we also have \Drupal\migrate\Plugin\migrate\destination\EntityContentBase::updateEntity() which also has quite a few overrides. The config version always works via the side effect. I think we should only work that way in Drupal 9. I've added a deprecation and CR to cover this change - https://www.drupal.org/node/2960492.
Comment #65
heddnCR here.
Also, we need test coverage of the BC layer.
Comment #66
alexpottFixing the patch.
Comment #67
heddnThis will not catch cases were things like ERR overrides getEntity. Its pretty common for contrib to override getEntity. However, hopefully this communicate to enough folks. And we aren't breaking BC, so this is OK.
This will break BC if someone overrides getEntity. They will not get the object returned.
Comment #68
heddnBack to NW for 67.2
Comment #69
maxocub CreditAttribution: maxocub as a volunteer commentedI'm not sure what we should do here. To return or not to return $entity?
I also want to remind that this is blocking #2746541: Migrate D6 and D7 node revision translations to D8 which is almost ready to migrate revisions of node translations.
Comment #70
maxocub CreditAttribution: maxocub as a volunteer commentedLooking at this again, I see that the changes we made to the return doc of updateEntity() says that it is deprecated and will be removed in Drupal 9. So we should not break BC and remove it, not before Drupal 9.
Comment #72
heddnI'm not sure what to do here. I see what we are trying to do. We're are attempting to warn folks to not return values from
updateEntity
. But that call is for protected code. And most people do not override updateEntity. They override getEntity and call updateEntity. Most people, meaning not very many people and the only case I've seen where folks do any overrides is ERR and they override getEntity. But not updateEntity. So I'm of the opinion we break BC and don't return the value. And then update the docs on updateEntity. Most folks are already assuming its value is changed internal to updateEntity, so we'd probably be OK. And again, this is internal API stuff.Comment #73
heddnHere's what I've suggested.
Comment #74
maxocub CreditAttribution: maxocub as a volunteer commentedI agree with #73, but can't RTBC.
Comment #75
mikeryanJust to walk through my thought process here (and clarify for anyone else coming in and wondering about this change), I'm not thrilled with passing objects by reference - in general, it's a surprise to have the actual object reference rather than just the object contents altered. But, in this case to handle translations we do need to potentially swap the parent entity object for the appropriate translation. We had been doing this by returning the (potentially new) entity and assigning it in the caller, but changing the argument to by reference makes this explicit. So, OK.
Out of scope, but - how does this work without calling parent::updateEntity()?
The original comment here seems vague and incomplete (I admit, there's probably at least a 50% chance it's mine:P). And the additional comment doesn't really help - although I understand it means to point out that the entity object itself (rather than just its contents) may be altered, it actually just ends up rephrasing the original comment from "updates an entity" to "the entity is updated".
This seems like a good place to point out that the abstract Entity destination class getEntity() calls updateEntity(), but the expected function signature is not documented in Entity. I think we should have
abstract protected function updateEntity()
declared in Entity (with a better comment) and EntityConfigBase and EntityContentBase can @inheritdoc.This seems the place to point out the argument is by reference.
Similar (out-of-scope) digression as for Book above - shouldn't this call the parent?
Comment #76
heddnSo, let's back-up to the patch in #50. Before we did all the pass by reference stuff.
Here's a verbatim re-roll of that patch. No interdiff, because this is a re-roll.
Comment #77
phenaproximaI tried to find things to complain about in this patch, but I couldn't. I'd say it's ready to go.
Comment #79
heddnUnrelated failure.
Comment #81
phenaproximaAnother unrelated test failure.
Comment #84
catchCommitted/pushed to 8.6.x and cherry-picked to 8.5.x, thanks!
Comment #85
mondrakeI think this is causing trouble for testing in non DrupalCI environments when running test directly via
phpunit
.I started getting this which I was not getting before:
See for example https://travis-ci.org/mondrake/drudbal/jobs/399699209
Comment #86
quietone CreditAttribution: quietone as a volunteer commented@mondrake, thanks for reporting this.
This patch should fix the problem for 8.6.x. I haven't looked at 8.5 and I don't have time now.
Comment #88
neclimdulConfirmed, Bisected to this issue as well.
Comment #89
mondrakeConfirming patch in #86 fixed the problem for me.
Thanks
EDIT: on 8.6.x.
Comment #90
webflo CreditAttribution: webflo at UEBERBIT GmbH commenteddrupal-project started to fail too. https://travis-ci.org/drupal-composer/drupal-project/builds/399953427
Comment #91
quietone CreditAttribution: quietone as a volunteer commentedOK. good, this works for 8.5.x as well, considering the files it is touching, that makes sense.
Comment #92
alexpottCommitted and pushed 369c0ffd6a to 8.6.x and dfbe8684c1 to 8.5.x. Thanks!
Thanks for the quick follow-up.
Comment #95
Dave ReidIs there a plan to get an updated core release since the latest 8.5.5 tag is unable to run tests due to the fatal error?
Comment #96
Gábor HojtsyPublished https://drupal.org/node/2960492
Comment #97
Chris Gillis CreditAttribution: Chris Gillis commentedIs there any way to rush through an 8.5.6 release? Killing all the unit tests also kills automated build processes, and seems like a pretty CRITICAL bug. I've added a hard limit to my system to prevent upgrading Drupal for now, but would ideally like to remove that.
Comment #98
mnshantz CreditAttribution: mnshantz commentedConfirming patch in #86 fixed the problem for me.
Comment #99
mnshantz CreditAttribution: mnshantz commentedConfirming patch in #86 fixed the problem for me on 8.5.x.
Comment #100
vermario CreditAttribution: vermario commentedJust wanted to report that the patch is still necessary on 8.5.6.
Comment #102
oknateThis patch is included in Drupal 8.5.7.