Problem/Motivation
There is an unmet need to flag entities as "syncing" when doing migration updates, so that they can be handled differently from ordinary updates.
For example, updating an entity sets the entity's changed time to the current time. This creates an issue when doing migration updates, which update entities on the destination that have been changed on the source since the last migration.
In an initial migration, the entity's changed time is set to match the changed time in the data source. For example, if a node of the page content type has a "changed" timestamp 1651775012 in the source, the node will also have a "changed" timestamp 1651775012 in the destination, regardless of how much time has actually elapsed since then.
This behavior should be the same when the initial migration is updated (via e.g. "drush mim upgrade_d7_node_page --update"). But it's not. Rather, the "changed" timestamp on the destination entity is set to the current time, instead of the value of the "changed" timestamp on the source node. As a result, the migrated data on the destination does not faithfully mirror the source.
Among other side effects, views that include a sort on the entity's changed time may not work correctly on the destination after a migration update.
An additional example of the need for a "syncing" flag can be seen in issue #2803717: Allow 'syncing' content to be updated in content moderation without forcing the creation of a new revision, concerning updates in content moderation.
Proposed resolution
Multiple patches are involved. This one simply flags an entity as "syncing" during a migration.
The following pending patch looks for the "syncing" flag and skips updating the changed time for entities with that flag:
#2329253: Allow the ChangedItem to skip updating the entity's "changed" timestamp when synchronizing (f.e. when migrating)
This fixed issue uses a "syncing" flag to solve a similar problem in content moderation migrations:
None. This patch simply sets the "syncing" flag that can be used to alter the behavior of an entity migration. It is blocking #2329253: Allow the ChangedItem to skip updating the entity's "changed" timestamp when synchronizing (f.e. when migrating) and would provide a simpler method to implement #2803717: Allow 'syncing' content to be updated in content moderation without forcing the creation of a new revision.
Steps to reproduce
See the steps in the issue #3195139: migrate:import --update creates new revisions, if content_moderation active because the current one will fix it automatically
Remaining tasks
None
User interface changes
None
API changes
None
Data model changes
None
Release notes snippet
This patch marks entities as "syncing" during migrations, so that other code (including issues #2329253: Allow the ChangedItem to skip updating the entity's "changed" timestamp when synchronizing (f.e. when migrating) and #2803717: Allow 'syncing' content to be updated in content moderation without forcing the creation of a new revision) can alter the handing of migrating changes to entities.
Comment | File | Size | Author |
---|---|---|---|
#59 | 3052115-59.patch | 11.86 KB | HitchShock |
| |||
#58 | 3052115-nr-bot.txt | 144 bytes | needs-review-queue-bot |
#53 | 3052115-53.patch | 11.88 KB | edysmp |
#47 | interdiff_44-47.txt | 398 bytes | ranjith_kumar_k_u |
#47 | 3052115-47.patch | 12.33 KB | ranjith_kumar_k_u |
Issue fork drupal-3052115
Show commands
Start within a Git clone of the project using the version control instructions.
Or, if you do not have SSH keys set up on git.drupalcode.org:
Comments
Comment #2
Sam152 CreditAttribution: Sam152 as a volunteer and at PreviousNext commentedUploading half the initial patch split from #2329253: Allow the ChangedItem to skip updating the entity's "changed" timestamp when synchronizing (f.e. when migrating).
Comment #4
Sam152 CreditAttribution: Sam152 as a volunteer and at PreviousNext commentedComment #5
Sam152 CreditAttribution: Sam152 as a volunteer and at PreviousNext commentedComment #7
RoSk0Blocked only on #2329253: Allow the ChangedItem to skip updating the entity's "changed" timestamp when synchronizing (f.e. when migrating) now. Adding blockers to related.
Comment #9
firewaller CreditAttribution: firewaller commented+1
Comment #11
herved CreditAttribution: herved commentedReroll from current 9.1.x HEAD
Edit: this works well to prevent content_moderation to create new revisions during migrations.
I didn't have time to look at those tests failures yet, and I'm not sure how to deal with some of those.
Comment #13
quietone CreditAttribution: quietone as a volunteer commentedThis really needs an issue summary update. ;-)
Comment #14
hchonovThere is no reason to postpone it just because it needs issue summary update.
Comment #15
quietone CreditAttribution: quietone as a volunteer commented@hchonov, I set this to postponed because the IS says it is blocked on #2329253: Allow the ChangedItem to skip updating the entity's "changed" timestamp when synchronizing (f.e. when migrating)
Comment #16
hchonovComment #17
donquixote CreditAttribution: donquixote as a volunteer commentedI ran into the same problem and already created an issue here, #3195139: migrate:import --update creates new revisions, if content_moderation active.
(or rather different problem, same solution)
It is now a duplicate.
Comment #18
donquixote CreditAttribution: donquixote as a volunteer commentedAbout the patch:
Would it not be simpler to call the ->setSyncing() in the ->save() method?
Do we need this for anything else? What is the effect e.g. on delete?
Also, for these temporary state changes I usually use try / finally, like so:
This makes sure the state is restored even if an exception occurs.
I could upload a new patch, but perhaps I am missing something essential.
Comment #19
muthuraman-s CreditAttribution: muthuraman-s commentedHi, I am also facing a similar issue updating the 1k volume of existing content.
But at the same time, I don't want to update the changed time. I am working
on the Drupal 8.9 Version. above patch #11 not working for me!! any suggestion?
Comment #21
codebymikey CreditAttribution: codebymikey at Zodiac Media commentedI don't think this issue is currently blocked by the other.
Should this be behind a configuration flag, is there ever a use-case where one would like to keep the current "non-sync" behaviour?
I think it'd be nice if the migration destination plugins have some kind of "pre/post-migration-save" hooks which can allow third party modules to manipulate the destination entities before they're saved without having to extend and reimplement the destination plugins with the desired behaviours.
This should improve DX and avoid having to wait for core's release cycle to get new features in, allowing contrib modules like
migrate_plus
to introduce more helpful APIs to handle issues like this.Comment #22
Wim Leers@huzooka and I are running into this same problem. He spent days pinning it down to this root cause! 🤯
Confirmation of the problem and clarifying it more 🔍
content_moderation
module. Added tag for that. Also bumped to , because it's making migration of D7 contrib data into the succeeding D8|9 module in Drupal core impossible.drush migrate:import
command.We can reproduce it even when not updating, just importing, and without using
drush
. All you need is:changed
timestampd7_node_complete
configured to sethigh_water_property
for itssource
plugin:high_water_property: { name: changed, alias: n }
.Both have the same root cause: a single migration plugin's source row is being imported multiple times.
(Using
high_water_property
is essential if you want to have remotely acceptable incremental migrations on big Drupal 7 source sites — it'd take prohibitively long otherwise.)Getting this moving forward again 🤞
So, I started looking at why this issue is actually marked #2329253: Allow the ChangedItem to skip updating the entity's "changed" timestamp when synchronizing (f.e. when migrating). AFAICT that happened because this was originally split out from #2329253, to reduce the scope there. But per @hchonov in #2329253-59: Allow the ChangedItem to skip updating the entity's "changed" timestamp when synchronizing (f.e. when migrating):
on… which confirms what I suspected: this issue can land independent of that other one!
Because then at least the migration can succeed, even though when updating already migrated entities, until #2329253: Allow the ChangedItem to skip updating the entity's "changed" timestamp when synchronizing (f.e. when migrating) lands, their
changed
field will be set to the current time inappropriately. Still, that is better than a failing migration.Comment #23
Wim LeersFYI: just RTBC'd #2329253 after a detailed review: #2329253-79: Allow the ChangedItem to skip updating the entity's "changed" timestamp when synchronizing (f.e. when migrating).
Comment #24
huzookaHere is a test-only patch.
I already have a local fix, but for first I want to check all the tests which were failing in #11.
AND: While I was checking all the destination plugins I noticed that the EntityShortcutSet plugin was created only to call
setSyncing(TRUE)
before the save!Comment #25
huzookaComment #26
Wim LeersWhy do we need this if you also updated
protected static $modules
? 🤔Nit: This is the same as the parent; can be removed.
Nit: can also be dropped.
Comment #27
huzookaAddressing #26, and adding a similar test for Drupal 6 migrations.
We have to fight A LOT with the preexisting migrations since some of their destination entities (like Role) are doing some data petting in a presave: for example, Role sorts its permissions if the actual entity isn't syncing...
Comment #28
BerdirAre we sure that we want to do this for content and config entities?
config entity sync is something very different from content entity sync conceptually.
I would assume that the main benefit here is content entities and config entities is an edge case. We could consider that in a separate issue?
Comment #29
huzookaThe existence of EntityShortcutSet destination plugin and thefact that in the Drupal 6 test there are 6 extra (duplicated ?) actions after the import prove that config entities are in scope.
But I don't say they shouldn't be fixed in a separate issue.
Comment #30
huzookaComment #31
huzookaThis is a content-only fix.
Comment #32
huzookaComment #33
huzookaComment #34
huzookaComment #35
Wim LeersTest/proof of necessity review
Before this patch:
😱
Before this patch:
😱
→ proven beyond any doubt! 👍
Functional change review
🙏 Can you add a comment here explaining why we do this only for content entities, and not config entities?
Previous patches were doing a
instanceof
\Drupal\Core\Entity\SynchronizableInterface
check first.But both content and config entities are always synchronizable since sue #2985297: Generalize the concept of entity synchronization. So … that makes sense.
But I think it's still safer to have that instance check simply because
EntityInterface
does not implement this interface; it's theoretically possible that non-content, non-config entities are being migrated…Comment #36
BerdirOn the functional part. The second call is in the content entity specific destination, so that does not require an instanceof check. For the first part, maybe we could do that in that content entity specific implementation too? If we later on want to do it for config too then I can see the benefit of keeping it in the base class but we could still move it back if we want later?
Comment #37
Wim LeersD'oh, of course! 😅
@huzooka just explained to me that he did that to address your feedback in #28 👍
He also just explained to me that we could do what you're describing but then he'd have to change it in both
EntityContentBase
(for e.g.d7_node
) andEntityContentComplete
(for e.g.d7_node_complete
). But then there's a complication due to\Drupal\migrate\Plugin\migrate\destination\EntityContentComplete::rollback()
's implementation:… so we'd need to copy/paste basically all of the code from
Entity::rollback()
intoEntityContentComplete::rollback()
, so … it'd end up not simplifying anything 😳And there could be contrib modules too that extend
Entity
and notEntityContentBase
orEntityContentComplete
: for example\Drupal\entity_reference_revisions\Plugin\migrate\destination\EntityReferenceRevisions::rollback()
uses theEntity
one, because it chose to not extendEntityContentBase
!IOW: also for maximum compatibility, the current approach is best/safest.
Considering all that … I think this is good to go as-is 😊
As far as I'm concerned, this is RTBC if we update the issue summary to specify that this is only tackling the problem for content entities, not config entities; and to file a follow-up to also do this for config entities. What do you think, @Berdir?
Comment #38
BerdirSounds good to me. I didn't mean to suggest that we should not do it for config entities, just that I suspect it's basically a 50/50 switch of intended and unintended consequences from a quick glance while content entities gives us big improvements (revisions and update fields together with the other issue) without known drawbacks, so it seems like a good idea to split it up.
Comment #39
Wim Leers😅 — that's exactly what @huzooka said! :D
So does that mean you also think this is RTBC as soon as the follow-up issue exists and the issue summary here has been updated?
Comment #41
quietone CreditAttribution: quietone at PreviousNext commentedDiscussed at #3253983: [meeting] Migrate Meeting 2021-12-16 2100Z. mikelutz, jibran and myself. As already tagged, this needs and Issue Summary update and without that it makes this difficult to review. We thought this is a feature request with priority of normal. We didn't have time to read all the comments so tagging 'Needs subsystem maintainer review'.
Setting to Needs work for an issue summary update. It would help to include the justification for the suggested change.
Thanks.
Comment #42
DamienMcKennaWould it be useful to have more comments in the code (not the tests) to explain what the changes mean?
Comment #43
quietone CreditAttribution: quietone at PreviousNext commented@DamienMcKenna, as a reviewer I would rather have the IS up to date so that I can then determine if the code is doing what it should and in this case how it fits in with the related issues. The IS implies that the changed timestamp is incorrect, is that for all migrations or just some? Nor does it explain how introducing syncing will fix it.
Comment #44
floydm CreditAttribution: floydm at Affinity Bridge commentedPatch 34 fails on 9.3.2. Attached is a reroll that applies.
Comment #45
rclemings CreditAttribution: rclemings commentedUpdated issue summary per comment #43 (https://www.drupal.org/project/drupal/issues/3052115#comment-14345378) in hopes of getting this much-needed patch moving again.
Comment #46
rclemings CreditAttribution: rclemings commentedComment #47
ranjith_kumar_k_u CreditAttribution: ranjith_kumar_k_u at Zyxware Technologies commentedComment #48
DamienMcKennaThis patch was not enough on its own to resolve the problem of content revision timestamps being modified when a migration is reran (migrate_tools, drush migrate:import MIGRATIONNAME --update), so I'm testing to see if #2329253 also helps.
Comment #49
rclemings CreditAttribution: rclemings commentedComment #50
DamienMcKennaJust to mention it, the site I'm having this problem on uses d7_node_complete for the migration instead of d7_node, so it's hitting the exact problem Wim mentioned in #37.
Comment #51
Wim Leers@DamienMcKenna: Any chance you'd be willing to do a thorough review + test the patch so we can mark this RTBC and get this in? 🤞
Comment #53
edysmpreroll for 9.4
Comment #54
quietone CreditAttribution: quietone at PreviousNext commentedFormat issue links in the IS.
Comment #55
jwilson3+1 I can confirm that the latest patch from this issue, combined with the patch from #2329253: Allow the ChangedItem to skip updating the entity's "changed" timestamp when synchronizing (f.e. when migrating) solves the most common underlying issue facing people that are running multiple migrations: #3280140: Running migrate:import for node entity with --update modifies CHANGED timestamp.
Note: we are not migrating revisions, so sadly I cannot do a complete review here.
Comment #57
fengtan+1, patch 53 combined with #2329253: Allow the ChangedItem to skip updating the entity's "changed" timestamp when synchronizing (f.e. when migrating) fixes the migrated "changed" dates for us. Without these two patches, the changed date would be set to the migration date on the destination.
Comment #58
needs-review-queue-bot CreditAttribution: needs-review-queue-bot as a volunteer commentedThe Needs Review Queue Bot tested this issue. It either no longer applies to Drupal core, or fails the Drupal core commit checks. Therefore, this issue status is now "Needs work".
Apart from a re-roll or rebase, this issue may need more work to address feedback in the issue or MR comments. To progress an issue, incorporate this feedback as part of the process of updating the issue. This helps other contributors to know what is outstanding.
Consult the Drupal Contributor Guide to find step-by-step guides for working with issues.
Comment #59
HitchShockreroll for 10.1.x-dev
Comment #60
HitchShockThe patch really works well for me, +1 RTBC.
Also, found one more issue related to the content syncing, but create a separate issue for that because it's related not only to the migration -
#3338260: Skip changing 'revision translation affected' field if the old revision is syncing
Also, I agree with other people that the patch from #2329253: Allow the ChangedItem to skip updating the entity's "changed" timestamp when synchronizing (f.e. when migrating) is very useful in this case too.
Comment #62
benjifisher@rclemings: Thanks for updating the issue summary. I am keeping the tag for that because I think it still needs work.
@HitchShock: Thanks for updating the patch (as a MR). When updating an issue that was previously RTBC but needs a "reroll", it is OK to mark the issue RTBC again. Otherwise, it is unusual to mark an issue RTBC when you have worked on the code.
I am setting the status back to NW for several reasons.
Is this issue a bug report?
I think there is a pretty good argument that it is. If so, then the issue summary should have steps to reproduce. It looks as though the automated tests implicitly have those steps, but it helps to have them in the issue summary. @quietone already made this point. (See Comments #42, #43.) There may be enough in the existing comments (such as #22 and #35 from @Wim Leers) to make the case that this issue is a bug. See also #11, #55.
If this issue is not a bug report, then we should not change the behavior unconditionally. We can add an option to the destination plugin to use the new behavior. I see that @codebymikey already made this point in #21.
Do not change the Entity base class.
This is just a question of OO design. The base class should handle common functionality, and the child classes should handle more specific requirements. In this case,
EntityContentBase::rollback()
can in-line the parent method (2 or 3 lines) instead of modifying the parent method.I see from #36 to #38 that this has already been discussed, and it is more complicated than that. Still, I wonder if there is a better way.
Maybe I am reading too much into the phrase, ‘simply flags an entity as "syncing"’ in the issue description. It suggests to me that this issue will not have any effect itself, and follow-up issues will use that flag. @Berdir knows the effects of
setSyncing()
, but for the rest of us it would help to have an explanation in the issue summary.Comment #63
smulvih2Patch #47 works for me on drupal/core 9.3.22
Comment #64
HitchShockHi @benjifisher
About your points:
That is, it seems that this is not a bug fix, but the introduction of the new feature. A feature that will allow us to fix #2329253: Allow the ChangedItem to skip updating the entity's "changed" timestamp when synchronizing (f.e. when migrating) and #3338260: Skip changing 'revision translation affected' field if the old revision is syncing in the future.
But at the same time, this solution automatically solves the issue #3195139: migrate:import --update creates new revisions, if content_moderation active, or rather, the issue will be fixed by the issues #3052115: Mark an entity as 'syncing' during a migration update and #2803717: Allow 'syncing' content to be updated in content moderation without forcing the creation of a new revision.
So, in the end, this is not just a new feature, but a new feature + a bug fix.
This is also confirmed by the tests for content_moderation that were created as part of this issue.
Not sure which category should be used in this case.
Done:
Also, the patch was tested on real projects.
It's RTBC for me.
Comment #66
longwaveI've been running this for some time along with the related patches in order to fix various issues with migrations; I think it's about time that this one lands in order to be able to move some of those other issues along - hopefully we can get those into 10.2.0 as well.
Removing the issue summary update tag as it's no longer required. Leaving the followup tag as I think #38/39 intend to discuss doing this for config entities as well?
Committed and pushed 68c1000f770 to 11.x (10.2.x). Thanks!
Comment #69
longwaveReverted, because this broke HEAD: https://www.drupal.org/pift-ci-job/2693233
And similarly in the other new test
Comment #70
HitchShock@longwave
Updated MR from 10.1.x and there are no test errors atm.
Looks like it was some temporary issue. Sometimes I see that right after the merge, there may be errors in the tests, but a second run shows that everything is fine.
Comment #72
longwaveIt looks like I accidentally committed the patch instead of the MR! I got the right one this time and ran the tests locally to double check.
Committed and pushed cd1b41fdbf to 11.x (10.2.x). Thanks!
Comment #74
quietone CreditAttribution: quietone at PreviousNext commentedSetting to NW for the followups.
Comment #75
quietone CreditAttribution: quietone at PreviousNext commentedI made the follow up #3425339: Mark a config entity as 'syncing' during a migration update
Comment #76
john.oltman CreditAttribution: john.oltman at SiteBasin, Inc. commentedThis seems like a meaningful change for sites with ongoing migrations and content moderation enabled. Yet I did not see this mentioned in the release notes for 10.2 (sorry if it is there and I missed it). As a result we lost expected revision records since we updated to 10.2 a few weeks ago, which makes it hard to see what the net change is per migration update. The change is not present in 10.1.
By the way I agree with the change. Just wish it had been called out. The easy fix to go back to how things had been working was a presave hook:
Comment #77
jasonawantI found myself in a similar situation as @john.oltman above.
I was relying on 10.1 behaviors to migrate entity revisions' moderation_state values and (I think) relying on ModerationHandler::onPresave() set publish state accordingly.
@john.oltman, thanks for the solution!
Comment #78
jasonawantHi! I created a change record for the 10.1/10.2 changed behavior. Is this needed? If so, could someone review it and then I'll publish it?
https://www.drupal.org/node/3426397