Active
Project:
Drupal core
Version:
main
Component:
migration system
Priority:
Normal
Category:
Plan
Assigned:
Unassigned
Issue tags:
Reporter:
Created:
19 Nov 2019 at 19:18 UTC
Updated:
3 Jul 2020 at 16:27 UTC
Jump to comment: Most recent, Most recent file
Comments
Comment #2
wim leersThanks for creating this, @mikelutz! I'll watch the recording ASAP.
I'm personally frightened by the fact that we've been running migrations without running entity validation. I'm very grateful #2745797: Add option to content entity destinations for validation happened. It's a step in the right direction.
I'd have loved to have seen a "enable entity validation for all the things" flag. It's true that setting that would likely break existing in-development migration projects. But I'm just starting to migrate
wimleers.comas an experiment, so that would actually be welcome. Hence the attached patch that simply forces entity validation for all entities.Where I'm coming from: migrating my own site from D7 to D8
After a number of
dateandfilterfixes, thewimleers.commigration mostly works when usingmigrate_drupal_ui🥳Albeit that in quite a few places, I’m getting entity validation errors if I go to a node edit form and hit “save” without changing anything! 😨
Which confirms the need for us to be able to run with entity validation turned on.
So what happens when I migrate my own site with the attached patch applied?
The results are kinda frightening. It looks like
migrate_drupalis not taking dependencies into account when generating and ordering migration tasks.At the very least,
path_aliasshould run the absolute latest (they’re entities now in 8.8, and hence have validation!), andtaxonomy_termshould run very early. Those two tweaks alone would fix a lot. I’m not sure a full dependency-based approach is feasible.But in theory we could introspect the data model for each entity type/bundle and selectively migrate all entities within a given bundle together, and introspect their dependencies (i.e. things they could reference).
OTOH, if there's one thing that we actually have reasonable certainty about, it's that references continue to work. So I think it'd be interesting to have a validate everything except for
\Drupal\Core\Entity\Plugin\Validation\Constraint\ValidReferenceConstraintmode.That's the next thing I will do: investigate how/where that migration execution order is determined, first manually tweak it, and hopefully make it smart by using a dependency graph.
EDIT: and @mikelutz wrote: — this gives me hope that we can make a leap forward here 😊It makes sense that we shipped with the simplest possible approach first, but now that all the foundations exist, I think we're finally in a position where we can do better!
Comment #3
mikelutzRelevant slack discussion:
Comment #4
wim leersI'm still doing more digging, but I've got a very rough PoC patch working for this:
Doing that cuts most of the noise away until we figure out a more migration execution order that respects dependencies (references).
I am finding lots of legitimate validation violations. Expect a big update tomorrow. :)
Comment #5
mikelutzThat is another reason we don't validate that I hadn't even thought about. The core migrations as ran through the UI preserve all entity ids across the board. So without validation running migrations or using stubs was unnecessary. Once the entire set of migrations ran, then all the references would validate.
I would like to see it all tightened up, though I also worry about circular references and other situations where it might be difficult to manage.
This is out of scope, but I'm just brainstorming here:
Core has no concept of migration groups, nor any concept of running several migrations together. It only supports migration relations as dependencies. Contrib provides the concept of groups, and a drush command to run a set of migration by group or tag, but under the hood, it still spins up a MigrateExecutable instance for each migrations and runs them one after another, completely independently (after using the dependency tree to sort them)
I wonder if it would help both with validation, and with the concept of allowing a dry run if core had some sense of migrations that were not independent, and were meant to be run together as a group. Some sort of Task runner that manages running a set of migrations and can do some stuff at the beginning and end of a group of runs. For the purposes of a dry run, it could keep track of the mapping data a dependent migration would need, and then make sure it's all cleared out at the end. For validation, it could keep track of saved entities and potentially validate them (particularly the references) at the end of the entire group run. OF course at that point the entities are already saved, so about all we could do is report, and maybe roll everything back.
I don't think it would be able to validate references during a dry-run, which makes the dry-run less useful (Although (and this is out of scope for my out of scope thought), maybe we could create a database connection with a new prefix, clone the tables, run the dry-run on THAT, and provide some sort of introspection, though there is a whole other can of worms. Still, to be useful it would require core having some concept of running more than one migration together, which it currently doesn't)
This could also be something done in contrib instead, but we lose the possibility of core doing a post migration group reference validation. Anyway, I'm just brainstorming long term ideas. They would be a lot of work, and I haven't fully thought through the practicality.
Comment #6
wim leersYep, hence my proposal to exempt specifically entity references. We might be able to detect circular references potential by reasoning about the data model. But even if we can deduce that circular references are possible, we still need to be able to migrate. That's when #4 comes into play.
Interesting. This risks being brittle too though. I think a simpler approach would be to track for which entities
ValidReferenceconstraints were not validated, and then validate all those entities again after the migration. Or simpler still, just run validation again after the full migration has been completed. If during migration all entities passed validation except for theValidReferenceconstraint and after migration all entities pass validation completely, that would allow us to safely conclude that all migrated entities are valid, including circular references :)That approach does seem highly practical, at least to my new-to-migration-and-its-complexities eyes. Do you see problems in that approach?
Comment #7
wim leersBased on my initial exploration work and extrapolating from migrating D7
wimleers.comto D8 (which itself was upgraded from D5), I think that gradually enabling entity validation should be doable.The argument in #2745797: Add option to content entity destinations for validation was that we don't want to break existing migrations. That is true for custom migrations. But there's nothing stopping
migrate_drupalin Drupal core from addingvalidate: trueto all generated migrations. That would not affect custom migrations at all.When @webchick attempted to migrate her blog about half a year ago, she ran into the problem that it's really hard to act based on the encountered errors. Migrate Drupal UI tells you to go to
/admin/reports/dblogfor the "detailed upgrade log". But that's literally just a log, often with incomplete information that is extremely hard to act on, even for developers.I believe the first step is to surface migration messages per migration, which tends to also mean per entity type and if it's a bundleable entity type, per entity type + bundle. This makes it easier for the migrator to create a mental model of which things are failing and how that may have effects on depending migrations.
At that time, #2745797 did not land yet, so that did not include validation errors. But the patch here does enable entity validation upon migration for every entity. When applying the patch in #4 to migrate

wimleers.com(along with #3095922: The comment "language" column in D7 might be empty after migrating from D6, this is invalid in D8 applied, which I would never have discovered until months or years after the migration to D8 and it'd then have caused obscure bugs), the super rough PoC UI for surfacing migration messages per migration, posted at #3063856-5: Add ability to view migrate_message table data, looks like this:That gives you a good sense of what's going on. Lots of entity types have zero validation errors 🥳 Most of the other ones are related to uploaded files that are larger than what is allowed and/or have disallowed extensions (turns out this uncovered a problem in the destination environment, something that would also be excellent for #3061676: Create an audit plugin class/manager for migrations to check!). A few things point to invalid data on the source environment that could easily be fixed on the source environment (caused by validation not being strict enough in D6/D7, or the upgrade having been imperfect in the past). This would empower the migrator to take more targeted action (and so far you don't even have to be a developer).
(We haven't even mentioned #2969551: Migrate messages from caught exceptions need file and line details, but that would of course make #3063856 even more valuable. But that's not strictly related to this issue.)
A consequence of running migrations with entity validation turned on is that we need to become smarter about the order in which we run migrations. I'm happy to have been completely wrong in my assessment in #2: migrations already are run in order that respects dependencies! But there's some dependency information missing I think, but that only becomes apparent once you turn on entity validation. So that should be simple 🙂
So this is what I propose as next steps:
migrate_drupal(_ui)to opt in to migrate with entity validation. It should excemptValidReferenceConstraintlike in #4, but then also run entity validation on all entities again post-migration, like #5 + #6 suggest. That would give us 100% confidence about the data in the destination site, and hence rule out any future weird bugs once running Drupal 8.Comment #8
wim leersNew child issues in addition to those listed in #7.3 — all related to improving migration dependencies, to make it possible for entities to be valid when they get migrated:
Comment #9
heddnSeems a lot of child issues about dependency ordering. Let's add dedicated test coverage about an expected order of running migrations as a first step.
Comment #10
xjmComment #11
wim leers