Problem/Motivation

When migrating Drupal 7 data to Drupal 8+, it is desirable to retain universally unique identifiers (UUIDs) of each entity. Although UUIDs are provided by a contributed UUID module in Drupal 7, Drupal 8+ supports them natively, which can lead people to expect that they can import the uuid field by simply mapping it in the process portion of a migration YML file. This works as expected for files and taxonomy terms because the source plugins for those entities include all fields from the source table.

However, the Node.php source plugin explicitly names the fields to import from the node table, not including the uuid field. This results in new data being generated during import instead of retaining these identifiers and can prevent the migrations from being updated.

Steps to reproduce

Create a custom migration (see migrate_plus module) of nodes from a Drupal 7 site that uses UUID module. In the process part of the YML file, map the uuid field. Run the migration. Compare the uuid data of the imported nodes to their original uuids. Note that the data does not match.

Now attempt to run the migration again with the --update flag (see migrate_tools module). Note that the migration fails because the UUIDs do not match.

Proposed resolution

The simplest solution would be to specify $query->fields('n'); in the Node.php source plugin rather than explicitly naming all the fields to use. This approach works when I test it locally, however it is discouraged in the documentation because collisions with fields from other tables such as node_revision could produce unexpected results.

Instead, my solution is to test for the presence of the uuid field in the node table and add it if it is present.

Remaining tasks

Make a fork and/or patch.

Issue fork drupal-3299806

Command icon Show commands

Start within a Git clone of the project using the version control instructions.

Or, if you do not have SSH keys set up on git.drupalcode.org:

Comments

BenStallings created an issue. See original summary.

benstallings’s picture

StatusFileSize
new811 bytes

Please disregard the fork on this branch. I screwed it up, and it contains the wrong changes. Use this patch instead:

benstallings’s picture

Status: Active » Needs review

Ready for review. Thanks!

benstallings’s picture

Version: 9.4.x-dev » 10.1.x-dev

Boosting to 10.1 since it's a feature request.

benstallings’s picture

StatusFileSize
new812 bytes

Adding a space after the comma to satisfy coding standards.

mikelutz’s picture

Status: Needs review » Needs work
Issue tags: +Needs tests, +Needs issue summary update, +Needs change record

Really, if we are going to do this, we need to do it for all entity types, and find a way to migrate the values in the core migrations if they exist, which is going to be a bit tricky.

tstoeckler’s picture

Re #8: So the issue is that for a bunch of entity types (just checked comment, term and user, I'm sure there are a bunch of others) the UUID is available "automatically" for the migration, because the source plugins just adds all fields from the respective D7 base table. E.g. from Comment:

    $query = $this->select('comment', 'c')->fields('c');

Since the D7 UUID module adds the UUID field to the {comment} table it gets picked up automatically and as soon as you map uuid: uuid in the migration process it just works™️.

Node is an exception because it specifies the fields to add for each table manually (because of the more involved revision and translation handling in contrast to other entities (at least I think that's why)). So IMHO it would be valid to actually add this one-off for nodes as there's really not anything to do for other entity types (at least not any that I checked, there may be others that need special handling).

mikelutz’s picture

Makes sense, but while I get the desire to have the core sources include this, I'm going to have a hard time justifying to the committers why we are writing extra code to add it to the core sources if the core migrations aren't going to use it when it's available. If the only place it would be used is a custom migration, then that custom migration should be responsible for adding it to the source, either through an extension of the core sources, or a prepare_row hook.

But using it in core is a bit tricky, IIRC, if we add it directly to the migrations as uuid:uuid, or even with a 'skip_on_empty', then we are going to set uuid to null if it doesn't exist, and I think that will blow things up versus actually not having it in the processes at all, and letting the entity api generate one. So we either have to generate and provide a uuid if it doesn't exist (bad for a few reasons, deterministic testing being a big one) or we have to do something in the node migration deriver or elsewhere to add it in dynamically only if the uuid module exists in D7.

tstoeckler’s picture

Ahh OK, I wasn't aware that that was the policy. Yeah, the point of this issue was (as far as I can tell from the original post which is also exactly what drives my interest in it) to save people from having to write custom (PHP) code and just be able to map uuid: uuid in their migrations and be done with it. In particular since UUIDs are so ubiquitous in D8 when you get handed a D7 database for a site which you don't really know and the {node} table happens to have a uuid column you don't spend a lot of time thinking about the fact that that column is not created Drupal core, you just assume that you're going to be able to map the UUID - at least that's what happened to me recently.

But yeah, I guess then this is closed (won't fix)?! 🤷

mikelutz’s picture

Well no, it means we need to find a way for the core migrations to migrate the uuid if it is present. Then it falls into the "We moved the functionality of the uuid module from d7 contrib to d8 core, so we should provide a migration for it" category.

tstoeckler’s picture

💡 ahh that totally makes sense @mikelutz. I misinterpreted your statement, thanks for clarifying!

I think you're right that we can't just put uuid: uuid in the core migrations as that would result in empty UUIDs on the destination but I guess we should validate that to see if the entity system does not still automatically generate UUIDs before saving as that would be the nicest solution I think.

Otherwise, I guess what we could do is always provide a uuid source property and then either have that be the D7 UUID if it exists (i.e. if the D7 UUID contrib module was installed) or just generate a random UUID in the node source's prepareRow(). That would make it still functionally equivalent to now as there really is no difference if the UUID is generated from the migration system or from the entity system, it's just as "nice" from a separation-of-concerns point of view. Not sure if that would still be acceptable, though, in order to claim support for the D7 UUID module in migrations.

mikelutz’s picture

I think that would end up being acceptable with some caveats. If we did go that route, I'd again want to include the uuid for all core entities that may have one from the uuid module, nodes, terms, comments, files, and users. We'd need testing to include both having the uuid module enabled and not, meaning we either add the uuid module to the fixture for testing, and then run some tests where we first disable the module from the test database and remove the columns, or we leave the fixture alone and run tests where we first make the adjustments to the source database to simulate the module being installed. we would want deterministic tests testing the uuid generation when it's not there, which we can do by mocking or overriding the uuid service easily enough.

Near as I can tell from the code, the uuid module generates uuids for existing entities on install, so I don't think we have to worry about some entities having them and some not, we can just assume they are there if the module is installed on d7. uuid provides vuuids for node revisions, but we don't do that in d9, so we can ignore those. What I don't know off the top of my head, and don't seem to have an old database around to tell me is how uuid module interacts with either method of translation used in d7, or if that would require any special handling for the various translation migrations or the node_complete migration.

The other, and perhaps easier option here is to go ahead and add the field to the node source (and again for completeness here, should be done for at least d7 node and node_complete. Really should be included in d6 sources too, but I'm inclined to push that to a follow-up will likely never be completed, but at least we can have an issue such that if someone really needs it, they could work on it there). Once it's in the source if the module exists, we could add `uuid:uuid` to the proper migrations in migrate_drupal_migration_plugins_alter, which would save us from having to generate them in migrate.

tstoeckler’s picture

StatusFileSize
new1.53 KB

Yes, totally makes sense, that if we're going to do add this to the node migrations, we also need to add it to all the other entities.

I'm not sure I understand the two approaches you describe exactly, @mikelutz, maybe need to read through the comment again.

However, I did look into whether maybe we need this complexity at all and if setting the UUID to NULL in the migration would still have the entity system create UUIDs automatically and I think that is actually the case. See the following in ContentEntityStorageBase::create():

    // Assign a new UUID if there is none yet.
    if ($this->uuidKey && $this->uuidService && !isset($values[$this->uuidKey])) {
      $values[$this->uuidKey] = $this->uuidService->generate();
    }

Note that the array is checked with isset() and not with array_key_exists() so as far as I can tell, the code in the if should be reached in this case.

So maybe actually the patch in #7 is not that far off, if we just add uuid: uuid to all the entity migrations and add some test coverage. That will probably be the most complex, as we need to test both with and without source UUIDs as you describe.

To make sure that it actually works, though, here's a quick test patch that just adds the uuid: uuid to the node migration (and does not add any source UUIDs) and asserts that the resulting node has a valid UUID.

tstoeckler’s picture

Status: Needs work » Needs review
tstoeckler’s picture

Status: Needs review » Needs work
mikelutz’s picture

As I woke up this morning, and groggily glanced at my notifications, I saw this and got excited right up until I sat down with my first coffee. :-) My emotions went from happy, to confused, to disappointment, to dread, lol.

When I said we couldn't just add uuid:uuid to the migrations, I didn't brain through the whole thing, it was just programmers intuition. I have flashes of seeing "SQL ERROR: Column uuid can not be null" while building custom migrations over the years. When I saw this work, I was confused until I realized when those happened. It's not an issue on the initial migration run. It's when I run them with --update and the uuid isn't coming through. Matter of fact, I suspect I've done exactly this, added uuid:uuid to a node migration without realizing the node source doesn't supply it. It runs fine the first time, but on update, we load the node, null out the existing uuid, and try to save. I'm fairly certain that THAT is where I see the sql error, because the code you mentioned only runs in ::create(), but if we are updating, we are loading and saving only.

Even if the entity save code were to detect the empty uuid and add one (it doesn't) We would still be re-generating uuids on every run, which is also bad. I realize now that this would also be a problem were we to try to provide uuids in the sources if they don't exist on d7. Every update run of the migration would change the uuids, which wouldn't be acceptable.

The dread comes from realizing we don't have test coverage over this, and we probably need it. We test the destination code that updates entities in a few places, but mostly with test entities, or embedded sources and custom test migrations. These result suggests that we don't test the actual migrations that core provides in an update scenario. It's understandable why, the core migrations were made for the core upgrade UI, which was originally built to be one and done, there was no method to re-run the migrations as updates (though I think that option exists now). So, this exposes an unfortunate gap in our test coverage that we likely need to fill either here or in another issue.

As for solutions, We could explore special uuid handling in EntityContentBase::updateEntity(). I'm not super excited about special handling for a field there, but uuids might warrant it. That might still be easier than trying to dynamically add the process to the entity migrations, but would require some good testing.

c-logemann’s picture

Issue tags: +UUID
tstoeckler’s picture

Hey, thanks for the that in-depth comment @mikelutz! As you probably guessed I absolutely did not think about updates for a second.

And yes, all of that makes very much sense (unfortutately?! ;-)).

Looking around the code I stumbled across the overwrite_properties feature that we have for destinations and was wondering whether that couldn't be used to achieve this. I.e. if we always fill overwrite_properties with everything but the UUID field, then the behavior should be as expected, right? Not sure if that would be considered viable to add that to all the core migrations or if there is a problem with that that I'm not thinking of. Alternatively, we could also add the inverse feature, something skip_properties or something and add the UUID field to that to avoid having to add all the other fields. Since we have overwrite_properties already it seems like it wouldn't too unnatural of an API addition?

Having written that, I think the same logic should apply to ID fields, as well, right? It seems there is no concrete bug there, but conceptually it totally doesn't make sense to update the IDs when running a migration update, right?

mikelutz’s picture

Well, the main problem with using overwrite_properties is that the content migrations are dynamic. Nodes specifically get derived into migrations per bundle, and each bundle has a different set of fields that would need to be added to the list. This would be doable in the deriver, but at that point, the deriver could just decide whether uuids exist or not, and add that process (or not). So if we go that route, you are over complicating it from the original 'just add the uuid process dynamically if needed' idea, not to mention making it more complicated for folks customizing those migration to add fields.

There is an issue out there for a skip properties type configuration item for the entity destination, #3291578: Allow overwrite_properties to be negated on destination plugins. I'm not completely sold on adding that in.

IDs are already somewhat of a special case, in that they determine whether the destination is creating or updating an entity. Whatever the ids are that get to the destination, the destination tries to load that entity, and if it finds it, it updates it, and if it doesn't, it creates a new one. If the migration provides no ids, it falls back to trying to load the id stored in the map table from a previous run, and if THAT doesn't exist, it creates a new entity with the next auto_increment ids and stores it in the map table for future updates. This does allow you to change a migration process in such a way that you provide a different id than what may be in the map table, causing the destination to create (or load if it happens to exist) an entity with a new id, save that new id to the map table, and essentially have migrate 'forget' about the original while leaving it in the system. I don't really consider this a bug, it's the intended behavior. While rare, I have intentionally manipulated this feature before to move some things around in very specific circumstances, and in concert with some other code to manage the situation to meet a specific client need.

The point being that while changing the id in the migration might change the id in the map table and create or update a different entity, there isn't a situation where changing an id in a migration could load the old entity and try to save it with a different id. The entity you load to do the update will never have its ID changed before save because the final ID is what is used to load the entity in the first place, if that makes any sense.

Version: 10.1.x-dev » 11.x-dev

Drupal core is moving towards using a “main” branch. As an interim step, a new 11.x branch has been opened, as Drupal.org infrastructure cannot currently fully support a branch named main. New developments and disruptive changes should now be targeted for the 11.x branch, which currently accepts only minor-version allowed changes. For more information, see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

donquixote’s picture

I like patch #7 as an incremental improvement.
For it to work correctly, we also need to add 'uuid' in the ->fields() method, with the same condition.

As for adding the "uuid: uuid" mapping to migrations:
One option is to conditionally add this mapping. So it would not be present in the default yml files.

If we always want to add the "uuid: uuid" mapping, then we need to auto-fill this if the uuid module was not installed in D7.

The usual random uuid is not great, it will produce a different value each time the migration is run.
This causes issues e.g. with views that reference taxonomy terms by uuid.

I can see two options to generate "stable" uuids, if uuid module was not enabled in D7:
- Store the generated uuids in a yml file somewhere, which would be committed to code. I don't like this, because this file would have to scale with the amount of content.
- Generate fake uuids that are actually hashes based on entity type, entity id, and a fixed salt defined somewhere. E.g. create a sha256, get a substring, insert the dashes for uuid format, done.

Maybe all of this would have to be opt-in, to not mess with existing migrations.

And yes, test coverage.

donquixote’s picture

StatusFileSize
new2.35 KB

Here is a patch that adds to the #7 patch.
This only covers the first part of the problem, exposing uuid for nodes.

Actually it might be a good idea to treat this as two separate issues:
- Add uuid for node source plugin, if uuid module was installed in D7.
- Follow-up to add uuid mapping to the default migrations.

donquixote’s picture

And btw, if I run "migrate:import --update" with a node or term migration with this patch, it does work, and it does update the entity uuid to the correct value.

We probably do need something else for revision uuids.
But in a first version we can live without them.
Term uuids and some content uuids are important if they are referenced from elsewhere, esp in config.
Revision uuids are quite unlikely to be referenced in config. The main reason to support them would be a sense of technical completeness.

Version: 11.x-dev » main

Drupal core is now using the main branch as the primary development branch. New developments and disruptive changes should now be targeted to the main branch.

Read more in the announcement.