Problem/Motivation

Scenario:

  1. d7_file migration runs before d7_user migration, because d7_user has an (optional) dependency on d7_file (for the user picture)
  2. d7_file migrates files on the filesystem but also the file_managed table
  3. the file_managed table contains uids
  4. but the D7 source site's file_managed table contains referenced to user IDs that no longer exist, because the users have been deleted

Consequence: data integrity problem; the files no longer have a valid owner.

If you're also validating all your migrated file entities (like you should be), which is possible since #2745797: Add option to content entity destinations for validation (see https://www.drupal.org/node/3073707 for the CR), then all of those rows will also trigger validation errors.

Proposed resolution

Note that d7_file will trigger an entity validation error for every single row out of the box, since no User entities will exist yet. That can be solved by updating d7_file to use the migration_lookup instead of just migrating uid blindly — this way stubs will be created.

Remaining tasks

TBD

User interface changes

TBD

API changes

TBD

Data model changes

TBD

Release notes snippet

CommentFileSizeAuthor
#15 3156733-15.patch1022 bytesomkar.podey
#2 3156733-2.patch596 byteswim leers

Comments

Wim Leers created an issue. See original summary.

wim leers’s picture

Status: Active » Needs review
StatusFileSize
new596 bytes
benjifisher’s picture

Status: Needs review » Needs work
Issue tags: +Needs tests

I have not tested, but I am afraid that the proposed resolution will lead to other problems.

Suppose that the files with fid 1, 2, 3 have uid equal to 3, 2, 1. If stub users are created during the d7_file migration, then the migrate_map table will map 1 to 3 and 3 to 1, right? What happens when we get to the d7_user migration, which has uid: uid?

I hope I am wrong, since then it will be much easier to fix this problem.

Either way, we need some test coverage before we can accept a patch on this issue.

It might be easier to add the tests after we finish #2974445: [Meta] Refactor Migrate Drupal UI Functional tests.

wim leers’s picture

Interesting question.

But isn't the same true when nodes with nid 1, 2, 3 have uid equal to 3, 2, 1?

benjifisher’s picture

That would be a problem if the node migration ran before the user migration and the node migration used migration_lookup.

The only place I see migration_lookup used in core node migrations is for D6 body formats:

$ grep -riB 1 migration_lookup core/modules/node/migrations/
core/modules/node/migrations/d6_node_complete.yml-  'body/format':
core/modules/node/migrations/d6_node_complete.yml:    plugin: migration_lookup
--
core/modules/node/migrations/d6_node.yml-  'body/format':
core/modules/node/migrations/d6_node.yml:    plugin: migration_lookup
--
core/modules/node/migrations/d6_node_revision.yml-  'body/format':
core/modules/node/migrations/d6_node_revision.yml:    plugin: migration_lookup

When I do a custom migration, I never try to preserve entity IDs, so I use migration_lookup everywhere. The core migrations do not work that way.

wim leers’s picture

#5: Fascinating! I never realized that. 😅 So that is intentional? Is this documented somewhere?

Do you see an alternative solution that can solve the problem I reported?

benjifisher’s picture

I do not know if it is documented anywhere. While thinking about this issue the other day, I did notice this comment in core/modules/file/migrations/d7_file.yml:

process:
  # If you are using this file to build a custom migration consider removing
  # the fid field to allow incremental migrations.
  fid: fid

IMO the right fix is to change core migrations to do what I do in my custom migrations: do not try to preserve entity IDs. But then you have to worry about <a hfref="/node/123">About Us</a> in the Body field. We added the DOM process plugins to the Migrate Plus module to handle that. I guess we would also have to bring those into core.

If we are going to stick with the core strategy of preserving entity IDs, then we have the chicken-and-egg problem that managed files have a uid field and users have a user_picture field. I see that the d7_user migration uses migration_lookup for that, which is odd because d7_file preserves the file ID. I guess the point is that migration_lookup will tell us when a file (user picture) was not migrated.

Maybe the simplest solution is to use default_value in the d7_file migration to set uid = 1 for all files. Then add d7_file_depends_on_d7_user that only has two fields in the process section: fid (use migration_lookup based on d7_file) and uid (use migration_lookup based on d7_user). If we skip any rows, then at least they will have a valid uid.

wim leers’s picture

But then you have to worry about <a hfref="/node/123">About Us</a> in the Body field.

Much worse: then you need to worry about https://www.w3.org/Provider/Style/URI 😨

I'll need to think more about the last two paragraphs you write in #7. I'm not sure I fully comprehend the consequences yet.

I will already say that I am surprised that your analysis seems to confirm that there is a chicken-and-egg problem WRT users and files (on sites where users have a user picture and hence a file), and that core has not yet solved this very common scenario 🤔

quietone’s picture

I know the related issue is closed as won't fix but it provides some related history.

Version: 9.1.x-dev » 9.2.x-dev

Drupal 9.1.0-alpha1 will be released the week of October 19, 2020, which means new developments and disruptive changes should now be targeted for the 9.2.x-dev branch. For more information see the Drupal 9 minor version schedule and the Allowed changes during the Drupal 9 release cycle.

quietone’s picture

Why set all the files to be owned by uid 1 if the owner is going to be changed in a later migration?

Adding a new migration 'd7_file_depends_on_d7_user' looks like the way to solve this. A new source plugin that just gets files not owned by user 1 may be worth the effort. Then, all that is required is a new source plugin test and new Kernel migration test. All the files in the d7 fixture are owned by uid 1 so that should be updated in the Kernel test before the migration is executed.

I think d6 has the same problem, so that can be done in this issue as well.

Version: 9.2.x-dev » 9.3.x-dev

Drupal 9.2.0-alpha1 will be released the week of May 3, 2021, which means new developments and disruptive changes should now be targeted for the 9.3.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

wim leers’s picture

Assigned: Unassigned » benjifisher

@benjifisher: do you agree with @quietone's proposed plan of attack in #11?

Version: 9.3.x-dev » 9.4.x-dev

Drupal 9.3.0-rc1 was released on November 26, 2021, which means new developments and disruptive changes should now be targeted for the 9.4.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

omkar.podey’s picture

StatusFileSize
new1022 bytes

Updated patch to cover d7_file_private migration.

Version: 9.4.x-dev » 9.5.x-dev

Drupal 9.4.0-alpha1 was released on May 6, 2022, which means new developments and disruptive changes should now be targeted for the 9.5.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

mikelutz’s picture

Status: Needs work » Closed (works as designed)

If your d7 database has files owned by users that have been deleted, then the data integrity problem exists in D7, and it is not the responsibility of the core migrations to manage that or fix it.

Regarding #3 that's not an issue. The destination ids on stubs aren't random or incremental. the stub is created by running the user migration with only the uid in the source data. The destination uid is managed by the migration, i.e. the uid:uid that is used in the user migration is used for both the stub and the actual user migration. Stubs would be rather useless if they weren't creating destination ids the same way that the migration is going to when it runs.

The bigger problem here is that you would end up creating stubbed users that would never be migrated because they don't exist on D7. If I recall correctly Wim is not a big fan of this either( #3156730: Stubs should only be created if the referenced source row actually exists ) If you combine that with this one, Then you are most likely going to end up reassigning these files to anonymous. That's not a decision the migration system can make. Whether these files should be deleted, assigned to uid 0, assigned to uid 1, or left assigned to the deleted user id is a question the administrator needs to answer, but more importantly It has nothing to do with the migration. The admin could fix it in d7, fix it in d9, or ignore it, but the migration system is about duplicating the d7 content in d9 faithfully to the best of its abilities. If that is files owned by deleted users in d7, then they should be files owned by deleted users in d9.

Regarding validation, if you choose to create custom migrations that use validation to try to fix data integrity errors on D7, by all means do so, but you are responsible for the whole process of stubbing references and all that. The core migrations are designed to copy the d7 data over as-is.

anybody’s picture

@mikelutz: Totally agree with your points in #17!

In my comment here over at #3307506: "Missing migrations upgrade_d7_field_instance" because of duplicate field_name_bundle entries in D7 field_config_instance I suggested the following, where the reason is also an inconsistency in the Drupal 7 source site:

@quietone: What do you think about adding a service to precheck such (clear) known issues in the source database for Drupal 6/7 migrations?
I think what can be well done in code, should be done there and not (only) in documentation. The question is, if the migration team is interested in such a feature and if it's still worth it, looking at the D7 EOL.

Another possible approach would be to create a separate module like "migrate_upgrade_inspector" (or sth.) which checks such upgrade preconditions or even "health_inspector" which checks general known schema/data issues in Drupal 7 ( / 6) with or without migration planned.

Or simply add a check in the specific d7 migration for this?

Before touching anything, we should definitely have a plan and feedback so the work isn't done for nothing.

So would it perhaps make sense to discuss any of the suggestions in a separate issue?

We're very technical, while by far not every Drupal 6 or Drupal 7 website owner is a developer and I guess the migration is still very frustrating. With such a preflight-check module (or whatever way) we might be able to provide a solution for some of these typical issues in the (also Drupal!) source.

What do you think?