Support for Drupal 7 is ending on 5 January 2025—it’s time to migrate to Drupal 10! Learn about the many benefits of Drupal 10 and find migration tools in our resource center.
File migration needs to be able to fetch remote files (pull files directly from a Drupal 6 public file directory, say). file_unmanaged_copy() does not work with remote files - right up top it does a file_exists, which does not work with the http wrapper. So, we need to call copy() directly, just like Migrate Classic did.
Comment | File | Size | Author |
---|---|---|---|
#19 | 2244555-19.patch | 3.29 KB | benjy |
#13 | interdiff.txt | 1.13 KB | ultimike |
#13 | 2244555-13.patch | 3.3 KB | ultimike |
Comments
Comment #1
mikeryanMade this change as well as beefing up error handling. Also, the source URLs need to be urlencoded before passing to copy, added an option to turn this on/off.
@benjy, note this is a Migrate framework change, not migrate_drupal.
Comment #2
mikeryanWell, no reason not to incorporate this into ui_poc - so I did.
Comment #3
mikeryanWrong issue, I did not commit this one.
Comment #4
benjy CreditAttribution: benjy commentedMoving this over to the core queue since it's against the migrate API. I'm also going to postpone it until #2121299: Migrate in Core: Drupal 6 to Drupal 8 goes in so we don't break anything else.
Maybe we should have a test for a remote file as well?
Comment #5
mikeryan#2121299: Migrate in Core: Drupal 6 to Drupal 8 is in, unpostponing.
Any thoughts on testing remote files? Like, where the remote file being tested would be?
Comment #6
Damien Tournoud CreditAttribution: Damien Tournoud commentedAlternatively, what about fixing
file_unmanaged_copy()
? Written as it is, it is riddled with race conditions.It is really bad to check if the file exists and then copy it, because it could have been removed in the meantime. So removing this separate file exist would kill two birds in one stone.
Comment #7
mikeryanSure, and there's also the poor error-handling: #2244513: Move the unmanaged file APIs to the file_system service (file.inc) (opened before I realized I couldn't use file_unmanaged_copy() at all in its current state).
I guess refactoring file_unmanaged_copy() should come under #2229865: [meta] Modernize File/StreamWrapper API (specifically #2050759: Move drupal_chmod and other code in file.inc to a Drupal\Core\File\FileSystem class) - not sure how much momentum that effort has, though.
Comment #8
benjy CreditAttribution: benjy commentedI was just thinking accessing a URL over HTTP was enough so it could come from the test site?
Comment #9
mikeryanRerolled.
Comment #10
mikeryanComment #12
benjy CreditAttribution: benjy commentedI think this is RTBC for me, just a few minor doc errors:
All on one line.
Missing doc comments.
Comment #13
ultimikeUpdated the patch based on @benjy's comments.
-mike
Comment #14
chx CreditAttribution: chx commentedMaybe. Wouldn't it be better to patch file_unmanaged_copy instead? Add an optional flag to make it work with remote files (which would skip exists)?
Comment #15
sunI'm not sure whether
file_unmanaged_copy()
is the right tool for job in this case, so I can get behind the idea of not using it.FWIW,
file_unmanaged_copy()
is full of custom expectations, of which the vast majority applies to user-triggered UI operations on local files only. It's more or less a specialized implementation ofcopy()
by System module, which only exists for System module (and the installer), because it has tons of hard-coded UI behavior mixed in. We need a proper reimplementation of that code, but the topic is a bit larger: #2229865: [meta] Modernize File/StreamWrapper APIIn light of speaker-deck demos like this, shouldn't there be at least some bare minimal validation that the source file is actually what we think it is?
Not sure why anyone would want to specify source URLs that are not valid URLs?
At minimum, this needs to use
parse_url()
.Comment #16
benjy CreditAttribution: benjy commentedWhat kind of validation are you thinking? When you're running a migration, I feel pretty confident you'd know the source files were "safe"?
Regarding the urlencode, it's copied directly from the D7 module, the relevant issues:
https://www.drupal.org/node/1849350
https://www.drupal.org/node/1780850
Are you suggesting breaking it up with parse_url(), encoding the relevant parts and then putting it back together?
Comment #17
ultimikeComment #18
benjy CreditAttribution: benjy commented@sun could you please reply to #16 so we can get this issue moving.
This issue stops the upgrade path migrating files so it would be good to push forward.
Comment #19
benjy CreditAttribution: benjy commentedOK, i've just given this patch a final review and it looks good to me. This patch is crucial for users to start testing migrations as file migrations don't work without it. I think we need to get this in before Amsterdam where there will likely be lots of testing.
If we still want to discuss our options for updates to file_unmanaged_copy() i'd rather that happened in a follow-up.
Patch no longer applied, just a straight re-roll by git, didn't change anything.
Comment #20
webchickchx checked this over and couldn't find anything untoward security-wise. I agree we want to remove as many blockers as we can prior to the Amsterdam sprints.
Committed and pushed to 8.x. Thanks!
Comment #23
chx CreditAttribution: chx at Migrate Rocks commentedThis needs to be rolled back as it is a huge hack. Either fix file_unmanaged_copy or separate downloads but this solution is not acceptable.
Comment #24
chx CreditAttribution: chx at Migrate Rocks commentedNevermind.