File migration needs to be able to fetch remote files (pull files directly from a Drupal 6 public file directory, say). file_unmanaged_copy() does not work with remote files - right up top it does a file_exists, which does not work with the http wrapper. So, we need to call copy() directly, just like Migrate Classic did.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

mikeryan’s picture

Status: Active » Needs review
FileSize
3.24 KB

Made this change as well as beefing up error handling. Also, the source URLs need to be urlencoded before passing to copy, added an option to turn this on/off.

@benjy, note this is a Migrate framework change, not migrate_drupal.

mikeryan’s picture

Status: Needs review » Fixed

Well, no reason not to incorporate this into ui_poc - so I did.

mikeryan’s picture

Status: Fixed » Needs review

Wrong issue, I did not commit this one.

benjy’s picture

Project: IMP » Drupal core
Version: » 8.x-dev
Component: Code » migration system
Status: Needs review » Postponed

Moving this over to the core queue since it's against the migrate API. I'm also going to postpone it until #2121299: Migrate in Core: Drupal 6 to Drupal 8 goes in so we don't break anything else.

Maybe we should have a test for a remote file as well?

mikeryan’s picture

Status: Postponed » Needs review

#2121299: Migrate in Core: Drupal 6 to Drupal 8 is in, unpostponing.

Any thoughts on testing remote files? Like, where the remote file being tested would be?

Damien Tournoud’s picture

Alternatively, what about fixing file_unmanaged_copy()? Written as it is, it is riddled with race conditions.

It is really bad to check if the file exists and then copy it, because it could have been removed in the meantime. So removing this separate file exist would kill two birds in one stone.

mikeryan’s picture

Sure, and there's also the poor error-handling: #2244513: Move the unmanaged file APIs to the file_system service (file.inc) (opened before I realized I couldn't use file_unmanaged_copy() at all in its current state).

I guess refactoring file_unmanaged_copy() should come under #2229865: [meta] Modernize File/StreamWrapper API (specifically #2050759: Move drupal_chmod and other code in file.inc to a Drupal\Core\File\FileSystem class) - not sure how much momentum that effort has, though.

benjy’s picture

Any thoughts on testing remote files? Like, where the remote file being tested would be?

I was just thinking accessing a URL over HTTP was enough so it could come from the test site?

mikeryan’s picture

Rerolled.

mikeryan’s picture

Status: Needs review » Needs work
Issue tags: +Needs tests

Status: Needs work » Needs review
benjy’s picture

I think this is RTBC for me, just a few minor doc errors:

  1. +++ b/core/modules/migrate/src/Plugin/migrate/destination/EntityFile.php
    @@ -50,14 +52,53 @@ public function import(Row $row, array $old_destination_id_values = array()) {
    +      throw new MigrateException(t('File %source could not be copied to %destination.',
    +        array('%source' => $source, '%destination' => $destination)));
    

    All on one line.

  2. +++ b/core/modules/migrate/src/Plugin/migrate/destination/EntityFile.php
    @@ -50,14 +52,53 @@ public function import(Row $row, array $old_destination_id_values = array()) {
    +   * @param $filename
    +   *
    +   * @return string
    

    Missing doc comments.

ultimike’s picture

FileSize
3.3 KB
1.13 KB

Updated the patch based on @benjy's comments.

-mike

chx’s picture

Maybe. Wouldn't it be better to patch file_unmanaged_copy instead? Add an optional flag to make it work with remote files (which would skip exists)?

sun’s picture

I'm not sure whether file_unmanaged_copy() is the right tool for job in this case, so I can get behind the idea of not using it.

FWIW, file_unmanaged_copy() is full of custom expectations, of which the vast majority applies to user-triggered UI operations on local files only. It's more or less a specialized implementation of copy() by System module, which only exists for System module (and the installer), because it has tons of hard-coded UI behavior mixed in. We need a proper reimplementation of that code, but the topic is a bit larger: #2229865: [meta] Modernize File/StreamWrapper API

  1. +++ b/core/modules/migrate/src/Plugin/migrate/destination/EntityFile.php
    @@ -50,14 +52,54 @@ public function import(Row $row, array $old_destination_id_values = array()) {
    +      $copied = copy($source, $destination);
    

    In light of speaker-deck demos like this, shouldn't there be at least some bare minimal validation that the source file is actually what we think it is?

  2. +++ b/core/modules/migrate/src/Plugin/migrate/destination/EntityFile.php
    @@ -50,14 +52,54 @@ public function import(Row $row, array $old_destination_id_values = array()) {
    +  protected function urlencode($filename) {
    ...
    +      // Actually, we don't want certain characters encoded
    +      $filename = str_replace('%3A', ':', $filename);
    +      $filename = str_replace('%3F', '?', $filename);
    +      $filename = str_replace('%26', '&', $filename);
    

    Not sure why anyone would want to specify source URLs that are not valid URLs?

    At minimum, this needs to use parse_url().

benjy’s picture

In light of speaker-deck demos like this, shouldn't there be at least some bare minimal validation that the source file is actually what we think it is?

What kind of validation are you thinking? When you're running a migration, I feel pretty confident you'd know the source files were "safe"?

Regarding the urlencode, it's copied directly from the D7 module, the relevant issues:

https://www.drupal.org/node/1849350
https://www.drupal.org/node/1780850

Are you suggesting breaking it up with parse_url(), encoding the relevant parts and then putting it back together?

ultimike’s picture

Status: Needs review » Needs work
benjy’s picture

Issue tags: +Needs reroll

@sun could you please reply to #16 so we can get this issue moving.

This issue stops the upgrade path migrating files so it would be good to push forward.

benjy’s picture

Status: Needs work » Reviewed & tested by the community
FileSize
3.29 KB

OK, i've just given this patch a final review and it looks good to me. This patch is crucial for users to start testing migrations as file migrations don't work without it. I think we need to get this in before Amsterdam where there will likely be lots of testing.

If we still want to discuss our options for updates to file_unmanaged_copy() i'd rather that happened in a follow-up.

Patch no longer applied, just a straight re-roll by git, didn't change anything.

webchick’s picture

Status: Reviewed & tested by the community » Fixed

chx checked this over and couldn't find anything untoward security-wise. I agree we want to remove as many blockers as we can prior to the Amsterdam sprints.

Committed and pushed to 8.x. Thanks!

  • webchick committed 3149c4b on 8.0.x
    Issue #2244555 by benjy, ultimike, mikeryan: Use copy() directly instead...

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.

chx’s picture

Status: Closed (fixed) » Active

This needs to be rolled back as it is a huge hack. Either fix file_unmanaged_copy or separate downloads but this solution is not acceptable.

chx’s picture

Status: Active » Closed (fixed)

Nevermind.