Problem/Motivation
The EntityFile currently does:
$file = $row->getSourceProperty($this->configuration['source_path_property']);
This is wrong, the destinations should never look at the source directly. Its also a complicated and opinionated destination, if you already have the files in the right place for example, its quite tricky to use. Also, it is the only destination plugin which contains a preg_replace
clearly showing it's more process than destination.
Proposed resolution
- Create a FileCopy process plugin to handle most of the move/copy logic in the destination.
- Remove the urlencode feature, this can be done in the source if you need it.
- Remove destination_path_property from EntityFile, it doesn't need to be configurable, if you need that, write a new destination.
- Remove source_path_property from EntityFile, no longer needed.
- Remove source_base_path from EntityFile, no longer needed.
- Fix file_umanaged_copy() to support remote file, new issue?
We can now use the process plugin like so:
source_full_path:
plugin: concat
delimiter: /
source:
- constants/source_base_path
- source_relative_path
uri:
plugin: file_copy
source: @source_full_path
Remaining tasks
Write patch
User interface changes
n/a
API changes
Yes, if you were using the destination in a custom migration, you may need to now use the process plugin in your migration yml.
Data model changes
n/a
Comment | File | Size | Author |
---|---|---|---|
#183 | refactor_entityfile_and-2695297-183.patch | 52.38 KB | mikeryan |
#183 | interdiff.txt | 1.67 KB | mikeryan |
#176 | interdiff.txt | 7.29 KB | mikeryan |
#176 | refactor_entityfile_and-2695297-176.patch | 52.26 KB | mikeryan |
#168 | refactor_entityfile_and-2695297-168.patch | 54.29 KB | ohthehugemanatee |
Comments
Comment #2
benjy CreditAttribution: benjy at PreviousNext commentedComment #3
benjy CreditAttribution: benjy at PreviousNext commentedComment #4
heddnFrom the proposed solution, #6 should be punted. File streams in PHP are notoriously hard and if someone needs to do that to s3 or some other remote file system, they are probably going to need to write something custom. So give them the basic building blocks and let them build it themselves.
Comment #5
vasi CreditAttribution: vasi at Evolving Web commentedI'm not sure I like (all of) this idea.
I agree that the destination shouldn't use anything from the source. If the destination needs info, it should be mapped.
I don't really want to lose the destination_path_property, I think that's useful.
More importantly, I think anything that we might want to rollback should be done by the destination, not a process. I don't like the idea that the process could download the file, and then if the destination failed for some reason, we'd have a stray file sitting around, with no way to roll it back.
Comment #6
benjy CreditAttribution: benjy at PreviousNext commentedWell we could keep it although this is Drupal's EntityFile, i'm not sure it's needed. Also, we maybe be able to make it so you can just map to a new property and work it out from there once we refactor into a process plugin.
The destination would delete the file entity which in turn should clean-up the file.
Comment #7
vasi CreditAttribution: vasi at Evolving Web commentedYes, I like the idea of making destination_path_property a property that can be mapped to. That's fine.
That's the "if everything is as expected" situation, so of course it works. But migrate should be robust. For example, a subsequent process could throw a skip row exception—now we just have a junk file sitting around, from a row we thought we skipped.
Comment #8
mikeryanComment #9
mikeryanComment #10
mikeryanWorking on this at the NOLA sprint, and I have a file_copy process plugin working with d6_file thusly:
The EntityFile destination plugin is reduced to getEntity() and processStubRow() implementations. Feeling good about this... Remaining:
I'll post a patch once I've got #1 & 2.
Comment #11
mikeryanOK, here it is, we'll let the testbot show us where tests need to be updated.
Next steps:
All good stuff for DrupalCon sprinters to tackle...
Comment #13
benjy CreditAttribution: benjy at PreviousNext commentedI think this is looking pretty good, lets try get it green then i'll do a full review.
Comment #14
quietone CreditAttribution: quietone as a volunteer commentedComment #15
quietone CreditAttribution: quietone as a volunteer commentedMeant to preview, but found save instead.
Here's the interdiff for the above patch. The D7 template wasn't using the source_base_path, so I copied it from the D6 version.
Comment #16
quietone CreditAttribution: quietone as a volunteer commentedDidn't know the test would not start on #14. So, uploading again for testing.
Comment #18
mikeryanWorking on fixing the tests now - picked up a couple of issues I'm also dealing with here.
Remove, urlencode is handled by its own plugin.
Remove from file_copy plugin, is now its own plugin.
Comment #19
mikeryanProgress on the tests - added a unit test for the urlencode plugin, and transmogrified the EntityFile destination plugin test (which was only testing file-copying functionality) into a CopyFile process plugin test. I think there's more downstream to fix, but let's let the testbot tell us where...
Comment #21
mikeryanThis fixes some of the test failures - I'm not currently seeing what's triggering the others.
Comment #23
vasi CreditAttribution: vasi at Evolving Web commentedThe d7\MigrateFileTest failure is because we no longer need to append a slash to source_base_path. Now the concat has a delimiter, so that adds a slash. We potentially end up with THREE slashes, which vfsRoot can't handle.
This shouldn't affect the behavior of what the user types into MigrateUpgradeForm, because that migration won't be using vfsRoot.
Comment #24
vasi CreditAttribution: vasi at Evolving Web commentedComment #25
vasi CreditAttribution: vasi at Evolving Web commentedI made some progress on the remaining failures:
* The file migrations require that source_base_path is set. But d6/MigrateFileTest does two migrations, only the first one sets that. Should be fixed by using a method to setup the migration, and calling it each time it's needed.
* Setting an empty source_base_path doesn't work anymore, because we no longer just append filename, but concat with a slash delimiter. So we get
implode('/', ['', 'foo.jpg']) => '/foo.jpg'
, when we really wanted ./foo.jpg . Fixed by using $this->root as the source_base_path.But the above fix breaks tmp migrations. Because now we're doing
implode('/', ['/path/to/drupal/root', '/tmp/foo'']) => '/path/to/drupal/root/tmp/foo'
. We really want that to stay just '/tmp/foo'.I'm thinking that we can't rely on concat to build paths. Either we need a custom plugin to handle path concatenation (with correct behavior for absolute/relative paths). Or we need some sort of interesting process steps to get this right.
Comment #29
quietone CreditAttribution: quietone as a volunteer commentedThis has a new process plugin for the d6 file source path. This fixed /d6/MigrateFileTest.php locally. That just leaves the /d6/MigrateUserPictureFileTest.php to fix.
Comment #31
mikeryanOK, I see we need to update d6_user_picture_file.yml for the new regime, looking at that...
Comment #32
mikeryanOK, this fixes the user picture file error. One error remaining, around fid 6 in the database fixture.
And the filename and filepath are modified in migrateDumpAlter() to a randomly-generated name. Need a little debugging to see why that temporary file is no longer migrated...
Comment #34
vasi CreditAttribution: vasi at Evolving Web commentedI'm not sure the logic in d6_file_source is correct. Tried to fix it up, and explained what it's doing.
Comment #36
vasi CreditAttribution: vasi at Evolving Web commentedForgot to make sure source_base_path was set on all test migrations that need it.
Comment #37
vasi CreditAttribution: vasi at Evolving Web commentedIt looks like the failing file in the migrate_drupal_ui test is the temp file. I don't really know how that file is supposed to have gotten in /tmp in the first place, how did it ever work before?
Comment #38
quietone CreditAttribution: quietone as a volunteer commentedyea, I had an error in the process plugin. But don't we still need to check for an empty source_base_path? Otherwise we return "/$filepath".
Comment #40
vasi CreditAttribution: vasi at Evolving Web commentedOoh, that's a good point. Ok, I'll update it.
Comment #41
vasi CreditAttribution: vasi at Evolving Web commentedOk, I think this one should work.
Comment #42
quietone CreditAttribution: quietone as a volunteer commentedYay!
I could be wrong, but perhaps because the destination plugin was created with this configuration
And then it set the source as
$source = $this->configuration['source_base_path'] . $file;
So /tmp would not be have anything prepended.
Comment #43
mikeryanmigrate_dump_alter() creates the temp file.
So, this is great - now we need a proper review by someone who didn't contribute to the implementation.
Comment #44
benjifisheruse
statements are still there. It looks to me as though the only ones you need are these:Is the comment on the EntityFile class still needed?
Indentation error
This description seems inadequate. This plugin handles copying the file.
Can we remove the parameter (and its code comment)?
indentation error
I think that "Determines" would be clearer than "Returns".
Insert a blank line after the short description. Fix indentation of the @return comment.
I think that "Determines" would be clearer than "Returns".
That is all for more, but I am still reviewing.
Comment #45
Anonymous (not verified) CreditAttribution: Anonymous commentedThis issue will unblock #2505283: Handle import of private files.
Comment #46
mikeryanThanks #benjifisher, I'll update the patch and respond.
Comment #47
mikeryanRemoved obsolete uses.
The comment really isn't right to begin with, and generally we don't have comments on the destination entity extensions, so removing.
Fixed.
Copypasta, fixed.
Yes we can!
Fixed.
Done.
Done.
Awaiting further feedback before rolling a new patch.
Comment #48
benjifisherJust a few more formatting complaints from CodeSniffer:
Add a space before the opening brace on the
class
line.Add ending punctuation to the code comments. Ditto for the comment on Line 30.
missing parameter type
Comment #49
mikeryanLast couple of comments addressed, here we go.
Comment #50
benjifisherI checked the interdiff in #49, and it resolves all the issues I pointed out. According to the testbot, we did not break anything.
I did not test the functionality, so I can only claim the "R" part of RTBC.
Comment #51
quietone CreditAttribution: quietone as a volunteer commentedJust ran a D6->D8 upgrade via the UI. The D6 site was devel generated with about 210 files (file fields, uploads and images) and the migration was successful.
I did notice that the D8 database is cluttered with 765 migrate_map_'hash' and migrate_message_'hash' tables. I checked a few tables and found that they are empty and all the map tables do not have any destination ids. Repeating the migration without the patch results in the same 'extra' map and message tables being generated. So, it is not related to this patch, but why some many?
Comment #52
quietone CreditAttribution: quietone as a volunteer commentedAnd same results as above for D7->D8.
Comment #53
mikeryan@quietone - map/message tables with the generated hash names are due to #2575135: Dummy map/message tables being created. Although, my D6 site generates 9 of each table, 765 seems a bit... much.
Comment #54
benjy CreditAttribution: benjy at PreviousNext commentedLooking at the logic in the d6_file_source plugin, it is not clear to me why that logic would be D6 specific and for D7 we can just use concat?
I don't think 'uri' should be hardcoded in the destination, this is the only thing stopping someone using this destination with a custom entity that doesn't use 'uri' ?
We return early if it's unchanged, that's before we check if the user wanted to overwrite the file? Looks like it could be pre-existing though.
I know this is pre-existing but the logic is pretty hard to follow, maybe we could simplify it, something like this (not tested):
Comment #55
vasi CreditAttribution: vasi at Evolving Web commentedFor D6, there are some tests that pass an empty source_base_path and a relative filepath. I don't know whether this is just a testing artifact, or something that's likely to actually happen.
Comment #56
heddnI'd like to do a quick review sometime in the next day or two to see if this supports what is outlined in #2706405: File migrations *always* copy in files, even if the files were off-line copied to destination, or if that should come in as a follow-up issue.
Comment #57
heddnAlthough an important thing, I won't mark this as needs work. We can re-add 'preserve_files' back in under #2706405: File migrations *always* copy in files, even if the files were off-line copied to destination after this gets committed. Other feedback below. Marking needs work so we can respond to some of this feedback.
Nit: Proper camel case is UrlEncode
This seems fragile. Is there really no other way around this?
And if we camel case the class under test, then this should too.
Comment #58
mikeryanDamn, you sent me down a rathole here... :P. So, this was introduced based on #25 above, and one aspect mentioned there was handling temp files (which might be something like /tmp/... - i.e., absolute paths which shouldn't have the source_base_path prepended). Looking at how D6 temp files are handled:
Well, that's out of scope and should be a follow-up issue... In the context of the present issue, though, I would say that since migration of temporary files is totally broken now and probably should be removed anyway, this patch should not worry about them.
So, the main thing the d6_file_source plugin is doing is accommodating an empty source_base_path, which really doesn't make sense if you actually want to copy the files - it would only work if the filepath coming in is an absolute local path (as it would be for the temp files) and the source and destination sites are on the same server, which certainly should not be the default assumption. So, I think we can leave off this custom plugin and just concat? Not changing yet in this patch - @vasi, @quietone, am I missing something?
This is the file entity destination, and the file entity's URI value is always in the 'uri' column, right? Are you thinking of some custom file/media entity which extends this entity class, in which case it could override this... Maybe to accommodate that 'uri' would be a class constant the deriving entity could override?
Yes, that's the existing logic. Hmm, right, that would just render the overwriteMode() totally irrelevant - this code would appear to never do anything with the file if it already exists (although heddn suggests in practice it doesn't work that way). Let's call fixing this here out of scope, and when we followup on #2706405: File migrations *always* copy in files, even if the files were off-line copied to destination we can make sure all three responses to an existing destination file (replace, rename, preserve) work as they should.
Done.
Done.
We need to urlencode the path components, yet preserve the path syntax. Seems like a common-enough thing to want that there should be a standard mechanism, but I don't know what that is. Anyway, changing this logic is not in the scope of this issue.
Done.
Comment #59
vasi CreditAttribution: vasi at Evolving Web commentedI concur that the d6_file_source is extremely weird. I can't think of any normal situation where one would want an empty source_base_path—the tests that do this should be changed. I do think we will run into lots of annoying-to-fix tests, though. Eg: The VFS mock filesystem deals fine with paths containing two consecutive slashes, but NOT three slashes. Ugh.
We should probably document what makes a valid source_base_path and/or filepath. Eg: Can they be empty? Start with slash? End with slash? etc.
Comment #60
mikeryan#2729369: Remove support for migrating temporary files added.
Comment #61
mikeryanThis version of the patch:
If that cheat isn't acceptable, maybe we should do #2729369: Remove support for migrating temporary files first to get rid of the temporary file junk, then this patch will be clean of it.
Comment #63
mikeryanMissed one d6_file_source...
Comment #64
quietone CreditAttribution: quietone as a volunteer commented@mikeryan, thanks for delving in that rabbit hole. I don't think you missed anything. And if temp files don't need to be migrated, then the d6 and d7 migrations are similar and both are easier to understand. Thats a positive for me.
Found one small copy/paste error.
Duplicated line.
Comment #65
mikeryanNit fixed, thanks!
Comment #66
quietone CreditAttribution: quietone as a volunteer commentedTested this yesterday and today by migrating a D6 and a D7 site, both devel generated. It worked as expected. And the D6 site has a temp file and that did not migrate.
Comment #67
phenaproximaAssigning to myself for review.
Comment #68
phenaproximaOverall, I love this approach. It makes so much more sense for plugins to do one thing, and do it well. And this embraces that philosophy beautifully. Bravo!
The patch is pretty good so far, but I did find some things...
IMHO, there should be a sanity check to ensure that $field_definitions[static::URI_COLUMN] actually exists.
Both of these properties need a description in the doc comment.
Can't inherit constructor docs :)
Why are we doing this twice? Shouldn't we try to prepare the directory first, then call $this->writeFile() once?
Can this be made more explicit, just for clarity's sake? (if strpos() > 0)
What's the difference between rawurlencode() and urlencode()? A comment here would be helpful.
str_replace() can take arrays as arguments, so we don't have to call it three times here. Yay, micro-optimization!
Should be @inheritdoc.
Nit: \Drupal::getContainer() can be $this->container. Also, won't the public stream wrapper be automatically registered in a kernel test?
Because this is a PHPUnit-based kernel test, there's no need to loop over the result of $this->localFileDataCopyProvider(). Better to use the data provider functionality provided by PHPUnit -- unless it's hugely slowing the test down, since that *would* reinitialize the test before every set of data.
Ideally, it'd be preferable to use vfsStream for this, since file system mocking is exactly what it's for. But, no urgent need for that if it's a huge pain in the ass.
Same thing here about using the data provider pattern the way it's supposed to be used :)
Let's use the @expectedException annotation for this test, rather than catching and asserting the exception manually.
I think there is an assertFileExists() or similar that we can use here, rather than assertTrue().
Let's use $this->container instead of \Drupal::getContainer().
This looks like a simple utility method. Why is it making an assertion?
This test might benefit from the data provider pattern that PHPUnit implements. Not urgent, though.
What's up with this? It at least needs a comment explaining why it's empty.
Comment #69
mikeryanI don't think there's a need for runtime sanity-checking here - the only reason the field definition would not exist is if someone extended the destination plugin for their own file entity variant and mucked up the URI_COLUMN, which the ensuing errors would quickly inform them of. I don't see anything in core doing this sanity-check (parallel example in EntityUser).
Done.
Done (although you might want to take a look at most of our other process plugins...
You might want to check with the dev that added this:P. I presume that the reason is performance - not wanting to call file_prepare_directory() on every single file being migrated. This way, file_prepare_directory() is only called at most once per distinct directory on the destination side (and the second writeFile() is only called at those times). Added a comment to that effect.
Done.
rawurlencode() follows RFC 3986, unlike urlencode which converts spaces to plus-signs. Added comment.
Done.
Done.
Done.
Haven't tried without it, but I'm following the example of MigrateFileTest here.
Will look at this.
Will look at this.
Will look at this.
Done.
Done.
Done.
The assertion is performed for each test that calls this method.
Enough else to do, skipping.
Comment added.
Comment #70
mikeryanDidn't have much luck with vfsStream, let's at least see how the rest of the changes fare.
Unforunately, the previous patch doesn't apply any more, which makes interdiff unhappy.
Comment #72
mikeryanManaged to get an interdiff, now let's see what broke the tests...
Comment #73
mikeryanSo, the problem there was that the data providers are referencing $this->root, but it appears that data providers are not executed in the context of the test class itself - $this->root is empty, screwing everything up. Presumably this is why the original test was iterating the provider explicitly, so restoring to that.
Interdiff is against the reviewed patch.
Comment #74
mikeryanNote that I've added a change record.
Comment #75
benjy CreditAttribution: benjy at PreviousNext commentedThis looks like it is about ready to me, just a few nitpicks on code style and some extra comments.
Lets drop the else { on the first block so this code is a consistent style
Same here.
Do we know why this is the case?
Extra white space.
Comment #76
Sharique CreditAttribution: Sharique as a volunteer commentedUpdated patch, addressed 1,2 and 4.
Comment #77
benjy CreditAttribution: benjy at PreviousNext commentedI still see one here. I think this patch is RTBC for me, would be good to get a comment on the URL encode stuff though.
Comment #78
Sharique CreditAttribution: Sharique as a volunteer commentedRemoved that also.
Comment #79
benjy CreditAttribution: benjy at PreviousNext commentedRest looks good, some more manual testing would be great but the best way for that is probably to have it committed.
Comment #80
mikeryanThe "certain characters" are part of the URL syntax - ':', '?', '&' - which must remain for the URL to actually work.
Comment #81
mikeryanClarified the comment.
Comment #82
mikeryanOops, ignore the interdiff in the last comment.
Comment #83
mikeryanComment #84
ressa CreditAttribution: ressa as a volunteer commentedI would like to test this, is there an example somewhere? I found this one, but it doesn't seem to work with 8.1...
Comment #85
vasi CreditAttribution: vasi at Evolving Web commentedHi @ressa, that blog post was for 8.0, and wasn't quite related to this. You can look at the new d6_file and d7_file migrations in the patch for an example.
Comment #86
mikeryan@ressa - Also see the related change record.
Comment #87
mikeryanBlocker for #2560795: Source plugins have a hidden dependency on migrate_drupal.
Comment #88
mikeryanA note - this has no BC layer, so should only be committed to 8.2.x.
Comment #90
phenaproximaUm...no it didn't.
Comment #91
alexpottA few things:
Is there any equivalent test - this says it is testing a remote path to local - are we doing that now?
Comment #92
mikeryanLogically, transformations of source data into the proper form for saving should be done in the process pipeline, while the destination plugin should focus on creating the resulting entity (or whatever) in Drupal. It's precisely the point of this patch to render unto the destination what is the destination's.
In IRC you said "we only used to do this change to the source url when writing - now we change the source for everything" - the source itself is not changed, only the value that gets assigned to the destination is affected by applying the urlencode plugin, and no other uses of the source value are affected (getSourceProperty('uri') will not get the urlencoded version).
I'll add a comment describing the use case (it's needed if the URL will be invoked by the migration process, if it's not already encoded).
True - while the only migrations in core that are using it are for file entities, there could be custom file/media-handling use cases that might make use of this, as well as the file_copy plugin. Perhaps they should move to migrate?
It said it was testing a remote path, but it wasn't and the test was redundant so I took it out. Is there a good way to test accessing remote (http://) files? I recall in the context of another issue attempting to make such a test and giving up in frustration.
Now, my own comment... On my current project I had need of that file_copy plugin (see #2635622: Process plugin for importing/creating files by URL) to create and map file entities within the context of an image field mapping in a node migration. When the source file was missing, the MigrateException thrown by the file_copy plugin threw out the entire node, which was not desirable. Indeed, my general feeling is that process plugins should at worst throw MigrateSkipProcessException (throwing out only the field they are populating, not the entire entity they're migrating), and it was an oversight on my part to blindly copy the MigrateException throws (which made sense at the entity destination level). I will replace them with MigrateSkipProcessException.
Comment #93
chx CreditAttribution: chx at Smartsheet commentedThis should be split and remote file access separately supported via guzzle. All this is just fumbling around a known and solved problem. Bonus: we can add proxy support, authentication and everything else needed.
Comment #94
mikeryanNew patch reflecting #92.
@chx: Can you describe a bit more how you would visualize that?
I have to say, trying hard to stabilize the Migrate API for 8.2.x, I'm reluctant to take on any more ambitious refactoring than what we already have in progress...
Comment #95
alexpott@mikeryan
confuses me because the in HEAD it is happening to the source value...
Comment #97
chx CreditAttribution: chx at Smartsheet commentedWell, let's consider we are migrating core here and core does not even support remote storage in the very first place. It supports public:// and private:// and that's what we need to. Indeed the right course would've been to won't fix #2244555: Use copy() directly instead of file_unmanaged_copy() because the very premise of that issue was wrong:
Nope, it doesn't. The migrate module needs to have support for remote but migrate_drupal doesn't. So we should remove urlencode entirely from this patch.
Then in a followup we should introduce the file download plugin wrapping a Guzzle HTTP request.
Comment #98
mikeryanI've fixed a couple of test problems but it looks like my change from MigrateException to MigrateSkipProcessException is wrong when used in file entity migrations as in core (although my contrib use case will need MigrateSkipProcessException). So, I think file_copy needs an option to choose how heavy an exception to throw. Thinking about that...
That would be a regression - right now we support a remote source_base_path for upgrades. Every time I test D6 migration on my site, I pass http://mikeryan.name/. This may not a good idea for sites with a large volume of files, or where the source server is distant network-wise, but is more than adequate for your typical personal blog site, and I don't want to require people to have to copy their files to the destination server by scp or similar means.
Comment #99
chx CreditAttribution: chx at Smartsheet commented> I pass http://mikeryan.name/.
Whatever you pass that to should change the migration to use the download plugin.
Comment #100
mikeryanBacking off on the exception changes - MigrateException is right for the core upgrade use case, I'll open a follow-up for supporting a skip-process option for other use cases.
This needed a reroll for this morning's commits, interdiff couldn't apply last patch. Removed a redundant test (when I moved the copy plugin and its test from file to migrate, the original test also remained under file).
Comment #101
mikeryanHrrm, how did I manage that? Here's the patch...
Comment #102
mikeryanFollow-up #2766369: Add skip-method option to file_copy and download process plugins opened.
Comment #104
mikeryanBotched test namespaces... Plus, there was an odd unrelated failure in there, let's see how we do this time...
Comment #106
mikeryanLooks like a random test failure, requeued.
Comment #107
chx CreditAttribution: chx at Smartsheet commented> That would be a regression - right now we support a remote source_base_path for upgrades.
That only works because of this very problematic urlencode hack. Clean up and re-add this feature in a followup.
Comment #108
phenaproximaI'd like this to say something like "Every migration that references files migrated from a Drupal 6 site should specify this migration as an optional dependency." Otherwise it sounds like *any* migration that touches files in any way needs to be dependent on d6_file.
s/uris/URIs. Should also include something like "See source_full_path configuration in this migration's process pipeline", so that people can look at an example.
Let's do here as in the d6_file migration.
Not a big deal, but why is this a constant? Could it perhaps be a default value in $this->configuration instead?
s/uri/URI
DRUPAL_ROOT is deprecated, we are supposed to be using \Drupal::root() now.
Needs @inheritdoc.
Ideally this should take advantage of PHPUnit's @dataProvider pattern (I think it will result in better error reporting), but that's just a nice-to-have.
s/Url/URL
s/url/URL
Can there be a little more explanation here? What should the input values be, what will be returned, what error conditions are there, etc.
Exception messages are not supposed to end with periods.
Should we maybe use the FILE_MODIFY_PERMISSIONS flag here as well (and adjust the exception message accordingly)?
Let's remove the period from the end of the message.
Maybe we should sanity-check that $value is, in fact, a string.
Why are we doing explode() here? It seems like that might produce unexpected consequences with certain inputs. Can we use parse_url() instead? That gives us the added benefit of being able to error if the input value cannot be parsed as a URL.
$migration should be type hinted.
Why not use the @dataProvider annotation on this method? There's no need to call the data provider directly.
Given the assertion messages later on, I'm a little unclear what $expected is supposed to represent.
We should test what happens with remote URIs.
Use @dataProvider rather than calling the data provider method directly.
Needs @inheritdoc.
Not a big deal, but this method should be using the @dataProvider pattern.
s/uris/URIs
Again, let's adjust this comment so it's clear that only migrations which reference files *imported from a D6 site* depend on this migration.
Comment #109
ohthehugemanatee CreditAttribution: ohthehugemanatee at Amazee Labs commentedminor tweak for compatibility with current 8.2.x . Will follow up to address @phenaproxima's comments, but this way it's clearer
Comment #110
ohthehugemanatee CreditAttribution: ohthehugemanatee at Amazee Labs commentedTried to take care of the easier feedback items.
1-3 : adjusted comments for clarity.
4 : Left it as a constant for now. I'm also interested in the reasoning.
5-7: applied the suggested changes.
8 : used @dataProvider, and reused the comments as value array keys.
9-10: applied the suggested changes.
11: Left it alone for now, but agreed this needs more clarity.
12-15: applied the suggested changes.
16: Switched to parse_url(), but we still need to explode on forward slashes so we're only url encoding path components.
17-18: applied the suggested changes.
19: Given that $expected was always always TRUE, I just removed it entirely and used assertTrue instead.
20: Still needs to be written.
21-25: applied the suggested changes.
STILL TO DO:
Comment #111
phenaproximaThanks for snapping up this patch, @ohthehugemanatee!
The constants need to be bitwise-ORed, not added.
Let's change this to: "Could not create or write to directory '$dir'"
Comment #113
ohthehugemanatee CreditAttribution: ohthehugemanatee at Amazee Labs commented1. Bitwise operator fixed
2. Changed the wording
Also tried to make it pass testing:
* I guess pecl_http is in my environment, but not the test environment. :| Replaced http_build_url with GuzzleHttp\Psr7\Uri::fromParts() .
* removed the explode() logic from URL handling entirely.
* fixed special character handling.
Comment #114
ohthehugemanatee CreditAttribution: ohthehugemanatee at Amazee Labs commented- whoops, missed that exception text change.
- added a remote file test
I had another look at the two "to do" items.
- const vs config: replaced the const with a config item that has a default value. It's pretty rare that someone will need a custom column, but hey let's support it.
- core/modules/migrate/src/Plugin/migrate/process/FileCopy.php "Can there be a little more explanation here?": Looking at all the other process plugins we provide, none of the others provide more explanation than this. If we want to raise the bar for how much info to include in process plugin docblocks, that's another ticket IMO.
If this passes tests, this is ready for review.
Comment #115
ohthehugemanatee CreditAttribution: ohthehugemanatee at Amazee Labs commentedle sigh - caught some other 8.2.x commits in my last patch. Fixed in this version.
Comment #117
ohthehugemanatee CreditAttribution: ohthehugemanatee at Amazee Labs commenteddouble le sigh - attaching a valid patchfile this time.
Comment #119
quietone CreditAttribution: quietone as a volunteer commentedJust want to point out that there is an issue proposing to put the documentation for the process plugins in the codebase #2776179: [meta] Add process plugin documentation to the codebase.
Comment #121
ohthehugemanatee CreditAttribution: ohthehugemanatee at Amazee Labs commented* Fixed the ipmlementation of @dataprovider for testSuccessfulMoves
* moved remote file copy into its own test
I don't know why it's having permission trouble, though. I get a different permission error on my local (Drupal\Tests\migrate\Unit\process\CopyFileTest::testSuccessfulCopies with data set "local to local copy" ('/core/modules/simpletest/file...st.jpg', 'public://file1.jpg') PHPUnit_Framework_Exception: sh: 1: : Permission denied). But when I pull the code out and run it outside of the test context, it works fine. Ideas?
Comment #123
ohthehugemanatee CreditAttribution: ohthehugemanatee at Amazee Labs commentedRefactored CopyFile test to extend FileTestBase, because they have convenient file creation and setup methods. Not sure if this is bad form or not, but they're Doing It Right and I want some of that.
anyway, i'm having trouble with phpunit getting "permission denied" for specific tests on my local setup... even when the tests are empty. So sorry for the spam, testbot, but I need you to run this for me.
Comment #126
mikeryanThe column name business comes from https://www.drupal.org/node/2695297#comment-11192565 point #2. It's a hypothetical and unlikely-in-the-real-world edge case and I don't think it's worth having configuration for it - let's just drop it and hardcode uri.
Comment #128
quietone CreditAttribution: quietone as a volunteer commentedI've been finding these interdiffs hard to read. So, I made myself a few and the most useful is between #109 and #123. Maybe it will help someone else. And the difference between #103 and #108 was only in the migration yml files, removing the 'provider: migrate_drupal' statements.
s/Url/URL
s/Url/URL
s/filesystem/fileSystem. But the test still fails.
Comment #129
mikeryan@quietone: Thanks for the interdiff! That helped me get caught up on the changes since I left...
Having trouble with tests locally (insisting the temporary stream wrapper isn't there, and not finding t() in FileTestBase), so letting the testbot try this iteration.
Comment #130
mikeryanYep, helps to actually upload the files...
Comment #131
mikeryanJust noticed extra indentation here - let's fix this on the next iteration.
Comment #134
mikeryanOK, the error here is the same one I saw locally running CopyFileTest via run-tests.sh:
FileTestBase extends Drupal\KernelTests\KernelTestBase. Note that Drupal\simpletest\KernelTestBase registers the temporary stream wrapper in setUp():
registerStreamWrapper() is part of the simpletest KernelTestBase, but not the Drupal KernelTestBase. From that registerStreamWrapper() implementation, I added
to CopyFileTest::setUp(), but I'm still getting the complaint about the temporary stream wrapper not existing. I don't see what I'm missing, but here's the patch using this...
Digression: StreamWrapperManager::registerWrapper() begins with:
So, you call registerWrapper() with a scheme that's already registered, and it unregisters it? Surprise!
Comment #135
mikeryanOh, never mind, I see that it unregisters so it can reregister with (potentially) new info...
Comment #138
ohthehugemanatee CreditAttribution: ohthehugemanatee at Amazee Labs commentedThanks @mikeryan and @quietone!
The remaining issue is
PHP Fatal error: Call to undefined function Drupal\KernelTests\Core\File\t() in /var/www/html/core/tests/Drupal/KernelTests/Core/File/FileTestBase.php on line 198
The offending line is from FileTestBase::createUri() :
I'm not sure what to do here.
* Why would that line fail here, but pass during the File tests? I don't see other children of FileTestBase doing anything special to make t() available.
* Why is FileTestBase using procedural t() in the first place? AFAIK it should be using the translation service, or StringTranslationTrait... but correcting FileTestBase is out of scope for this task.
* FileTestBase::createUri() is calling assertTrue() wrong anyway. There should only be 2 parameters, not 3... but correcting FileTestBase is out of scope for this task.
* FileTestBase::createUri() is the only example I can find in core of using string translation in a test result message. So I don't think this is required. But correcting FileTestBase is out of scope for this task.
We could re-implement createUri() on CopyFileTest, using the translation service. It would mean we can use the translation service for all our other messages. But that seems unnecessary, and I don't see other core tests using string translation for messages.
Attached a patch that removes string translation from that line in FileTestBase::createUri(). There are a couple of other instances of t() in there, but not in methods we use. Is this the Right Way to do this?
Comment #139
ohthehugemanatee CreditAttribution: ohthehugemanatee at Amazee Labs commentedComment #141
ohthehugemanatee CreditAttribution: ohthehugemanatee at Amazee Labs commentedOK, now we're back to
Unable to find the wrapper "temporary" - did you forget to enable it when you configured PHP?
I'm stumped on this. We register the "temporary" wrapper in setUp. Besides, none of the other children of FileTestBase have this problem, and they don't explicitly register any wrappers.
I feel like there's something fundamental I'm missing here. What's the difference between the run contexts of Drupal\KernelTests\Core\File\UnmanagedCopyTest and Drupal\Tests\migrate\Unit\process\CopyFileTest ?
Comment #142
quietone CreditAttribution: quietone as a volunteer commentedDebugged this and learned that the dataProvider functions were being called before the setup. And without the setup the file system wasn't setup and thus the failures. I talked to mikeryan on IRC about this and we agreed to rework this to remove the dataproviders. I'm sure someone will speak up if that isn't the way to proceed.
I started from the patch in #134 in order to restore the string translation removed in #138. Seems to me that if the translation needs work, it is for another issue.
Comment #144
quietone CreditAttribution: quietone as a volunteer commentedOops. Here is the correctly named interdiff.
Comment #145
quietone CreditAttribution: quietone as a volunteer commentedThe patch in #142 passed, so setting to NR.
Comment #146
ohthehugemanatee CreditAttribution: ohthehugemanatee at Amazee Labs commentedAwesome, good find!
So I'm trying to test with this config
But no file is saved to the node. Will dig into it this afternoon.
Comment #147
mikeryan@ohthehugemanatee: You're not passing a destination path to the file_copy plugin.
Comment #148
mikeryanTry this:
Comment #149
ohthehugemanatee CreditAttribution: ohthehugemanatee at Amazee Labs commentedaw jeez @mikeryan way to chime in with the easy answer. :)
(and thanks)
why back to 8.2.x? This is a >1500 line patch... does this mean it can make it into 8.2?
Comment #150
mikeryanI've been trying hard to get all BC breaks into 8.2.x so we can upgrade the basic framework to beta experimental status and let everyone count on a stable API from 8.2.x forward, I *really* don't have to wait another 6 months for that.
Comment #151
ohthehugemanatee CreditAttribution: ohthehugemanatee at Amazee Labs commentedTested with these configs:
Straightforward file copy: WORKS
File copy with move=TRUE: WORKS
File copy with rename=TRUE: BROKEN
First I created public://drupalicon_overwriteme.png . Then ran this migration.
It created drupalicon_overwriteme_0.png as expected, but the file path returned was still public://druplicon_overwriteme.png . So we need work on FileCopy::writeFile() to have it return the actual destination. Or maybe just call file_destination() to get the foreseen destination in ::transform(). This means we need to review the test, too.
Comment #152
ohthehugemanatee CreditAttribution: ohthehugemanatee at Amazee Labs commentedComment #153
quietone CreditAttribution: quietone as a volunteer commented@ohthehugemanatee, thx for the testing.
This should address the destination file name problem.
Comment #154
ohthehugemanatee CreditAttribution: ohthehugemanatee at Amazee Labs commentedahh @quietone you beat me to the punch! :) I actually did a double take looking at your patch just now, because we even chose the same variable names.
Attaching my version of the patch here, too. It's extremely similar. IMO a tiny bit preferable because of a simpler FileCopy::writeFile and switching from assertTrue(is_file()) to assertFileExists().
changes to FileCopy:
* simplified FileCopy::writeFile() to just choose between file_unmanaged_move and file_unmanaged_copy, returning the actual destination.
* Corresponding changes to ::transform to return the destination we get back from ::writeFile(), instead of the requested destination.
changes to CopyFileTest:
* doImport now returns the destination that the importer returns
* every time we run doImport, I added a test to make sure the returned value is as expected (except for testNonExistentSourceFile)
* replaced manual file existence checks with assertFileExists and assertFileNotExists.
This patch passed all three manual tests above.
Comment #155
mikeryanSo, between the two nearly-identical patches, I prefer the return of the file_unmanaged_move() result in #154, but not:
Unfortunately, file_unmanaged_copy() won't work with remote URLs (we actually do have a reason for copy() here).
Comment #156
ohthehugemanatee CreditAttribution: ohthehugemanatee at Amazee Labs commentedreplaced that file_unmanaged_copy with copy
Comment #157
chx CreditAttribution: chx at Smartsheet commentedYeah. This was done before and it was wrong then as well. Do not skip Drupal core APIs because they do not do what you need from them. That's what contrib does: a fortunate hack that happens to do what you want to do -- until you get badly burned later. I already mentioned remote capabilities do not belong to this patch and you need a followup for a download process plugin utilizing Guzzle.
Alternatively, file a patch to fix the file API.
Comment #158
mikeryanThe use of copy() to support remote files is not being introduced by this patch, it's being moved, and I don't want to expand the scope of this issue - let's fix what this patch is intended to do (moving configuration and processing out of the destination plugin and into their logical places in source and process) and followup with a more robust means of handling remote files.
Comment #159
mikeryanComment #160
chx CreditAttribution: chx at Smartsheet commented> The use of copy() to support remote files is not being introduced by this patch
Nope it was introduced long ago and I didn't catch it then and it was a hack then and it's an excellent time to get rid of it. As I said in #97 core has public and temporary wrappers and while downloading is a good idea it simply doesn't belong here.
Comment #161
mikeryanThe current patch needs someone else to review it.
I still disagree this is the place to change the copy() behavior.
Comment #162
chx CreditAttribution: chx at Smartsheet commentedSure. https://www.drupal.org/node/2244555#comment-11503439 is the place to get rid of it then.
Comment #163
ohthehugemanatee CreditAttribution: ohthehugemanatee at Amazee Labs commentedI don't understand why the status goes to "postponed". We were already using copy() for remote files, and this patch maintains that behavior. It in no way depends on a better implementation of remote file handling.
I agree that we should get rid of copy(), but "change how we handle remote files", "make file_unmanaged_copy support remote files", or "add a process plugin for guzzling remote files" are all way outside the scope of this issue. Whichever approach we take, it's also not a blocker. We have a working patch that addresses this issue as scoped, and doesn't change the existing approach to remote files.
Comment #164
quietone CreditAttribution: quietone as a volunteer commentedSorry, but two small items.
Can this be changed to reflect that writeFile will do a copy or move? s/copy/copy or move/ ?
But sometimes not a boolean, sometimes the destination. What is the standard for this? I don't see anything in https://www.drupal.org/node/1354#return. File.inc just has '@return' without a data type for cases like this but other files in core have multiple return data types separated by an or, i.e. '@return string|bool'. Personally, I prefer the later, showing all possible data types that will be returned.
Comment #165
chx CreditAttribution: chx at Smartsheet commented> We were already using copy() for remote files, and this patch maintains that behavior.
And that was wrong. Adding anything to core is trivial, even if it's wrong and removing it as this issue shows is impossible.
Do whatever. Unfollowed.
Comment #166
chx CreditAttribution: chx at Smartsheet commentedParting shot: if someone has basic auth (for example for allowing the migration of private files from remote) then this hack breaks. The clear solution is , as I repeatedly said is to remove hackish remote capabilities and in a followup add a file download plugin.
Comment #167
ohthehugemanatee CreditAttribution: ohthehugemanatee at Amazee Labs commented@mikeryan did I do this right? #2783075: Add a "download" process plugin, remove remote capability from FileCopy
Comment #168
ohthehugemanatee CreditAttribution: ohthehugemanatee at Amazee Labs commentedUpdated comments per @quietone's suggestions. Needs someone else to test...
Comment #170
quietone CreditAttribution: quietone as a volunteer commentedRetesting. The error was unrelated to this patch.
Comment #171
quietone CreditAttribution: quietone as a volunteer commentedTests passing, back to needs review.
Comment #172
chx CreditAttribution: chx at Smartsheet commentedComment #173
Chi CreditAttribution: Chi commented@ohthehugemanatee This might be your issue too.
https://github.com/sebastianbergmann/environment/pull/13
Comment #174
ohthehugemanatee CreditAttribution: ohthehugemanatee as a volunteer commentedWow, fantastic find @Chi! This is indeed the problem. Nice job figuring that out.
Comment #175
phenaproximaOh, we are getting there!
I'd like to add a sanity check here -- if $destination is empty, can we throw a MigrateException? I ask because file migration is prone to failure because of all the moving parts, and any extra defenses against weird, hard-to-trace failures will be good things.
I'd like to wrap this logic in a protected method so that, if the source_base_path constant is not set for some reason (or otherwise invalid), we can throw an exception. Play defense!
Nit: There should be a blank line between the description and @dataProvider.
Kind of a nitpick, but why is this in its own method? Can't we move all this directly into testUrls()?
s/uri/URI
Nit: The "any" makes this line exceed 80 characters.
Same thing here, with that last period.
This seems a bit too trusting to me. Can we use parse_url() (or a helper method wrapping around it) here to prove that it really is a URL? And if it's not, I think we should maybe throw \InvalidArgumentException rather than return the input value.
In other words, let's just do everything we do in the if block anyway, for all input values :)
TBH, I find this loop a little awkward. I'd prefer it to be straight-up procedural for clarity and simplicity's sake...but I concede that this is more a code style nitpick than anything. So consider this request a "nice to have", rather than "back to the ol' drawing board".
Nit: Need a space after (string).
I'd rather name this prepareMigration() than processMigration(). The word "process" sounds like it's going to do some seriously heavy lifting, and I'm guessing that is not the point of this method.
Missing description.
This belongs in a data provider method, unless there's a compelling reason not to use one...but I don't see any real reason not to...? (It seems OK to me to move the $file preparation logic into the data provider. If I'm not mistaken, it will only run once.)
Ditto for all this.
I think this file might be duplicated in the File module, in this same patch.
Comment #176
mikeryanDone.
It's valid for source_base_path to not be set (when the source paths are absolute).
Done.
Not sure if I did this, or maybe ohthehugemanatee, but I don't see a compelling reason to move it. I think it does help clarify what this chunk of code is doing.
Done.
Done.
It doesn't have to be an URL, it may be a local file path.
Comment #177
phenaproximaOkay, fair enough. Assuming the tests pass, I think this is RTBC as far as I'm concerned. I'd like to get confirmation on that from @benjy, though.
Comment #178
chx CreditAttribution: chx at Smartsheet commentedCan we recognize #2783075: Add a "download" process plugin, remove remote capability from FileCopy is fairly ready and nuke remote in here already?
Comment #179
phenaproximaI am very much in favor of a dedicated plugin for remote file handling, rather than over-engineering FileCopy.
But I'm also pragmatically considering the fact that this issue is a Migrate-critical blocker and #2783075: Add a "download" process plugin, remove remote capability from FileCopy is not. If it's not going to be much trouble to remove the remote file support from this patch (and I don't see why it would be), I say let's do it now and get the other patch in ASAP. If it does turn out to be difficult, let's commit this patch as it is and remove remote support from FileCopy in #2783075: Add a "download" process plugin, remove remote capability from FileCopy.
Comment #180
phenaproximaDiscussed in the weekly Migration hangout with the other maintainers. We've decided to proceed with this patch as-is for now, then move remote file support into the download plugin as a follow-up. This is not to say that remote file support is unimportant (precisely the opposite, in fact), but this is blocking further work, it's Migrate critical, and in light of that it's OK for this not to be architecturally pure in the short term.
Comment #181
chx CreditAttribution: chx at Smartsheet commented> it's OK for this not to be architecturally pure in the short term.
In my opinion it is worth more to be architecturally clean finally (the hack was added two years ago) and lack remote support for the short term. But. That's old term thinking apparently. Welcome to Drupal 8. I have not changed the status of this issue and I will not do more than voice my protest.
Comment #182
alexpottNot used.
I think we should add a sub directory with spaces here. Just to make sure %2f's are replaced with / properly.
Comment #183
mikeryanFinal (Mike crosses his fingers) patch...
Comment #184
phenaproximaRockin'.
Comment #185
alexpottCommitted and pushed 78714b9 to 8.3.x and f3de7e6 to 8.2.x. Thanks!
Comment #187
neclimdulThe friendly neighborhood "this broke the phpunit runner" notice. At least in a bare checkout running the unit test suite. Seems this is a kernel test in the unit test suite which causes problems if you don't have all the requirements to run a kernel test setup.
Comment #188
neclimdulPosted follow up.
Comment #189
mikeryanWhat's the policy on publishing change records? When there's a release?
Speaking of follow-up issues: #2793731: Obsolete destination properties in d6_file/d6_user_picture_file.
Comment #190
benjifisherAccording to https://groups.drupal.org/node/402688,
That is more than 2.5 years old, and I do not see anything official on d.o. :-(
Another thing that may or may not be out of date is my recollection that any issue with the "needs change record" tag should be marked critical.
Comment #191
mikeryanWe have the draft change record - the question was when to publish it. The relevant clause:
So, I guess it should be published now. The open question is who does that (which is vague above) - my assumption was always that the core maintainer who marked the issue fixed would do that. But, maybe it's up to the patch author (or one of the patch authors)?
Comment #192
benjy CreditAttribution: benjy at PreviousNext commentedI think anyone can publish them, I've published the CR just now. https://www.drupal.org/node/2747025