I've been successfully doing D6 migrations (including files) from my personal website, hosted on Acquia Cloud, for a while. Today I tried a D7 migration from a site on my local Mac, and files with spaces in the filenames all failed with 404s. Adding urlencode to the d7_file destination configuration fixed this, but begged the question "why is this different from D6?". After banging my head for a while, I discovered the difference was not between D6 and D7 but between the webservers - when I attempted a D6 migration using a local copy of my site, filenames with spaces failed for that as well. So, I think the AC servers are configured such that spaces work, but we should generally urlencode by default.
I'll see if I can do a failing test first (although, it might not be reproducible depending on the testbot servers), then submit the tiny little patch to fix this.
Comment | File | Size | Author |
---|---|---|---|
#5 | drupal_file_migrations-2553311-5.patch | 1.21 KB | mikeryan |
Comments
Comment #2
mikeryanThe issue is specific to retrieving files through a web server (http://), and I'm not quite sure how to test that (there don't seem to be any existing tests in the file module using remote files). So, anyway, here's the fix...
Comment #4
mikeryanIt seemed so simple... Turns out that although the file destination supports a urlencode setting, it's not in the schema...
Comment #5
mikeryanWith extra schema goodness.
Comment #6
mikeryanComment #7
safetypinPatch #5 worked for me, no more file access errors.
Comment #8
quietone CreditAttribution: quietone commentedThe patch is fine (always good to look at bits of migrate code new to me), certainly straightforward. I just wish I could reproduce the problem. I tried locally, just with D6, but couldn't get a failure. But it clearly fixes a problem for two people.
+1 RTBC
Comment #9
webchickI can reproduce the problem with webchick.net's DB, which has some files like
http://webchick.net/files/Screen Shot 2015-06-05 at 12.46.08 PM.png
. Happy to test this patch out on my next run. However, we need tests here.Comment #10
phenaproximaIt may be slightly risky, but I have an idea for how to test this: create a stream wrapper which will simulate the web server, returning the responses needed in order to verify and fix the problem. You could even override PHP's native http wrapper, but I don't know what kind of effect that would have on other tests.
Comment #11
BerdirHm, that is tricky. Are you sure this is a migration bug? Local files are *not* url encoded, doing that would actually be wrong IMHO. file.uri isn't really a standards-conform URI.. in most cases, it is a local path "masked" as a URI. And there, spaces are perfectly valid and should *not* be escaped.
See #2278073: Files with spaces in URIs fail entity TypedData validation and #2412509: Change LinkItem.uri to type 'uri' rather than 'string' and introduce user-path: scheme, which made it possible to have spaces in the file uri (or any uri, I guess).
About tests, you probably want to save a file twice and have it in the database twice.. once as a local path with spaces and once as a http thing. Maybe with additional special characters and not just space. You can get the http URL with getAbsoluteUrl() or so in a test. Not sure how to access that in a database dump file but you can probably inline it in a test. And both cases will need to work. I think that a url encode will break the local path.
Comment #12
mikeryanThe OP may have been unclear in using the word "local" - in this case, that's my local web server, files are being fetched from HTTP either way. Specifically, using the migrate-upgrade command, passing --legacy-root=http://mikeryan.name successfully copies the files, while --legacy-root=http://mryan1.dev.local:8083/ fails.
The mystery in the difference is that, with no URL encoding, filenames with spaces should fail either way. I spent some time in the debugger trying to understand how copy($filename_with_spaces, $destination) succeeds with mikeryan.name when that filename is not urlencoded - strangely, when I just hit continue after a breakpoint, it succeeds just like when run normally, but if I step through line-by-line it fails. WTFF!?
And, the copy command (copy-and-pasting exactly what the debugger showed the copy() parameters to be) fails as expected:
Anyway, explaining how copying filenames with spaces works in particular circumstances is a diversion from the main point - it shouldn't work, and usually doesn't work, so we need to add the urlencode.
As for writing tests, getAbsoluteUrl() is a WebTestBase thing and no good here, the whole migration test framework extends KernelTestBase. Which means we really can't test anything involving http://, can we? Do we need to build a parallel framework for this off of WebTestBase? Ugghh...
Comment #14
mikeryanI just don't see how testing this is possible - the scenario only happens when copying files over HTTP (so where does the test put the test files to be copied), and only seems to happen in certain server configurations. The fix is extremely straightforward.
Comment #15
quietone CreditAttribution: quietone commentedIt seems to me that this is a nice win for migraters and not likely to cause problems. With that in mind Is it OK to RTBC this and make a follow up issue for adding tests?
Comment #16
mikeryanCertainly OK by me - been waiting for someone to offer!
Comment #17
quietone CreditAttribution: quietone commentedCreated new Issue for writing a test #2613722: Figure out how to test migration of remote (http://) files.
Comment #18
webchickOk, tests at a later time it is.
Committed and pushed to 8.0.x. Thanks!