Problem/Motivation
When drupal moves a file it issues a copy() and then an unlink() this causes a very significant amount of I/O. If the source and destination are on the same filesystem and rename() is issued instead then virtually no I/O occurs becuase filesystems can perform this operation by just changing a pointer on disk.
Proposed resolution
Instead of file_unmanaged_move() calling file_unmanaged_copy(), break out the common code into a new function file_unmanaged_prepare() and have file_unmanaged_move() attempt to issue a rename(), falling back to copy() and unlink() if this does not work.
This optimizes drupal_build_css_cache(), drupal_build_js_cache(). It also prevents a race condition with these functions where other processes or servers may start reading the files before they are fully written.
A further advanatage of issuing a rename() is that some remote schemes have implemented this method, e.g. it would be possible for a file to be moved to a different location on Amazon S3 without having to download it and re-upload it to the correct location.
Remaining tasks
Reviewing.
User interface changes
None.
API changes
None.
Original report by bvanmeurs
Hi, I'm using Drupal to host a huge photo archive (one node per file). I automatically create taxonomy tags for new directories, and files within those directories are automatically inserted as a node.
When a tag name is updated, I want to automatically change the directory name by creating a new directory, and move all of the files to the new directory. I do this using the function file_move.
When I update the root tag, containing all files (about 1000) it takes more than a minute to complete. I cachegrinded the request and found out that 99% of the resources were spent in the file_unmanaged_move function.
Researching the function, I found out that it simply copies the file and deletes it. I am working with hughe photo files (10mb each) so that means that about 10 gig of data is copied. That takes a whopping time of course, whereas a move would be very fast. I do not see any reason not to implement an efficient move function here. Just adding a parameter to the file_unmanaged_copy and using rename instead of copy would do the trick for me!
Any chance for a fix?
Comment | File | Size | Author |
---|---|---|---|
#87 | 1377740-75.patch | 14.05 KB | catch |
#85 | 2688389-1.patch | 4.32 KB | catch |
#75 | optimize-file-move.patch | 12.77 KB | jbrown |
Comments
Comment #1
bvanmeurs CreditAttribution: bvanmeurs commentedAnd here's a patch. I wonder if it may be better to create a new directory that is called by both file_unmanaged_copy and file_unmanaged_move as a helper-function. What do you guys think?
Comment #2
bvanmeurs CreditAttribution: bvanmeurs commentedMade a mistake.. please ignore the previous patch
Comment #3
bvanmeurs CreditAttribution: bvanmeurs commentedI re-factored the changes. There is now one prepare function, and both file_unmanaged_copy and file_unmanaged_move call this function. Original APIs are kept; functionality is unchanged but has become just WAY faster for big (multi-gig) files in file_unmanaged_move, which previously didn't really move files but copied and deleted them.
Tested using the File SimpleTest and ran successfully.
Comment #4
jbrown CreditAttribution: jbrown commentedThanks for this!
I've been meaning to write this patch for years. I'm sometimes hack core to make this change. It prevents servers from falling over when massive files are being uploaded.
The operating system can move a multi-gigabyte file to a new location with just a tiny amount of I/O. Copying and then deleting the original file produced millions of times the amount of I/O.
Great approach!
D7 patches are written first for D8 and then backported once committed.
See #1240256: file_unmanaged_copy() Fails with Drupal 7.7+ and safe_mode or open_basedir.
I think file_unmanaged_prepare_copy_or_move() should be called file_unmanaged_prepare_destination().
Your patch introduces a couple of blank lines with whitespace in them - these need to be blank.
I am wondering if there are any compatibility issues with renaming a file across schemes?
Comment #5
bvanmeurs CreditAttribution: bvanmeurs commentedHi jbrown,
Before I made the changes I had a look at the previous approach, which worked by calling file_unmanaged_copy and file_unmanaged_delete serially. In my patch, file_unmanaged_move really does exactly the same as file_unmanaged_copy does now, with the exception that the copy command is replaced by a rename command. file_unmanaged_delete simply checks if the paths are correct (but this already happened in the file_unmanaged_prepare_copy_or_move function) and then calls drupal_unlink.
Drupal_unlink in turn chmods the file to be (re)moved, and then uses the basic 'unlink' php command. Note that a stream context may be supplied, but that isn't used by file_unmanaged_move. So I moved the chmod-part to the file_unmanaged_move function and re-patched it. I also changed the function name to your suggested name. I agree it describes its use better.
Patch is tested again.
Are you in the position to commit this to the core?
Regards,
Bas
Comment #6
bvanmeurs CreditAttribution: bvanmeurs commentedBtw, the patch is for Drupal 7.x-dev (but also works for Drupal 7.10)! Don't know if it also works for Drupal 8.x-dev.
Comment #7
bvanmeurs CreditAttribution: bvanmeurs commented.
Comment #8
drewish CreditAttribution: drewish commentedI'm not sure where this block came from:
Could you explain it?
Comment #9
jbrown CreditAttribution: jbrown commentedYeah - don't know about that hunk.
Attached is work in progress.
Comment #10
bvanmeurs CreditAttribution: bvanmeurs commentedThat's part of drupal_unlink and checks the source for validity. Also, it makes sure that the source cán be moved. Notice that this piece of code was also executed in the current file_unmanaged_move so, not willing to break things, I have included it in this case too.
Why the source stream wrapper must be checked in this case is a question for me too. Especially because it is not checked in file_unmanaged_copy. But, as said, I've included it because it is part of drupal_unlink which we are bypassing because of using the rename function.
Comment #11
jbrown CreditAttribution: jbrown commentedI put the Windows fix back in and it now falls back to the old procedure for remote schemes where rename() doesn't work.
I made file_unmanaged_copy() and file_unmanaged_move() docblocks consistent.
@bvanmeurs only dries, webchick and catch have commit access.
Comment #12
bvanmeurs CreditAttribution: bvanmeurs commentedThanks jbrown! Your patch is looking good.
Comment #13
jbrown CreditAttribution: jbrown commentedNo need to use drupal_unlink() - we are already handling the Windows problem.
Comment #14
jbrown CreditAttribution: jbrown commentedFurther cleanup.
Comment #15
jbrown CreditAttribution: jbrown commented#14: optimize-file-move.patch queued for re-testing.
Comment #16
Jorrit CreditAttribution: Jorrit commentedI agree with this patch entirely. Since Drupal 7 introduced stream wrappers, file_move can also be used for remote locations like S3. When moving between two S3 locations an efficient S3-side copy & delete operation can be used, which takes just one round trip to the S3 server. However when copy is used, a stream copy takes place which transfers the entire file from S3 to the web server and back to S3, which is very inefficient. An example where this is used is in the File (Field) Paths module.
I hope this patch is quickly committed to Drupal 7 and Drupal 8 so this inefficiency is removed.
Comment #17
JordanMagnuson CreditAttribution: JordanMagnuson commentedI'm also having severe issues with S3 because of this.
Comment #18
marcingy CreditAttribution: marcingy commented#14: optimize-file-move.patch queued for re-testing.
Comment #20
Jorrit CreditAttribution: Jorrit commentedRerolled patch #14
Comment #21
chx CreditAttribution: chx commentedThanks for fixing this issue! It indeed has been bothering us for a very long time.
The original code in
drupal_unlink
read:why did this change?
On the other hand,
$filepath = @copy($source, $destination);
this needs to change. Originally, filepath was the return of file_unmanaged_copy which does return a filepath but copy returns a boolean. I suggest$success
.Comment #22
Jorrit CreditAttribution: Jorrit commentedMy guess is that $scheme is not needed separately in this new implementation. Also, I would also put the == WIN check before the scheme check, because string comparisons are faster than the valid scheme check.
Comment #23
jbrown CreditAttribution: jbrown commentedRe-roll of my patch from #14.
The modifications to the windows fix - it makes more sense and is slightly faster to check first if we are indeed on windows. file_stream_wrapper_valid_scheme() can do what it is supposed to do - we don't need to check the scheme we pass to it.
See also #1395524: drupal_move_uploaded_file() produces twice as much I/O than necessary
Comment #24
jbrown CreditAttribution: jbrown commentedSimplified the code further - it makes sense to always attempt to resolve URIs to filepaths before issuing a copy() or a rename().
If the URIs can be resolved (i.e. they are local), then it works with safe_mode and open_basedir (#1240256: file_unmanaged_copy() Fails with Drupal 7.7+ and safe_mode or open_basedir). It also means rename() can work (and be fast) across schemes.
If URIs do not resolve (i.e. they are remote) then rename() can move files around on remote schemes fast (without having to download them).
If rename() fails (the schemes are different and not local) then the slower copy() then unlink() procedure is invoked.
Comment #25
jbrown CreditAttribution: jbrown commentedImproved comments.
Comment #26
jbrown CreditAttribution: jbrown commented#25: optimize-file-move.patch queued for re-testing.
Comment #27
Dave ReidComment #28
jbrown CreditAttribution: jbrown commented#25: optimize-file-move.patch queued for re-testing.
Comment #29
jbrown CreditAttribution: jbrown commentedComment #31
jbrown CreditAttribution: jbrown commented#29: optimize-file-move.patch queued for re-testing.
Comment #32
jbrown CreditAttribution: jbrown commented#29: optimize-file-move.patch queued for re-testing.
Comment #34
jbrown CreditAttribution: jbrown commentedComment #35
jbrown CreditAttribution: jbrown commented#34: optimize-file-move.patch queued for re-testing.
Comment #37
jbrown CreditAttribution: jbrown commented#34: optimize-file-move.patch queued for re-testing.
Comment #38
jbrown CreditAttribution: jbrown commented#34: optimize-file-move.patch queued for re-testing.
Comment #39
jbrown CreditAttribution: jbrown commented#34: optimize-file-move.patch queued for re-testing.
Comment #40
mikeytown2 CreditAttribution: mikeytown2 commentedGoing to go ahead and mark #818818: Race Condition when using file_save_data FILE_EXISTS_REPLACE as a duplicate of this issue as this issue has 2x as many followers. Note that the linked issue has patches for 6.x, 7.x and 8.x; with all 3 passing tests.
Blog post on this issue as well: http://mikeytown2.drupalgardens.com/content/all-file-operations-future-v...
Comment #41
andypostAny reason to close earlier issue that have patches in favour of this? I think this one should be marked as duplicate!
Comment #42
mikeytown2 CreditAttribution: mikeytown2 commentedOnly reason is this one has 2x as many followers.
Comment #43
jbrown CreditAttribution: jbrown commentedThe patches in this issue and the issue in #40 are quite different.
This issue is for the much broader concept of file_unmanaged_move() issuing rename() instead of copy() & unlink(). There are lots of reasons why this is a good idea. The patch for this is a clean refactoring that breaks out common functionality required by both file_unmanaged_move() and file_unmanaged_copy() into a new function file_unmanaged_prepare().
The patch in the other issue specifically deals with a race condition by issuing a rename() inside file_unmanaged_copy(), but is not a clean solution. file_unmanaged_save_data() already writes to a temporary file and with the patch in this issue it is renamed into place instead of copying.
Comment #44
jbrown CreditAttribution: jbrown commentedRemoved unnecessary introduction of blank lines.
Add Windows fix from #818818: Race Condition when using file_save_data FILE_EXISTS_REPLACE
Comment #44.0
jbrown CreditAttribution: jbrown commentedAdd issue summary.
Comment #45
jbrown CreditAttribution: jbrown commentedComment #46
jbrown CreditAttribution: jbrown commented#44: optimize-file-move.patch queued for re-testing.
Comment #48
jbrown CreditAttribution: jbrown commented#44: optimize-file-move.patch queued for re-testing.
Comment #49
jbrown CreditAttribution: jbrown commented#44: optimize-file-move.patch queued for re-testing.
Comment #50.0
(not verified) CreditAttribution: commentedAdd more information to proposed solution.
Comment #51
mikeytown2 CreditAttribution: mikeytown2 commentedMarking #1762772: Notice: Trying to get property of non-object in image_style_deliver() as a duplicate of this issue.
Comment #52
mikeytown2 CreditAttribution: mikeytown2 commentedComment #53
onco_p53 CreditAttribution: onco_p53 commented44: optimize-file-move.patch queued for re-testing.
Comment #55
anrikun CreditAttribution: anrikun commentedA patch for D7
It provides a fast move by using rename() instead of copy() + unlink() when possible.
This patch is strongly recommended when using File Resumable Upload for large files.
Comment #56
anrikun CreditAttribution: anrikun commentedTest passed for D7.
Reverting to 8.x.
Comment #57
andypostComment #58
rpayanmComment #59
pwieck CreditAttribution: pwieck commentedComment #64
mvcreroll of #58
Comment #65
kristofferwiklund CreditAttribution: kristofferwiklund commentedLooks very good to me.
Comment #67
kristofferwiklund CreditAttribution: kristofferwiklund commentedTesting problem because of this #2489100: Random fail in Drupal\views\Tests\Plugin\RowRenderCacheTest
Comment #69
anrikun CreditAttribution: anrikun commented@kristofferwiklund
Shouldn't this be "Reviewed & tested by the community" again?
Comment #70
kristofferwiklund CreditAttribution: kristofferwiklund commentedIt should. I forgot to update the status after I rerun the test.
Comment #71
hass CreditAttribution: hass commentedI'm not like how you implemented this. Using optional parameters sounds useless to me. We should always try a rename and fallback automatically to copy/delete if rename is not possible. I think this does not require any additional parameters. Hide this from the api sounds best to me.
Comment #74
jbrown CreditAttribution: jbrown as a volunteer commented-
Comment #75
jbrown CreditAttribution: jbrown as a volunteer commentedThis is a reroll of the patch from #44.
Comment #76
jbrown CreditAttribution: jbrown as a volunteer commentedComment #77
slashrsm CreditAttribution: slashrsm commentedComment #78
iwant2fly CreditAttribution: iwant2fly commentedThis works great. I would love to see it committed so I don't have to patch core.
Comment #79
joelpittet@iwant2fly, if you have reviewed and tested it you are welcome to let us know how you did so and set the status. Otherwise this will not get into core. Committers don't look at issues that haven't been vetted by the community and tested to ensure they work.
Comment #80
iwant2fly CreditAttribution: iwant2fly commentedOK I am happy to do that. What kind of details do you need about how I tested this?
Comment #81
joelpittet@iwant2fly considering this has to do with PHP functions and file systems. Any relevant information about your system, folder/file permissions and more importantly the method of how you "tested" to show that this patch fixes the "bug" which is outlined in the issue summary.
Comment #82
joelpittetDoesn't have to be in depth, though since this doesn't have automated tests the environment that needs fixing needs to be tested on.
And change the status to "Reviewed and Tested by the Community":) aka RTBC
Comment #83
quicksketchA note on the latest patch (#75) and the previous version (#44). I think this patch is not justified in the approach of using absolute file system paths any more. Neither safe_mode nor open_basedir are supported in PHP 5.3+, so this probably isn't a concern much longer (D7 still officially support PHP 5.2 though https://www.drupal.org/requirements). Additionally, resolving to local file paths will work only on local file systems, meaning this won't help remote file systems like S3, where the file copying on rename problem is the worst.
However, I think that this set of patches had a good idea separating out the common code that would be shared between the rename and copy functions. The other set of patches (#64 and #58) put a in a hack into
file_unmanaged_copy()
that made it move files instead of copying them. That's not a very clean solution and it dirties the parameters for a function that wasn't intended to do both move and copy operations.So basically, both approaches probably need work.
Comment #84
catchOpened #2688389: file_unmanaged_save_data()/file_unmanaged_move() are not atomic yesterday.
Bumping to major bug report because this is a race condition as well as saving on i/o.
Comment #85
catch@beejeebus posted this patch on the other issue, uploading here.
Comment #87
catchAlso re-uploading #75 which conveniently still applies, don't think it's possible to get a new test run on patches that old.
Comment #88
Fabianx CreditAttribution: Fabianx as a volunteer commentedRTBC - This looks good to me.
Note that in theory this still does not make file_unmanaged_move atomic, because a failing rename call due to the file already existing since it was last checked, is not distinguished from the rename failing because it is on e.g. another incompatible stream wrapper. Potentially a flag could be added to allow to fail the rename?
But it improves it a lot and hopefully the file that is put in place will be the same anyway.
Therefore RTBC.
And this should be backported to D7.
Comment #91
Anonymous (not verified) CreditAttribution: Anonymous commentedcatch credited beejeebus.
Comment #92
Anonymous (not verified) CreditAttribution: Anonymous commentedcatch credited beejeebus.
Comment #93
catchCommitted/pushed to 8.2.x and cherry-picked to 8.1.x. Thanks!
It's not a perfect fix, but gets us a long way with a long-standing and very severe issue that's hard to debug.
Opened the 7.x backport issue as a child of this.
Comment #94
Eric_A CreditAttribution: Eric_A commentedWhat's with the theme.inc changes? Accidental, I guess?
Comment #96
Fabianx CreditAttribution: Fabianx as a volunteer commentedIndeed the theme.inc changes are unintentional.
Comment #99
catchWell spotted.
Committed/pushed a revert of that hunk to 8.2.x, and 8.1.x
Comment #100
Fabianx CreditAttribution: Fabianx as a volunteer commentedComment #102
mfbThis needs to be re-opened or a new issue created for the backport.Found a new issue for Drupal 7 backport at #2752783: [D7] file_unmanaged_move() should issue rename() where possible instead of copy() & unlink()
Comment #103
apadernoComment #104
NWOM CreditAttribution: NWOM commentedCreated a new D7 issue using the patch at #55: #3098526: Edit file_unmanaged_move() should issue rename() where possible instead of copy() & unlink()