Problem/Motivation

When drupal moves a file it issues a copy() and then an unlink() this causes a very significant amount of I/O. If the source and destination are on the same filesystem and rename() is issued instead then virtually no I/O occurs becuase filesystems can perform this operation by just changing a pointer on disk.

Proposed resolution

Instead of file_unmanaged_move() calling file_unmanaged_copy(), break out the common code into a new function file_unmanaged_prepare() and have file_unmanaged_move() attempt to issue a rename(), falling back to copy() and unlink() if this does not work.

This optimizes drupal_build_css_cache(), drupal_build_js_cache(). It also prevents a race condition with these functions where other processes or servers may start reading the files before they are fully written.

A further advanatage of issuing a rename() is that some remote schemes have implemented this method, e.g. it would be possible for a file to be moved to a different location on Amazon S3 without having to download it and re-upload it to the correct location.

Remaining tasks

Reviewing.

User interface changes

None.

API changes

None.

Original report by bvanmeurs

Hi, I'm using Drupal to host a huge photo archive (one node per file). I automatically create taxonomy tags for new directories, and files within those directories are automatically inserted as a node.

When a tag name is updated, I want to automatically change the directory name by creating a new directory, and move all of the files to the new directory. I do this using the function file_move.

When I update the root tag, containing all files (about 1000) it takes more than a minute to complete. I cachegrinded the request and found out that 99% of the resources were spent in the file_unmanaged_move function.

Researching the function, I found out that it simply copies the file and deletes it. I am working with hughe photo files (10mb each) so that means that about 10 gig of data is copied. That takes a whopping time of course, whereas a move would be very fast. I do not see any reason not to implement an efficient move function here. Just adding a parameter to the file_unmanaged_copy and using rename instead of copy would do the trick for me!

Any chance for a fix?

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

bvanmeurs’s picture

And here's a patch. I wonder if it may be better to create a new directory that is called by both file_unmanaged_copy and file_unmanaged_move as a helper-function. What do you guys think?

bvanmeurs’s picture

Made a mistake.. please ignore the previous patch

bvanmeurs’s picture

I re-factored the changes. There is now one prepare function, and both file_unmanaged_copy and file_unmanaged_move call this function. Original APIs are kept; functionality is unchanged but has become just WAY faster for big (multi-gig) files in file_unmanaged_move, which previously didn't really move files but copied and deleted them.

Tested using the File SimpleTest and ran successfully.

jbrown’s picture

Title: file_unmanaged_move slower than it should be » file_unmanaged_move() produces millions of times more I/O than necessary
Version: 7.x-dev » 8.x-dev
Category: feature » bug
Priority: Major » Normal
Status: Needs review » Needs work

Thanks for this!

I've been meaning to write this patch for years. I'm sometimes hack core to make this change. It prevents servers from falling over when massive files are being uploaded.

The operating system can move a multi-gigabyte file to a new location with just a tiny amount of I/O. Copying and then deleting the original file produced millions of times the amount of I/O.

Great approach!

D7 patches are written first for D8 and then backported once committed.

See #1240256: file_unmanaged_copy() Fails with Drupal 7.7+ and safe_mode or open_basedir.

I think file_unmanaged_prepare_copy_or_move() should be called file_unmanaged_prepare_destination().

Your patch introduces a couple of blank lines with whitespace in them - these need to be blank.

I am wondering if there are any compatibility issues with renaming a file across schemes?

bvanmeurs’s picture

Hi jbrown,

Before I made the changes I had a look at the previous approach, which worked by calling file_unmanaged_copy and file_unmanaged_delete serially. In my patch, file_unmanaged_move really does exactly the same as file_unmanaged_copy does now, with the exception that the copy command is replaced by a rename command. file_unmanaged_delete simply checks if the paths are correct (but this already happened in the file_unmanaged_prepare_copy_or_move function) and then calls drupal_unlink.

Drupal_unlink in turn chmods the file to be (re)moved, and then uses the basic 'unlink' php command. Note that a stream context may be supplied, but that isn't used by file_unmanaged_move. So I moved the chmod-part to the file_unmanaged_move function and re-patched it. I also changed the function name to your suggested name. I agree it describes its use better.

Patch is tested again.

Are you in the position to commit this to the core?

Regards,
Bas

bvanmeurs’s picture

Version: 8.x-dev » 7.x-dev

Btw, the patch is for Drupal 7.x-dev (but also works for Drupal 7.10)! Don't know if it also works for Drupal 8.x-dev.

bvanmeurs’s picture

Version: 7.x-dev » 8.x-dev

.

drewish’s picture

I'm not sure where this block came from:

+  //Make sure that Drupal can move the file
+  $scheme = file_uri_scheme($source);
+  if ((!$scheme || !file_stream_wrapper_valid_scheme($scheme)) && (substr(PHP_OS, 0, 3) == 'WIN')) {
+    chmod($source, 0600);
+  }

Could you explain it?

jbrown’s picture

Status: Needs work » Needs review
FileSize
11.84 KB

Yeah - don't know about that hunk.

Attached is work in progress.

bvanmeurs’s picture

Status: Needs review » Needs work

That's part of drupal_unlink and checks the source for validity. Also, it makes sure that the source cán be moved. Notice that this piece of code was also executed in the current file_unmanaged_move so, not willing to break things, I have included it in this case too.

Why the source stream wrapper must be checked in this case is a question for me too. Especially because it is not checked in file_unmanaged_copy. But, as said, I've included it because it is part of drupal_unlink which we are bypassing because of using the rename function.

jbrown’s picture

Status: Needs work » Needs review
FileSize
12.47 KB

I put the Windows fix back in and it now falls back to the old procedure for remote schemes where rename() doesn't work.

I made file_unmanaged_copy() and file_unmanaged_move() docblocks consistent.

@bvanmeurs only dries, webchick and catch have commit access.

bvanmeurs’s picture

Thanks jbrown! Your patch is looking good.

jbrown’s picture

FileSize
12.57 KB

No need to use drupal_unlink() - we are already handling the Windows problem.

jbrown’s picture

FileSize
12.43 KB

Further cleanup.

jbrown’s picture

#14: optimize-file-move.patch queued for re-testing.

Jorrit’s picture

I agree with this patch entirely. Since Drupal 7 introduced stream wrappers, file_move can also be used for remote locations like S3. When moving between two S3 locations an efficient S3-side copy & delete operation can be used, which takes just one round trip to the S3 server. However when copy is used, a stream copy takes place which transfers the entire file from S3 to the web server and back to S3, which is very inefficient. An example where this is used is in the File (Field) Paths module.

I hope this patch is quickly committed to Drupal 7 and Drupal 8 so this inefficiency is removed.

JordanMagnuson’s picture

I'm also having severe issues with S3 because of this.

marcingy’s picture

#14: optimize-file-move.patch queued for re-testing.

Status: Needs review » Needs work

The last submitted patch, optimize-file-move.patch, failed testing.

Jorrit’s picture

Status: Needs work » Needs review
FileSize
12.81 KB

Rerolled patch #14

chx’s picture

Status: Needs review » Needs work

Thanks for fixing this issue! It indeed has been bothering us for a very long time.

  if (substr(PHP_OS, 0, 3) == 'WIN' && !file_stream_wrapper_valid_scheme(file_uri_scheme($source))) {
    chmod($source, 0600);

The original code in drupal_unlink read:

  $scheme = file_uri_scheme($uri);
  if ((!$scheme || !file_stream_wrapper_valid_scheme($scheme)) && (substr(PHP_OS, 0, 3) == 'WIN')) {
    chmod($uri, 0600);
  }

why did this change?

On the other hand, $filepath = @copy($source, $destination); this needs to change. Originally, filepath was the return of file_unmanaged_copy which does return a filepath but copy returns a boolean. I suggest $success.

Jorrit’s picture

My guess is that $scheme is not needed separately in this new implementation. Also, I would also put the == WIN check before the scheme check, because string comparisons are faster than the valid scheme check.

jbrown’s picture

Title: file_unmanaged_move() produces millions of times more I/O than necessary » file_unmanaged_move() should issue a rename() where possible instead of copy() followed by unlink()
Status: Needs work » Needs review
FileSize
12.65 KB

Re-roll of my patch from #14.

The modifications to the windows fix - it makes more sense and is slightly faster to check first if we are indeed on windows. file_stream_wrapper_valid_scheme() can do what it is supposed to do - we don't need to check the scheme we pass to it.

See also #1395524: drupal_move_uploaded_file() produces twice as much I/O than necessary

jbrown’s picture

FileSize
12.38 KB

Simplified the code further - it makes sense to always attempt to resolve URIs to filepaths before issuing a copy() or a rename().

If the URIs can be resolved (i.e. they are local), then it works with safe_mode and open_basedir (#1240256: file_unmanaged_copy() Fails with Drupal 7.7+ and safe_mode or open_basedir). It also means rename() can work (and be fast) across schemes.

If URIs do not resolve (i.e. they are remote) then rename() can move files around on remote schemes fast (without having to download them).

If rename() fails (the schemes are different and not local) then the slower copy() then unlink() procedure is invoked.

jbrown’s picture

FileSize
12.47 KB

Improved comments.

jbrown’s picture

#25: optimize-file-move.patch queued for re-testing.

Dave Reid’s picture

Issue tags: +sprint, +Media Initiative
jbrown’s picture

#25: optimize-file-move.patch queued for re-testing.

jbrown’s picture

FileSize
12.65 KB

The last submitted patch, optimize-file-move.patch, failed testing.

jbrown’s picture

Status: Needs work » Needs review

#29: optimize-file-move.patch queued for re-testing.

jbrown’s picture

#29: optimize-file-move.patch queued for re-testing.

Status: Needs review » Needs work
Issue tags: +sprint, +Media Initiative

The last submitted patch, optimize-file-move.patch, failed testing.

jbrown’s picture

Status: Needs work » Needs review
FileSize
12.65 KB
jbrown’s picture

Issue tags: -sprint, -Media Initiative

#34: optimize-file-move.patch queued for re-testing.

Status: Needs review » Needs work

The last submitted patch, optimize-file-move.patch, failed testing.

jbrown’s picture

Status: Needs work » Needs review

#34: optimize-file-move.patch queued for re-testing.

jbrown’s picture

#34: optimize-file-move.patch queued for re-testing.

jbrown’s picture

Issue tags: +sprint, +Media Initiative

#34: optimize-file-move.patch queued for re-testing.

mikeytown2’s picture

Going to go ahead and mark #818818: Race Condition when using file_save_data FILE_EXISTS_REPLACE as a duplicate of this issue as this issue has 2x as many followers. Note that the linked issue has patches for 6.x, 7.x and 8.x; with all 3 passing tests.

Blog post on this issue as well: http://mikeytown2.drupalgardens.com/content/all-file-operations-future-v...

andypost’s picture

Any reason to close earlier issue that have patches in favour of this? I think this one should be marked as duplicate!

mikeytown2’s picture

Only reason is this one has 2x as many followers.

jbrown’s picture

The patches in this issue and the issue in #40 are quite different.

This issue is for the much broader concept of file_unmanaged_move() issuing rename() instead of copy() & unlink(). There are lots of reasons why this is a good idea. The patch for this is a clean refactoring that breaks out common functionality required by both file_unmanaged_move() and file_unmanaged_copy() into a new function file_unmanaged_prepare().

The patch in the other issue specifically deals with a race condition by issuing a rename() inside file_unmanaged_copy(), but is not a clean solution. file_unmanaged_save_data() already writes to a temporary file and with the patch in this issue it is renamed into place instead of copying.

jbrown’s picture

FileSize
3.48 KB
12.91 KB

Removed unnecessary introduction of blank lines.
Add Windows fix from #818818: Race Condition when using file_save_data FILE_EXISTS_REPLACE

jbrown’s picture

Issue summary: View changes

Add issue summary.

jbrown’s picture

Title: file_unmanaged_move() should issue a rename() where possible instead of copy() followed by unlink() » file_unmanaged_move() should issue rename() where possible instead of copy() & unlink(), optimize drupal_build_[css/js]_cache()
jbrown’s picture

Issue tags: -sprint, -Media Initiative

#44: optimize-file-move.patch queued for re-testing.

Status: Needs review » Needs work

The last submitted patch, optimize-file-move.patch, failed testing.

jbrown’s picture

Status: Needs work » Needs review

#44: optimize-file-move.patch queued for re-testing.

jbrown’s picture

#44: optimize-file-move.patch queued for re-testing.

Status: Needs review » Needs work

The last submitted patch, optimize-file-move.patch, failed testing.

Anonymous’s picture

Issue summary: View changes

Add more information to proposed solution.

mikeytown2’s picture

mikeytown2’s picture

Title: file_unmanaged_move() should issue rename() where possible instead of copy() & unlink(), optimize drupal_build_[css/js]_cache() » file_unmanaged_move() should issue rename() where possible instead of copy() & unlink(), optimize drupal_build_[css/js]_cache() & image toolkit save
Issue summary: View changes
onco_p53’s picture

Status: Needs work » Needs review

44: optimize-file-move.patch queued for re-testing.

Status: Needs review » Needs work

The last submitted patch, 44: optimize-file-move.patch, failed testing.

anrikun’s picture

Version: 8.0.x-dev » 7.x-dev
Status: Needs work » Needs review
FileSize
2.99 KB

A patch for D7

It provides a fast move by using rename() instead of copy() + unlink() when possible.

This patch is strongly recommended when using File Resumable Upload for large files.

anrikun’s picture

Version: 7.x-dev » 8.0.x-dev
Status: Needs review » Needs work

Test passed for D7.
Reverting to 8.x.

andypost’s picture

Issue tags: +Needs reroll
rpayanm’s picture

Status: Needs work » Needs review
FileSize
2.97 KB
pwieck’s picture

Issue tags: -Needs reroll

Status: Needs review » Needs work

The last submitted patch, 58: fast_move-1377740-58-D8.patch, failed testing.

Status: Needs work » Needs review

Status: Needs review » Needs work

The last submitted patch, 58: fast_move-1377740-58-D8.patch, failed testing.

mvc’s picture

Status: Needs work » Needs review
FileSize
2.96 KB

reroll of #58

kristofferwiklund’s picture

Status: Needs review » Reviewed & tested by the community

Looks very good to me.

Status: Reviewed & tested by the community » Needs work

The last submitted patch, 64: fast_move-1377740-64-D8.patch, failed testing.

kristofferwiklund’s picture

Status: Needs work » Needs review
anrikun’s picture

@kristofferwiklund
Shouldn't this be "Reviewed & tested by the community" again?

kristofferwiklund’s picture

Status: Needs review » Reviewed & tested by the community

It should. I forgot to update the status after I rerun the test.

hass’s picture

I'm not like how you implemented this. Using optional parameters sounds useless to me. We should always try a rename and fallback automatically to copy/delete if rename is not possible. I think this does not require any additional parameters. Hide this from the api sounds best to me.

Status: Reviewed & tested by the community » Needs work

The last submitted patch, 64: fast_move-1377740-64-D8.patch, failed testing.

Status: Needs work » Needs review
jbrown’s picture

Title: file_unmanaged_move() should issue rename() where possible instead of copy() & unlink(), optimize drupal_build_[css/js]_cache() & image toolkit save » file_unmanaged_move() should issue rename() where possible instead of copy() & unlink()
FileSize
12.79 KB
12.77 KB

-

jbrown’s picture

FileSize
12.77 KB

This is a reroll of the patch from #44.

slashrsm’s picture

Issue tags: +D8Media
iwant2fly’s picture

This works great. I would love to see it committed so I don't have to patch core.

joelpittet’s picture

@iwant2fly, if you have reviewed and tested it you are welcome to let us know how you did so and set the status. Otherwise this will not get into core. Committers don't look at issues that haven't been vetted by the community and tested to ensure they work.

iwant2fly’s picture

OK I am happy to do that. What kind of details do you need about how I tested this?

joelpittet’s picture

@iwant2fly considering this has to do with PHP functions and file systems. Any relevant information about your system, folder/file permissions and more importantly the method of how you "tested" to show that this patch fixes the "bug" which is outlined in the issue summary.

joelpittet’s picture

Doesn't have to be in depth, though since this doesn't have automated tests the environment that needs fixing needs to be tested on.

And change the status to "Reviewed and Tested by the Community":) aka RTBC

quicksketch’s picture

Status: Needs review » Needs work

Simplified the code further - it makes sense to always attempt to resolve URIs to filepaths before issuing a copy() or a rename(). If the URIs can be resolved (i.e. they are local), then it works with safe_mode and open_basedir

A note on the latest patch (#75) and the previous version (#44). I think this patch is not justified in the approach of using absolute file system paths any more. Neither safe_mode nor open_basedir are supported in PHP 5.3+, so this probably isn't a concern much longer (D7 still officially support PHP 5.2 though https://www.drupal.org/requirements). Additionally, resolving to local file paths will work only on local file systems, meaning this won't help remote file systems like S3, where the file copying on rename problem is the worst.

However, I think that this set of patches had a good idea separating out the common code that would be shared between the rename and copy functions. The other set of patches (#64 and #58) put a in a hack into file_unmanaged_copy() that made it move files instead of copying them. That's not a very clean solution and it dirties the parameters for a function that wasn't intended to do both move and copy operations.

So basically, both approaches probably need work.

catch’s picture

Opened #2688389: file_unmanaged_save_data()/file_unmanaged_move() are not atomic yesterday.

Bumping to major bug report because this is a race condition as well as saving on i/o.

catch’s picture

Version: 8.0.x-dev » 8.1.x-dev
Status: Needs work » Needs review
FileSize
4.32 KB

@beejeebus posted this patch on the other issue, uploading here.

Status: Needs review » Needs work

The last submitted patch, 85: 2688389-1.patch, failed testing.

catch’s picture

Status: Needs work » Needs review
FileSize
14.05 KB

Also re-uploading #75 which conveniently still applies, don't think it's possible to get a new test run on patches that old.

Fabianx’s picture

Status: Needs review » Reviewed & tested by the community
Issue tags: +Needs D7 backport

RTBC - This looks good to me.

Note that in theory this still does not make file_unmanaged_move atomic, because a failing rename call due to the file already existing since it was last checked, is not distinguished from the rename failing because it is on e.g. another incompatible stream wrapper. Potentially a flag could be added to allow to fail the rename?

But it improves it a lot and hopefully the file that is put in place will be the same anyway.

Therefore RTBC.

And this should be backported to D7.

  • catch committed 26cf55b on 8.2.x
    Issue #1377740 by jbrown, bvanmeurs, catch, anrikun, Jorrit, rpayanm,...

  • catch committed 1cafb0d on 8.1.x
    Issue #1377740 by jbrown, bvanmeurs, catch, anrikun, Jorrit, rpayanm,...
Anonymous’s picture

catch credited beejeebus.

Anonymous’s picture

catch credited beejeebus.

catch’s picture

Status: Reviewed & tested by the community » Fixed

Committed/pushed to 8.2.x and cherry-picked to 8.1.x. Thanks!

It's not a perfect fix, but gets us a long way with a long-standing and very severe issue that's hard to debug.

Opened the 7.x backport issue as a child of this.

Eric_A’s picture

Status: Fixed » Reviewed & tested by the community

What's with the theme.inc changes? Accidental, I guess?

Status: Reviewed & tested by the community » Needs work

The last submitted patch, 87: 1377740-75.patch, failed testing.

Fabianx’s picture

Assigned: Unassigned » catch
Issue tags: +needs revert

Indeed the theme.inc changes are unintentional.

  • catch committed 032d127 on 8.2.x
    Issue #1377740: revert unintentional theme.inc changes.
    

  • catch committed 5a7797e on 8.1.x
    Issue #1377740: revert unintentional theme.inc changes.
    
    (cherry picked...
catch’s picture

Status: Needs work » Fixed

Well spotted.

Committed/pushed a revert of that hunk to 8.2.x, and 8.1.x

Fabianx’s picture

Issue tags: -needs revert

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.

mfb’s picture

This needs to be re-opened or a new issue created for the backport.

Found a new issue for Drupal 7 backport at #2752783: [D7] file_unmanaged_move() should issue rename() where possible instead of copy() & unlink()

apaderno’s picture

Issue tags: -Needs D7 backport +Needs backport to D7
NWOM’s picture