Problem/Motivation

Right now we don't use the rsync file copier as the default file copier used by Composer Stager for 2 reasons

  1. We can't test it on DrupalCI
  2. There might be platform variations that would be hard to test

But rsync might perform better than the PHP file copier. It also might be better at handling symlinks if choose to support them in some form.

Proposed resolution

As a first step to determine whether it is worth trying to see if rsync should be the default copier we should test to see if performs better and if so how much better.

How to test

We can only test this locally because Drupalci doesn't have rsync available

Probably the easier to test this to make branch where record times to a file. We just append the time to the same file at various points in the stage life cycle

The file copier is only used for create and apply the stage

  1. In \Drupal\package_manager\StageBase::create before and after $this->beginner->begin($active_dir, $stage_dir, new PathList($event->getExcludedPaths()), NULL, $timeout);
  2. In \Drupal\package_manager\StageBase::apply() before and after $this->committer->commit($stage_dir, $active_dir, $paths_to_exclude, NULL, $timeout);

I would make a utility function on StageBase that records the time to file so it would be like

$this->recordTimeToFile('before begin');
$this->beginner->begin($active_dir, $stage_dir, new PathList($event->getExcludedPaths()), NULL, $timeout);
$this->recordTimeToFile('after begin');

`recordTimeToFile` should make the file name based on the stage id, the current stage class, developers name(hard coded to tell difference between locals) and the copier being used. that way we will have 1 file per life cycle, know which copier was used and which updater. The directory can just be hardcoded.

We should upload the files to this issues. Each person testing should run multiple. A good way to test to just run \Drupal\Tests\automatic_updates\Build\CoreUpdateTest so we know it the same for everyone

The copier can be set in settings.php via the config `package_manager.settings.file_syncer` see \Drupal\package_manager\FileSyncerFactory::create for values

Even though we are not testing on drupalci we should still push the branch so others can test because it might be different systems

Command icon Show commands

Start within a Git clone of the project using the version control instructions.

Or, if you do not have SSH keys set up on git.drupalcode.org:

Comments

tedbow created an issue. See original summary.

effulgentsia’s picture

Thank you for this issue! It would be good to test this with scenarios that include a large number of files that need to get copied. For example:

  • Updating a major core version (e.g., 9.4 to 10.0-alpha).
  • Installing (e.g., via Project Browser) a contrib module that brings in a lot of (large) dependencies.
tedbow’s picture

Issue tags: +core-post-mvp
tedbow’s picture

Version: 8.x-2.x-dev » 3.0.x-dev

tedbow’s picture

Issue summary: View changes
tedbow’s picture

Assigned: Unassigned » yash.rode
wim leers’s picture

Issue tags: +Performance

🚀

yash.rode made their first commit to this issue’s fork.

yash.rode’s picture

I tried running testApi() for both php and rsync and following are the results I got, rsync is faster than php file copier.

php
php1
php2

rsync
rsync1
rsync2

yash.rode’s picture

Assigned: yash.rode » Unassigned
Status: Active » Needs work

Needs testing from someone else,
Change the file path of file_put_contents() in StageBase.php to match to your local and try running \Drupal\Tests\automatic_updates\Build\CoreUpdateTest::testApi with php file copier and rsync.

wim leers’s picture

Assigned: Unassigned » yash.rode

#10 contains raw data. Can you please post your interpretation/analysis of that data? 🙏 😊

yash.rode’s picture

Assigned: yash.rode » Unassigned
StatusFileSize
new712.5 KB


Left side are the file contents when php file copier is used and right 2 are the results for rsync. The difference between before and after the begin/commit operation has done is noted and rsync takes ~2 sec lesser time but it can be a problem with my machine, as it's bit slower.

wim leers’s picture

Assigned: Unassigned » yash.rode

Thanks, #13 is a start for an analysis, but is nowhere near enough.

Missing fundamental pieces:

  1. Description of the project size: A) is it just Drupal core, or Drupal core plus many contrib modules?, B) how many files in total on disk?, C) how many directories in total on disk?, D) how many bytes stored on disk? (Use find, du etc.)
  2. Extrapolation of how the observed performance difference scales: A) 0 contrib modules, 10 contrib modules, 100 contrib modules, B) arbitrarily create an enormous & deep directory structure that gets copied over, C) arbitrarily put a few very large files in the project such as videos that get copied over
yash.rode’s picture

Assigned: yash.rode » Unassigned

As discussed in yesterday's call, I am running \Drupal\Tests\automatic_updates\Build\CoreUpdateTest::testApi and we want to test this on other machine as we found in past that my machine is slower compared to others. So the first step should be running the same test on @omkar.podey and @kunal.sachdev's machine so that we can confirm that we are getting the correct results and as described in the issue summary if we try to run the same test on different machines we can come to a conclusion from the observations from several developers.

kunal.sachdev’s picture

I also tried running testApi() for both php and rsync and following are the results :
Php1
before begin:1683287269
after begin:1683287287
Total time for begin: 18
before apply:1683287332
after apply:1683287360
Total time for apply: 28

Php2
before begin:1683287635
after begin:1683287654
Total time for begin: 19
before apply:1683287699
after apply:1683287728
Total time for apply: 29

Rsync1
before begin:1683286476
after begin:1683286496
Total time for begin: 20
before apply:1683286542
after apply:1683286575
Total time for apply: 33

Rsync2
before begin:1683286850
after begin:1683286868
Total time for begin: 18
before apply:1683286911
after apply:1683286940
Total time for apply: 29

wim leers’s picture

Great call, @yash.rode, to rule out that it's a machine/environment-specific observation that you were seeing! 👍

@kunal.sachdev confirmed @yash.rode's surprising discovery (that rsync and php file syncer have similar performance).

Great! 😊

Now it's time to address my review from #14, to build an understanding of when the performance is comparable between the two: in all circumstances, or only some?

omkar.podey’s picture

Assigned: Unassigned » omkar.podey
omkar.podey’s picture

I'm not sure if the tests are actually using rsync as i tried to set

$config['package_manager.settings']['file_syncer'] = 'rsync';

but when i tried to get the setting using

$type = \Drupal::config('package_manager.settings')->get('file_syncer');

i always get php instead of rsync.

wim leers’s picture

GREAT FIND, @omkar.podey! That completely invalidates the test results in #13 and #16

omkar.podey’s picture

Issue summary: View changes
StatusFileSize
new149.05 KB


My results show similar times, and just a 2 sec difference in the apply operation, now expanding the test to copy more files/larger files.

effulgentsia’s picture

Do the begin and apply operations run precondition checks? If so, it's possible that those are a significant portion of the time and are therefore masking the difference between php and rsync for the actual copying part. I think it would be good to run these tests without running the precondition checks (for testing purposes, is it possible to override services in such a way as to remove the precondition checks?).

Also, is this being tested on a Mac? If so, I think it would also be good to test this on Linux, since that's the more common production environment. I think doing that within Docker would be good enough, though another possibility would be to use an account on DigitalOcean or somewhere similar.

wim leers’s picture

If so, it's possible that those are a significant portion of the time and are therefore masking the difference between php and rsync for the actual copying part.

This is a very good point. I've actually noticed that the Status Report page is horrendously slow when Automatic Updates is installed, so I think this is a very likely theory!

If so, I think it would also be good to test this on Linux, since that's the more common production environment.

Another great point. Although … I don't see why performance characteristics of rsync vs the naïve PHP-based alternative would be so wildly different on different operating systems? You're right that this would be better for sure.

I think doing that within Docker would be good enough,

But isn't Docker-on-Mac known for its abysmal file system performance? 😅

omkar.podey’s picture

This should not be closed, i tried to add an early return in the \Drupal\package_manager\dispatch to avoid invoking validators, but the test fails to register times from apply operation so it's not as simple to just add an early return and test the file copiers without the validators.

So we need to figure out how to disable validators with the test passing.

wim leers’s picture

So we need to figure out how to disable validators with the test passing.

One possible solution: new test-only module that is installed for this test that removes the event_subscriber service tag from all services except for the essential ones.