So our migrate/clone tasks suck, performance-wise. We can be frank about it: we all experience it from time to time, even processing one task at a time, migrate can kill a server by tarring and clearing caches all over the place.

This issue aims to regroup the various issues that were reported to try to adress this problem:

* stop dealing with files/ - #1205458: Move modules/themes/libraries/files/private directories out of /sites/example.com
* do not flush caches in verify, but create another task for this - this is the infamous #1083386: Dont rebuild any cache / menu / modules on verify, as those things cant change. ( ~30 times faster verify )
* maybe simpler: #1093468: Exclude tables (i.e. cache*) from backups (w/workaround)
* make rename O(1) - #956998: rename (and perhaps clone) should not invoke provision-deploy (or avoid invoking drush updatedb)

Those may also be affected by this, although I find them fairly confusing so I can't say for sure:

* #1083366: Make the spokes authoritative for files/ and private/ directories
* #1077022: optimize site cloning
* #1561102: Allow spokes to change their themes and modules

Comments

jason.fisher’s picture

Updated http://drupal.org/node/1093468#comment-6027782 with a workaround to exclude data from select tables. Many of my SQL dumps are now almost half the size..

anarcat’s picture

Version: 7.x-2.x-dev » 6.x-2.x-dev

I have studied the code surrounding this issue extensively during my last train retreat. Here is my report.

First tests

I have tried the following:

1. make provision-backup's tar command follow symlinks. this introduces a serious security issue where a malicious site user may create a symlink to another site and steal his database credentials (along with all files).

2. dereference symlinks before passing them through tar. this doesn't actually work because the paths are now absolute, and will therefore not decompress properly.

Both approaches fail, resulting in the conclusion that optimising migrate will *not* be as simple as following symlinks or not, another approach needs to be taken.

First solution: move files out of the directory

First to resolve the sites directory issue, a proper solution would be a flag in deploy/install that would first create the site in a different directory than the sites dir, then create a symlink to that directory in the sites dir. A few ideas come to mind, but I think this could simply be the client's directory, which stays stable through upgrades and could be a good candidate for a chroot.

The core of this refactoring needs to happen install and deploy hooks, most specifically in _provision_drupal_create_directories(). Verify also uses that function, but maybe just to change permissions.

I think we should split that function between directory creation and permission verification. This would allow the ACL code to be merged in or at least hooked better, but would also allow us to avoid making modifications to core for every contrib modules.

This first solution is what #1205458: Move modules/themes/libraries/files/private directories out of /sites/example.com is about. This doesn't necessarily gives us performance improvements, and breaks backups, as we cannot reliably follow symlinks, so we will also need to adjust the backup command to add those files to the archive when doing backups, which in turns mean doing the backup in two steps (tar then gzip) to be able to append to the archive, which sucks. An alternative would be to start using "zip" instead of tgz, which supports appending natively.

Second solution: optimize migrate

Second, to resolve the performance issue, we need to change the way migrate works significantly. It should stop using `backup` and `deploy` to deploy files, except from migration between remote servers, and instead use `cp -al` or a similar (pluggable?) backup system.

This mostly happens in platform/migrate.provision.inc for migrate. The process would look something like this, and would be fairly similar in both clone and migrate, the difference being that migrate deletes the original site, as usual.

1. create the second site (_provision_drupal_create_directories())
2. create the second site's credentials (see db/deploy.provision.inc)
3. put the first site offline
4. copy files in place (new function! with cp -al or pluggable?)
5. generate the settings files
(_provision_drupal_create_settings_file(), and drush_provision_drupal_pre_provision_deploy())
6. copy the database using a mysql pipeline
7. do some verify magic (remaining of deploy?)
8. put new site online
9. put old site online (clone) or delete it (migrate)

This would have the added advantage that migrate/clone wouldn't depend on provision-backup anymore, and we could therefore start porting that code to drush archive functions.

An alternative to this would be to use the engine system to implement various backup/restore systems.

anarcat’s picture

Issue summary: View changes

add another related issue

helmo’s picture

Issue summary: View changes
Status: Active » Closed (outdated)

The 6.x-2.x branch will go EOL along with Drupal this week. So I'm closing
this issue. If it remains a confirmed issue in 7.x-3.x, feel free to re-open,
or better yet, create a new issue referencing this one.

PS: The hosting_sync task has some speed improvements related to this.