For a site with a 80MB unpacked SQL dump the cloning takse ~1hour. It is tested with a 16MBit/5MBit bandwith, the platform size is 120MB unpacked.

- deploying of the platform takes 1.30minutes - so perfect ( 120MB uncompressed )
- cloning the site _without_ files takes at least one hour (80MB uncompressed ).

As the sight needs to be imported after the clone, it takes forever also there.
Iam not sure were the issue comes from yet, but:
1. why dont we compress the DB dump (sites/database.sql) and import it localy? mysql -u .. < database.sql ? As far i can see the dump is imported using the external port, or? As we need the mysql command on the remote anyway ( due to the dummy check ) we can import that way. this would speed up the process a lot, i think by a huge factor even.
2. if we add tar + gz to the "remote deps"..which is kind of "not too much", we could speed up files / platform deployment a lot (as we can untar /ungzip remotely). But actually in this case iam not 100% sure. Most of the files will be images ( compressing is useless) but its done in the backup anyway. So we could actually extract the backup on the aegir master and remote, instead of syncing it. I dont think this would harm the process, except:
- if we are doing chmods / chowns (doing a chown is bad..) on the ageir master,we would have to re-tar it before deploying ( kind of bad ). We could move all chmod / chown processing on the remote, which i think is the best bet. This way we dont run into UID/ GID issues anymore and can handle chown / chmod the same way.

In my understanding, extracting "data" like database or files on the aegir master is waste of time and space, as it is not needed. AFAIK the database is never imported for the bootstrap on the master, neither are the files used. So actually i thinking of the whole process:
Platform:
- rsync it the normal way from the master to the remote
- never hold "data" or extract it on the master, only settings if needed for a verify bootstrap
Site:
- make a backup of the "data(files, database)" as tgz. Make this on the remote (or the master, if it is localy).
- cp that backup to the aegir master, move it to the remote and untar it there
- import database on the remote using mysql localy
- chmod / chown files remotely

i think even for the spoke model this is valid. As long as we tar and extra exactly the same file it just is a "transport" implementation, as "compressing" during rsync is pretty bad and needs to be done several times ( remote -> master -> remote ), which makes it even more useless.

Part of the issue is a bug, part are more features. I dont think its desired to let a clone of a 200MB volume take over 2 hours all tother.

Comments

Anonymous’s picture

Some good ideas here, wondering if the issue #998484: Can we speed up site install / migration / backup / restore for remote servers that are geographically distant? is basically about the same thing (db bottleneck when verifying, migrating, cloning remote sites). Happy to close that older one and talk about this here, what do you think.

EugenMayer’s picture

Well:

- Backup: is totaly unrelated as its tared remotely ( -> we have tgz already as dep, so we would not even add that one with the approach above ). AFAIK even the dump is created remotely and not using the external port (that needs to be verified)
- Verify: I think this maybe needs an extra issue, as this most probably comes down to node_access_rebuild which should neven happen on verify at all (it should only happen on clone/migrate due to the possible new / gone modules) and is not related to the transport issues itself, but rather "what happens"
- Clone/Migrate: Yes, i think thats exactly the same issue

mig5 what the scope for this? 0.5, 1.0, (0.4)?

EugenMayer’s picture

correcting myself, we dont tar on the remote, we only tar localy ( so tar is not a dep for a remote yet )... see http://drupal.org/node/1079274

EugenMayer’s picture

One step on the road: http://drupal.org/node/1083386 .. removing "clear" for remotes speeds up 30x

anarcat’s picture

Title: Cloning takes horrible long » optimize site cloning
Version: » 6.x-1.0-rc2
Category: bug » feature
Issue tags: +optimization

(this was actually filed against beta2, but i have updated the version to match more closely)

this should be better now that we don't rebuild the node access table. turning into a feature request as there are other optimizations possible here, obviously.

Steven Jones’s picture

Version: 6.x-1.0-rc2 » 6.x-2.x-dev

I think that we should be able to make certain optimisations here, if say the source and target servers are both on the same server, in this case we should be able to do local copies, rather than doing a backup and restore. This would be a less intense version of #998484: Can we speed up site install / migration / backup / restore for remote servers that are geographically distant? which would involve running Drush commands on remote machines, and maybe would only work on the master server, so we could make some nice assumptions (when we can then run Drush on a remote machine, those assumptions become valid there too.)

mvc’s picture

ergonlogic’s picture

Version: 6.x-2.x-dev » 7.x-3.x-dev

New features need to be implemented in Aegir 3.x, then we can consider back-porting to Aegir 2.x.