Test case would be http://drupal.org/node/1057736 ... if you fail cloning because of this for example you end up
- having no "site" node in aegir - so no way to delete the site as it does not exist officially
- having a vhost.d entry on the new remote, which did not get removed
- having all files and the sites/newdomain/ folder on the new server
- having the SSL folder with the wrong cert on the new remote

And you have to delete all this by hand, as you cannot revert / delete cloning. ( at least not using the GUI)

Once again, beta2

Comments

EugenMayer’s picture

side note - database is removed correctly.

In the clone log, Changes for drush_http_post_provision_deploy module have been rolled back. is logged and actually all those deletes are green - but well files and configs are still there.

EugenMayer’s picture

Title: If a clone fails, files and configurations are not removed » when a mirgation fails, newly created DB is not removed
Version: 6.x-0.2-beta1 »
Priority: Normal » Minor

RC1:
Easy to reproduce:
- add a drush_set_error somewhere in platform/migrate.provision.inc
- run migrate

Migrating will fail unsuprisingly, the odd thing iss, that the "newly" and not needed DB is not removed in the rollback methods.

Still to check: Iam pretty sure "new" site folder is not deleted on the aegir master also, but did not verify this right now

Marking low prio, as its not mission critical. You just end up with ghost DBs, but everthing else stays functional

memtkmcc’s picture

Priority: Minor » Major

It is even worse, because not only database zombie is left, but also dbuser with random password, so any attempt to create/migrate the site again will fail, because you will end up with two grants for the same dbuser, but with two different passwords - I have seen that many times already.

Changing to major, as it is a pretty serious bug related to not-clean rollback procedure.

anarcat’s picture

Title: when a mirgation fails, newly created DB is not removed » migration rollback not working on remote servers
Priority: Major » Critical
Issue tags: +multiserver

If I understand this correctly, rollback is not working, but only on multi-server: clarifying title.

Note that #1057736: [SSL] Cloning SSL sites will lead to a non SSL site .. but certificate is still copied should be fixed in rc1, so I'd be curious to see why you still have a failure in the migrate. Can you post the full debug log?

EugenMayer’s picture

A have no failure there, i provocate it to reproduce the bug ( thats why i add drush_set_error ). But there are other reasons why this can happen and then you need a proper rollback. so #1057736: [SSL] Cloning SSL sites will lead to a non SSL site .. but certificate is still copied works fine.

anarcat’s picture

Note that this may be a regression introduced while fixing #976300: web server migration results in content removal when site is verified...

anarcat’s picture

Priority: Critical » Normal

So. I understand there's an issue here, but I'd like to reproduce it. So please state exactly where you added the drush_set_error() to trigger that bug. File name and line number, from our current HEAD or RC1.

From what I understand, the underlying bug that was triggering this one was fixed, so this is less of a priority now, downgrading priority to normal, unless we can find a case where this bug gets triggered by forces outside of provision that we can't fix another way.

EugenMayer’s picture

add it behind http://git.aegirproject.org/?p=provision.git;a=blob;f=platform/migrate.p...

Iam not sure setting the prio lower is the right assumption but maybe jus try out to reproduce it and make up your mind on the issue. I think its still critical.

#976300: web server migration results in content removal when site is verified iam not sure about. But i pretty much know the data-flow:
1. pre backups the database + files, the tgz lands in backup http://git.aegirproject.org/?p=provision.git;a=blob;f=platform/migrate.p...
2. that backup gets deployed in http://git.aegirproject.org/?p=provision.git;a=blob;f=platform/migrate.p...
a) That is basicaly untaring the file on the the aegir master into platforms
b) syncinc files with remote
c) importing the db http://git.aegirproject.org/?p=provision.git;a=blob;f=db/deploy.provisio...

So and 2.c is our issues, as db/deploy.migrate.inc has a rollback http://git.aegirproject.org/?p=provision.git;a=blob;f=db/deploy.provisio... but those are not called recursivaly.

So e.g. if later on provision gets called in http://git.aegirproject.org/?p=provision.git;a=blob;f=platform/migrate.p... or the verify in http://git.aegirproject.org/?p=provision.git;a=blob;f=platform/migrate.p... ( which is often failing ), the database is never rolled back, as db/deploy.migrate.inc/rollback is never called.

This something we really have to think more deeply about, as this comes down to:
If a task calles other task which do complete, but the global task fails and is rolled back, the subtask should get rolled back also - all.

anarcat’s picture

Status: Active » Postponed (maintainer needs more info)

If a task calles other task which do complete, but the global task fails and is rolled back, the subtask should get rolled back also - all.

That is correct.

So I guess in this case, it would simply be making sure that provision-verify provokes a rollback if it fails.

I suspect this is the case right now.

Can you please provide a clear patch of what you have done to trigger the (non) rollback? Just "drush_set_error()" is not enough to raise an error, I believe.

Please provide a task backlog too to help in debugging.

EugenMayer’s picture

Just to give this one more drive and infos:

If the maste process fails, but triggered 4 backend forks before which all succeeded and the master process ( clone / migrate ) decides to rollback, only his drush rollback hook ( clone_rollback ) are called but NOT some of those backend forks:
- deploy rollback ( db..)

as they succeeded. So actually what we need is "pass down the rollback even to all subevents in the callstack tree

ergonlogic’s picture

Version: » 7.x-3.x-dev
Issue tags: +Smarter remotes

tagging