This has been a long running problem, if you are used to aegir you just "deal with it".

Let's be honest: it's a big cause of some of the AegirHate out there. If you can't cleanly delete things from the system it is very frustrating. It just feels broken.

Scenario #1:

  1. Site Delete task is triggered, and deletes database, but fails to update status to Success for some reason.
  2. Since the task failed, User assumes Site Delete didn't work. Hit's "Retry".
  3. Second run of Site Delete task fails because "Unable to Drop Database".
  4. Site is permanently stuck in "enabled" mode, even though it's been destroyed. The only recourse is to manually delete the node by visiting node/%/delete or the admin pages.

Solution: In this scenario, the second "Site Delete" task should finish with a warning, because the desired end state (no database) is reached.

Scenario #2:

  1. Platform Delete task is triggered, and deletes files, but fails to update status to success for some reason.
  2. Since the task failed, User assumes Site Delete didn't work. Hit's "Retry".
  3. Second run of Site Delete task fails because "The directory /var/aegir/platforms/drupal does not contain a valid Drupal installation".
  4. Platform is permanently stuck, in an active state, because you can't delete it or verify it.

This also occurs if one were to manually delete the files from the platform and then run a Delete Platform task.

Solution: In this scenario, the second "Platform Delete" task should finish with a warning, because the desired end state (no files) is reached.

Let's try to find, describe, and fix problems like these.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Jon Pugh created an issue. See original summary.

ergonlogic’s picture

First off, referring to hatred of the project is simply demotivating, and reminds me of this little diatribe. Let's try to avoid it.

Secondly, neither of the scenarios you describe result in the "desired end state". The reason the delete task failed in the first place presumably still exists. Furthermore, setting the task status to "warning" won't have any effect on the state of the site or platform. So it'd still need to be manually cleaned up.

I think a more productive approach would be to detect and address the types of failures that lead to such scenarios. Also, I've begun work to add "Purge" tasks, that would, to the extent possible, safely automate clean up of such error states: http://cgit.drupalcode.org/hosting_dev/tree/modules/purge/README.md.

Jon Pugh’s picture

Apologies if my comments were demotivating, my intent was the exact opposite. My hope is that we can be honest about our project's pain points so that we can address them.

An I understand about handling all of these situations better.

So in that spirit... the site deletion one is one that can't be worked around without manually deleting the site node, site alias, etc. You can also get into this state by using a "proper" procedure of using drush provision-delete via CLI, then going into hostmaster and hitting the "delete" button..

It also results in a site that appears "enabled" in the front-end, even if it's been totally destroyed in the back.

Here's a bad delete task... It looks like there is a warning, and there is also an error. Seems like at some point Aegir was ok with the warning, then at some point we added the error.

Happy to help address this when I have some more time next week.

Jon Pugh’s picture

At the very least, we should tell the user why it failed to drop the database and some kind of recourse to take.

Jon Pugh’s picture

The bigger problem with Site Delete failing is that it does not delete the sites/DOMAIN/settings.php file. All other files in the sites folder get deleted.

With this file still there, it blocks you from being able to delete the platform until you SSH in and remove this file, otherwise you get the message "There are still sites on this platform".

So, as a first step, we should figure out how to let Site Delete tasks finish as much as they can (including deleting their sites/DOMAIN folder).

This would at least allow you to continue and delete the platform.

Jon Pugh’s picture

Assigned: Unassigned » Jon Pugh

After many years of this, I finally learned that provision-delete command already has a "force" option.

I also understand now why the code triggers a task failure: it is unable to make a backup.

If I add a "force" option to the hosting_task_form 'parameters' it works in fully deleting a site that failed to install!

We can simply add a variable now: "When deleting sites, fail if a backup cannot be made."

I think we should make this variable TRUE by default.

Jon Pugh’s picture

Title: Failed "Delete" tasks should end in a warning when intended state is reached. » Allow --force option on Delete tasks.
Priority: Normal » Major
Status: Active » Needs review
FileSize
14.39 KB

Updating the title to reflect the solution: If we were able to run with --force, this is no longer an issue.

Solution is a new variable:

Force sites, platforms, and servers to be removed when running the Delete task.
If something goes wrong when deleting a site, platform, or server, the task will fail. Check this box to force contexts to be removed when Delete tasks are run.

Screenshot:

screenshot of new setting

  • Jon Pugh committed a555a14 on 2666158-delete-force-setting
    Issue #2666158: Allow --force option on Delete tasks.
    

  • helmo committed db1567c on 2666158-delete-force-setting
    Issue #2666158: Remove duplicate check
    
helmo’s picture

Status: Needs review » Reviewed & tested by the community

Sounds good to have that on /admin/hosting/settings

I added one minor commit, you were checking the variable_get('hosting_delete_force', FALSE) twice.

  • helmo committed 3f052e4 on 7.x-3.x
    Issue #2666158: Merge remote-tracking branch 'origin/2666158-delete-...
  • Jon Pugh committed a555a14 on 7.x-3.x
    Issue #2666158: Allow --force option on Delete tasks.
    
  • helmo committed db1567c on 7.x-3.x
    Issue #2666158: Remove duplicate check
    
helmo’s picture

Status: Reviewed & tested by the community » Fixed

merged

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.

Jon Pugh’s picture

Reopening to cover the issue:

"Don't try to backup a site provision cannot access."

  1. User creates site. Site install fails. Aegir rolls back database and settings.php. There is no database.
  2. User triggers Delete task.
  3. Delete site task fails, only because the "provision-backup" is invoked on a site that has no data to backup.

The branch pushed fixes this: It first checks if drush can bootstrap the database. Then it makes a backup. If it can't, it throws a warning that there was no DB to delete.

Jon Pugh’s picture

Project: Hosting » Provision
Status: Closed (fixed) » Needs review
Jon Pugh’s picture

Issue tags: +devshop patches

This has been working great on devshop for a long time: https://git.drupalcode.org/project/provision/compare/7.x-3.x...2666158-d...

  • Jon Pugh committed a5c4b5c on 7.x-3.x
    Issue #2666158 by Jon Pugh, helmo, ergonlogic: Don't invoke a
    "provision...
Jon Pugh’s picture

Status: Needs review » Fixed
colan’s picture

So do we still need https://www.drupal.org/project/hosting_dev ? Or is it no longer useful now that this stuff is fixed?

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.