In #1033072: migrate interface takes forever (litterally) to load, I have a fix to improve the performance on huge hosting_package_instance tables. However, those huge tables are usually a sign of zombie entries more than huge deployments (although we do want to allow that so the index fix probably should stay anyways).

I wonder what is the proper way to cleanup such old installs. In #934864: platform verification tasks take forever, and then some more, omega8cc refers to that problem and specifically to those two comments: http://drupal.org/node/907248#comment-3429476 and this: http://drupal.org/node/907248#comment-3436884

I am not sure how does comment relate to the zombies, but I'd be curious about how to fix older installs.

Anyone?

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Anonymous’s picture

Is there a clear discrepancy, i.e many more entries in hosting_package vs hosting_package_instance? Maybe we can delete anything that doesn't have an instance of that package? Although I suppose that is unsafe (unless we then force a Verify of platforms afterward in case we delete packages on platforms that haven't had sites installed on them).

Not really sure :s

omega8cc’s picture

This can be tricky, because you can have two categories of zombies there: duplicates and those "normal" - left from no longer used and/or modified platforms.

I'm afraid the only reliable and secure method would be to purge them all (all records) completely and re-verify all platforms. I didn't check the code yet, but I hope it will be enough.

[EDIT] Ah, full purge would require re-verifying also all sites. Not the best in the world idea, probably :(

bwood’s picture

subscribe

ergonlogic’s picture

Title: how do we cleanup zombie entries in hosting_package_* tables? » Cleanup zombie entries in hosting_package_* tables?
Version: 6.x-0.4-alpha3 » 6.x-2.x-dev
Category: support » feature
Status: Active » Needs review

We could build an array of active sites and platforms using _hosting_get_platforms() and hosting_get_sites_by_status(). We could then delete any entry in hosting_package_instance where the rid isn't among those nids.

The suggestion in #1 should work for hosting_package, since we add instances for platforms (without sites) too.

Presumably it would be safe to delete any entries in hosting_package_languages where there isn't a corresponding iid in hosting_package_instance, but I don't know if that's even a problem.

We could perhaps start calling hosting_package_instance_sync() with just the $rid argument in post site and platform delete hooks. Since there won't be any packages passed in, it should just delete any instances associated to rid. We could then do the one-off purge in an update hook.

I'm making this a feature request, and setting as 'needs review' to get opinions. Since I've just been working with the packages system, I'd be happy to add this fix, if we agree that it's a valid path to pursue.

anarcat’s picture

I wonder if we shouldn't do what is suggested in #2036283: Make Aegir aware of site-specific packages, that is TRUNCATE the _instance table and rebuild it. But instead of doing a verify on each site (which will trash the cache), why couldn't do a lightweight version somehow?

Maybe we could work on a temporary table or something... We *do* have all the data anyways, it's just that there's too much of it. Now, within #2036283: Make Aegir aware of site-specific packages we actually change the database structure, so that may require an actual verify to get more data from the backend, but that's a different story.

ergonlogic’s picture

We could perhaps start calling hosting_package_instance_sync() with just the $rid argument in post site and platform delete hooks. Since there won't be any packages passed in, it should just delete any instances associated to rid. We could then do the one-off purge in an update hook.

We could start here, since it should at least avoid additional build-up of zombie entries. Then we could work out how to do a one-time cleanup.

anarcat’s picture

What I am worried is that there is some data in the instance table that is not associated with a package or even a platform, so even if we call it with $rid, it will not clean anything up...

omega8cc’s picture

Yes, there is usually a ton of duplicated leftovers, so we probably need to re-build it from scratch, which in turn should trigger re-verify for platforms and sites :/

ergonlogic’s picture

Right, but I don't think that doing a one-time cleanup will fix this in the long run. We don't currently remove package instances when platforms or sites are deleted:

▾ 5 functions call hosting_package_instance_sync()
hosting_migrate_post_hosting_migrate_task in modules/hosting/migrate/hosting_migrate.drush.inc
hosting_platform_post_hosting_verify_task in modules/hosting/platform/hosting_platform.drush.inc
hosting_site_post_hosting_import_task in modules/hosting/site/hosting_site.drush.inc
hosting_site_post_hosting_install_task in modules/hosting/site/hosting_site.drush.inc
hosting_site_post_hosting_verify_task in modules/hosting/site/hosting_site.drush.inc

So, what I'm suggesting in #6 is that we, at least, start cleaning up after ourselves systematically.

ergonlogic’s picture

Also, while we define hosting_package_instance_delete(), we don't actually call it from anywhere. We could change it to delete an array of iid's, or write a new hosting_package_instances_delete() to do it. Then:

We could build an array of active sites and platforms using _hosting_get_platforms() and hosting_get_sites_by_status(). We could then delete any entry in hosting_package_instance where the rid isn't among those nids.

Orphans instances should get deleted by doing this, regardless of whether they have an rid or not. This would be pretty easy to do in a hook_update(), without having to verify everything.

ergonlogic’s picture

Here's a patch for #6

anarcat’s picture

Oooh, I see. Of course we need to cleanup after ourselves! :)

I see what you mean, that's a good way to go!

ergonlogic’s picture

Status: Needs review » Needs work

I merged the patch in #11 in 210d94f. There may be more cleanup like that we can do too.

I'll work on #10, unless anyone has better suggestions?

anarcat’s picture

go.

mrfelton’s picture

Manually applied the patch in #11 to my aegir 6.10 install (it didn't apply cleanly), truncated my hosting_package_instance table and reverified all platforms and sites. My hosting_package_instance has reduced in size from 2297248 rows to just 72665, and aegir is far speedier now taking about 5 seconds to load the migrate or clone modal dialog, vs the 30 or so it took before hand. I have confirmed that deleting a platform now actually deletes entries from the hosting_package_instance table which should help to keep its size down. Thanks for the tips, shame this didn't make it into the recent 1.10 release though.

ergonlogic’s picture

Version: 6.x-2.x-dev » 7.x-3.x-dev
Status: Needs work » Active

We don't have a patch for this anymore :)

  • Commit 210d94f on 6.x-2.x, 7.x-3.x, dev-ssl-ip-allocation-refactor, dev-sni, dev-helmo-3.x by ergonlogic:
    Issue #1034520: Cleanup package instances when deleting sites and...

  • Commit 210d94f on 6.x-2.x, 7.x-3.x, dev-ssl-ip-allocation-refactor, dev-sni, dev-helmo-3.x by ergonlogic:
    Issue #1034520: Cleanup package instances when deleting sites and...

  • ergonlogic committed 210d94f on dev-helmo-3.x
    Issue #1034520: Cleanup package instances when deleting sites and...

  • ergonlogic committed 210d94f on 7.x-3.x-1995506
    Issue #1034520: Cleanup package instances when deleting sites and...

  • ergonlogic committed 210d94f on dev-2223033
    Issue #1034520: Cleanup package instances when deleting sites and...

  • ergonlogic committed 210d94f on dev-2359571
    Issue #1034520: Cleanup package instances when deleting sites and...

  • ergonlogic committed 210d94f on 7.x-4.x
    Issue #1034520: Cleanup package instances when deleting sites and...