I regularly see huge hosting_task_log tables in Aegir installs. This table alone can easily be 100x the size of all the rest of the frontend site's database combined, over 1GB on at least one Aegir I work on regularly. The hosting_task_arguments and hosting_task tables come in 2nd (100MB) and 4th (24MB) in terms of size. The only other table that's even in the same order of magnitude in node_revisions (3rd at 39MB).

There's already a contrib module that adds a queue to clean up Task data from deleted sites. I've just suggested that it #2053915: Allow for retention policies per task type and status. This would make it much more agressive in cleaning up these tables. I'm going to experiment in contrib, and see whether there are any negative consequences to this approach. Depending on the results there, maybe we can bring some of that into core. Hence marking this as 'postponed'.

Don't get me wrong... I firmly believe that the data in these tables is just about the most valuable in Aegir. I just think there's a bunch of redundant entries that don't bring any value. Having the Aegir frontend itself be one of the biggest sites on a given install just seems wrong :)

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

j0nathan’s picture

In our installation:

  1. hosting_task_log 1.6 GiB
  2. hosting_package_instance 246.4 MiB
omega8cc’s picture

We use the aforementioned Hosting task garbage collection module in BOA for over a year already. It really should be in core. There is no reason to keep these logs for deleted sites, platforms and any other orphans.

Rebuilding hosting_package_instance is another, separate thing, discussed in some other issue, I recall.

omega8cc’s picture

Another really useful module we use for a long time already: Revision Deletion

anarcat’s picture

Status: Postponed » Needs work

i think it's a good idea to merge this into core, or at least add it to the makefile, but i'd keep this to 3.x.

patch anyone?

chertzog’s picture

Issue summary: View changes
FileSize
3.81 KB

Here is a patch that 1.) ports hosting garbage collection to D7, and 2.) adds it to Hosting.

chertzog’s picture

Status: Needs work » Needs review
helmo’s picture

The other hosting sub-modules don't have 'hosting_' as directory prefix. So I guess we should add this as the 'task_gc' directory.

My very quick test just now failed to run the queue ... I'll try to look into that next week.

helmo’s picture

I got it working ... here's an updated patch with some cleanup.

One TODO could be to also reduce the number of node revisions on tasks for sites that are not deleted. But maybe we could use an existing module for that .... https://drupal.org/project/node_revision_delete (I have not tried this module)

helmo’s picture

helmo’s picture

Added #2217745: Merging into Aegir 7.x-3.x to the hosting_task_gc queue to let Dane Powell know.

A next step is to look at feature/thermonuclear in the hosting_task_gc repo, borrowing from #2066179: Dealing with platform logs too, or: the thermonuclear option

  • helmo committed d73aca2 on dev-helmo-3.x
    Issue #2053929 by helmo, chertzog | ergonlogic: Added Trim hosting_task...

  • helmo committed d73aca2 on 7.x-3.x, dev-helmo-3.x
    Issue #2053929 by helmo, chertzog | ergonlogic: Added Trim hosting_task...
helmo’s picture

Status: Needs review » Fixed

merged to 7.x-3.x

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.