Drupal 10, the latest version of the open-source digital experience platform with even more features, is here.Aegir should check its environment condition before dispatching new tasks/crons when it is easy to kill the server using just a few heavy distros with very busy crons, with batch migrate or with mass-re-verify sites/platforms on Aegir upgrade.
Before trying to implement this in Aegir as a part of future enhancements, useful to monitor resources used by sites/platforms, like CPU and RAM estimated usage based on requests/minute, sites size (numbers of users/nodes/comments) bandwidth etc. it is also possible to introduce simple server load control before running Aegir cron.
Every minute we can run from cron simple shell script to check current load and decide if we can run the Aegir dispatcher safely.
Example:
#!/bin/sh
renice -9 $$
control()
{
NOW_LOAD=`awk '{print $1*100}' /proc/loadavg`
CTL_LOAD=200
if [ $NOW_LOAD -lt $CTL_LOAD ]; then
echo load is $NOW_LOAD while maxload is $CTL_LOAD
echo ... now doing CTL...
su - aegir -c "sh /home/aegir/aegir.sh"
else
echo load is $NOW_LOAD while maxload is $CTL_LOAD
echo ...we have to wait...
fi
}
control
and /home/aegir/aegir.sh script can include just:
#!/bin/sh
/path/to/php '/path/to/drush/drush.php' hosting dispatch --root='/path/to/aegir/domain' --uri=http://aegir.domain










Comments
Comment #1
omega8cc CreditAttribution: omega8cc commentedBTW: we can also run dispatcher every 15 seconds, just for crash-test or to avoid too long waiting for safe load :)
Comment #2
adrian CreditAttribution: adrian commentedwe can't depend on the /proc filesystem, and we can't depend on bash.
we need php functions to test these things if we are to test them at all, and i don't object to putting a switch into the dispatcher to not fire if the system load is too heavy.
Comment #3
anarcat CreditAttribution: anarcat commentedI'll attack this, I'm tired of aegir finishing off my struggling servers. I found this:
http://ca2.php.net/manual/fr/function.sys-getloadavg.php
Comment #4
anarcat CreditAttribution: anarcat commentedthis required a little of restructuring. i wanted to abort in an drush_init() so that we bootstrap the least possible. So I had to move the hosting-dispatch command from a callback to a regular command (r8543f5255148).
The fix for this itself is in rf39e00906f0f. I added two functions: provision_count_cpus() and provision_load_critical().
Both should work on all platform (but windows), but provision_count_cpus() currently returns FALSE on anything else than Linux. Extensions can easily be written for that, because unfortunatly, there's no way to tell the number of CPUs from within PHP reliably, so we need hacks to figure that out. So basically, provision_count_cpus() is Linux-only.
provision_load_critical(), which decides what "critical" means, uses the number of CPUs to figure out the load. That part is platform-agnostic (except windows).
The number of CPUs is important to make some sense of the load and evaluate if it's critical or not. A load of 4 on a 4 CPU system is not uncommon and shouldn't be too much of a problem. A load of 4 on a single CPU is bad and the machine feels unresponsive.
If we don't know the number of CPUs, we assume that a load of 10 (magic number!) is critical. Otherwise, we assume a load of 5 processes per CPU (e.g. 5 for a single CPU, 20 for a 4-CPU machine) is critical.
When load is critical, drush just doesn't run, which should help recovery during critical situations (as opposed to now, where Aegir aggravates the problem by spawning more drush bootstrap sequences, see #695244: hosting-cron killed my server for a few good examples).
So basically, I think this answers the spec: it works everywhere, because we have sane defaults if the platform-specific stuff cannot be figured out. We get the load in a platform-agnostic manner.
Testing would be appreciated, but I'm running this in production already.
Comment #5
anarcat CreditAttribution: anarcat commentedOh and I had good help from the following references, obviously found on google:
http://groups.google.com/group/sage-devel/browse_thread/thread/d65209f7a...
http://ca2.php.net/manual/en/function.sys-getloadavg.php
http://github.com/certik/sysconf/blob/master/ncpus.c
Comment #6
anarcat CreditAttribution: anarcat commentedAh, and another thing: I have implemented the controls in provision, which means it affects any drush command not defining a callback. This may not be desirable: maybe we want to implement the controls only over hosting-dispatch. If that's the case, we'd just need to move provision_drush_init() and the code for cpu and load detection to hosting.
I do think however, that it's useful to have such controls at the backend level, especially in the context of multiple server support: who cares about the load on the master server if the slave that's supposed to run the task is hosed?
Comment #7
anarcat CreditAttribution: anarcat commented