I'm regularly running into load on system too heavy when trying to migrate, with the result that I can't migrate (upgrade) the site. That has obvious security implications.

Latest error:
"load on system too heavy (5.66 3.03 1.81), aborting
Drush command terminated abnormally due to an unrecoverable error."

I'm happy to slow the server to a crawl temporarily to get this done but is there a setting that will allow me to override that test or raise the bar without hacking provision_load_critical?

More generally, is there anything that can be done to reduce aegir/drush's resource usage? (Even if as crude as putting sleep in between steps?)

See also #1490974: load on system too heavy

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Steven Jones’s picture

This check happens here: http://drupalcode.org/project/provision.git/blob/refs/heads/6.x-1.x:/pro...
Not sure its configurable really, maybe the best you could do would be to comment those lines out and skip the check?

willmoy’s picture

Just in case anyone else finds this, the hack I actually used was to force provision_load_critical(), which defines what counts as excessive system load, to use a higher value by putting $threshold = 10 before the return line. That lets you give yourself a bit more headroom without completing skipping the safety feature.

The comments on provision_load_critical() [1] say: "It's not a really reliable metric" but it's also true that it's not easy to come up with a better one.

But why should aegir abort when it's triggered? Why not just sleep for a bit and then retest and only abort in the load continues to be excessive?

I would be happy to roll a patch if you think it might be worth looking at.

[1] http://drupalcode.org/project/provision.git/blob/refs/heads/6.x-1.x:/pro...

anarcat’s picture

Title: load on system too heavy aborting » have a workaround for the "load on system too heavy aborting" errors

One thing that could be done here would be to make the threshold customizable. The code certainly makes that possible, and it could be a simple variable in the system-wide drushrc...

The problem with waiting for the test is that the load average is an average that is done every minute and transient overloads are not likely to go away in a reasonable delay. We'd need to wait like 60 seconds, which doesn't seem like an acceptable delay...

helmo’s picture

Version: 7.x-2.x-dev » 7.x-3.x-dev
Issue summary: View changes
ergonlogic’s picture

Status: Active » Needs review
FileSize
912 bytes

Here's a patch to get things going. It basically just makes the multiplier and threshold Drush options, so they can easily be overridden. Once we have this (or something like it) committed, we can look at adding these options to our drushrc.php on hostmaster site verifies.

ergonlogic’s picture

Status: Needs review » Fixed

Fixed in 14760417.

  • Commit 1476041 on 7.x-3.x by ergonlogic:
    Issue #1785472: have a workaround for the 'load on system too heavy...

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.

ergonlogic’s picture

See: #2295923: Allow configuration of load thresholds from the UI for follow-up on the front-end component.