My Setup: I am running Aegir a14 using Barracuda and Octopus on Ubuntu.

I have tried to clone a site which I imported into aegir, and the clone task began normally but then got stuck, here is the log messages:

Log message
Task starts processing
Running: /data/disk/master/tools/drush/drush.php @wcstaging.brainowl.com provision-clone '@affiliate.wcstaging.brainowl.com' '@platform_WealthCycles' --backend

This very well may be some sort of error with how I imported the site, or the code itself, but that brings me to issue 2:

There is no way to cancel a processing task. It just sits there processing, and no new tasks are being processed in the mean time. It would be nice to have some sort force stop option.

Comments

Anonymous’s picture

There is no real way to cancel a 'processing' task because there's no way to be sure what stage the task is up to in order to clean up from.

Just find the nid of the task itself (open the task output in a new window/tab) then go to /node/$nid/delete and delete the task. That'll get rid of the stalled task for you.

entrigan’s picture

Category: bug » support

Makes sense. What if tasks had a field (non cck obviously) for status. Then aegir could "kill" a task by setting its status to aborted.

Anonymous’s picture

Tasks do have a status (queued, processing, error, success). so you could manually change the status of it in the db if you like (as opposed to delete the task altogether)

We do have the Cancel button on a task when it is queued, but we can't really set it to be applied to a task of status 'processing' for the reasons in #1. (too dangerous in my opinion).

The only other way out of this would be to have Aegir look at the time the task started executing and if it's past a certain threshold (maybe an hour) it could consider the task as stalled and set it to 'error' or something like that.

Of course I would love to figure out why some tasks stall altogether (not the first time I've seen it.)

Anonymous’s picture

FYI the task statuses are


/**
 * The task is being processed
 */
define('HOSTING_TASK_PROCESSING', -1);


/**
 * The task is queued
 */
define('HOSTING_TASK_QUEUED', 0);

/**
 * The command completed succesfully.
 */
define('HOSTING_TASK_SUCCESS', 1);

/**
 * The command was not successfully completed. This is the default error
 * status.
 */
define('HOSTING_TASK_ERROR', 2);
entrigan’s picture

I agree with you that a threshold would be much safer than a abort button. A system that relies a human skill is a system that will fail.

So I ran the stalled command through the command line with --verbose:

DRUSH_BACKEND_OUTPUT_START>>>{"output":"","object":[],"error_status":1,"log":[{"type":"bootstrap","message":"Drush bootstrap phase : _drush_bootstrap_drush()","timestamp":1287089123.4411,"memory":1257864,"error":null},{"type":"notice","message":"Load alias @wcstaging.brainowl.com","timestamp":1287089123.448,"memory":1268184,"error":null},{"type":"error","message":"A Drupal installation directory could not be found","timestamp":1287089123.8701,"memory":2508200,"error":"DRUSH_NO_DRUPAL_ROOT"},{"type":"error","message":"The drush command '@wcstaging.brainowl.com provision-clone @wcstaging2.brainowl.com @platform_WealthCycles' could not be found.","timestamp":1287089123.8703,"memory":2509768,"error":"DRUSH_COMMAND_NOT_FOUND"}],"error_log":{"DRUSH_NO_DRUPAL_ROOT":["A Drupal installation directory could not be found"],"DRUSH_COMMAND_NOT_FOUND":["The drush command '@wcstaging.brainowl.com provision-clone @wcstaging2.brainowl.com @platform_WealthCycles' could not be found."]},"context":{"backend":true,"verbose":true}}<<<DRUSH_BACKEND_OUTPUT_END

I am thinking {"type":"notice","message":"Load alias @wcstaging.brainowl.com","timestamp":1287089123.448,"memory":1268184,"error":null},{"type":"error","message":"A Drupal installation directory could not be found","timestamp":1287089123.8701,"memory":2508200,"error":"DRUSH_NO_DRUPAL_ROOT"} means my alias is having problems. But checking in my .drush folder, mysite.mydomain.com.alias.drushrc.php exists and seems to be configured correctly.

So if nothing else hopefully we can track down the source of the stall so that the "provision-clone" can be improved to handle the error. Thoughts?

entrigan’s picture

P.S. @mig5 your help is amazing. Your articles, screencasts and issue queues, irc: seriously amazing.

entrigan’s picture

Also interestingly, my cpu usage, disk IO and network usage all spike when I try running clone on this site. Clone does btw work for other sites I have in aegir.

Anonymous’s picture

Are you sure the settings in the alias match that of the system (i.e the @platform is correct?)

Try verifying the current platform, the target platform you're cloning the site to (if it differs) and then the site in question, then attempt the clone again?

Anonymous’s picture

Status: Active » Closed (cannot reproduce)

Closing this from lack of response (standard procedure, don't take offense), please reopen if you have more info so we can help.

The No Drupal installation directory could not be found should be easily fixable if your frontend and backend are all in sync (run verify tasks).

entrigan’s picture

Ya, I agree with the status. I will try again after upgrading to a15 (at some point in the future) and report back if there is anything interesting. Thanks.

Also, I forgot to respond to #8 but reverifying did not solve the issue.

omega8cc’s picture

BTW: this happens when your site has some serious bugs or for some other reasons are using too much servers resources and the proces simply timeouts (in the database or in the php) so it fails to report success or failure to the frontend.

The common fix is:

1. Delete the task node.
2. Re-verify the platform, since it is possible the site has been cloned anyway.
3. If Aegir will discover the clone, it will verify it and we are home.
4. If not, try to increase php/mysql limits and try again.

I hope this helps a bit.