I accidentally run the command drush self-update. This overrided the aegir-comparible drush with a newer version. When I realized the issue, two days later, I replaced the new drush with old one (4.6-dev), found in the backups.
The issue I am facing is that the tasks in the Aegir queue have stopped running. They give the message that they should run in the next cron, in about a minute, but then nothing happens. It keeps telling me that the last cron was run two days ago.

I have tried sudoing as Aegir and running the following:

drush vdel hosting_queue_cron_running -y
in case a semaphore is left, but this gives the message hosting_queue_cron_running not found.

/var/aegir/drush/drush.php '@hostmaster' hosting-dispatch --debug
and it runs fine, which means that the drush kicks the cron, but for some reason it doesn't reach Aegir

I tried to run barracuda up-stable in order to recover .drush, but it didn't recover it. I also tried to clean cache and rebuild registry. The time of the server is correct. Could you suggest some way to further debug the issue?

Comments

radiobuzzer’s picture

Issue summary: View changes
radiobuzzer’s picture

Issue summary: View changes
radiobuzzer’s picture

Issue summary: View changes
josevitalsouto’s picture

After upgrading to the latest version (BOA 2.2.2) I'm also the problem to run cron jobs.
Testing run cron manually, I found that addresses cron job in Drupal displays the 403 error, there is probably something wrong in the settings of Nginx.

eg:
Trying to access: http://mysite.com/cron.php?cron_key=SnEqeXdE7T_mLcTS5gbm_zuMtuvDvFweuqPT...
Error 403 Forbidden

omega8cc’s picture

Status: Active » Postponed (maintainer needs more info)

Please attach required config/log files.

omega8cc’s picture

Also, please note that access to /cron.php is allowed only for local requests on the server.

radiobuzzer’s picture

StatusFileSize
new1.89 KB
new1.15 KB
josevitalsouto’s picture

StatusFileSize
new1.56 KB
new1.92 KB
radiobuzzer’s picture

Edit, when firing command /var/aegir/drush/drush.php '@hostmaster' hosting-dispatch --debug as aegir, cron runs normally. This is the end of the debug output given on the commandline:

Bootstrap to phase 6. [0.27 sec, 22.93 MB]                                            [bootstrap]
Drush bootstrap phase : _drush_bootstrap_drupal_login() [0.27 sec, 22.93 MB]          [bootstrap]
Successfully logged into Drupal as Anonymous (uid=0) [0.27 sec, 22.93 MB]             [bootstrap]
Found command: hosting-tasks (commandfile=hosting) [0.27 sec, 22.93 MB]               [bootstrap]
Running tasks queue [0.27 sec, 22.93 MB]                                                 [notice]
Command dispatch complete [0.27 sec, 22.93 MB]                                           [notice]
Peak memory usage was 21.13 MB [0.27 sec, 22.93 MB]                                      [memory]
Command dispatch complete [0.27 sec, 21.74 MB]                                           [notice]
 Timer  Cum (sec)  Count  Avg (msec) 
 page   0.202      1      201.65     
radiobuzzer’s picture

I have been debugging at the code

line 891 of hostmaster/modules/hosting/task/hosting_task.module

<?php
   $result = db_query("SELECT t.nid FROM {hosting_task} t INNER JOIN {node} n ON t.vid = n.vid WHERE t.task_status = %d GROUP BY t.rid ORDER BY n.changed, n.nid ASC LIMIT %d", 0, $limit);
   while ($node = db_fetch_object($result)) {
        $return[$node->nid] =  node_load($node->nid);
   }
return $return;
?>

never enters the while loop, although the same query works when run manually via mysql console

nevertheless, line 24 of hostmaster/modules/hosting/dispatch.hosting.inc
$queues = hosting_get_queues();
returns the right number of task items

omega8cc’s picture

@vitalsouto Please note that the original issue here is not about cron (not) running for sites, but about Aegir backend tasks, and it is related to the Mater Instance only (which shouldn't be used, by the way). Please open separate issue if your problem is not related to this issue (and it sounds like it is not for sure).

omega8cc’s picture

@radiobuzzer Your problem can't be related to issues with BOA nor Aegir code, I believe, especially since the last known issue has been fixed #2229715: Tasks in queue aren't running in the Master Instance

I would suggest to find and remove the Drush copy you have installed and replace /var/aegir/drush with the copy you can find in /opt/tools/drush/4/drush

Make sure to run chown -R aegir:aegir /var/aegir/drush

radiobuzzer’s picture

Status: Postponed (maintainer needs more info) » Active

Thanks, I did that and it still does not work. I am wondering if this is still relevant by the forced upgrade of drush.

This is because I also run system up-stable from BOA 2.1.3 to 2.2.2, which means the problem may have occurred because of the system upgrade. I wasn't not aware of issue #2229715, I have to check whether there is something relevant to that.

radiobuzzer’s picture

Ok. I followed an advice issue #2229715 and ran bash /var/xdrago/run-o1. I saw an error I hadn't noticed. /opt/local/bin/php was missing. I create a symbolic link to the /usr/bin/php and it works now. The pending tasks have also started running again in batches every one minute.

I don't know what is the reason for the disappearance /opt/local/bin/php. Anyway. The problem is now solved. If anybody's else php disappears, please open again the issue

radiobuzzer’s picture

Status: Active » Fixed

See last comment

radiobuzzer’s picture

Title: Aegir tasks not running, after accidentally upgrading drush » Aegir tasks not running, due to missing /opt/local/bin/php
omega8cc’s picture

Status: Fixed » Closed (cannot reproduce)

It is simple: you didn't upgrade the Octopus instance yet (and maybe Master Instance?), if it is still trying to use /opt/local/bin/php

Please remove that symlink or more bad things may happen, and run proper, full barracuda upgrade (Including master instance) followed by complete upgrade for all octopus instances.

omega8cc’s picture

Title: Aegir tasks not running, due to missing /opt/local/bin/php » Aegir tasks not running due to incomplete barracuda+octopus upgrade
radiobuzzer’s picture

Thanks. I have tried up-stable, up-stable all and up-stable system and the situation remains the same.

Indeed the upgrade was totally broken. I am now running the correct upgrade

omega8cc’s picture

This is not a correct method to run complete upgrade.

Please read and follow the docs: https://github.com/omega8cc/boa/blob/master/docs/UPGRADE.txt

radiobuzzer’s picture

So I ran

 barracuda up-stable
 octopus up-stable all

the installation proceeded but the tasks are still not being executed.
bash /var/xdrago/run-o1 still gives this error

/data/disk/o1/aegir.sh: line 4: /opt/local/bin/php: No such file or directory
CTL done
omega8cc’s picture

You should enable debug mode and force upgrade with octopus up-stable all both

Then review any errors you will see displayed in the terminal window.

You didn't complete upgrades for some reason if the cron script is still trying to use old php path.

radiobuzzer’s picture

Thanks, it seems this solved it.

snlnz’s picture

Unfortunately this issue is occurring on one of our hosts and I'm simply not sure if this should be in the barracuda or octopus queue or if I should re-open a closed issue or create a new one?

I'll supply as much info as I can anyway:
http://pastebin.com/iUTB3WWW - /root/.barracuda.cnf
http://pastebin.com/9475SiXe - /data/disk/USER/log/octopus_log.txt
http://pastebin.com/FHLxHpU3 - /var/log/barracuda_log.txt
http://pastebin.com/rF4dbzV6 - /root/.USER.octopus.cnf

I've run the upgrade a couple of times now just to be sure.

barracuda up-stable log
octopus up-stable all both log

I can confirm Aegir tasks are pending in the queue indefinitely.
If I run cron manually (drush @hostmaster hosting-tasks --debug) the tasks start without any error and everything appears to be execute without issues.

I tried running the /var/xdrago/run-oct which completes in a couple of seconds, no errors.
I also tried running /var/xdrago/daily.sh which I haven't seen the command complete without my ssh session timing out so I've just kicked it off again using screen but it appears to take a long time to execute without any feedback. FYI

Any thoughts?

snlnz’s picture

Status: Closed (cannot reproduce) » Active
omega8cc’s picture

Status: Active » Closed (fixed)

This means that you probably don't have /var/xdrago/run-USER script matching the instances USER. At least I don't see any other reason, if running the queue manually works w/o any errors. Since you have run proper upgrades, your problem is not related to this issue here, so please don't re-open it. Rather debug this on your end until you will find why the cron is not invoked.

snlnz’s picture

ok do you have any idea where to start to debug this?
This issue appears to affect all cron jobs so sites in the octopus instance aren't running either.

omega8cc’s picture

No idea, honestly. It just works for me (tm)