First let me state that this is unrelated to #2150787: Queue daemon fails to restart itself despite the nearly identical titles. I was also experiencing the issue described there related to the queue daemon failing to restart on PEAR-based installs, but I resolved this with the included patch.
I periodically and seemingly at random discover that the Hosting queue daemon is not running after waiting a few minutes for an Aegir task to kick off. I verify that the daemon is probably not running by checking /admin/hosting/queued and noting the "Last started:" entry is greater than the 30 minute restart interval I have set. I also then verify that it is not running by checking the status of the hosting-queued service from the command line of the server. I can start the service from the command line and it always starts successfully. It will run for hours, days, or even weeks at a time just fine and then without explanation or any error that I can find, it will fail to restart during one of the regularly scheduled daemon restarts.
I know I can write a quick cron job or something to start the daemon if it isn't already running, but isn't that the point of what the daemon already does? I am mostly looking for anyone else having experienced a similar issue and how they solved it or if anyone has any ideas on where on the system to check for a log message or something similar to help me determine why it is failing to restart. I am running RHEL. The instantaneous task execution advantage of using hosting-queued over the cron-based task runner is quickly lost when my coworkers and I are unable to get tasks to execute, so this is obviously something we are very interested in fixing. I am tempted to just go back to the cron runner, but I feel like there is something deeper going on here that isn't working the way it should.
Comments
Comment #2
jpwester commentedComment #3
helmo commentedThe 7.x version has some improvements on this, but it's also not perfect.
One reason I've seen is a database server that is down. Even a few seconds during an upgrade ... gives a fatal php error which is not caught.
Comment #4
jpwester commentedYes, I suspect that might be my issue as well.
Without abandoning the daemon all together, there are two options as I see it. The first would obviously be to write the cron job I mentioned to check the status of the daemon and start it if it isn't running. The other option is likely a little more complicated. At least in my setup, the aegir user on our servers doesn't have the ability to start/stop the service; I have to use my own non-aegir user with sudo for that. In theory, I suppose I could get our Sys Admins to allow the aegir user to start/stop the service and then use a hook or something to check the status of the service upon the creation of a task. I haven't looked too closely at the code, but knowing Drupal, I'm sure there's a way to do that. :)
Comment #5
jpwester commentedI meant to add some sort of question to my last post. Do these two options sound realistic or am I missing something? Has anybody else done anything similar in the past? I just don't want to get too deep in this and find out I was way off base. Thanks!
Comment #6
helmo commentedThere has been talk about using something else to replace our own php code. A more standalone mature queue thing ... but that's long term. Although ergonlogic has been experimenting with this.
Comment #7
jpwester commentedInteresting, I'd be curious to see what comes of that for sure!
What I ended up doing was getting our server guys to allow the apache user to manage the service and I wrote a tiny module that checks the status of the service and starts if necessary on any node insert or update (hook_nodeapi). This was the best way I came up with to fire up the service upon task creation in Aegir.
Comment #8
ergonlogicThe 6.x-2.x branch will go EOL along with Drupal this week. So I'm closing this issue. If it remains a confirmed issue in 7.x-3.x, feel free to re-open, or better yet, create a new issue referencing this one.
As for the queue daemon, see: #2672530: Adopt Python queue daemon replacement