Drupal 10, the latest version of the open-source digital experience platform with even more features, is here.After running the apt-get upgrade command, aegir3-hostmaster gets stuck half way through being configured.
At first it was stuck at Platforms path /var/aegir/platforms is writable.
Thanks to a helpful person the IRC I was able to get additional debugging information. See attached file.
Now I see it is actually stuck on
Executing: mysql --defaults-extra-file=/tmp/drush_07o4Pu --database=exampledatabase_0 --host=localhost --port=3306 --silent < /tmp/drush_ixjEKl
We were also able to determine that the mysql connection is open, but idle.
| 409 | exampledatabase_0 | localhost | exampledatabase_0 | Sleep | 1 | | NULL | 0.000 |
| 1168 | exampledatabase_0 | localhost | exampledatabase_0 | Sleep | 0 | | NULL | 0.000 |
| 1211 | exampledatabase_0 | localhost | exampledatabase_0 | Sleep | 1 | | NULL | 0.000 |
I am using Debian Jessie and all other pending upgrades have processed successfully.
Any assistance is highly appreciated since this has caused our server to become unavailable.
Work around
Check to see if there's a task in the aegir task queue. Either make sure it's finished before upgrading or remove it.
When viewing the task node, use the 'Edit' tab to find the node id...and then visit example.com/node/1234/delete to get rid of it.
| Comment | File | Size | Author |
|---|---|---|---|
| #18 | aegir_upgrade_stuck-2773223-18.patch | 1.44 KB | helmo |
| #17 | aegir_upgrade_stuck-2773223-17.patch | 1.44 KB | helmo |
| #10 | aegir_upgrade_stuck-2773223-10.patch | 589 bytes | helmo |
| debug.txt | 53.09 KB | g33kg1rl |











Comments
Comment #2
g33kg1rl CreditAttribution: g33kg1rl commentedAn strace on the process provided this information over and over
poll([{fd=6, events=POLLIN|POLLPRI}], 1, 0) = 0 (Timeout)
sendto(6, "p\0\0\0\3SELECT COUNT(t.vid) FROM ho"..., 116, 0, NULL, 0) = 116
recvfrom(6, "\1\0\0\1\1\"\0\0\2\3def\0\0\0\fCOUNT(t.vid)\0\f?"..., 16384, 0, NULL, NULL) = 67
poll([{fd=6, events=POLLIN|POLLPRI}], 1, 0) = 0 (Timeout)
sendto(6, "x\0\0\0\3SELECT count(t.nid) FROM no"..., 124, 0, NULL, 0) = 124
recvfrom(6, "\1\0\0\1\1\"\0\0\2\3def\0\0\0\fcount(t.nid)\0\f?"..., 16384, 0, NULL, NULL) = 67
write(2, "\0DRUSH_BACKEND:{\"type\":\"message\""..., 186) = 186
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGCHLD, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
nanosleep({1, 0},
Comment #3
g33kg1rl CreditAttribution: g33kg1rl commentedI checked what was in the /tmp/drush_ixjEKl file and found SHOW TABLES;
Comment #4
g33kg1rl CreditAttribution: g33kg1rl commentedOK I figured out the solution. I had to do a manual upgrade as per this documentation --> http://docs.aegirproject.org/en/latest/install/upgrade/#upgrades-with-de...
Comment #5
colanIn my case, I couldn't get debugging turned on as neither of these added any extra output:
I had a task (of which I was unaware) that was still running (status -1). It was causing a wait state in drush_hosting_pause_validate(). Because debugging messages couldn't be turned on, I couldn't see this:
Once I deleted the task, all was well, and apt could continue.
Thanks to helmo for getting me on the right track!
Comment #6
matthewgann CreditAttribution: matthewgann commentedCan confirm that had multiple tasks stuck at processing (-1) that were causing the update to stall as well as a few other issues. Created a view to list those tasks and removed them. Upgrade finished processing.
Thanks @colan and @helmo
Comment #7
g33kg1rl CreditAttribution: g33kg1rl commentedI have been running into this issue every time I try to upgrade. I can confirm that creating the view to see the processing tasks and removing them will allow the upgrade to finish.
Comment #8
helmo CreditAttribution: helmo at Initfour websolutions for Aegir Cooperative commentedComment #9
helmo CreditAttribution: helmo at Initfour websolutions for Aegir Cooperative commentedJust closed a duplicate in #2866614: Upgrading 3.9 to 3.10 via debian packages is stalling
Comment #10
helmo CreditAttribution: helmo at Initfour websolutions for Aegir Cooperative commentedI'd like to address the root cause here ... the upgrade process is waiting until all tasks in the queue are finished.
Waiting for running tasks seems a good thing, lets address the viability of such a messages in #2861696: Extra output via debugging messages cannot be enabled on Debian package upgrades
I propose to NOT wait for tasks that are queued... this patch does just that.
Comment #11
Jon PughI'm now experiencing this in while trying desperately to get upgrade test working in travis for devshop: https://travis-ci.org/opendevshop/devshop/jobs/224067830
Funny I spent a lot of time struggling to figure out why it is hanging before finding this issue.
Well that explains why I can't make it work in the upgrade test, because it has to run as a single process, all the way through to the behat tests. There's no separate queue runner at all in docker, so...
I'll try out this patch in that devmaster test!
Comment #12
Jon PughIt's not just debian. This is happening in the hostmaster-migrate command.
Comment #13
Jon PughOk, I have a better question: What is the point of this command?
All this command does is stop Crontab. I don't see why this specific drush command should wait for tasks to finish processing. It doesn't stop a Hosting Queued, it doesn't stop tasks from being run manually with drush.
In fact, I can see this being a problem, in that if there are tasks stuck in processing, cron will just keep on running and will never be turned off, which might trigger new tasks... which will keep the command waiting even longer!
I propose we remove this validate hook completely and see what happens.
Comment #14
Jon PughSadly, after testing I have found that on hostmaster-migrate, the drush hosting-pause command is running from the old codebase.
So we are essentially stuck with this problem until after the next release.
I guess that means we need some kind of manual intervention when upgrading?
Still looking into this...
Comment #15
helmo CreditAttribution: helmo at Initfour websolutions for Aegir Cooperative commentedWe could extend the workaround offered here in the summary to suggest applying a small patch to the previous platform.
But yes ... hosting-pause might be from a time where we had no queue daemon. Removing seems very tempting.
Comment #16
helmo CreditAttribution: helmo at Initfour websolutions for Aegir Cooperative commentedIn the Debian package upgrade we call 'service hosting-queued stop' from debian/aegir3-hostmaster.postinst so both cron and queued would be disabled during such an upgrade.
I think there are certainly things that can go wrong when you run regular task during a hostmaster-migrate.
It might be nice to let hosting-pause also block the queued though.
Comment #17
helmo CreditAttribution: helmo at Initfour websolutions for Aegir Cooperative commentedHere's an UNTESTED patch, it adds a time-out to the query about running tasks... now set to 3600 secs (1 hour)
Comment #18
helmo CreditAttribution: helmo at Initfour websolutions for Aegir Cooperative commentednew patch ... < should be >
Comment #19
Jon PughSo it only loads tasks that were started in the last hour?
Comment #22
helmo CreditAttribution: helmo at Initfour websolutions for Aegir Cooperative commentedI increaded the time-out to 8 hour.
Comment #23
helmo CreditAttribution: helmo at Initfour websolutions for Aegir Cooperative commented@Jon Pugh: It's not loading them, just checking for any running tasks. And now ignoring a task that has the running status, but was started more then 8 hours ago.
Comment #25
milovan CreditAttribution: milovan commentedI confirm patch from #18 works on the latest version. Solved the issue, please commit it if you didn't have already.