We are getting deadlocks when doing platform upgrades. Is there a setting that will prevent multiple tasks from running at once?

debian version 7.11
boa version 3.2.0

errors
WD node: PDOException: SQLSTATE[40001]: Serialization failure: 1213 Deadlock found when trying to get lock; try restarting transaction: INSERT INTO {node_access} (nid, realm, gid, grant_view, grant_update, grant_delete) VALUES (:db_insert_placeholder_0, :db_insert_placeholder_1, :db_insert_placeholder_2, :db_insert_placeholder_3, :db_insert_placeholder_4, :db_insert_placeholder_5); Array
(
[:db_insert_placeholder_0] => 11244
[:db_insert_placeholder_1] => hosting task
[:db_insert_placeholder_2] => 1
[:db_insert_placeholder_3] => 1
[:db_insert_placeholder_4] => 0
[:db_insert_placeholder_5] => 0
)
in node_access_write_grants() (line 3591 of /data/disk/o1/aegir/distro/015/modules/node/node.module).

exception 'PDOException' with message 'SQLSTATE[40001]: Serialization failure: 1213 Deadlock found when trying to get lock; try restarting transaction' in /data/disk/o1/aegir/distro/015/includes/database/database.inc:2227
Stack trace:
#0 /data/disk/o1/aegir/distro/015/includes/database/database.inc(2227): PDOStatement->execute(Array)
#1 /data/disk/o1/aegir/distro/015/includes/database/database.inc(697): DatabaseStatementBase->execute(Array, Array)
#2 /data/disk/o1/aegir/distro/015/includes/database/mysql/query.inc(36): DatabaseConnection->query('INSERT INTO {no...', Array, Array)
#3 /data/disk/o1/aegir/distro/015/modules/node/node.module(3591): InsertQuery_mysql->execute()
#4 /data/disk/o1/aegir/distro/015/modules/node/node.module(3537): node_access_write_grants(Object(stdClass), Array, NULL, true)
#5 /data/disk/o1/aegir/distro/015/modules/node/node.module(1184): node_access_acquire_grants(Object(stdClass))
#6 /data/disk/o1/aegir/distro/015/profiles/hostmaster/modules/aegir/hosting/task/hosting_task.module(579): node_save(Object(stdClass))
#7 /data/disk/o1/aegir/distro/015/profiles/hostmaster/modules/aegir/hosting/task.hosting.inc(66): hosting_add_task('10614', 'verify', Array)
#8 [internal function]: drush_hosting_task_validate('@platform_custo...', 'verify')
#9 /data/disk/o1/tools/drush/includes/command.inc(422): call_user_func_array('drush_hosting_t...', Array)
#10 /data/disk/o1/tools/drush/includes/command.inc(231): _drush_invoke_hooks(Array, Array)
#11 [internal function]: drush_command('@platform_custo...', 'verify')
#12 /data/disk/o1/tools/drush/includes/command.inc(199): call_user_func_array('drush_command', Array)
#13 /data/disk/o1/tools/drush/lib/Drush/Boot/BaseBoot.php(67): drush_dispatch(Array)
#14 /data/disk/o1/tools/drush/includes/preflight.inc(66): Drush\Boot\BaseBoot->bootstrap_and_dispatch()
#15 /data/disk/o1/tools/drush/drush.php(12): drush_main()
#16 {main}

If you need any other information, please let me know. Thank you!

CommentFileSizeAuthor
#4 running tasks.jpg133.27 KBsgardapee
Members fund testing for the Drupal project. Drupal Association Learn more

Comments

sgardapee created an issue. See original summary.

colan’s picture

helmo’s picture

Are you sure there are two or more separate hosting tasks being executed?

I such an error recently in #2866279: Deadlock with concurrent webhooks and it's unfortunately something that happens more then it should but is not unique to Aegir.

sgardapee’s picture

FileSize
133.27 KB

Yes, there is more than one task running. I verified this in the database. I just submitted a platform migrate on a test server, migrating 5 sites. It appears that there are three tasks running by looking at the task_status ("-1" indicates running task as I understand it). Attached shows the statuses of my tasks. I have done this upgrade a few times now, one site has failed on most of those migrates. I have to migrate 100 sites this weekend, I'm concerned about multiple failures.

If you are suggesting this is not specific to Aegir, can you help me understand what I can do?

sgardapee’s picture

UPDATE: We ran the platform upgrade in production over the weekend. The tasks single threaded, no deadlocks. There must be something different on the test server, I will investigate. Thanks for all your time.