SQLSTATE[40001]: Serialization failure: 1213 Deadlock found when trying to get lock; try restarting transactionSQLSTATE[40001]: Serialization failure: 1213 Deadlock found when trying to get lock; try restarting transaction

And going to repeat the unique part

SQLSTATE[40001]: Serialization failure: 1213 Deadlock found when trying to get lock; try restarting transaction

Re-installed but wasn't able to reproduce the error, was going to test if #926426: system_check_http_request() causes deadlock on single threaded webserver was related >.O.

PHP: 5.2.5, www.entropy.ch Release 6
Mysql: 5.0.88
(Apache, os-x or whatever this mac's operating system is)

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

drunken monkey’s picture

I receive the same error frequently on node pages, when I try to open several nodes at once (with Javascript, so really simultaneously, not one after another). The error isn't written in a normal drupal_set_message()-type message box but seemingly simply output with echo, appearing either before any content or suddenly somewhere in the header of the HTML code – but always displayed above the actual site. They are also not logged with watchdog.
Once, even a fatal error occurred with about the same message (sadly, I didn't save it and now can't reproduce), which for the first time mentioned the cause: It seems, an INSERT INTO {history} is the problem, which seems plausible (two simultaneous page requests could easily bring two threads to want to insert at the same time, and probably the history table is the only one being changed on simple page requests).

Anyways, this seems to me to be a bit more serious than stated above, since end users could easily receive the same message – since it doesn't seem to follow any error reporting rules, it probably also can't be turned off along with normal error reporting. Maybe something in PDO? Or only in some specific version?

This, in any case, is the system:
PHP 5.2.6-1+lenny9
Linux 2.6.26-2-amd64 x86_64
Apache web server
PDO Driver for MySQL, client library version 5.0.51a

hefox’s picture

Title: DEADLOCK error on installation » DEADLOCK errors

If I remember correctly, ditto on the errors displaying on the top of the screen instead of drupal_set_message;
Updating the title to reflect what drunken money said.

Anonymous’s picture

hefox, drunkenmonkey - can one or both of you write up some reproduce steps? or, even better, a test that fails due to this bug?

also, anyone else seeing this issue because they've encountered a DEADLOCK error - if possible, please run 'SHOW INNODB STATUS', because along with a lot of other stuff, it will show information about the last DEADLOCK. pasting that output here would help us fix this issue.

drunken monkey’s picture

I don't think we'll be able to make a conventional test for this, as this requires (for me, at least) several simultaneous page requests, which the normal test case afaik can't do.

Steps to reproduce: Open several dozen node pages simultaneously.
E.g., with FireGestures (Firefox plugin), mark a list of links to different node pages and execute the "Open all links in selection" gesture.

$ SHOW INNODB STATUS ;
------------------------
LATEST DETECTED DEADLOCK
------------------------
101029 18:06:49
*** (1) TRANSACTION:
TRANSACTION 0 11499410, ACTIVE 0 sec, process no 4388, OS thread id 3218288 inserting
mysql tables in use 1, locked 1
LOCK WAIT 4 lock struct(s), heap size 320, 3 row lock(s)
MySQL thread id 80, query id 14197 localhost drupal update
INSERT INTO cache_menu (cid, serialized, created, expire, data) VALUES ('links:devel:page:node/35:de:1:0', '1', '1288368408', '0', 'a:4:{s:9:\"min_depth\";i:1;s:9:\"max_depth\";N;s:8:\"expanded\";a:1:{i:0;i:0;}s:12:\"active_trail\";a:1:{i:0;i:0;}}')
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 0 page no 2290 n bits 120 index "PRIMARY" of table "drupal7"."cache_menu" trx id 0 11499410 lock_mode X insert intention waiting
Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0
0: len 8; hex 73757072656d756d; asc supremum;;

*** (2) TRANSACTION:
TRANSACTION 0 11499411, ACTIVE 0 sec, process no 4388, OS thread id 12364656 inserting, thread declared inside InnoDB 500
mysql tables in use 1, locked 1
4 lock struct(s), heap size 320, 3 row lock(s)
MySQL thread id 78, query id 14196 localhost drupal7 update
INSERT INTO cache_menu (cid, serialized, created, expire, data) VALUES ('links:devel:page:node/33:de:1:0', '1', '1288368408', '0', 'a:4:{s:9:\"min_depth\";i:1;s:9:\"max_depth\";N;s:8:\"expanded\";a:1:{i:0;i:0;}s:12:\"active_trail\";a:1:{i:0;i:0;}}')
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 0 page no 2290 n bits 120 index "PRIMARY" of table "drupal7"."cache_menu" trx id 0 11499411 lock_mode X
Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0
0: len 8; hex 73757072656d756d; asc supremum;;

*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 0 page no 2290 n bits 120 index "PRIMARY" of table "drupal7"."cache_menu" trx id 0 11499411 lock_mode X insert intention waiting
Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0
0: len 8; hex 73757072656d756d; asc supremum;;

*** WE ROLL BACK TRANSACTION (2)
mr.baileys’s picture

Ran into the same issue today on a site under very light load (not live):

PDOException: SQLSTATE[40001]: Serialization failure: 1213 Deadlock found when trying to get lock; try restarting transaction: INSERT INTO {history} (uid, nid, timestamp) VALUES (:db_insert_placeholder_0, :db_insert_placeholder_1, :db_insert_placeholder_2); Array ( [:db_insert_placeholder_0] => 1 [:db_insert_placeholder_1] => 745 [:db_insert_placeholder_2] => 1306241824 ) in node_tag_new() (line 313 of /export/www/photo/html/modules/node/node.module).

Unfortunately I don't have the permission to run show innodb status on the server.

mikeryan’s picture

FWIW, I hit this today on an INSERT into {node}. This was on a development server with no traffic - I was doing two migration imports in separate terminal sessions with drush, each importing into a different node type. Unfortunately, my MySQL account doesn't have full privileges, can't run SHOW ENGINE INNODB STATUS.

Tough to reproduce this sort of thing, let alone create automated tests... I'm not sure there's much to be done - try to catch the error and retry the offending query after a brief (small number of ms) pause?

ParisLiakos’s picture

Same here

PDOException: SQLSTATE[40001]: Serialization failure: 1213 Deadlock found when trying to get lock; try restarting transaction: INSERT INTO {search_index} (word, sid, type, score) VALUES (..............

shabana.navas’s picture

Getting the same error:

PDOException: SQLSTATE[40001]: Serialization failure: 1213 Deadlock found when trying to get lock; try restarting transaction: INSERT INTO {search_index} (word, sid, type, score) VALUES (:db_insert_placeholder_0, :db_insert_placeholder_1, :db_insert_placeholder_2, :db_insert_placeholder_3); Array ( [:db_insert_placeholder_0] => upcoming [:db_insert_placeholder_1] => 6893 [:db_insert_placeholder_2] => node [:db_insert_placeholder_3] => 1 ) in search_index()

But these errors don't pop up in the pages, it can only be seen in the Log Messages. Should I be worried about it?

catch’s picture

Title: DEADLOCK errors » DEADLOCK errors on {history} {search_index} etc.
Version: 7.x-dev » 8.x-dev
Priority: Normal » Major
Issue tags: +Needs backport to D7

Bumping to major, enough people are getting this under light enough load.

jbrown’s picture

catch’s picture

Looking through this is likely merge queries, see #910010: Potential race condition on variable_set() due to merge query implementation for some background.

Both search_index and history tables get hit with a db_merge(), this is using SELECT .. FOR UPDATE

{history} also has a ranged delete with no index #165061: New indexes on flood.timestamp and history.timestamp for cron runs., looking at http://stackoverflow.com/questions/2332768/how-to-avoid-mysql-deadlock-f... the situation described there looks similar (and includes situations with frequent inserts that don't need a SELECT .. FOR UPDATE to trigger a deadlock).

MySQL docs suggest using READ_COMMITTED transaction isolation level, but setting that is not an option for most people so I'd rather see SELECT .. FOR UPDATE nuked if we can. http://dev.mysql.com/doc/refman/5.0/en/innodb-deadlocks.html

cangeceiro’s picture

I can also confirm this using apachesolr and drupal 7 and only when using drush.

# drush solr-reindex library
WD php: PDOException: SQLSTATE[40001]: Serialization failure: 1213 Deadlock found when trying to get lock; try restarting transaction: DELETE FROM  [error]
{apachesolr_search_node} 
WHERE  (nid IN  (SELECT node.nid AS nid
FROM 
{node} node
WHERE  (type = :db_condition_placeholder_0) )) ; Array
(
    [:db_condition_placeholder_0] => library
)
 in apachesolr_rebuild_index_table() (line 500 of /var/www/cmog/drupal/sites/all/modules/apachesolr/apachesolr.module).
PDOException: SQLSTATE[40001]: Serialization failure: 1213 Deadlock found when trying to get lock; try restarting transaction: DELETE FROM {apachesolr_search_node} 
WHERE  (nid IN  (SELECT node.nid AS nid
FROM 
{node} node
WHERE  (type = :db_condition_placeholder_0) )) ; Array
(
    [:db_condition_placeholder_0] => library
)
 in apachesolr_rebuild_index_table() (line 500 of /var/www/cmog/drupal/sites/all/modules/apachesolr/apachesolr.module).
Drush command terminated abnormally due to an unrecoverable error. 
yang_yi_cn’s picture

I think this happens a lot on Drupal 6 too, if you set all db engine to use INNODB. In my case it happens to the {cache} table all the time, not sure if it is because the 2 web servers are both trying to write to the cache table.

BeaPower’s picture

this happened too in drupal 7.8 when I tried to create a content type for organic groups - named groups.
Immediately I cleared the cache, and haven't seen the error so far again... hope its gone for good.

Shane Birley’s picture

I am seeing this error in a D6 > D7 update - always into {cache_update} from the update.module.

Shane Birley’s picture

I am seeing this error in a D6 > D7 update - always into {cache_update} from the update.module.

WilliamB’s picture

Just launched our drupal site for 3 days and i noticed that error in the logs too.
Subscribing.

Damien Tournoud’s picture

I found one cause for this. Please see #1369332: Avoid deadlock issues on the {node} table.

There are other causes, notably caused by the gap-locking feature. Those seem to go away with setting innodb_locks_unsafe_for_binlog = 1 in the MySQL configuration.

I have seen this type of pattern all the time:

*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS ... lock_mode X insert intention waiting

*** (2) HOLDS THE LOCK(S):
RECORD LOCKS ... lock_mode X
*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS ... lock_mode X insert intention waiting

... and I don't understand why MySQL would want to promote a lock_mode X into a lock_mode X insert intention (the later one being weaker, as far as I understand). We need input from MySQL guys here.

basic’s picture

If we can get this reproduced it would be nice to see if upgrading to the latest mariadb 5.3 helps. Its updates to mysql locking should help work around these deadlocks.

basic’s picture

I'm hoping the latest patch here fixes the history and watchdog deadlocks: #165061: New indexes on flood.timestamp and history.timestamp for cron runs.

geerlingguy’s picture

Subscribing. I had this happen when simply trying to do an INSERT INTO {realname}, on a relatively low-traffic site. Every now and then, I see another deadlock pop up, and it seems to not be very repeatable or consistent. But probably happens once a day or so.

deviantintegral’s picture

I was able to reproduce this by simply running a few drush scripts in parallel that generated nodes. Usually the deadlock would be in a cache_* table. Setting innodb_locks_unsafe_for_binlog = 1 worked for me as well, though I haven't found anything that actually explains why that would fix the deadlocks.

Heine’s picture

The key to this problem appears to be INNODB creating a gap lock on the index when a key is NOT in the index on a SELECT ... FOR UPDATE.

(Output below from SHOW ENGINE INNODB STATUS requires the the table 'innodb_lock_monitor' in the db. Create via CREATE TABLE innodb_lock_monitor(a int) ENGINE=INNODB;))

SELECT ... FOR UPDATE with an existing cid results in:

TABLE LOCK table `XXX`.`cache_menu` trx id 1FDB808 lock mode IX
RECORD LOCKS space id 0 page no 19215 n bits 88 index `PRIMARY` of table `XXX`.`cache_menu` trx id 1FDB808 lock_mode X locks rec but not gap

SELECT ... FOR UPDATE with a cid that's not in cache_menu:

TABLE LOCK table `XXX`.`cache_menu` trx id 1FDB807 lock mode IX
RECORD LOCKS space id 0 page no 19209 n bits 112 index `PRIMARY` of table `XXX`.`cache_menu` trx id 1FDB807 lock_mode X locks gap before rec

Depending on what's in your table, and what you are trying to insert, the gap lock can also hold the supremum pseudorecord.

To reproduce, start three mysql sessions and point them to your database (LOG1, T1, T2). Make sure no other processes use the database (server). Then setup as follows:

On LOG1 execute:

CREATE TABLE innodb_lock_monitor(a int) ENGINE=INNODB;

On T1 and T2 execute:

set autocommit=0; begin;

On T1 execute:

SELECT 1 AS expression FROM cache_menu cache_menu WHERE ( (cid = 'test:one' ) ) FOR UPDATE;

Next, execute on T2:

SELECT 1 AS expression FROM cache_menu cache_menu WHERE ( (cid = 'test:two') ) FOR UPDATE;

Execute show ENGINE innodb status on the LOG, note supremum lock.

---TRANSACTION 1FDBC05, ACTIVE 3 sec
2 lock struct(s), heap size 320, 1 row lock(s)
MySQL thread id 391, OS thread handle 0xa77ffb40, query id 2074 localhost root
TABLE LOCK table `XXX`.`cache_menu` trx id 1FDBC05 lock mode IX
RECORD LOCKS space id 0 page no 19215 n bits 88 index `PRIMARY` of table `XXX`.`cache_menu` trx id 1FDBC05 lock_mode X
Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0
 0: len 8; hex 73757072656d756d; asc supremum;;

---TRANSACTION 1FDBC04, ACTIVE 23 sec
2 lock struct(s), heap size 320, 1 row lock(s)
MySQL thread id 9, OS thread handle 0xb6f49b40, query id 2071 localhost root
TABLE LOCK table `XXX`.`cache_menu` trx id 1FDBC04 lock mode IX
RECORD LOCKS space id 0 page no 19215 n bits 88 index `PRIMARY` of table `XXX`.`cache_menu` trx id 1FDBC04 lock_mode X
Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0
 0: len 8; hex 73757072656d756d; asc supremum;;

Now, time to INSERT some data.

eg On T2.

INSERT INTO cache_menu (cid, data) VALUES('test:two', '');

You'll notice that this query is blocked waiting; From LOG:

TRANSACTIONS
------------
Trx id counter 1FDBC06
Purge done for trx's n:o < 1FDBC04 undo n:o < 0
History list length 1041
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 1FDBC05, ACTIVE 111 sec inserting
mysql tables in use 1, locked 1
LOCK WAIT 3 lock struct(s), heap size 320, 2 row lock(s)
MySQL thread id 391, OS thread handle 0xa77ffb40, query id 2076 localhost root update
INSERT INTO cache_menu (cid, data) VALUES ('test:two', '')
------- TRX HAS BEEN WAITING 6 SEC FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 0 page no 19215 n bits 88 index `PRIMARY` of table `XXX`.`cache_menu` trx id 1FDBC05 lock_mode X insert intention waiting
Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0
 0: len 8; hex 73757072656d756d; asc supremum;;

------------------
TABLE LOCK table `XXX`.`cache_menu` trx id 1FDBC05 lock mode IX
RECORD LOCKS space id 0 page no 19215 n bits 88 index `PRIMARY` of table `XXX`.`cache_menu` trx id 1FDBC05 lock_mode X
Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0
 0: len 8; hex 73757072656d756d; asc supremum;;

RECORD LOCKS space id 0 page no 19215 n bits 88 index `PRIMARY` of table `XXX`.`cache_menu` trx id 1FDBC05 lock_mode X insert intention waiting
Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0
 0: len 8; hex 73757072656d756d; asc supremum;;

---TRANSACTION 1FDBC04, ACTIVE 131 sec
2 lock struct(s), heap size 320, 1 row lock(s)
MySQL thread id 9, OS thread handle 0xb6f49b40, query id 2071 localhost root
TABLE LOCK table `XXX`.`cache_menu` trx id 1FDBC04 lock mode IX
RECORD LOCKS space id 0 page no 19215 n bits 88 index `PRIMARY` of table `XXX`.`cache_menu` trx id 1FDBC04 lock_mode X
Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0
 0: len 8; hex 73757072656d756d; asc supremum;;

On T1:

INSERT INTO cache_menu (cid, data) VALUES('test:one', '');

INNODB aborts with a deadlock. Info from LOG:

LATEST DETECTED DEADLOCK
------------------------
120703 17:01:02
*** (1) TRANSACTION:
TRANSACTION 1FDBC05, ACTIVE 172 sec inserting
mysql tables in use 1, locked 1
LOCK WAIT 3 lock struct(s), heap size 320, 2 row lock(s)
MySQL thread id 391, OS thread handle 0xa77ffb40, query id 2078 localhost root update
INSERT INTO cache_menu (cid, data) VALUES ('test:two', '')
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 0 page no 19215 n bits 88 index `PRIMARY` of table `XXX`.`cache_menu` trx id 1FDBC05 lock_mode X insert intention waiting
Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0
 0: len 8; hex 73757072656d756d; asc supremum;;

*** (2) TRANSACTION:
TRANSACTION 1FDBC04, ACTIVE 192 sec inserting
mysql tables in use 1, locked 1
3 lock struct(s), heap size 320, 2 row lock(s)
MySQL thread id 9, OS thread handle 0xb6f49b40, query id 2079 localhost root update
INSERT INTO cache_menu(cid, data ) VALUES ('test:one', '')
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 0 page no 19215 n bits 88 index `PRIMARY` of table `XXX`.`cache_menu` trx id 1FDBC04 lock_mode X
Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0
 0: len 8; hex 73757072656d756d; asc supremum;;

*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 0 page no 19215 n bits 88 index `PRIMARY` of table `XXX`.`cache_menu` trx id 1FDBC04 lock_mode X insert intention waiting
Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0
 0: len 8; hex 73757072656d756d; asc supremum;;

*** WE ROLL BACK TRANSACTION (2)

If you try it, a first INSERT on T1 also blocks on acquiring the insert intention lock.

All on MySQL server 5.5.24-0ubuntu0.12.04.1

I have no more time to look into this further at the moment, hence the info dump.

Heine’s picture

Title: DEADLOCK errors on {history} {search_index} etc. » DEADLOCK errors on MergeQuery INSERT due to InnoDB gap locking when condition in SELECT ... FOR UPDATE results in 0 rows

After thinking about it; It makes sense for locking reads to prevent insertions in the table when a primary key has not been found in order to keep selects within the transaction return the same information.

What to do? Re-issue the transaction?

NB Other deadlock causes may exist, claiming this issue for MergeQueries (and thus cache) on InnoDB.

Damien Tournoud’s picture

Well, that still doesn't explain why MySQL tries to promote a lock X to a lock X+insert intention on T1. It's my understanding that the later is actually weaker.

Heine’s picture

Right, per http://dev.mysql.com/doc/refman/5.6/en/innodb-locks-set.html it seems it always tries to get the intent lockeven though it might not be necessary (except for performance reasons).

As it can not acquire the lock (T2 is waiting before it in the queue) we have a deadlock.

edit, from http://lists.mysql.com/internals/21065:

The 'insert intention' lock is a tag that waits until the gap is cleared of the locks of other users. Then we can retry the insertion. But remember, the lock does not give us a permission to insert. In the gaps, locks are purely 'inhibitive'.

So, T1 *must* get an insert intention lock. If it just inserted, T2's repeatable read would possibly be violated.

Damien Tournoud’s picture

I pinged Michael Widenius (Monty) about this by email and he reported that this is the expected behavior. At this point, I'm not quite sure how to fix our MERGE query implementation. We could go back to an INSERT / if fails UPDATE strategy (aka INSERT OR UPDATE), but it cannot possibly result in the same exact behavior in all cases of MERGE query.

At this point, I think we should just explain that you should run you MySQL server with binlog_format=row and innodb_locks_unsafe_for_binlog=1 (which is, despite its name, safe to use when using row-level replication). Oh, and also use MySQL 5.1 or later only because MySQL 5.0 will mostly not work as intended.

Heine’s picture

Because a new MergeQuery implementation is quite difficult what needs to be done is (from DamZ on IRC):

- test scenarios with innodb_locks_unsafe_for_binlog=1
- implement a hook_requirements that checks the variable and provides feedback
- provide documentation to alert people using replication that they need to use row-level replication (and thus MySQL 5.1+)
- provide documentation that MySQL 5.0 is only supported for trivial setups

cheyer’s picture

innodb_locks_unsafe_for_binlog=1 did not work for me.

We have lot`s of users filling in their user profile fields (including profile image) and I still see here and then the serialization error:

PDOException: SQLSTATE[40001]: Serialization failure: 1213 Deadlock found when trying to get lock; try restarting transaction: UPDATE {users} SET name=:db_update_placeholder_0, pass=:db_update_placeholder_1,

wnl’s picture

We are seeing this issue (I am fairly certain) on one of our sites. Unfortunately we are using Amazon's RDS (with master/slave replication), and it doesn't allow us to change either binlog_format or innodb_locks_unsafe_for_binlog. Sounds like there is no general solution coming from Drupal in the near future. Which means our only way to resolve this is to set up our own servers and abandon RDS, correct?

Damien Tournoud’s picture

Supposing that RDS runs a fairly recent MySQL version, you can probably change the transaction isolation to READ COMMITTED. In your settings.php, you should be able to specify something like this:

$databases['default']['default'] = array(
  // ...
  'init_commands' => array(
    'SET SESSION TRANSACTION ISOLATION LEVEL READ COMMITTED',
  ),
);
vinmassaro’s picture

Seeing this as well occasionally with Drush:

[exec] WD php: PDOException: SQLSTATE[40001]: Serialization failure: 1213  [error]
[exec] Deadlock found when trying to get lock; try resPDOException: SQLSTATE[40001]: Serialization failure: 1213 Deadlock found when trying to get lock; try restarting transaction: DELETE FROM {cache} 
[exec] WHERE  (cid LIKE :db_condition_placeholder_0 ESCAPE '\\') ; Array
[exec] (
[exec][:db_condition_placeholder_0] => schema:%
[exec] )
[exec]  in cache_clear_all() (line 176 of /d7/includes/cache.inc).
[exec] tarting transaction:
[exec] DELETE FROM {cache} 
[exec] WHERE  (cid LIKE :db_condition_placeholder_0 ESCAPE '\\') ; Array
[exec] (
[exec][:db_condition_placeholder_0] => schema:%
[exec] )
[exec]  in cache_clear_all() (line 176 of
[exec] /data01/d7/dev.d7.yale.edu/includes/cache.inc).
[exec] Drush command terminated abnormally due to an unrecoverable error.  [error]
jhodgdon’s picture

Issue tags: +Random test failure

I have seen two random test failures in the last 24 hours due to this error, so I it needs to be fixed. I also found several other issues in the core queue that are duplicates of this one. I guess it's already at "major"; looks like it should stay that way...

catch’s picture

This is bad enough I think we should bump it to critical, at least for Drupal 8.

Marked #1862222: Random deadlock issues in simpletest as duplicate.

chx’s picture

Status: Active » Needs review
FileSize
843 bytes

Status: Needs review » Needs work

The last submitted patch, 937284_35.patch, failed testing.

chx’s picture

Status: Needs work » Needs review
FileSize
824 bytes

That's too verbose :) I did think on adding a set array with key-value but it'd be slower to process so this is it.

Status: Needs review » Needs work

The last submitted patch, 937284_36.patch, failed testing.

chx’s picture

Status: Needs work » Needs review
FileSize
824 bytes

Sorry. Note that innodb_locks_unsafe_for_binlog is deprecated in MySQL 5.6.3 so we can't use that.

chx’s picture

Status: Needs review » Needs work

Talked to Damien we need a requirements check that passes if a) binlog is off b) binlog_format is rows because it requires SUPER privileges to change it: http://dev.mysql.com/doc/refman/5.1/en/binary-log-setting.html

To change the global binlog_format value, you must have the SUPER privilege. This is also true for the session value as of MySQL 5.1.29.

. And you need rows cos http://dev.mysql.com/doc/refman/5.1/en/set-transaction.html

As of MySQL 5.1, if you use READ COMMITTED [...] you must use row-based binary logging.

I think we might need to bump to 5.0.40 however because http://bugs.mysql.com/bug.php?id=21409 . There are 32 bugs for MySQL 5.0 InnoDB for read committed. I can't find anything higher 5.0 than 5.0.40.

For 5.1, we need to make sure that if someone runs 5.1 then it's at least 5.1.30 , see http://bugs.mysql.com/bug.php?id=40386 and http://bugs.mysql.com/bug.php?id=40360 .

catch’s picture

Priority: Major » Critical

Didn't actually change status.

chx’s picture

The other solution , proposed by David Strauss and pointed out by Damien is to drop FOR UPDATE http://drupal.org/node/715108#comment-3159442

chx’s picture

Version: 8.x-dev » 7.x-dev
Status: Needs work » Needs review
FileSize
624 bytes

Because we will need extensive testing and that won't happen in D8 I am breaking process and posting a D7 patch first.

yched’s picture

Issue tags: -Random test failure

Just a note that we seem to have other random testbot fails regarding "Deadlock found", but on INSERT queries - #1862222: Random deadlock issues in simpletest
http://qa.drupal.org/pifr/test/406773 is an example.

Berdir’s picture

I tried, but wasn't able to reproduce this, tried both on my desktop and my considerably slower laptop. Both are running MySQL 5.5.28, which could be the reason?

So the only thing that I can confirm that the same test script that I was using to force some locks here works fine with the patch from #43.

I think this is the fix for #1862222: Random deadlock issues in simpletest, so we should probably move it back to 8.x and get a patch for it?

It's a 2-line patch that is trivial to revert, so my suggestion would be to jut commit it to 8.x and watch if we still get deadlocks...

Edit: Here's the snippet that I used and called with 10 parallel requests:

  // Store some big data to slow down the write.
  $data = str_repeat('DEADLOCK!!!!', 1000);

  try {
    for ($i = 1; $i < 1000; $i++) {
        state()->set('deadlock_test', $data);
    }
  } catch (Exception $e) {
    watchdog_exception('deadlock', $e);
  }

I was also experimenting with one that used 'deadlock_test' . rand(0, 1000000) as the key, due to what Heine said about missing keys.

chx’s picture

Version: 7.x-dev » 8.x-dev
FileSize
654 bytes
catch’s picture

Status: Needs review » Reviewed & tested by the community

I'm going to commit this to 8.x (once it comes back green) to see if it removes the random deadlocks from simpletest. While we're seeing how it runs we can see if there's a better option, but looking through this issue and the linked ones I'm not sure there is.

Status: Reviewed & tested by the community » Needs work

The last submitted patch, 937284_46.patch, failed testing.

Fabianx’s picture

#46: That's rather nice, we now get a direct test failure for all previous deadlocks, it seems.

Anonymous’s picture

http://privatepaste.com/d68fd36556

perhaps we need a rollback:

    catch (Exception $e) {
        // The insert query failed, maybe it's because a racing insert query
        // beat us in inserting the same row. Retry the select query, if it
        // returns a row, ignore the error and continue with the update
        // query below.
        $transaction->rollback();
        if (!$select->execute()->fetchField()) {
          throw $e;
        }
      }
chx’s picture

Status: Needs work » Needs review
FileSize
3.78 KB
1.39 KB

After a lot of testing and discussion -- if we are to lose the FOR UPDATE, the transaction becomes moot. I included a do-no-test patch rolled with diff -b for easier review.

jthorson’s picture

Status: Needs review » Reviewed & tested by the community

RTBC based on #47 (assuming we had the right root cause, this should bring a green result).

Status: Reviewed & tested by the community » Needs work
Issue tags: -Needs backport to D7

The last submitted patch, 937284_51.patch, failed testing.

tim.plunkett’s picture

Status: Needs work » Needs review

#51: 937284_51.patch queued for re-testing.

EDIT: that was not a deadlock failure, it was

Found field with name 'user_admin_role' and value 3 Browser UserRoleUpgradePathTest.php 71 Drupal\system\Tests\Upgrade\UserRoleUpgradePathTest->testRoleUpgrade()

Status: Needs review » Needs work
Issue tags: +Needs backport to D7

The last submitted patch, 937284_51.patch, failed testing.

tim.plunkett’s picture

Status: Needs work » Needs review
FileSize
4.96 KB

So I was able to reproduce locally, there was actually an error on the page, nothing specific to roles.

PDOException: SQLSTATE[42S22]: Column not found: 1054 Unknown column 'tags' in 'field list' in menu_local_tasks() (line 1909 of /Users/tim/www/d8/core/includes/menu.inc).

So I put $this->assertResponse(200) in there for good measure.

And then, I was trying to print out that query, and lo and behold not executing it, and letting the foreach trigger that, "fixed" it?!
I don't understand.

EDITED
Yeah I must be on drugs. foreach does not do anything of what I just said above. The test passed locally because that change meant the query wasn't run.

BUT! It does help remove the red herring of the role, and I can debug more.

Status: Needs review » Needs work

The last submitted patch, deadlock-937284-56.patch, failed testing.

chx’s picture

My Rackspace instance DOES manage to reproduce the testbot fail. Yay! I will debug but if you want access find me and drop me an SSH key.

chx’s picture

Status: Needs work » Needs review
FileSize
6.27 KB

There are so many bugs here it makes my head dizzy. The main culprit is system_update_8007 passed prefixed tables to db_field_exists which expects nonprefixed which means the cache tables didnt get a tags column. The merge query from cache however ran happily... I suspect #1872020: Insert throws the wrong exception this is going on. Meanwhile, #1158322-104: Add backtrace to all errors makes debugging an interesting challenge combined with the fact debug_backtrace on the minimal PHP version doesn't support IGNORE_ARGS. We really badly need to go over to 5.3.10.

Anyways. We revert to using a fixed list of cached bins and providing a helper so contrib can up their cache bins. Instead of writing five new update functions I just fire all eleven core bins from system. It won't hurt cos there is, there needs to be a db_table_exists in the helper.

Thanks goes to berdir for pointing out the fixed list as an approach (and Carl Orff and Corvus Corvax for making awesome music out of Carmina Burana that kept me awake till 4am).

chx’s picture

Note that salvaging the "let's find cache tables" routine is not impossible but very challenging due to the possibility of per table prefixes. You would need to go over the per table prefixes (if any) where the table starts with 'cache' and build a mapping of the prefixed table to the nonprefixed one. Then compare the found tables to this map and if not found, remove the default prefix if there is one. Then you would need to somehow test this. I vote against this. (Oh and it's not exactly forbidden to use prefixes to make two cache table point to the same table! It's madness but it'd not be always broken. Really, let's not try.)

Alternatively we could patch db_field_exists and db_add_field to work with prefixed tables. That's better... but it's quite a bit of work for not a lot of benefit.

Status: Needs review » Needs work
Issue tags: -Needs backport to D7

The last submitted patch, 937284_59.patch, failed testing.

chx’s picture

Status: Needs work » Needs review
Issue tags: +Needs backport to D7

#59: 937284_59.patch queued for re-testing.

Edit: it was a lock test fail, could acquire a second lock in the same request. Hope it's unrelated.

Berdir’s picture

Status: Needs review » Reviewed & tested by the community
FileSize
566 bytes
6.32 KB

Here is a small update to improve the docs on that function.

I'm not sure about the function name, but we also have system_schema_cache_8007(), so it kind of makes sense.

This will need a change notification, either an update somewhere or a new one, to inform contrib that they will need to update their cache bins.

webchick’s picture

Version: 8.x-dev » 7.x-dev
Status: Reviewed & tested by the community » Patch (to be ported)

I confess I don't follow everything in here, but I know catch looked at a slightly earlier version and said it was good to go. I'm also told this might help cut down on random testbot failures.

Committed and pushed to 8.x, but this needs some more discussion and testing before committing to D7.

deviantintegral’s picture

Status: Patch (to be ported) » Needs review
FileSize
3.54 KB

Here's a backport to 7.x of 532ea17 (#64), dropping the changes to the cache tables. I resurrected the site that I was able to replicate this on and the patch totally fixes the deadlocks. Excellent! I look forward to maxing out many cores with devel generate :)

Peter Bowey’s picture

#65 works fine in Drupal 7.18
Tested on a medium size database site

chx’s picture

I am not RTBCing this because #1792536: Remove the install backend and stop catching exceptions in the default database cache backend needed to change how Merge retries to get tests pass once the exception catch from cache -- the main Merge consumer in core -- is removed. I recommend moving that over; search for retry in the patch.

YesCT’s picture

Status: Needs review » Needs work

NW for #67

For clarification, should this wait on #1792536: Remove the install backend and stop catching exceptions in the default database cache backend?
Or should that be closed and totally incorporated into this?

hass’s picture

Status: Needs work » Needs review
YesCT’s picture

#65: 937284.65-deadlock-merge.patch queued for re-testing.

mcrittenden’s picture

Just a note that #1792536: Remove the install backend and stop catching exceptions in the default database cache backend is fixed now, but only for D8 and a backport isn't being discussed, so I'm not sure how it affects this. Chx?

Gastonia’s picture

I can confirm that #65 did not work for me. Running 7.19 and applied patch. Like many others I am using drush scripts simultaneously to batch save nodes, but at the same time this comes up at well as users try to post.

I also applied the mysql configuration settings as suggested above, and that did not work for me either.

PDOException: SQLSTATE[40001]: Serialization failure: 1213 Deadlock found when trying to get lock; try restarting transaction: INSERT INTO {location_instance} (nid, vid, uid, genid, lid) VALUES (:db_insert_placeholder_0, :db_insert_placeholder_1, :db_insert_placeholder_2, :db_insert_placeholder_3, :db_insert_placeholder_4); Array ( [:db_insert_placeholder_0] => 31451 [:db_insert_placeholder_1] => 31451 [:db_insert_placeholder_2] => 0 [:db_insert_placeholder_3] => cck:field_location:31451 [:db_insert_placeholder_4] => 25092 ) in location_save_locations() (line 974 of /var/www/cityposted.com/httpdocs/sites/all/modules/location/location.module).

I have written about this before, most notably here:
https://drupal.org/node/1369332 starting with #45. There it was suggested that I was having a problem with the locations module, although this has happened to me on other tables as well. It's still unclear to me whether this fix is distinctly targeting specific tables, or should this eventually fix all deadlock problems regardless of table or module?

EDIT: Above I had not hit the save button on the my.cnf file before running the test.

Setting the mysql binary log and innodb settings as mentioned above seems to have greatly improved the situation. My scripts have been running for an hour straight, non stop, no errors. I'll crank it up a few notches and report back.

Anyone know why those settings make a difference?

chx’s picture

If #65 doesn't work for you, I can't help. That was our best generic effort. Can you try #39 however?

mikeytown2’s picture

Without changing mysql settings #65 does work for me. Trying it out with 12 apache processes doing user_save across 20k users. Before the patch in #65 drupal would error out on me. I would say RTBC for #65.

Jooblay.net’s picture

Great work drupal community thanks to everyone...

mikeytown2’s picture

I did encounter a deadlock even with the patch in #65, but the occurrences of happening has been greatly reduced. In this case I would classify the deadlock as a bug because of the abuse of the variables table. #1903498: PDOException: SQLSTATE[40001]: Serialization failure: 1213 Deadlock found when trying to get lock; .

Still see this as RTBC.

chx’s picture

Status: Needs review » Reviewed & tested by the community

Go with it then. It's better for sure.

David_Rothstein’s picture

Status: Reviewed & tested by the community » Needs work
-        catch (Exception $e) {
-          // The insert query failed, maybe it's because a racing insert query
....
+      catch (IntegrityConstraintViolationException $e) {
+        // The insert query failed, maybe it's because a racing insert query

This can't be working correctly - that class doesn't exist in Drupal 7.

Otherwise, the overall patch makes sense to me (and awesome research in this issue!) though removing transactions has the potential to cause as many problems as it fixes... Only a couple people have commented above that they actually tested the patch in any scenario at all. Has anyone else in the interim?

mikeytown2’s picture

For what it's worth I've been using this patch in production on a mainly user authenticated site for the last month.

Anybody’s picture

The patch works great for me. Any plans to bring it into Drupal 7 Core soon?

Damien Tournoud’s picture

Also, I don't know why the transaction was removed here, but I think we should restore it.

Fabianx’s picture

#81: Removing the transaction is the whole point of the patch ...

#80: The patch needs work as it is trying to catch an Exception that does not exist. (as per #78)

Damien Tournoud’s picture

@Fabianx: no, removing the ->forUpdate() is the point of the patch. The transaction is useful and should be kept.

Berdir’s picture

Without the forUpdate(), doing the transaction becomes useless, I unfortunately can't find the discussion in here, but it was discussed (with david strauss among others I think), see #51.

We had patches without forUpdate() and the transaction and it still blew up. Note that we have found and fixed the source of the problem for the 8.x later (each tearDown() cleared persistent caches of the parent site) but similar things can still happen on real sites with a lot of requests trying to write caches.

jthorson’s picture

The privatepaste in #50 (now expired) showed why the transaction was being removed ... IIRC (and these are some foggy memories!), we were doing some mysql testing and had identified that a race condition between two simultaneous transactions (i.e. which happened on testbot due to concurrent runs) would result in unexpected behaviour. I don't have the exact details, but it was related to having two transactions, inserting into one, then inserting the same query into the other would cause a hang until the first transaction was closed; after which a read on the second still gave you the pre-transaction values (presumably because that was the db state when the second transaction start occurred, even though the value had been modified by a different transaction in the meantime?)

In addition, after removing the forUpdate(), I think we were down to a single db call ... which made the transaction redundant.

Not sure if chx and beejeebus may have a better memory than I. :)

Fabianx’s picture

#85 Yes, I remember this like you. The transaction definitely led to the deadlock.

fishkaz’s picture

Status: Needs work » Needs review

#59: 937284_59.patch queued for re-testing.

Fabianx’s picture

FWIW, I encountered this bug reproducible within a production system.

xmacinfo’s picture

Any traction here? Were have deadlock issues with FOR UPDATE on Drupal 7.23.

Anonymous’s picture

Shame to keep this issue stalled on an invalid exception type. Reverted it to the original catch-all Exception.

chx’s picture

Status: Needs review » Reviewed & tested by the community

let's do this. Consider https://drupal.org/node/715108#comment-3159540 as the main endorsement for dropping FOR UPDATE: it buys nothing.

xmacinfo’s picture

We had to temporarily switch the table type to MyISAM to prevent the lock. Will be glad to switch back to InnoDB once fixed.

vinmassaro’s picture

Is this patch supposed to be used in conjunction with changes to database configuration (READ-COMMITTED or innodb_locks_unsafe_for_binlog)?. We've been load testing one of our sites with a fast user login test via CAS authentication, 100 users every 15 seconds, ramping up to 500 simultaneous logins over a few minutes. We use LDAP to map a few fields to their user profile upon login. During testing, we see deadlocks like this:

---TRANSACTION 2B6E04D6, ACTIVE 51 sec starting index read
mysql tables in use 1, locked 1
LOCK WAIT 2 lock struct(s), heap size 376, 1 row lock(s)
MySQL thread id 29003, OS thread handle 0x7feddf0c3700, query id 5470792 127.0.0.1 d770a41b22e3238d statistics
SELECT 1 AS expression
FROM
cache cache
WHERE ( (cid = 'ldap_user_synch_mapping') ) FOR UPDATE
------- TRX HAS BEEN WAITING 50 SEC FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 835345 page no 26 n bits 72 index "PRIMARY" of table "d770a41b22e3238d"."cache" trx id 2B6E04D6 lock_mode X locks rec but not gap waiting
------------------
TABLE LOCK table "d770a41b22e3238d"."cache" trx id 2B6E04D6 lock mode IX
RECORD LOCKS space id 835345 page no 26 n bits 72 index "PRIMARY" of table "d770a41b22e3238d"."cache" trx id 2B6E04D6 lock_mode X locks rec but not gap waiting

After changing isolation level to READ-COMMITTED and running the test again, deadlocks are greatly reduced, going from ~150+ down to 2. This was at the recommendation of Percona support. We thought this change was a panacea until they brought up that we could run into this bug since some tables set up by contrib modules across many of our sites (this is a non-multisite, managed university Drupal environment) do not have primary keys, and this could cause replication to get out of sync.

We reverted READ-COMMITTED and ran our test again using this patch but still saw nearly as many deadlocks as our first test. Wondering if anyone has any ideas. Thanks in advance.

chx’s picture

We do not touch the isolation level, see #40 for more.

JordanMagnuson’s picture

I'm getting a number of deadlocks coming from node_tag_new() (node.module) when trying to db_merge('history').

Example watchdog error:

PDOException: SQLSTATE[40001]: Serialization failure: 1213 Deadlock found when trying to get lock; try restarting transaction: INSERT INTO {history} (uid, nid, timestamp) VALUES (:db_insert_placeholder_0, :db_insert_placeholder_1, :db_insert_placeholder_2); Array ( [:db_insert_placeholder_0] => 17701 [:db_insert_placeholder_1] => 1450 [:db_insert_placeholder_2] => 1383542928 ) in node_tag_new() (line 314 of /srv/www/pixelscrapper.com/public_html/modules/node/node.module).

Here's a screenshot of the full watchdog output: http://i.imgur.com/oRWe5ef.png

Here's output from SHOW ENGINE INNODB STATUS:

=====================================
131104  9:50:44 INNODB MONITOR OUTPUT
=====================================
Per second averages calculated from the last 16 seconds
-----------------
BACKGROUND THREAD
-----------------
srv_master_thread loops: 1208256 1_second, 1208188 sleeps, 116373 10_second, 52888 background, 52885 flush
srv_master_thread log flush and writes: 1276576
----------
SEMAPHORES
----------
OS WAIT ARRAY INFO: reservation count 221157, signal count 280014
Mutex spin waits 1688909, rounds 2390109, OS waits 22938
RW-shared spins 156125, rounds 4004392, OS waits 129266
RW-excl spins 13422, rounds 2196498, OS waits 66859
Spin rounds per wait: 1.42 mutex, 25.65 RW-shared, 163.65 RW-excl
------------------------
LATEST DETECTED DEADLOCK
------------------------
131104  9:31:47
*** (1) TRANSACTION:
TRANSACTION 2DD02DF6, ACTIVE 0 sec inserting
mysql tables in use 1, locked 1
LOCK WAIT 3 lock struct(s), heap size 376, 2 row lock(s)
MySQL thread id 833230, OS thread handle 0x7f9ba6e0c700, query id 143861100 localhost pixelscrapper update
INSERT INTO semaphore (name, value, expire) VALUES ('variable_init', '16787998065277da037db949.04680936', '1383586308.5141')
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 5518 page no 3 n bits 120 index "PRIMARY" of table "pixelscrapper"."semaphore" trx id 2DD02DF6 lock_mode X locks rec but not gap waiting
Record lock, heap no 47 PHYSICAL RECORD: n_fields 5; compact format; info bits 32
0: len 13; hex 7661726961626c655f696e6974; asc variable_init;;
1: len 6; hex 00002dd02de4; asc   - - ;;
2: len 7; hex 2300000140221f; asc #   @" ;;
3: len 30; hex 3131393039333632313135323737646130333362636639382e3833333232; asc 11909362115277da033bcf98.83322; (total 33 bytes);
4: len 8; hex bc740f81f69dd441; asc  t     A;;

*** (2) TRANSACTION:
TRANSACTION 2DD02DF3, ACTIVE 0 sec inserting
mysql tables in use 1, locked 1
3 lock struct(s), heap size 376, 2 row lock(s)
MySQL thread id 833229, OS thread handle 0x7f9ba6ed0700, query id 143861091 localhost pixelscrapper update
INSERT INTO semaphore (name, value, expire) VALUES ('variable_init', '9968984195277da037c3f71.61259930', '1383586308.5056')
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 5518 page no 3 n bits 120 index "PRIMARY" of table "pixelscrapper"."semaphore" trx id 2DD02DF3 lock mode S locks rec but not gap
Record lock, heap no 47 PHYSICAL RECORD: n_fields 5; compact format; info bits 32
0: len 13; hex 7661726961626c655f696e6974; asc variable_init;;
1: len 6; hex 00002dd02de4; asc   - - ;;
2: len 7; hex 2300000140221f; asc #   @" ;;
3: len 30; hex 3131393039333632313135323737646130333362636639382e3833333232; asc 11909362115277da033bcf98.83322; (total 33 bytes);
4: len 8; hex bc740f81f69dd441; asc  t     A;;

*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 5518 page no 3 n bits 120 index "PRIMARY" of table "pixelscrapper"."semaphore" trx id 2DD02DF3 lock_mode X locks rec but not gap waiting
Record lock, heap no 47 PHYSICAL RECORD: n_fields 5; compact format; info bits 32
0: len 13; hex 7661726961626c655f696e6974; asc variable_init;;
1: len 6; hex 00002dd02de4; asc   - - ;;
2: len 7; hex 2300000140221f; asc #   @" ;;
3: len 30; hex 3131393039333632313135323737646130333362636639382e3833333232; asc 11909362115277da033bcf98.83322; (total 33 bytes);
4: len 8; hex bc740f81f69dd441; asc  t     A;;

*** WE ROLL BACK TRANSACTION (2)
------------
TRANSACTIONS
------------
Trx id counter 2DD1140C
Purge done for trx's n:o < 2DD0D24C undo n:o < 0
History list length 2554
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 0, not started
MySQL thread id 834270, OS thread handle 0x7f9ba9960700, query id 143997008 localhost root
SHOW ENGINE INNODB STATUS
---TRANSACTION 2DD1140B, ACTIVE 0 sec
mysql tables in use 1, locked 0
MySQL thread id 834213, OS thread handle 0x7f9ba6a69700, query id 143997006 localhost pixelscrapper Sorting result
SELECT alias FROM url_alias WHERE source = 'node/3676' AND language IN ('en', 'und') ORDER BY language ASC, pid DESC
Trx read view will not see trx with id >= 2DD1140C, sees < 2DD1140C
--------
FILE I/O
--------
I/O thread 0 state: waiting for i/o request (insert buffer thread)
I/O thread 1 state: waiting for i/o request (log thread)
I/O thread 2 state: waiting for i/o request (read thread)
I/O thread 3 state: waiting for i/o request (read thread)
I/O thread 4 state: waiting for i/o request (read thread)
I/O thread 5 state: waiting for i/o request (read thread)
I/O thread 6 state: waiting for i/o request (write thread)
I/O thread 7 state: waiting for i/o request (write thread)
I/O thread 8 state: waiting for i/o request (write thread)
I/O thread 9 state: waiting for i/o request (write thread)
Pending normal aio reads: 0 [0, 0, 0, 0] , aio writes: 0 [0, 0, 0, 0] ,
ibuf aio reads: 0, log i/o's: 0, sync i/o's: 0
Pending flushes (fsync) log: 0; buffer pool: 1
44079 OS file reads, 19873198 OS file writes, 11847308 OS fsyncs
0.00 reads/s, 0 avg bytes/read, 19.94 writes/s, 11.75 fsyncs/s
-------------------------------------
INSERT BUFFER AND ADAPTIVE HASH INDEX
-------------------------------------
Ibuf: size 1, free list len 525, seg size 527, 4404 merges
merged operations:
insert 5492, delete mark 4193, delete 129
discarded operations:
insert 0, delete mark 0, delete 0
Hash table size 8850487, node heap has 4878 buffer(s)
10304.61 hash searches/s, 6506.66 non-hash searches/s
---
LOG
---
Log sequence number 212858222496
Log flushed up to   212858222426
Last checkpoint at  212855358506
0 pending log writes, 0 pending chkp writes
9841917 log i/o's done, 9.87 log i/o's/second
----------------------
BUFFER POOL AND MEMORY
----------------------
Total memory allocated 4395630592; in additional pool allocated 0
Dictionary memory allocated 4761575
Buffer pool size   262144
Free buffers       174227
Database pages     83039
Old database pages 30654
Modified db pages  744
Pending reads 0
Pending writes: LRU 0, flush list 1, single page 0
Pages made young 659, not young 0
0.00 youngs/s, 0.00 non-youngs/s
Pages read 54614, created 28641, written 14754970
0.00 reads/s, 0.00 creates/s, 26.25 writes/s
Buffer pool hit rate 1000 / 1000, young-making rate 0 / 1000 not 0 / 1000
Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s
LRU len: 83039, unzip_LRU len: 0
I/O sum[938]:cur[72], unzip sum[0]:cur[0]
--------------
ROW OPERATIONS
--------------
0 queries inside InnoDB, 0 queries in queue
1 read views open inside InnoDB
Main thread process no. 11884, id 140301130860288, state: flushing buffer pool pages
Number of rows inserted 6564509, updated 7838295, deleted 5529746, read 4662558497
6.62 inserts/s, 9.06 updates/s, 3.31 deletes/s, 20585.34 reads/s
----------------------------
END OF INNODB MONITOR OUTPUT
============================
mikeytown2’s picture

mikeytown2’s picture

Been playing around with deadlocks today, trying to figure out where a good place to put retry transaction logic #476048-17: MySQL transient error handling. Haven't found a good place yet. But I'm able to repo deadlocks (with the latest patch #90 applied) if using the database as a cache backend and have multiple threads doing node saves and enabling/disabling modules. This got me thinking about clearing/setting other things inside of a transaction. It seems like a good way to trigger a deadlock is to run multiple threads of a function that calls db_transaction(). node_save starts a transaction and enabling a module will trigger registry_update(); thus the repo case.

I have an interesting idea, what if the core database as a cache backend used it's a copy of its own database connection so that cache_set/clear calls do not happen inside of a transaction. If using something like memcache these cache_set/clears would not be rolled back as part of the transaction. Same applies to lock.inc. More info on creating database connections https://drupal.org/node/18429. This would place these db operations (create, insert, update, delete) outside of the transaction as it would be using a different database connection for the cache and for locks. We might want to expand it to other things as well if needed; but I know this should be ok as this is what memcache effectively does.

David_Rothstein’s picture

Issue summary: View changes
Status: Reviewed & tested by the community » Fixed
Issue tags: +7.25 release notes

Alright, based on the additional research and testing (thanks for that!) it sounds like we can go with #90. Complicated issue, so hopefully it doesn't backfire on us :)

Committed to 7.x - thanks! http://drupalcode.org/project/drupal.git/commit/70e0b16

Sounds like there could be additional improvements here (based on the last couple of comments above) but I suppose those should be in new issues?

mikeytown2’s picture

Thanks for getting #90 in :)

Followup issue based off my observations in #97
#2164849: Enforce READ COMMITTED transaction isolation level

jerdiggity’s picture

If it helps, here are the details of my recent encounter with the same (or similar) error which occurred while running update.php on a D7 production site. The only updates being applied were:

  • system module
    • 7079 - Convert the 'filesize' column in {file_managed} to a bigint.
  • node module
    • 7014 - Add an index on {node}.language.

Toward the end of applying the updates I was redirected to update.php?op=finished (as opposed to the normal update.php?op=results), at which point I was shown this message:

Error message
PDOException: SQLSTATE[40001]: Serialization failure: 1213 Deadlock found when trying to get lock; try restarting transaction: DELETE FROM {cache_bootstrap} WHERE (cid = :db_condition_placeholder_0) ; Array ( [:db_condition_placeholder_0] => system_list ) in cache_clear_all() (line 165 of /var/www/somesite/includes/cache.inc).

NB: the only main reason I'm adding this comment to this particular (fixed) issue is because it's still tagged with needs backport to D7.

mikeytown2’s picture

@jerdiggity
This has been fixed in Drupal 7.25 as #98 says. So if you are still experiencing deadlock issues and you are using 7.25, you should follow the issue linked in #99.

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.

pounard’s picture

May be a stupid question but why don't the merge queries use the INSERT ... ON DUPLICATE KEY ... syntax on MySQL ?Forget it found the explaination elsewhere!

pwolanin’s picture

Version: 7.x-dev » 6.x-dev
Status: Closed (fixed) » Patch (to be ported)
Issue tags: +Needs backport to D6

Any reason this shouldn't be applied to Drupal 6 also?

pwolanin’s picture

Version: 6.x-dev » 7.x-dev
Status: Patch (to be ported) » Closed (fixed)

nevermind - we are seeing some insert deadlocks in 6, but the code is quite different