SQLSTATE[40001]: Serialization failure: 1213 Deadlock found when trying to get lock; try restarting transactionSQLSTATE[40001]: Serialization failure: 1213 Deadlock found when trying to get lock; try restarting transaction
And going to repeat the unique part
SQLSTATE[40001]: Serialization failure: 1213 Deadlock found when trying to get lock; try restarting transaction
Re-installed but wasn't able to reproduce the error, was going to test if #926426: system_check_http_request() causes deadlock on single threaded webserver was related >.O.
PHP: 5.2.5, www.entropy.ch Release 6
Mysql: 5.0.88
(Apache, os-x or whatever this mac's operating system is)
Comment | File | Size | Author |
---|---|---|---|
#90 | resolve_merge_deadlock-937284-90.patch | 3.51 KB | Anonymous (not verified) |
#65 | 937284.65-deadlock-merge.patch | 3.54 KB | deviantintegral |
#63 | deadlock-merge-937284-63.patch | 6.32 KB | Berdir |
#63 | deadlock-merge-937284-63-interdiff.txt | 566 bytes | Berdir |
#59 | 937284_59.patch | 6.27 KB | chx |
Comments
Comment #1
drunken monkeyI receive the same error frequently on node pages, when I try to open several nodes at once (with Javascript, so really simultaneously, not one after another). The error isn't written in a normal
drupal_set_message()
-type message box but seemingly simply output withecho
, appearing either before any content or suddenly somewhere in the header of the HTML code – but always displayed above the actual site. They are also not logged with watchdog.Once, even a fatal error occurred with about the same message (sadly, I didn't save it and now can't reproduce), which for the first time mentioned the cause: It seems, an
INSERT INTO {history}
is the problem, which seems plausible (two simultaneous page requests could easily bring two threads to want to insert at the same time, and probably the history table is the only one being changed on simple page requests).Anyways, this seems to me to be a bit more serious than stated above, since end users could easily receive the same message – since it doesn't seem to follow any error reporting rules, it probably also can't be turned off along with normal error reporting. Maybe something in PDO? Or only in some specific version?
This, in any case, is the system:
PHP 5.2.6-1+lenny9
Linux 2.6.26-2-amd64 x86_64
Apache web server
PDO Driver for MySQL, client library version 5.0.51a
Comment #2
hefox CreditAttribution: hefox commentedIf I remember correctly, ditto on the errors displaying on the top of the screen instead of drupal_set_message;
Updating the title to reflect what drunken money said.
Comment #3
Anonymous (not verified) CreditAttribution: Anonymous commentedhefox, drunkenmonkey - can one or both of you write up some reproduce steps? or, even better, a test that fails due to this bug?
also, anyone else seeing this issue because they've encountered a DEADLOCK error - if possible, please run 'SHOW INNODB STATUS', because along with a lot of other stuff, it will show information about the last DEADLOCK. pasting that output here would help us fix this issue.
Comment #4
drunken monkeyI don't think we'll be able to make a conventional test for this, as this requires (for me, at least) several simultaneous page requests, which the normal test case afaik can't do.
Steps to reproduce: Open several dozen node pages simultaneously.
E.g., with FireGestures (Firefox plugin), mark a list of links to different node pages and execute the "Open all links in selection" gesture.
Comment #5
mr.baileysRan into the same issue today on a site under very light load (not live):
Unfortunately I don't have the permission to run
show innodb status
on the server.Comment #6
mikeryanFWIW, I hit this today on an INSERT into {node}. This was on a development server with no traffic - I was doing two migration imports in separate terminal sessions with drush, each importing into a different node type. Unfortunately, my MySQL account doesn't have full privileges, can't run SHOW ENGINE INNODB STATUS.
Tough to reproduce this sort of thing, let alone create automated tests... I'm not sure there's much to be done - try to catch the error and retry the offending query after a brief (small number of ms) pause?
Comment #7
ParisLiakos CreditAttribution: ParisLiakos commentedSame here
PDOException: SQLSTATE[40001]: Serialization failure: 1213 Deadlock found when trying to get lock; try restarting transaction: INSERT INTO {search_index} (word, sid, type, score) VALUES (..............
Comment #8
shabana.navas CreditAttribution: shabana.navas commentedGetting the same error:
PDOException: SQLSTATE[40001]: Serialization failure: 1213 Deadlock found when trying to get lock; try restarting transaction: INSERT INTO {search_index} (word, sid, type, score) VALUES (:db_insert_placeholder_0, :db_insert_placeholder_1, :db_insert_placeholder_2, :db_insert_placeholder_3); Array ( [:db_insert_placeholder_0] => upcoming [:db_insert_placeholder_1] => 6893 [:db_insert_placeholder_2] => node [:db_insert_placeholder_3] => 1 ) in search_index()
But these errors don't pop up in the pages, it can only be seen in the Log Messages. Should I be worried about it?
Comment #9
catchBumping to major, enough people are getting this under light enough load.
Comment #10
jbrown CreditAttribution: jbrown commentedSee #879346: Transactions that are automatically rolled back due to deadlock are not reattempted.
Comment #11
catchLooking through this is likely merge queries, see #910010: Potential race condition on variable_set() due to merge query implementation for some background.
Both search_index and history tables get hit with a db_merge(), this is using SELECT .. FOR UPDATE
{history} also has a ranged delete with no index #165061: New indexes on flood.timestamp and history.timestamp for cron runs., looking at http://stackoverflow.com/questions/2332768/how-to-avoid-mysql-deadlock-f... the situation described there looks similar (and includes situations with frequent inserts that don't need a SELECT .. FOR UPDATE to trigger a deadlock).
MySQL docs suggest using READ_COMMITTED transaction isolation level, but setting that is not an option for most people so I'd rather see SELECT .. FOR UPDATE nuked if we can. http://dev.mysql.com/doc/refman/5.0/en/innodb-deadlocks.html
Comment #12
cangeceiro CreditAttribution: cangeceiro commentedI can also confirm this using apachesolr and drupal 7 and only when using drush.
Comment #13
yang_yi_cn CreditAttribution: yang_yi_cn commentedI think this happens a lot on Drupal 6 too, if you set all db engine to use INNODB. In my case it happens to the {cache} table all the time, not sure if it is because the 2 web servers are both trying to write to the cache table.
Comment #14
BeaPower CreditAttribution: BeaPower commentedthis happened too in drupal 7.8 when I tried to create a content type for organic groups - named groups.
Immediately I cleared the cache, and haven't seen the error so far again... hope its gone for good.
Comment #15
Shane Birley CreditAttribution: Shane Birley commentedI am seeing this error in a D6 > D7 update - always into {cache_update} from the update.module.
Comment #16
Shane Birley CreditAttribution: Shane Birley commentedI am seeing this error in a D6 > D7 update - always into {cache_update} from the update.module.
Comment #17
WilliamB CreditAttribution: WilliamB commentedJust launched our drupal site for 3 days and i noticed that error in the logs too.
Subscribing.
Comment #18
Damien Tournoud CreditAttribution: Damien Tournoud commentedI found one cause for this. Please see #1369332: Avoid deadlock issues on the {node} table.
There are other causes, notably caused by the gap-locking feature. Those seem to go away with setting
innodb_locks_unsafe_for_binlog = 1
in the MySQL configuration.I have seen this type of pattern all the time:
... and I don't understand why MySQL would want to promote a
lock_mode X
into alock_mode X insert intention
(the later one being weaker, as far as I understand). We need input from MySQL guys here.Comment #19
basic CreditAttribution: basic commentedIf we can get this reproduced it would be nice to see if upgrading to the latest mariadb 5.3 helps. Its updates to mysql locking should help work around these deadlocks.
Comment #20
basic CreditAttribution: basic commentedI'm hoping the latest patch here fixes the history and watchdog deadlocks: #165061: New indexes on flood.timestamp and history.timestamp for cron runs.
Comment #21
geerlingguy CreditAttribution: geerlingguy commentedSubscribing. I had this happen when simply trying to do an INSERT INTO {realname}, on a relatively low-traffic site. Every now and then, I see another deadlock pop up, and it seems to not be very repeatable or consistent. But probably happens once a day or so.
Comment #22
deviantintegral CreditAttribution: deviantintegral commentedI was able to reproduce this by simply running a few drush scripts in parallel that generated nodes. Usually the deadlock would be in a cache_* table. Setting
innodb_locks_unsafe_for_binlog = 1
worked for me as well, though I haven't found anything that actually explains why that would fix the deadlocks.Comment #23
Heine CreditAttribution: Heine commentedThe key to this problem appears to be INNODB creating a gap lock on the index when a key is NOT in the index on a SELECT ... FOR UPDATE.
(Output below from
SHOW ENGINE INNODB STATUS
requires the the table 'innodb_lock_monitor' in the db. Create viaCREATE TABLE innodb_lock_monitor(a int) ENGINE=INNODB;
))SELECT ... FOR UPDATE with an existing cid results in:
SELECT ... FOR UPDATE with a cid that's not in cache_menu:
Depending on what's in your table, and what you are trying to insert, the gap lock can also hold the supremum pseudorecord.
To reproduce, start three mysql sessions and point them to your database (LOG1, T1, T2). Make sure no other processes use the database (server). Then setup as follows:
On LOG1 execute:
On T1 and T2 execute:
On T1 execute:
Next, execute on T2:
Execute
show ENGINE innodb status
on the LOG, note supremum lock.Now, time to INSERT some data.
eg On T2.
You'll notice that this query is blocked waiting; From LOG:
On T1:
INNODB aborts with a deadlock. Info from LOG:
If you try it, a first INSERT on T1 also blocks on acquiring the insert intention lock.
All on MySQL server 5.5.24-0ubuntu0.12.04.1
I have no more time to look into this further at the moment, hence the info dump.
Comment #24
Heine CreditAttribution: Heine commentedAfter thinking about it; It makes sense for locking reads to prevent insertions in the table when a primary key has not been found in order to keep selects within the transaction return the same information.
What to do? Re-issue the transaction?
NB Other deadlock causes may exist, claiming this issue for MergeQueries (and thus cache) on InnoDB.
Comment #25
Damien Tournoud CreditAttribution: Damien Tournoud commentedWell, that still doesn't explain why MySQL tries to promote a
lock X
to alock X+insert intention
on T1. It's my understanding that the later is actually weaker.Comment #26
Heine CreditAttribution: Heine commentedRight, per http://dev.mysql.com/doc/refman/5.6/en/innodb-locks-set.html it seems it always tries to get the intent lock
even though it might not be necessary (except for performance reasons).As it can not acquire the lock (T2 is waiting before it in the queue) we have a deadlock.
edit, from http://lists.mysql.com/internals/21065:
So, T1 *must* get an insert intention lock. If it just inserted, T2's repeatable read would possibly be violated.
Comment #27
Damien Tournoud CreditAttribution: Damien Tournoud commentedI pinged Michael Widenius (Monty) about this by email and he reported that this is the expected behavior. At this point, I'm not quite sure how to fix our MERGE query implementation. We could go back to an INSERT / if fails UPDATE strategy (aka INSERT OR UPDATE), but it cannot possibly result in the same exact behavior in all cases of MERGE query.
At this point, I think we should just explain that you should run you MySQL server with
binlog_format=row
andinnodb_locks_unsafe_for_binlog=1
(which is, despite its name, safe to use when using row-level replication). Oh, and also use MySQL 5.1 or later only because MySQL 5.0 will mostly not work as intended.Comment #28
Heine CreditAttribution: Heine commentedBecause a new MergeQuery implementation is quite difficult what needs to be done is (from DamZ on IRC):
- test scenarios with innodb_locks_unsafe_for_binlog=1
- implement a hook_requirements that checks the variable and provides feedback
- provide documentation to alert people using replication that they need to use row-level replication (and thus MySQL 5.1+)
- provide documentation that MySQL 5.0 is only supported for trivial setups
Comment #29
cheyer CreditAttribution: cheyer commentedinnodb_locks_unsafe_for_binlog=1 did not work for me.
We have lot`s of users filling in their user profile fields (including profile image) and I still see here and then the serialization error:
PDOException: SQLSTATE[40001]: Serialization failure: 1213 Deadlock found when trying to get lock; try restarting transaction: UPDATE {users} SET name=:db_update_placeholder_0, pass=:db_update_placeholder_1,
Comment #30
wnl CreditAttribution: wnl commentedWe are seeing this issue (I am fairly certain) on one of our sites. Unfortunately we are using Amazon's RDS (with master/slave replication), and it doesn't allow us to change either binlog_format or innodb_locks_unsafe_for_binlog. Sounds like there is no general solution coming from Drupal in the near future. Which means our only way to resolve this is to set up our own servers and abandon RDS, correct?
Comment #31
Damien Tournoud CreditAttribution: Damien Tournoud commentedSupposing that RDS runs a fairly recent MySQL version, you can probably change the transaction isolation to
READ COMMITTED
. In yoursettings.php
, you should be able to specify something like this:Comment #32
vinmassaro CreditAttribution: vinmassaro commentedSeeing this as well occasionally with Drush:
Comment #33
jhodgdonI have seen two random test failures in the last 24 hours due to this error, so I it needs to be fixed. I also found several other issues in the core queue that are duplicates of this one. I guess it's already at "major"; looks like it should stay that way...
Comment #34
catchThis is bad enough I think we should bump it to critical, at least for Drupal 8.
Marked #1862222: Random deadlock issues in simpletest as duplicate.
Comment #35
chx CreditAttribution: chx commentedComment #37
chx CreditAttribution: chx commentedThat's too verbose :) I did think on adding a set array with key-value but it'd be slower to process so this is it.
Comment #39
chx CreditAttribution: chx commentedSorry. Note that innodb_locks_unsafe_for_binlog is deprecated in MySQL 5.6.3 so we can't use that.
Comment #40
chx CreditAttribution: chx commentedTalked to Damien we need a requirements check that passes if a) binlog is off b) binlog_format is rows because it requires
SUPER
privileges to change it: http://dev.mysql.com/doc/refman/5.1/en/binary-log-setting.html. And you need rows cos http://dev.mysql.com/doc/refman/5.1/en/set-transaction.html
I think we might need to bump to 5.0.40 however because http://bugs.mysql.com/bug.php?id=21409 . There are 32 bugs for MySQL 5.0 InnoDB for read committed. I can't find anything higher 5.0 than 5.0.40.
For 5.1, we need to make sure that if someone runs 5.1 then it's at least 5.1.30 , see http://bugs.mysql.com/bug.php?id=40386 and http://bugs.mysql.com/bug.php?id=40360 .
Comment #41
catchDidn't actually change status.
Comment #42
chx CreditAttribution: chx commentedThe other solution , proposed by David Strauss and pointed out by Damien is to drop FOR UPDATE http://drupal.org/node/715108#comment-3159442
Comment #43
chx CreditAttribution: chx commentedBecause we will need extensive testing and that won't happen in D8 I am breaking process and posting a D7 patch first.
Comment #44
yched CreditAttribution: yched commentedJust a note that we seem to have other random testbot fails regarding "Deadlock found", but on INSERT queries - #1862222: Random deadlock issues in simpletest
http://qa.drupal.org/pifr/test/406773 is an example.
Comment #45
BerdirI tried, but wasn't able to reproduce this, tried both on my desktop and my considerably slower laptop. Both are running MySQL 5.5.28, which could be the reason?
So the only thing that I can confirm that the same test script that I was using to force some locks here works fine with the patch from #43.
I think this is the fix for #1862222: Random deadlock issues in simpletest, so we should probably move it back to 8.x and get a patch for it?
It's a 2-line patch that is trivial to revert, so my suggestion would be to jut commit it to 8.x and watch if we still get deadlocks...
Edit: Here's the snippet that I used and called with 10 parallel requests:
I was also experimenting with one that used
'deadlock_test' . rand(0, 1000000)
as the key, due to what Heine said about missing keys.Comment #46
chx CreditAttribution: chx commentedComment #47
catchI'm going to commit this to 8.x (once it comes back green) to see if it removes the random deadlocks from simpletest. While we're seeing how it runs we can see if there's a better option, but looking through this issue and the linked ones I'm not sure there is.
Comment #49
Fabianx CreditAttribution: Fabianx commented#46: That's rather nice, we now get a direct test failure for all previous deadlocks, it seems.
Comment #50
Anonymous (not verified) CreditAttribution: Anonymous commentedhttp://privatepaste.com/d68fd36556
perhaps we need a rollback:
Comment #51
chx CreditAttribution: chx commentedAfter a lot of testing and discussion -- if we are to lose the FOR UPDATE, the transaction becomes moot. I included a do-no-test patch rolled with diff -b for easier review.
Comment #52
jthorson CreditAttribution: jthorson commentedRTBC based on #47 (assuming we had the right root cause, this should bring a green result).
Comment #54
tim.plunkett#51: 937284_51.patch queued for re-testing.
EDIT: that was not a deadlock failure, it was
Found field with name 'user_admin_role' and value 3 Browser UserRoleUpgradePathTest.php 71 Drupal\system\Tests\Upgrade\UserRoleUpgradePathTest->testRoleUpgrade()
Comment #56
tim.plunkettSo I was able to reproduce locally, there was actually an error on the page, nothing specific to roles.
PDOException: SQLSTATE[42S22]: Column not found: 1054 Unknown column 'tags' in 'field list' in menu_local_tasks() (line 1909 of /Users/tim/www/d8/core/includes/menu.inc).
So I put $this->assertResponse(200) in there for good measure.
And then, I was trying to print out that query, and lo and behold not executing it, and letting the foreach trigger that, "fixed" it?!I don't understand.
EDITED
Yeah I must be on drugs. foreach does not do anything of what I just said above. The test passed locally because that change meant the query wasn't run.
BUT! It does help remove the red herring of the role, and I can debug more.
Comment #58
chx CreditAttribution: chx commentedMy Rackspace instance DOES manage to reproduce the testbot fail. Yay! I will debug but if you want access find me and drop me an SSH key.
Comment #59
chx CreditAttribution: chx commentedThere are so many bugs here it makes my head dizzy. The main culprit is system_update_8007 passed prefixed tables to db_field_exists which expects nonprefixed which means the cache tables didnt get a tags column. The merge query from cache however ran happily... I suspect #1872020: Insert throws the wrong exception this is going on. Meanwhile, #1158322-104: Add backtrace to all errors makes debugging an interesting challenge combined with the fact debug_backtrace on the minimal PHP version doesn't support IGNORE_ARGS. We really badly need to go over to 5.3.10.
Anyways. We revert to using a fixed list of cached bins and providing a helper so contrib can up their cache bins. Instead of writing five new update functions I just fire all eleven core bins from system. It won't hurt cos there is, there needs to be a db_table_exists in the helper.
Thanks goes to berdir for pointing out the fixed list as an approach (and Carl Orff and Corvus Corvax for making awesome music out of Carmina Burana that kept me awake till 4am).
Comment #60
chx CreditAttribution: chx commentedNote that salvaging the "let's find cache tables" routine is not impossible but very challenging due to the possibility of per table prefixes. You would need to go over the per table prefixes (if any) where the table starts with 'cache' and build a mapping of the prefixed table to the nonprefixed one. Then compare the found tables to this map and if not found, remove the default prefix if there is one. Then you would need to somehow test this. I vote against this. (Oh and it's not exactly forbidden to use prefixes to make two cache table point to the same table! It's madness but it'd not be always broken. Really, let's not try.)
Alternatively we could patch db_field_exists and db_add_field to work with prefixed tables. That's better... but it's quite a bit of work for not a lot of benefit.
Comment #62
chx CreditAttribution: chx commented#59: 937284_59.patch queued for re-testing.
Edit: it was a lock test fail, could acquire a second lock in the same request. Hope it's unrelated.
Comment #63
BerdirHere is a small update to improve the docs on that function.
I'm not sure about the function name, but we also have system_schema_cache_8007(), so it kind of makes sense.
This will need a change notification, either an update somewhere or a new one, to inform contrib that they will need to update their cache bins.
Comment #64
webchickI confess I don't follow everything in here, but I know catch looked at a slightly earlier version and said it was good to go. I'm also told this might help cut down on random testbot failures.
Committed and pushed to 8.x, but this needs some more discussion and testing before committing to D7.
Comment #65
deviantintegral CreditAttribution: deviantintegral commentedHere's a backport to 7.x of 532ea17 (#64), dropping the changes to the cache tables. I resurrected the site that I was able to replicate this on and the patch totally fixes the deadlocks. Excellent! I look forward to maxing out many cores with devel generate :)
Comment #66
Peter Bowey CreditAttribution: Peter Bowey commented#65 works fine in Drupal 7.18
Tested on a medium size database site
Comment #67
chx CreditAttribution: chx commentedI am not RTBCing this because #1792536: Remove the install backend and stop catching exceptions in the default database cache backend needed to change how Merge retries to get tests pass once the exception catch from cache -- the main Merge consumer in core -- is removed. I recommend moving that over; search for retry in the patch.
Comment #68
YesCT CreditAttribution: YesCT commentedNW for #67
For clarification, should this wait on #1792536: Remove the install backend and stop catching exceptions in the default database cache backend?
Or should that be closed and totally incorporated into this?
Comment #69
hass CreditAttribution: hass commentedComment #70
YesCT CreditAttribution: YesCT commented#65: 937284.65-deadlock-merge.patch queued for re-testing.
Comment #71
mcrittenden CreditAttribution: mcrittenden commentedJust a note that #1792536: Remove the install backend and stop catching exceptions in the default database cache backend is fixed now, but only for D8 and a backport isn't being discussed, so I'm not sure how it affects this. Chx?
Comment #72
Gastonia CreditAttribution: Gastonia commentedI can confirm that #65 did not work for me. Running 7.19 and applied patch. Like many others I am using drush scripts simultaneously to batch save nodes, but at the same time this comes up at well as users try to post.
I also applied the mysql configuration settings as suggested above, and that did not work for me either.I have written about this before, most notably here:
https://drupal.org/node/1369332 starting with #45. There it was suggested that I was having a problem with the locations module, although this has happened to me on other tables as well. It's still unclear to me whether this fix is distinctly targeting specific tables, or should this eventually fix all deadlock problems regardless of table or module?
EDIT: Above I had not hit the save button on the my.cnf file before running the test.
Setting the mysql binary log and innodb settings as mentioned above seems to have greatly improved the situation. My scripts have been running for an hour straight, non stop, no errors. I'll crank it up a few notches and report back.
Anyone know why those settings make a difference?
Comment #73
chx CreditAttribution: chx commentedIf #65 doesn't work for you, I can't help. That was our best generic effort. Can you try #39 however?
Comment #74
mikeytown2 CreditAttribution: mikeytown2 commentedWithout changing mysql settings #65 does work for me. Trying it out with 12 apache processes doing user_save across 20k users. Before the patch in #65 drupal would error out on me. I would say RTBC for #65.
Comment #75
Jooblay.net CreditAttribution: Jooblay.net commentedGreat work drupal community thanks to everyone...
Comment #76
mikeytown2 CreditAttribution: mikeytown2 commentedI did encounter a deadlock even with the patch in #65, but the occurrences of happening has been greatly reduced. In this case I would classify the deadlock as a bug because of the abuse of the variables table. #1903498: PDOException: SQLSTATE[40001]: Serialization failure: 1213 Deadlock found when trying to get lock; .
Still see this as RTBC.
Comment #77
chx CreditAttribution: chx commentedGo with it then. It's better for sure.
Comment #78
David_Rothstein CreditAttribution: David_Rothstein commentedThis can't be working correctly - that class doesn't exist in Drupal 7.
Otherwise, the overall patch makes sense to me (and awesome research in this issue!) though removing transactions has the potential to cause as many problems as it fixes... Only a couple people have commented above that they actually tested the patch in any scenario at all. Has anyone else in the interim?
Comment #79
mikeytown2 CreditAttribution: mikeytown2 commentedFor what it's worth I've been using this patch in production on a mainly user authenticated site for the last month.
Comment #80
AnybodyThe patch works great for me. Any plans to bring it into Drupal 7 Core soon?
Comment #81
Damien Tournoud CreditAttribution: Damien Tournoud commentedAlso, I don't know why the transaction was removed here, but I think we should restore it.
Comment #82
Fabianx CreditAttribution: Fabianx commented#81: Removing the transaction is the whole point of the patch ...
#80: The patch needs work as it is trying to catch an Exception that does not exist. (as per #78)
Comment #83
Damien Tournoud CreditAttribution: Damien Tournoud commented@Fabianx: no, removing the
->forUpdate()
is the point of the patch. The transaction is useful and should be kept.Comment #84
BerdirWithout the forUpdate(), doing the transaction becomes useless, I unfortunately can't find the discussion in here, but it was discussed (with david strauss among others I think), see #51.
We had patches without forUpdate() and the transaction and it still blew up. Note that we have found and fixed the source of the problem for the 8.x later (each tearDown() cleared persistent caches of the parent site) but similar things can still happen on real sites with a lot of requests trying to write caches.
Comment #85
jthorson CreditAttribution: jthorson commentedThe privatepaste in #50 (now expired) showed why the transaction was being removed ... IIRC (and these are some foggy memories!), we were doing some mysql testing and had identified that a race condition between two simultaneous transactions (i.e. which happened on testbot due to concurrent runs) would result in unexpected behaviour. I don't have the exact details, but it was related to having two transactions, inserting into one, then inserting the same query into the other would cause a hang until the first transaction was closed; after which a read on the second still gave you the pre-transaction values (presumably because that was the db state when the second transaction start occurred, even though the value had been modified by a different transaction in the meantime?)
In addition, after removing the forUpdate(), I think we were down to a single db call ... which made the transaction redundant.
Not sure if chx and beejeebus may have a better memory than I. :)
Comment #86
Fabianx CreditAttribution: Fabianx commented#85 Yes, I remember this like you. The transaction definitely led to the deadlock.
Comment #87
fishkaz CreditAttribution: fishkaz commented#59: 937284_59.patch queued for re-testing.
Comment #88
Fabianx CreditAttribution: Fabianx commentedFWIW, I encountered this bug reproducible within a production system.
Comment #89
xmacinfoAny traction here? Were have deadlock issues with FOR UPDATE on Drupal 7.23.
Comment #90
Anonymous (not verified) CreditAttribution: Anonymous commentedShame to keep this issue stalled on an invalid exception type. Reverted it to the original catch-all Exception.
Comment #91
chx CreditAttribution: chx commentedlet's do this. Consider https://drupal.org/node/715108#comment-3159540 as the main endorsement for dropping FOR UPDATE: it buys nothing.
Comment #92
xmacinfoWe had to temporarily switch the table type to MyISAM to prevent the lock. Will be glad to switch back to InnoDB once fixed.
Comment #93
vinmassaro CreditAttribution: vinmassaro commentedIs this patch supposed to be used in conjunction with changes to database configuration (READ-COMMITTED or innodb_locks_unsafe_for_binlog)?. We've been load testing one of our sites with a fast user login test via CAS authentication, 100 users every 15 seconds, ramping up to 500 simultaneous logins over a few minutes. We use LDAP to map a few fields to their user profile upon login. During testing, we see deadlocks like this:
After changing isolation level to READ-COMMITTED and running the test again, deadlocks are greatly reduced, going from ~150+ down to 2. This was at the recommendation of Percona support. We thought this change was a panacea until they brought up that we could run into this bug since some tables set up by contrib modules across many of our sites (this is a non-multisite, managed university Drupal environment) do not have primary keys, and this could cause replication to get out of sync.
We reverted READ-COMMITTED and ran our test again using this patch but still saw nearly as many deadlocks as our first test. Wondering if anyone has any ideas. Thanks in advance.
Comment #94
chx CreditAttribution: chx commentedWe do not touch the isolation level, see #40 for more.
Comment #95
JordanMagnuson CreditAttribution: JordanMagnuson commentedI'm getting a number of deadlocks coming from
node_tag_new()
(node.module) when trying todb_merge('history')
.Example watchdog error:
Here's a screenshot of the full watchdog output: http://i.imgur.com/oRWe5ef.png
Here's output from SHOW ENGINE INNODB STATUS:
Comment #96
mikeytown2 CreditAttribution: mikeytown2 commentedRelated info
http://www.mysqlperformanceblog.com/2013/12/12/one-more-innodb-gap-lock-...
Comment #97
mikeytown2 CreditAttribution: mikeytown2 commentedBeen playing around with deadlocks today, trying to figure out where a good place to put retry transaction logic #476048-17: MySQL transient error handling. Haven't found a good place yet. But I'm able to repo deadlocks (with the latest patch #90 applied) if using the database as a cache backend and have multiple threads doing node saves and enabling/disabling modules. This got me thinking about clearing/setting other things inside of a transaction. It seems like a good way to trigger a deadlock is to run multiple threads of a function that calls db_transaction(). node_save starts a transaction and enabling a module will trigger registry_update(); thus the repo case.
I have an interesting idea, what if the core database as a cache backend used it's a copy of its own database connection so that cache_set/clear calls do not happen inside of a transaction. If using something like memcache these cache_set/clears would not be rolled back as part of the transaction. Same applies to lock.inc. More info on creating database connections https://drupal.org/node/18429. This would place these db operations (create, insert, update, delete) outside of the transaction as it would be using a different database connection for the cache and for locks. We might want to expand it to other things as well if needed; but I know this should be ok as this is what memcache effectively does.
Comment #98
David_Rothstein CreditAttribution: David_Rothstein commentedAlright, based on the additional research and testing (thanks for that!) it sounds like we can go with #90. Complicated issue, so hopefully it doesn't backfire on us :)
Committed to 7.x - thanks! http://drupalcode.org/project/drupal.git/commit/70e0b16
Sounds like there could be additional improvements here (based on the last couple of comments above) but I suppose those should be in new issues?
Comment #99
mikeytown2 CreditAttribution: mikeytown2 commentedThanks for getting #90 in :)
Followup issue based off my observations in #97
#2164849: Enforce READ COMMITTED transaction isolation level
Comment #100
jerdiggity CreditAttribution: jerdiggity commentedIf it helps, here are the details of my recent encounter with the same (or similar) error which occurred while running
update.php
on a D7 production site. The only updates being applied were:Toward the end of applying the updates I was redirected to
update.php?op=finished
(as opposed to the normalupdate.php?op=results
), at which point I was shown this message:NB: the
onlymain reason I'm adding this comment to this particular (fixed) issue is because it's still tagged with needs backport to D7.Comment #101
mikeytown2 CreditAttribution: mikeytown2 commented@jerdiggity
This has been fixed in Drupal 7.25 as #98 says. So if you are still experiencing deadlock issues and you are using 7.25, you should follow the issue linked in #99.
Comment #103
pounardMay be a stupid question but why don't the merge queries use theForget it found the explaination elsewhere!INSERT ... ON DUPLICATE KEY ...
syntax on MySQL ?Comment #104
pwolanin CreditAttribution: pwolanin at Acquia commentedAny reason this shouldn't be applied to Drupal 6 also?
Comment #105
pwolanin CreditAttribution: pwolanin at Acquia commentednevermind - we are seeing some insert deadlocks in 6, but the code is quite different