Support for Drupal 7 is ending on 5 January 2025—it’s time to migrate to Drupal 10! Learn about the many benefits of Drupal 10 and find migration tools in our resource center.
MySQL can generate some transient errors when running queries. These can be due to an overloaded MySQL server or due to database deadlocks (when using InnODB).
In many cases, simply rerunning the query would make it succeed, but Drupal has no handling for this currently.
The attached patch attempts to re-execute a failed query N times until it succeeds, based on the error that was thrown. It works for MyISAM as well as InnoDB.
The patch is based on information from http://openquery.com/blog/error-handling-for-mysql-applications and applies cleanly to Drupal 6.12.
Comments
Comment #1
cafuego CreditAttribution: cafuego commentedActually, having this kind of error handling integrated into D7 would be great too.
Comment #2
cafuego CreditAttribution: cafuego commentedI've had a look at the D7 database code, but I can't easily get my head around it.
Still, it would be great to have this error handing go into D7 before it is released, so I'm changing the version tag on this to 7.x.
Comment #3
Damien Tournoud CreditAttribution: Damien Tournoud commentedHighly transactional systems, like Drupal + MySQL should never try to reissue transactions. It can only provoke meltdowns (on the front-end server, or on the database server, depending on the root cause of the initial error).
The only error codes I think are worth discussing are: 2006 (CR_SERVER_GONE_ERROR) and 2013 (CR_SERVER_LOST). Those two can be triggered by scripts running longer then the wait_timeout, which can easily happen on cron (while fetching new RSS feeds, for example). Those generate the infamous "MySQL Server has gone away" errors.
Comment #4
arjenlentz CreditAttribution: arjenlentz commentedDamien, Drupal is not highly transactional, it merely issues a lot of queries. Not the same thing at all ;-)
But the issue is that we're talking about transient errors, i.e. non-fatal error situations; the specific intent of those errors is that the app appropriately re-issues the query or transaction (a db server can't magically re-run an entire transaction as params could change, so it's something the app needs to handle). In any case, they are not "errors" at all, in the usual sense of the word.
Regarding them as fatal errors is, strictly speaking, a bug.
With any transactional engine (or other RDBMS, including PostgreSQL, Oracle, etc) a deadlock, for instance, is a perfectly normal and *transient* error, and an app should re-run the statement/transaction at least once.
Lock wait timeouts should also be handled, the key there is that the default innodb_wait_timeout is 50 seconds which is way too long. But that setting is not directly a Drupal-code concern, since if a user is affected by the 50 second wait, re-running the transaction/query (which will then succeed) will then at least come up with a real result rather than a failure. And on a properly configured production server, the user won't notice at all exactly because the handling is correct.
I should also note in this respect that these things are really not MySQL specific; however the misconceptions on error handling are mainly unfortunate bad development habits from the old "MyISAM non-transactional way of doing things". But the even when using the MyISAM storage engine, the "proper" workflow for these cases works. For more detail on that background, more possible "errors" to handle, and appropriate ways of handling them, please see:
http://openquery.com/blog/error-handling-for-mysql-applications
and
http://openquery.com/blog/migrating-mysql-myisam-apps-to-the-innodb-stor...
Comment #5
arjenlentz CreditAttribution: arjenlentz commentedCode-wise, I have the following suggestions.
For the transient errors
- there doesn't need to be a delay before re-running the query.
- if inside an explicit transaction, issue a ROLLBACK before restarting.
- only retry once - perhaps twice.
For
case 11: // Resource temporarily unavailable (EAGAIN)
case 1040: // Too many connections
I'd agree that rerunning on these errors can cause serious trouble; it needs to be reported through the backend.
Comment #6
Damien Tournoud CreditAttribution: Damien Tournoud commented@arjenlentz: if your front-end server receives 200 requests per second, blocking a request for 50ms to reissue a query means that you are blocking 10 requests. Drupal doesn't issue a lot of queries, but as all web applications, it is highly transactional. In that context, it is critical never to block.
I don't know about the lock-related error codes (1205, 1213, 1614), because I never saw those happening on production, but clearly retrying on 11 and 1040 (EAGAIN, Too many connections) is a very bad idea.
Comment #7
arjenlentz CreditAttribution: arjenlentz commented@Damien: okidoki, then I suggest cafuego simply applies my code suggestions, and we'll be right!
For the transients no delay before reissuing is necessary, and I agree with you on 11 and 1040.
Agree?
Comment #8
Crell CreditAttribution: Crell commentedI'll be perfectly honest and say that I've never seen one of those errors happen to me in production. Just how common are they? I'm a little concerned about adding complexity for an extreme edge case.
Comment #9
arjenlentz CreditAttribution: arjenlentz commentedThey happen, naturally more under load. And also when components use multi-statement transactions. That's not reason not to do them, it's just a fact and perfectly normal. All good db apps have this logic built-in.
It's not an extreme edge case. It's just that MyISAM, through its design, is deadlock free. This has made app coders lazy, in a way - in reality they're just unaware that all other db servers and MySQL engines need to deal with it.
Comment #10
cafuego CreditAttribution: cafuego commented@Crell; I have deadlocks happen on Drupal (and other applications) from time to time; even when load is not particularly high. It just depends on what the site users are doing.
You'll never hit these particular errors if you're using MyISAM though.
I'll rework the patch as per @arjenlentz suggestion.
Comment #11
cafuego CreditAttribution: cafuego commentedI've updated the patch to no longer check errors 11 and 1040 - and also added the transient error check to the mysqli connector.
I'm running on my Drupals and it seems to be fine.
Comment #12
cafuego CreditAttribution: cafuego commentedUgh. I got the mysqli_query() call back to front. More coffee needed. Updated patch (tested *and* working) attached.
Comment #13
mikeytown2 CreditAttribution: mikeytown2 commentedre-roll of #12.
Comment #14
Damien Tournoud CreditAttribution: Damien Tournoud commentedIf you are coming here because of DEADLOCK issues on InnoDB, you are on the wrong issue. See #937284: DEADLOCK errors on MergeQuery INSERT due to InnoDB gap locking when condition in SELECT ... FOR UPDATE results in 0 rows for the cause and potential solutions.
Bumping this to 8.x since it needs to be dealt with there.
I'm not sold on retrying, expect in case we are 100% certain it is safe to do so (we already do that in SQLite in a particular case). We should only retry once, and then give up. Looking at the current patch:
^ Those looks decently safe.
^ I doubt retrying will be idempotent (the error says that a transaction branch was rolled back, after all), so we should not retry.
^ We don't auto-reconnect, so there is no point in retrying on error 2013.
Comment #15
mikeytown2 CreditAttribution: mikeytown2 commentedD6 patch for pressflow
Comment #16
mikeytown2 CreditAttribution: mikeytown2 commentedFixed a bug with the last patch
Comment #17
mikeytown2 CreditAttribution: mikeytown2 commentedHow to catch these errors in D7. Question is where would it go, and how to restart the transaction?
https://api.drupal.org/api/drupal/7/search/%3A%3Aexecute
Comment #18
Damien Tournoud CreditAttribution: Damien Tournoud commentedContrary to what I said in #14, it seems like it is only safe to retry the last query on error
occurs after a configurable timeout, which is of 50s by default. Retrying anything after having waited for 50s is just a no-go for a transactional application like Drupal.
This makes this whole issue won't fix.