Support for Drupal 7 is ending on 5 January 2025—it’s time to migrate to Drupal 10! Learn about the many benefits of Drupal 10 and find migration tools in our resource center.
Even after using #2193149: Deadlocks occur in cache_field table. & #2222635: Waiting for table metadata lock on cache_field table I'm still getting deadlocks that cause performance issues. I decided to use async queries outside of the transaction as a way to fix this. Side benefit of this is a 5x speedup of all cache_set() calls.
Comments
Comment #1
mikeytown2 CreditAttribution: mikeytown2 commentedComment #2
mikeytown2 CreditAttribution: mikeytown2 commentedComment #4
DamienMcKennaJust out of interest, does it make any difference when entitycache is enabled?
Comment #5
mikeytown2 CreditAttribution: mikeytown2 commentedLooking into the failures and I believe it is caused by the async query. cache_set no longer blocks until the query has completed; thus for tests where it does a cache_set followed by a cache_get the cached object isn't there yet. I might be able to reduce this issue by using this connection for cache_get as I set the isolation level to READ-UNCOMMITTED.
I have not tested it with entity cache yet.
Comment #6
mikeytown2 CreditAttribution: mikeytown2 commentedMain issue with the last patch was database prefixes; I've fixed that (along with some other minor things) and I now get intermittent failures on the cache tests (same test rarely fails twice in a row). To get this to work 100% of the time with simpletest I would need to use the same db connection for the cache_gets. With the new patch, speedup is only 2x instead of 5x; but cache_set seems to be working 100% of the time (was getting random failures before excluding simpletest).
Comment #8
mikeytown2 CreditAttribution: mikeytown2 commentedcache_get now uses the async connection if a cache_set was done; should make the test bot happy.
Comment #11
mikeytown2 CreditAttribution: mikeytown2 commentedOne small change
Comment #15
mikeytown2 CreditAttribution: mikeytown2 commentedLatest patch failed with this message.
Field is translatable. Other field.test 3190 FieldTranslationsTestCase->testTranslatableFieldSaveLoad()
Patch passes 2nd time around.
Comment #16
mikeytown2 CreditAttribution: mikeytown2 commented$mysqli->reap_async_query can be super slow in rare cases. Will see what I can do to address this issue.
Comment #17
mikeytown2 CreditAttribution: mikeytown2 commentedEnded up using a connection pool to work around the slow queries. I've recorded some cachegrinds from cache clears and it shows that drupal will sometimes spend more time in cache_get than in cache_set with this patch; mind blown because writes are effectively faster than reads.
Comment #18
mikeytown2 CreditAttribution: mikeytown2 commentedReal patch
Comment #19
mikeytown2 CreditAttribution: mikeytown2 commentedSet the wait_timeout to 45 seconds and it will ping the DB after 40 seconds to make sure the connection is up before using it again.
I tried using persistent connections but $mysqli->close() seems to stall so getting that working seems like a no go.
Comment #23
mikeytown2 CreditAttribution: mikeytown2 commentedLooks like the wait_timeout and reconnect logic needs some help. I'll up the timeout to 90 seconds as testing takes longer than 45 once I get the reconnect stuff figured out.
Comment #24
Fabianx CreditAttribution: Fabianx commentedUhm, there is a better way to get a new DB connection, you just need to use a new key in settings.php or even just a new 'target'.
Comment #25
mikeytown2 CreditAttribution: mikeytown2 commentedThe big performance gain comes from asynchronous queries; which is not available with PDO
http://php.net/manual/en/mysqlinfo.api.choosing.php
Comment #26
Fabianx CreditAttribution: Fabianx commentedAH :), makes sense. So we could still write a DB driver for async mysql, but would need to use mysqli directly instead of PDO.
Comment #27
mikeytown2 CreditAttribution: mikeytown2 commentedWriting a DB driver for async mysqli is essentially what I've done to a very limited extent here. I'll work on the reconnect logic and post a new patch once I have that working.
Comment #28
Fabianx CreditAttribution: Fabianx commentedWe discussed this briefly and this could solve the critical of #2347867: Race conditions with lock/cache using non-database storage - add a non-transactional database connection while making the async support optional, but easily available via a config change / contrib module.
Comment #29
mikeytown2 CreditAttribution: mikeytown2 commentedThe wait_timeout and reconnect is now working at 1 second; so that bug has now been fixed. This patch uses 90 seconds as that seems to be a good happy medium; in the future the current wait_timeout should be taken into account when setting this value. I also fixed some issues with the connection pool management; the $db_connections array is now keyed off of $mysqli->thread_id. This allows for way better house keeping when using mysqli_poll(); luckily $mysqli->thread_id seems to be a noop from my testing.
pg_send_query() seems to be the way to do this using PostgreSQL; I don't use PostgreSQL though.
Comment #30
Fabianx CreditAttribution: Fabianx commentedLooks great! :)
Comment #31
btopro CreditAttribution: btopro commentedis there some suggestion of how to test this? an operation that you know invokes a ton of cache sets to do an A / B comparison / try and bench mark? Thinking this would reduce the pain after a full cache clear yes?
Comment #32
mikeytown2 CreditAttribution: mikeytown2 commentedIt knocks 5 seconds off of a cache clear (on our test server), so yeah things that use cache_set will see the biggest improvements
Comment #33
mikeytown2 CreditAttribution: mikeytown2 commentedGave everything proper documentation & did some micro optimizations (1 less query per connection if # of connections is greater than 1). The only place that has hard coded values is now inside of the advanced drupal_static() pattern; thus those can be overwritten by modifying
drupal_static('drupal_mysqli_get_object')
if it's really needed.Comment #34
mikeytown2 CreditAttribution: mikeytown2 commentedLittle bit more refactoring; created 2 new functions and reduced code duplication... Looks like I need to merge this in better next time; some of the documentation was accidentally changed. Will wait for the testbot to pass before fixing it. Should also handle the $error array from mysqli_poll.
Comment #35
mikeytown2 CreditAttribution: mikeytown2 commentedThis one handles errors (like dropped connections) now. This is in it's final form as a D7 core patch. Future goals for asynchronous MySQLi connection pools is to combine it with Cache MultiGet & Alternative Database Cache to create the best DB backed cache backend. This is needed even when things like memcache/APC are in use do to the nature of the cache_field table in D7.
Getting an asynchronous connection pool into D8 core would be nice; unfortunately I won't be the one creating the patch for it. I hope someone picks up the torch as all five issues listed here https://groups.drupal.org/node/415883 make drupal core scale way better by its self.
Comment #36
AnybodyWhao this looks great! Is there any change in configuration one has to make after the patch has been roled against core or does it simply work?
Comment #37
btopro CreditAttribution: btopro commentedno. it simply "Just Works" (TM). We run this in production as part of our pressflow fork -- https://github.com/btopro/Presser-Flow-FORK
Comment #38
gielfeldt CreditAttribution: gielfeldt commentedTwo questions:
1.) Does this belong in core? The way the Drupal is designed, this would easily fit into a custom cache backend.
2.) Ins't this approach inherently "dangerous"? Using async queries for writing cache may improve speed, but the application is now under the wrong assumption that the operation has been performed. I can think of at least one scenario where this could become a problem:
#1 Read from cache.
#2 Cache is empty
#3 Acquire lock
#4 Fetch from DB.
#5 Write to cache (cache write is actually delayed. This is my assumption of how async queries work, since I've never used the mysql async feature before, please correct me if I'm wrong. If I'm wrong, just ignore this post :-).
#6 Release lock
The above scenario is a standard cache/lock pattern, and using async queries for the cache could potentially introduce a thundering herd problem, which the lock is supposed to eliminate. It sounds like this is the issue you addressed for the test bot. However, I think your solution with re-using the connection from the pool will only solve the problem inside that particular thread. Once concurrency is added, the problem will remain.
Also; using another connection for the cache introduces this problem: #1679344: Race condition in node_save() when not using DB for cache_field
Comment #39
mikeytown2 CreditAttribution: mikeytown2 commentedI recently created a cache module. It works around the cache_field race condition by using the core db connection so that operation happens inside of the transaction.
https://www.drupal.org/project/apdqc
I'll be creating a lock.inc for apdqc (#2387371: Create a lock back end that uses the APDQC connection) which can have some logic so that the lock doesn't get released until the write to the db is complete.
Comment #40
Sara101 CreditAttribution: Sara101 commentedIs this patch still recommended if using the Asynchronous Prefetch Database Query Cache module or is the module a replacement for this issue?
Comment #41
ibtisem CreditAttribution: ibtisem commentedHi ,
I am working on Drupal 7.69 for website on linux server and mysql . when i run the cache since backend : (/admin/config/development/performance) I face an error :-
PDOException: SQLSTATE[HY000]: SQLSTATE[40001]: Serialization failure: General error: 1205 Lock wait timeout exceeded; try restarting transaction ... after that the server freezes and website inaccessible .
After searching on similar issues I found that solution suggested are using :
add to settings.php file
$databases['default']['default']['init_commands']['isolation'] = "SET SESSION tx_isolation='READ-COMMITTED'";
can this line correct the deadlock error ?
Any suggestions would be appreciated.
Thanks