Waiting for table metadata lock on cache_field table [#2222635]

In the last couple of days I've have my connection count to MySQL go above a set threshold of 35 sometimes peaking at over 150. When this happens SHOW PROCESSLIST and SHOW ENGINE INNODB STATUS are ran so I can investigate what is causing the issue. After looking it over it appears that TRUNCATE cache_field is causing the issue. Looking over the options it appears that TRUNCATE commands need to not be ran on a live table. This means we need to rename the table before the truncate happens.

http://bugs.mysql.com/bug.php?id=61935#c368453
http://stackoverflow.com/questions/17762361/truncate-table-taking-very-l...
http://dba.stackexchange.com/questions/20600/what-can-cause-truncate-tab...
http://whatwebwhat.com/en/page/466/truncating-big-mysql-innodb-tables

Comment	File	Size	Author
#46	Query-61.txt	17.52 KB	shardesty
#46	show_process_list_2.txt	9.38 KB	shardesty
#46	show_process_list_1.txt	19.44 KB	shardesty
#46	query_2.txt	1.44 KB	shardesty
#46	query_1.txt	1.19 KB	shardesty
#38	show_processlist_during_lock.txt	28.33 KB	shardesty
#38	innodb_status_during_lock.txt	43.89 KB	shardesty
#26	drupal-2222635-26-rename-truncate.patch	2.46 KB	mikeytown2
#14	cache-id-delete.patch	1.98 KB	danblack
#3	drupal-2222635-3-use-temp-cache-table.patch	958 bytes	mikeytown2

Comments

Comment #1

mikeytown2 commented 21 March 2014 at 00:36

Issue summary:

View changes

Comment #2

mikeytown2 commented 21 March 2014 at 00:55

The plot thickens... this means that AUTOCOMMIT will not help.
TruncateQuery::__toString

Comment #3

mikeytown2 commented 21 March 2014 at 01:47

Status:

Active

» Needs review

Status	File	Size
new	drupal-2222635-3-use-temp-cache-table.patch	958 bytes

Created a patch that does this

CREATE TABLE IF NOT EXISTS {cache__truncated_table} LIKE {cache};
RENAME TABLE 
  {cache} TO {cache__temp_table}, 
  {cache__truncated_table} TO {cache}, 
  {cache__temp_table} TO {cache__truncated_table};
TRUNCATE {cache__truncated_table};

This way the DDL statement will no longer lock other queries and it's valid syntax http://dev.mysql.com/doc/refman/5.0/en/rename-table.html

Comment #4

mikeytown2 commented 24 March 2014 at 23:03

Issue summary:

View changes

Comment #5

danblack commented 4 April 2014 at 05:35

#2229013 is related and as you can see in the comment #6 I've done a delete by create time, that way it doesn't block new inserts. Access of ones due to delete AFAIK will get a deadlock error which is caught by the cache module and ignored.

The problem with renaming is you've put your entire caching offline for the time (despite it being quicker than a DELETE ALL). More recent mysql truncate table effectively do you what you've done in SQL.

Comment #6

mikeytown2 commented 4 April 2014 at 17:27

I'm having issues where a truncate takes over 30 seconds on MySQL 5.6 and I then get over 150 connections to the database all waiting on metadata for the cache_* table in a short amount of time. I do agree that since the rename is atomic, MySQL will lock here temporally preventing access to that table while the rename is happening; I believe it will not error out and it will merely wait for the rename to finish (from what I've seen on our MySQL server). Drupal does not use any foreign keys on the cache tables so luckily that is one less thing to worry about here as well. I'm also not dropping any tables as that is slow.

There are some disadvantages when comparing DELETE * to TRUNCATE according to http://dba.stackexchange.com/questions/20600/what-can-cause-truncate-tab..., but I think you addressed this in #2229013: PDOException: SQLSTATE[HY000]: General error: 1030 Got error -1 from storage engine: TRUNCATE {cache_field} ; Array ( ) in cache by using the created column. The main issue I see with using created is accurate time keeping in a multi-webhead environment and time resolution. There is the possibility of stale data persisting in the cache. Some of these concerns could be mitigated by using time() instead of REQUEST_TIME. Out of sync system clocks in a multi-webhead environment is where the created delete isn't as good as this patch from my point of view. There is also a slight chance of old data not getting cleared because it is "<" instead of "<=".

Comment #7

danblack commented 4 April 2014 at 18:29

Yes, I believe I've addressed the DELETE FROM by adding the date criteria.

In addition the the previous time based concerns mentioned above there is overheads of indexes #165061.

How about adding a serial column on the cache and then doing SELECT serialid FROM cache ORDER BY serialid DESC LIMIT 1; DELETE FROM cache WHERE serialid < {selected_id}? Mitigates time, metadata locks and allows new cache to be created on new items after this. (I'll wait until 2224847 is committed before doing a patch on this approach).

Comment #8

mikeytown2 commented 4 April 2014 at 21:00

I like what your trying to do in #2224847: Automatically shorten cid's in Cache\DatabaseBackend and PhpBackend. Do you think using a serialid (assuming an auto-increment column) would introduce new issues, like hitting the 32bit limit?

ps https://drupal.org/filter/tips#filter-filter_autop

#1234: "Mark all as read" in forums. (or #1234: "Mark all as read" in forums. for comments) turn into links automatically, with the title of the issue appended.

Comment #9

danblack commented 4 April 2014 at 21:26

serial:big perhaps :-)

postgres has a CYCLE option. MySQL will fail once reached however it can be reset back with an alter table statement though it will need a test to see what happens with existing data there. However once this happens the logic on previous optimisation falls down quickly and though you end up with:

select MAX(id)...
insert into ... (dummy entry)
if last_ID < MAX_ID - range delete
else delete < last_ID

Comment #10

mikeytown2 commented 4 April 2014 at 21:57

Truncate cache on cron if max is within 1 billion of the max value should take care of that issue.

Comment #11

danblack commented 8 April 2014 at 12:08

Time options

Given the availability of alternate higher performing caching options should we even worry about the index insert time for created?

I guess we need to assess all of these https://api.drupal.org/api/drupal/includes%21cache.inc/function/calls/ca...

Looking - most seem to be admin, cron or test related.

Autoincrement overflow handling

> Truncate cache on cron if max is within 1 billion of the max value should take care of that issue.

Or catch exception on reaching and process it:

renumber or better in a locking sense:

select count(id) as size from cache
alter table cache AUTO_INCREMENT = {size + safety margin before next begin occurs}
begin
create temporary table x as select id from cache
SET @newid=0
update cache,x SET cache.id=(@newid:=@newid+1) where cache.id=x.id;
delete from cache using cache,x where x.id=cache.id
drop table x
commit

Comment #12

mikeytown2 commented 5 April 2014 at 00:26

I think your approach (delete where serialid) will work out well for D8. I think it's too risky to do in D7 though; people access the cache_* tables directly instead of using the API, thus doing something more conservative would be ideal (my vote is for D7 is the rename trick).

Catching the exception when auto increment overflow happens could cause a major headache as that table can not be written to and will keep throwing exceptions. I do like the update cache id trick though; that should work instead of truncate. I think we should do both; run a check on cron in order to be proactive and if cron stops working then we will still catch the exception and deal with it.

Comment #13

danblack commented 5 April 2014 at 02:30

> I think your approach (delete where serialid) will work out well for D8.

Works for me.

> I think it's too risky to do in D7 though; people access the cache_* tables directly instead of using the API

crazy.

> , thus doing something more conservative would be ideal (my vote is for D7 is the rename trick).

works for me. Looked at your patch. Seems fine with one fairly significant exception - create table like isn't in sqlite.

> Catching the exception when auto increment overflow happens could cause a major headache as that table can not be written to and will keep throwing exceptions.

for at least the time of an select count ; alter table

fair call though.

> . I do like the update cache id trick though; that should work instead of truncate. I think we should do both; run a check on cron in order to be proactive and if cron stops working then we will still catch the exception and deal with it.

how about instead of cron we grab last_insert_id in a cache set call?

throw a warning after $early_limit

perform renumber on a some low probably request after $high_limit

provide api / ui to do process?

Comment #14

danblack commented 6 April 2014 at 22:51

Status	File	Size
new	cache-id-delete.patch	1.98 KB

I'd done part one (assuming my cache-hash patch gets accepted) that covers a basic delete.

I wanted to get the structure in place before we had to deal with database migration scripts (which would be horribly blocking for anyone doing running an early 8.x).

I'm still up for doing a renumber function. As a bonus, because cid/cidhash is no longer a unique key, the retry that is doSet probably can go away too.

Comment #15

6 April 2014 at 22:52

Status:

Needs review

» Needs work

The last submitted patch, 14: cache-id-delete.patch, failed testing.

Comment #16

danblack commented 7 April 2014 at 07:38

locking proof of concept.

By doing a delete from cache where id <= max_id we don't block new inserts.

session1

MariaDB [drupal_dev]> describe cache_data;
+------------------------+---------------------+------+-----+----------------------+----------------+
| Field                  | Type                | Null | Key | Default              | Extra          |
+------------------------+---------------------+------+-----+----------------------+----------------+
| id                     | bigint(20) unsigned | NO   | PRI | NULL                 | auto_increment |
| cidhash                | binary(20)          | NO   | MUL |                      |                |
| cid                    | varchar(1024)       | NO   |     |                      |                |
| data                   | longblob            | YES  |     | NULL                 |                |
| expire                 | int(11)             | NO   | MUL | 0                    |                |
| created                | int(11)             | NO   | MUL | 0                    |                |
| serialized             | smallint(6)         | NO   |     | 0                    |                |
| tags                   | longtext            | YES  |     | NULL                 |                |
| checksum_invalidations | int(11)             | NO   |     | 0                    |                |
| checksum_deletions     | int(11)             | NO   |     | 0                    |                |

MariaDB [drupal_dev]> select max(id) from cache_data;
+---------+
| max(id) |
+---------+
|     127 |
+---------+
1 row in set (0.00 sec)

MariaDB [drupal_dev]> set autocommit =0; set tx_isolation='READ-COMMITTED'; start transaction; delete from cache_data where id <=127;
Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.00 sec)

Query OK, 74 rows affected (0.00 sec)

As this isn't committed (with a commit) the locks are still active as we can see by the record locks in session 2 there lock the pages of the primary key. We can also see inserts in session 2 aren't locked by the delete.

session2

MariaDB [drupal_dev]> show engine innodb status;
...
---TRANSACTION 290E1E, ACTIVE 75 sec
10 lock struct(s), heap size 1248, 68 row lock(s), undo log entries 68
MySQL thread id 7435, OS thread handle 0x7f379610b700, query id 3298598 localhost debian-sys-maint sleeping
TABLE LOCK table `drupal_dev`.`cache_data` trx id 290E1E lock mode IX
RECORD LOCKS space id 82546 page no 7 n bits 88 index `PRIMARY` of table `drupal_dev`.`cache_data` trx id 290E1E lock_mode X locks rec but not gap
RECORD LOCKS space id 82546 page no 8 n bits 80 index `PRIMARY` of table `drupal_dev`.`cache_data` trx id 290E1E lock_mode X locks rec but not gap
RECORD LOCKS space id 82546 page no 9 n bits 72 index `PRIMARY` of table `drupal_dev`.`cache_data` trx id 290E1E lock_mode X locks rec but not gap
RECORD LOCKS space id 82546 page no 11 n bits 80 index `PRIMARY` of table `drupal_dev`.`cache_data` trx id 290E1E lock_mode X locks rec but not gap
RECORD LOCKS space id 82546 page no 12 n bits 88 index `PRIMARY` of table `drupal_dev`.`cache_data` trx id 290E1E lock_mode X locks rec but not gap
RECORD LOCKS space id 82546 page no 30 n bits 80 index `PRIMARY` of table `drupal_dev`.`cache_data` trx id 290E1E lock_mode X locks rec but not gap
RECORD LOCKS space id 82546 page no 33 n bits 72 index `PRIMARY` of table `drupal_dev`.`cache_data` trx id 290E1E lock_mode X locks rec but not gap
RECORD LOCKS space id 82546 page no 35 n bits 72 index `PRIMARY` of table `drupal_dev`.`cache_data` trx id 290E1E lock_mode X locks rec but not gap
RECORD LOCKS space id 82546 page no 36 n bits 80 index `PRIMARY` of table `drupal_dev`.`cache_data` trx id 290E1E lock_mode X locks rec but not gap

MariaDB [drupal_dev]> insert into cache_data(cidhash,cid,data,expire,created,serialized,tags,checksum_invalidations,checksum_deletions) VALUES ('sdd','dddd','data',NOW(),NOW(),4,'super tag',3,4);
Query OK, 1 row affected, 2 warnings (0.00 sec)

MariaDB [drupal_dev]> insert into cache_data(cidhash,cid,data,expire,created,serialized,tags,checksum_invalidations,checksum_deletions) VALUES ('sdd','dddd','data',NOW(),NOW(),4,'super tag',3,4);
Query OK, 1 row affected, 2 warnings (0.01 sec)

MariaDB [drupal_dev]> insert into cache_data(cidhash,cid,data,expire,created,serialized,tags,checksum_invalidations,checksum_deletions) VALUES ('sdd','dddd','data',NOW(),NOW(),4,'super tag',3,4);
Query OK, 1 row affected, 2 warnings (0.00 sec)

MariaDB [drupal_dev]> insert into cache_data(cidhash,cid,data,expire,created,serialized,tags,checksum_invalidations,checksum_deletions) VALUES ('sdd','dddd','data',NOW(),NOW(),4,'super tag',3,4);
Query OK, 1 row affected, 2 warnings (0.00 sec)

ps. warnings are the double use of NOW() to an int field.

(which means I need to get transaction isolation level support done otherwise it would only not block with < max).

Comment #17

mikeytown2 commented 7 April 2014 at 19:24

Version:	7.x-dev	» 8.x-dev
Priority:	Normal	» Major
Issue tags:		+Performance, +innoDB, +beta blocker, +database cache

Moving this issue to D8 and setting this as a beta blocker as the upgrade path would suck to write. Creating a new tag called database cache and going to try and tag all the DB cache issues.

Comment #18

xjm

she/her

English

commented 8 April 2014 at 10:36

Only core maintainers should set the beta blocker tag and only critical issues should be beta-blocking; did you speak to catch about this issue?

Comment #19

catch

he/him

English

commented 8 April 2014 at 16:57

Issue tags:

-beta blocker

If TRUNCATE isn't safe, we should change the implementation in the MySQL driver, not work around it in the cache backend.

Since this will need backport to 7.x. which will require an update, I don't see why it should block an 8.x beta. The update can drop and recreate all the cache tables if necessary rather than trying to actually update.

Comment #20

mikeytown2 commented 10 April 2014 at 22:59

More results from testing the rename table truncate trick.

TRUNCATE isn't quick, it takes around 150ms to run on an empty table due to the usage of DROP + CREATE internally. The rename trick does take care of the deadlock issue on cache tables which is nice; but deadlocks can still happen if the request takes longer than 30 seconds due to the semaphore expiring. After adding a new field to a node type, the request took over 30 seconds, and I got a deadlock & process list emailed to me (happens when the number of active DB connections hits a certain limit). The process list looked good; nothing was over 1 second, but the Deadlock was on 'field_info:field_map' even with this patch #2193149: Deadlocks occur in cache_field table.. This makes me think that the truncate on the cache__truncated_table is causing the slowdown.
Running the TRUNCATE via a register_shutdown_function would be better but this leads me to the conclusion that being able to TRUNCATE in the background would be ideal.

Not having to issue truncate would be nice as well but this syntax doesn't seem to work

SELECT CASE WHEN ( (SELECT COUNT(*) FROM `cache__truncated_table` WHERE 1) > 0 )
    THEN ( TRUNCATE TABLE `cache__truncated_table` )
END

Comment #21

damien tournoud commented 11 April 2014 at 08:25

I don't understand where the discussion is going here. The fact that the database cache is transactional is a feature, not a bug, we should not try to workaround it.

Why not simply doing a DELETE unconditionally, instead of trying to do a TRUNCATE if outside a transaction, a DELETE otherwise?

Comment #22

danblack commented 11 April 2014 at 09:03

> I don't understand where the discussion is going here. The fact that the database cache is transactional is a feature, not a bug, we should not try to workaround it.

Yep. We agree.

> Why not simply doing a DELETE unconditionally, instead of trying to do a TRUNCATE if outside a transaction, a DELETE otherwise?

An unconditional DELETE in repeatable-read will lock all inserts and take a while. A read-committed level will delete new cache entries. A range based delete will only lock the records that exist and still allow new cache to be inserted. #2238203: DatabaseCacheBackend/CacheBackendInterface get method should have timeout should stop existing records in the process of being deleted from undergoing a long lock timeout (which I'm working on test cases for at the moment).

Comment #23

damien tournoud commented 11 April 2014 at 14:06

> Why not simply doing a DELETE unconditionally, instead of trying to do a TRUNCATE if outside a transaction, a DELETE otherwise?

An unconditional DELETE in repeatable-read will lock all inserts and take a while.

We don't want repeatable-read, but that's another question. I guess what you are saying is only true for statement-based replication, in which InnoDB will add a gap-lock?

A read-committed level will delete new cache entries. A range based delete will only lock the records that exist and still allow new cache to be inserted. #2238203: DatabaseCacheBackend/CacheBackendInterface get method should have timeout should stop existing records in the process of being deleted from undergoing a long lock timeout (which I'm working on test cases for at the moment).

From the perspective of Drupal 8 (that only has tag-based mass deletion), what's actually desirable is to delete all the entries matching the tag *and* prevent new entries from being created.

From the perspective of Drupal 7, it seems desirable to delete any entries matching the wildcard *and* prevent new entries matching the wildcard from being created. Of course, that last one is not possible with standard SQL tools, but that what sounds desirable from a transactional perspective.

So, as a second best, locking the whole table doesn't sound too bad from this perspective.

Comment #24

mikeytown2 commented 12 April 2014 at 02:27

locking the whole table doesn't sound too bad

Locking the table is an awful idea when you have a lot of action on a site; that's the reason for using InnoDB, row based locks instead of table based.

I'll work on a patch monday that does the truncate of the spare cache table in a shutdown request and I'll make sure the spare table is empty before promoting it to be the cache bin. Putting the truncate of the spare table inside a non-blocking request would be idea; similar to this issue #1599622: Run poormanscron via a non-blocking HTTP request

Comment #25

damien tournoud commented 12 April 2014 at 12:31

@mikeytown2:

Locking the table is an awful idea when you have a lot of action on a site; that's the reason for using InnoDB, row based locks instead of table based.

The burden of proof bears on you. If you have a full wildcard flushing of a cache bin, it's desirables that it blocks further inserts in the table until the transaction is committed. See #23.

If it's a performance problem, the solution is to not do such full wildcard flushing...

I'll work on a patch monday that does the truncate of the spare cache table in a shutdown request and I'll make sure the spare table is empty before promoting it to be the cache bin.

That will make the cache non-transactional, which is an horrible idea.

Drupal barely supports concurrency properly right now. The only thing that saves us from horrible inconsistencies is that the default cache implementation is transactional. Start using memcache as the field cache, for example, and everything falls apart. See #1679344: Race condition in node_save() when not using DB for cache_field.

I think in many ways you are looking at the wrong end of the problem. Look at why/where those cache flushings are done and fix them.

Comment #26

mikeytown2 commented 14 April 2014 at 23:51

Version:

8.x-dev

» 7.x-dev

Status	File	Size
new	drupal-2222635-26-rename-truncate.patch	2.46 KB

Patch for D7 is attached.

The rename will lock the table preventing anything from happening; timing wise the lock will be a much smaller window than before.

If we want to get very tricky, this code can be used to rollback TRUNCATE due to the use of the rename; simply swap the tables again and presto, TRUNCATE has been undone & transactions when using TRUNCATE is possible in theory, but this would be a bad idea most likely.

what's actually desirable is to delete all the entries matching the tag *and* prevent new entries from being created

In order to do this, a master cache ID for that cache bin would be needed from my understanding. It would allow entries to be created but they won't be used in the future due to the master cache ID not matching (thus having a similar effect). In AdvAgg 7.x I do something similar; I call it a global counter. Once the counter gets incremented old processes still running with old data will write that info with the old counter. New data with the new counter value will then write new data. Next request comes in using the new counter. Doing anything less than some sort of global counter will not give the desired outcome if I'm understanding this correctly.

The advantage of using a global counter is clearing out old cache data can be done on cron; using old data if new data isn't there is possible as well; alt cache backends should be happy as well. The mail disadvantage is duplicate cache data.

Going down the global counter route would take a lot of R&D to do it correctly. If the cache tags had a version number in it, and a hash of the cache tags was used as some sort of global counter that might be ideal; I don't know if it is though, this would require a lot of testing.

Comment #27

mikeytown2 commented 14 April 2014 at 23:49

Status:

Needs work

» Needs review

Comment #28

mikeytown2 commented 15 April 2014 at 01:01

Version:	7.x-dev	» 8.x-dev
Status:	Needs review	» Needs work

Passes tests, moving back to D8.

Comment #29

fabianx commented 5 August 2014 at 23:46

Btw. TRUNCATE itself is implemented in that way (rename, create like, drop, rename) and had bugs before MySQL 5.5 (still buggy in 5.1):

http://bugs.mysql.com/bug.php?id=42643

Comment #30

mikeytown2 commented 6 August 2014 at 18:02

I'm Using MySQL 5.6 and patch in #26 has helped us with deadlocks.

From [8 Jul 2010 1:38] Paul Dubois

Noted in 5.5.5 changelog.

TRUNCATE TABLE did not take an exclusive lock on a table if
truncation was done by deleting all rows in the table. For InnoDB
tables, this could break proper isolation because InnoDB ended up
aborting some granted locks when truncating a table. Now an exclusive
metadata lock is taken before TRUNCATE TABLE can proceed. This
guarantees that no other transaction is using the table.

This makes the cache table a bottle neck when TRUNCATE TABLE is issues due to the exclusive metadata lock.

Comment #31

fabianx commented 11 August 2014 at 18:22

I thought about the whole patch again and it totally makes sense.

It is pretty simple to get into a performance situation with this during a revert of field_instances. In show processlist you see:

TRUNCATE cache_field
1
2
TRUNCATE cache_token
1
2

So it takes up to 2s to execute the truncate. The reason is the global metadata lock, because there is new SELECTS going on during the TRUNCATE.

This patch and the rename tricks solves it because:

The 2nd TRUNCATE on the table__truncated will be _fast_ as nothing is trying to select from it (besides other processes doing a count query in the same function), but because it is fast, this should still improve things very much.

And we know that consistency is not a problem with the cache here, so it is okay if INSERTS already scheduled fail.

The only thing I don't get is the !inTransaction() part.

Would that not mean that the rename is called but the table __truncated could not exist, yet?

If the first to hit this code path would be in a transcation? (unlikely, but possible)

Because I think this double-buffers the table, right?

EDIT:

I think it should then look like:

if (inTransaction()) {
  $sql = TRUNCATE TABLE
} else {
  // Do the rename
}

EDIT 2:

What I outlined above is basically single-threaded, that means I had no concurrency issues, which means that the TRUNCATE was running in the background anyway. Which might mean it has no effect on the run time performance. I am gonna check that.

Comment #32

mikeytown2 commented 11 August 2014 at 23:51

Feel free to create a patch with modified inTransaction() logic. I was trying to make sure the code path was the same. The toString method is where we have a delete or the rename query; thus the execute method was patched differently as well.

Comment #33

fabianx commented 12 August 2014 at 01:28

I see now, we also had query cache lock problems.

A co-worker pointed out also that SELECT * from TABLE LIMIT 1 should return 0 when empty and something else when not and that count can be a performance bottleneck by itself.

JFYI

Comment #34

anybody

German

Porta Westfalica

commented 17 September 2014 at 22:29

The patch from #26 helped us a lot. How can we get some more eyes on this? I'm looking forward to see this patch in a future D7/D8 release! Great work!

Comment #35

shardesty commented 3 November 2014 at 15:09

Not totally sure if this is the right place for this, but this seemed the most relevant of all the related threads:

About three weeks ago, began to experience seemingly random site crashes. Looking through the logs, we saw PDOException errors, and realized we had a database lockup. Kicking the database server restored the site, but did not, of course, eliminate the problem. Over the next couple weeks, we installed a number of patches to Drupal, and made some changes to our database set up. The number of PDOException errors dramatically decreased, but we still had intermittent site failures.

We did everything on this list: "Fixes for MySQL Deadlocks in D7" (https://groups.drupal.org/node/415883) and also applied the patch from this thread: https://www.drupal.org/node/930876.

* Use READ COMMITTED for MySQL transactions
* Do not use InnoDB for the semaphore table, use Memory
* Deadlocks occur in cache_field table (patch to modules/field/field.info.class.inc and modules/field/field.info.inc)
* Use asynchronous MySQLi connection to avoid RECORD LOCKS (Deadlocks) on cache tables (patch to includes/cache.inc and includes/database/mysql/database.inc)
* PDOException: SQLSTATE[08004] [1040] Too many connections in lock_may_be_available() (patch to lock.inc)
* Waiting for table metadata lock on cache_field table (patch to includes/database/query.inc)

Having eliminated most of the other PDOExceptions we were able to see that the exact sequence of events that led to the site going down was the generation of some PDOException and *then* clearing the cache.

That last patch to the query.inc file is intended to help fix the issue of the website locking up in the middle of a cache clear. But applying that patch did not solve our issue. Clearing cache alone does not drop the site. However, we have another (related or not, no idea) issue where you cannot unblock a user using the admin UI. Attempting to unblock a user throws a PDO Exception - and if you clear cache after that, the site goes down. (If you use drush first, it works. If you attempt in the UI first, it won't work in the UI or in Drush.)

Here are the two errors, drush and from the dlog:

***

root@CNS0904:/var/www# drush uublk dsherman
WD user: PDOException: SQLSTATE[HY000]: General error: 1205 Lock wait[error]
timeout exceeded; try restarting transaction: DELETE FROM
{cache_entity_user}
WHERE (cid IN (:db_condition_placeholder_0)) ; Array
(
[:db_condition_placeholder_0] => 1252
)
in cache_clear_all() (line 165 of /var/www/includes/cache.inc).
WD php: PDOException: SQLSTATE[HY000]: General error: 1205 Lock wait [error]
timeout exceeded; try restarting transaction: DELETE FROM
{cache_entity_user}
WHERE (cid IN (:db_condition_placeholder_0)) ; Array
(
[:db_condition_placeholder_0] => 1252
)
in cache_clear_all() (line 165 of /var/www/includes/cache.inc).
PDOException: SQLSTATE[HY000]: General error: 1205 Lock wait timeout exceeded; try restarting transaction: DELETE FROM {cache_entity_user}
WHERE (cid IN (:db_condition_placeholder_0)) ; Array
(
[:db_condition_placeholder_0] => 1252
)
in cache_clear_all() (line 165 of /var/www/includes/cache.inc).
Drush command terminated abnormally due to an unrecoverable error. [error]

***

Type php
Date Monday, November 3, 2014 - 6:45am
User Anonymous (not verified)
Location http://default/index.php
Referrer
Message PDOException: SQLSTATE[HY000]: General error: 1205 Lock wait timeout exceeded; try restarting transaction: DELETE FROM {cache_entity_user}

WHERE (cid IN (:db_condition_placeholder_0)) ; Array ( [:db_condition_placeholder_0] => 1252 ) in cache_clear_all() (line 165 of /var/www/includes/cache.inc).
Severity error
Hostname 127.0.0.1

***

Once the site went down after those errors and a clear cache, I restarted the database server and the site came back up. Went to drush and was able to unblock the user successfully.

So:

* Need to know why using the UI to unblock a user throws that error. Searching on the error leads to all the fixes listed above.
* I think I have an idea of why the database locks up when we clear cache, but why have all the previous fixes and patches not prevented it from going down in this case?
* And what changed three weeks ago such that PDOExceptions suddenly started locking up the database and killing the site? Our logs show random PDOExceptions as far back as we've looked, but we weren't dropping the site.

Other relevant information:

* We updated to 7.31 on the 10/16, and saw the first site crash on the 21st. (That could be because no one clicked the clear cache button, possibly.)
* We installed the SQL Injection vulnerability patch (not the whole 7.32 update, just the patch), also on the 16th.
* The production site is load-balanced, two servers synced with unison. PDO errors were mostly on the primary but all the cache errors were on the secondary.
* We have been completely unable to replicate the error on any of our development servers. (Not load balanced, but otherwise pretty close to production.)
* Site sits on Ubuntu 14.04 lts, and the mysql server is version: 5.5.40-0ubuntu0.14.04.1 (Ubuntu)

Any guidance would be greatly appreciated.

Comment #36

mikeytown2 commented 4 November 2014 at 03:28

@shardesty
The patch in this issue currently works for when a cache clear is used to wipe the table. In your case this query
DELETE FROM cache_entity_user WHERE cid IN (1252) is only targeting a single row. The error that the PDO driver throws is "Lock wait timeout exceeded; try restarting transaction". This lock wait timeout referred to here is usually the innodb_lock_wait_timeout variable; default value is 50 seconds. If you have 2 different threads trying to operate on the same row, this innodb timeout can get triggered. In my async patch (#2336521-35: Use asynchronous MySQLi connection to avoid RECORD LOCKS (Deadlocks) on cache tables) I change this timeout to be 2 seconds instead of the default of 50 seconds via SET SESSION innodb_lock_wait_timeout = 2. You can try to adjust this value from its default, that should help with the database locking up; be careful though, the 2 seconds I use is there just for the cache, data that if it doesn't get written isn't the end of the world. Adjusting the innodb_lock_wait_timeout so the timeouts happen sooner has the potential to negatively affect normal queries (non cache bin).

If you can trigger this database stall fairly easily can you trigger it and then run SHOW ENGINE INNODB STATUS via phpmyadmin or some other mysql tool and put the output of it here. Be sure to capture it while the database is locked up otherwise it won't have the info that is needed to look deeper into this issue. If you could also run SHOW PROCESSLIST that would be helpful as well; but the INNODB STATUS is the most important one to get.

In order to improve calls to cache_clear_all() I would want to create an alt cache backend #2361183: Database cache master plan so async deletes can happen. I've taken patching core to it's limits and an alt cache database backend seems like the best way forward.

Comment #37

shardesty commented 4 November 2014 at 15:04

@mikeytown2

Thanks for replying so quickly. We're talking to our client to arrange bringing the site down for a few minutes to gather the information you requested. That will happen sometime later today, and we'll post our findings as soon as we have them.

Comment #38

shardesty commented 4 November 2014 at 18:53

Status	File	Size
new	innodb_status_during_lock.txt	43.89 KB
new	show_processlist_during_lock.txt	28.33 KB

@mikeytown2

We captured the information you asked about, attached.

Comment #39

mikeytown2 commented 4 November 2014 at 20:25

Thanks for the info :)

This is the thread that is trying to do something (clear the cache_entity_user table)

---TRANSACTION 21225768, not started
MySQL thread id 597, OS thread handle 0x7fd92c786700, query id 90806
<primary server ip address> <db user> Waiting for table metadata lock
RENAME TABLE cache_entity_user TO cache_entity_user__temp_table,
cache_entity_user__truncated_table TO cache_entity_user,
cache_entity_user__temp_table TO cache_entity_user__truncated_table

This is the thread that has locked up the table and appears to be just sitting there doing nothing as no query is listed. Means that this thread is inside a transaction or it took an exclusive lock on a table. If you want to free up your database, kill this db connection.

---TRANSACTION 212230B5, ACTIVE 533 sec
9 lock struct(s), heap size 1248, 3 row lock(s), undo log entries 5
MySQL thread id 188, OS thread handle 0x7fd92c149700, query id 20577
<primary server ip address> <db user>

The thread ID is 188 so KILL 188 should bring your database back from the dead; yes this is a "sleeping" thread. Noted that in this stackoverflow issue they have the timout being exceeded; in your case you actually want the timeout to happen so the zombie db connection gets killed. You can try to fix your case by adjusting the wait_timeout variable. SET SESSION wait_timeout = 90 is what I use in the async patch; means that if a mysql thread has been sitting for over 90 seconds it will auto close the connection and a "Warning: MySQL server has gone away" error will appear. With async connections I have it setup so it will auto reconnect if this happens and no fatal error is displayed (think about a long running cron job where 90 seconds might not be enough time).

All of this usually points towards an issue in the application (drupal in our case), where some rows/tables get a lock and then that lock doesn't get released and the db connection never closes.

Do you have a query log of all queries made by thread 188? That would be very helpful.

Comment #40

shardesty commented 4 November 2014 at 20:33

@mikeytown2, thanks

Just to clarify in my own head - the first transaction is coming from your truncate patch, correct? Also, are you saying that this thread 188 is a sort of zombie thread that has been causing our deadlocks all along, or just in this particular case? (And by extension, is killing this thread the solution, or just treating a symptom?)

We've been rebooting the mysql server whenever it deadlocks - this brings our site back, and I would have thought that it would have killed any threads, because reboot. Of course, I know next to nothing about mysql.

Comment #41

mikeytown2 commented 4 November 2014 at 22:06

The locking transaction is coming from thread 188 (thread IDs get reset on mysql server restarts). The query that ran on that connection somehow got a lock on the cache_entity_user table and never released it. Then the truncate query comes along (thread 597) and wants to clear the cache but it can't do so because of the lock on the table from thread 188. This results in a metadata lock and a bad one because thread 188 isn't really doing anything yet it has a database lock; thus you have to wait for that connection to timeout.

In order to manually clear this issue it without too much hassle (deciphering INNODB STATUS output is hard) SHOW PROCESSLIST is helpful. Find a thread that has been sleeping for longer or about as long as the query that's been running the longest with a metadata lock and kill it. The thread will look like it's doing nothing (sleeping). Most of the time you'll kill the correct thread and things start to work again. Every now and then the kill won't do it, so find another thread in sleep that is about as old as the thread with the metadata lock and kill that one as well. This should release the lock and then things work as normal.

Comment #42

fabianx commented 5 November 2014 at 09:44

Wow, thanks mikeytown2 for a _lot_ of background information here.

Comment #43

shardesty commented 5 November 2014 at 20:20

mikeytown2, thanks for all the information so far. So, we now have a manual option for restoring database function with the kill thread, so we don't have to reboot the database server every time we get a deadlock. However, manually performing that is not a long term plan, especially as we've asked our clients to refrain from clicking on the clear cache button or modifying user accounts until this is resolved.

We're planning on triggering a deadlock intentionally again tomorrow morning, with the query logs turned on. Is there any other information that we should gather that would be helpful in diagnosing the actual cause of the deadlocks?

You've mentioned adjusting the innodb_lock_wait_timeout from the 2 seconds, and the wait_timeout variable from 90 - which of these do you recommend, and why? If the problem is, as you suggest, in drupal, what can we do to stop the the user modify / cache clear / deadlock situation?

Comment #44

mikeytown2 commented 5 November 2014 at 20:47

Adjusting the wait_timeout in theory would help to make this issue auto-resolve. Biggest issue I see with setting a low wait_timeout value is cron. Looking at drupal_cron_run() I think the best option is to use hook_cron_queue_info() to re-adjust the wait_timeout value to be something like 300 seconds and then in your settings.php file set the wait_timeout value to 60 seconds.

// Add to an existing module or create a new one.
// Module name is shardesty in this case.
function shardesty_cron_queue_info() {
  // Give cron a longer database timeout.
  db_query('SET SESSION wait_timeout = 300');
}

// In your settings.php file.
$databases['default']['default']['init_commands'] = array(
  'isolation' => "SET SESSION tx_isolation='READ-COMMITTED'",
  'wait_timeout' => 'SET SESSION wait_timeout = 60',
  'innodb_lock' => 'SET SESSION innodb_lock_wait_timeout = 10',
);

In terms of the query log, getting the thread id for every query in the log as well as the output from SHOW ENGINE INNODB STATUS while the site is locked.

Comment #45

mikeytown2 commented 6 November 2014 at 04:39

I have an experimental query that might shed some light on the issue while it's happening.

SELECT 
  trx.trx_id AS trx_id,
  trx.trx_mysql_thread_id AS thread_id,
  trx.trx_query AS query,
  trx.trx_tables_locked AS tables_locked,
  trx.trx_rows_locked AS rows_locked,
  trx.trx_state AS trx_state,
  p.STATE AS process_state,
  trx.trx_operation_state AS trx_op_state,
  p.COMMAND AS command,
  DATEDIFF(trx.trx_started, NOW()) AS trx_age,
  DATEDIFF(trx.trx_wait_started, NOW()) AS wait_age,
  wr.requested_lock_id AS requested_lock_id,
  wb.requested_lock_id AS blocking_lock_id,
  trx.trx_isolation_level AS iso_level
FROM information_schema.innodb_trx AS trx
LEFT JOIN information_schema.processlist p 
  ON trx.trx_mysql_thread_id = p.ID
LEFT JOIN information_schema.innodb_lock_waits w 
  ON trx.trx_mysql_thread_id = p.ID
LEFT JOIN information_schema.innodb_lock_waits wb 
  ON trx.trx_id = wb.blocking_trx_id
LEFT JOIN information_schema.innodb_lock_waits wr 
  ON trx.trx_id = wr.requesting_trx_id
GROUP BY trx.trx_mysql_thread_id

Don't run this in phpmyadmin if you have it with frames (this is the default) as that will then query all of the tables in the information_schema database and your mysql server will be slow/crash. If phpmyadmin shows you a list of tables on the left side then you shouldn't run the above query in it. If you do want to run it with phpmyadmin be sure to do it outside of a frame. /tbl_sql.php?db=information_schema&table=innodb_trx would be a good url for a starting point.

Here is another query; same rules about phpmyadmin apply to this as well.

SELECT * 
FROM information_schema.processlist AS plist
INNER JOIN (
  SELECT TIME AS time
  FROM information_schema.processlist 
  WHERE STATE LIKE 'Waiting for table metadata lock' 
  ORDER BY TIME DESC
  LIMIT 0,1
) as t 
  ON t.time-10 <= plist.TIME 
  AND t.time+10 >= plist.TIME

This query should show all processes that have been waiting the same time (+-10 seconds) as your longest running 'Waiting for table metadata lock' query.

Inspiration for the queries comes from http://illumineconsulting.blogspot.com/2013/10/digging-into-mysql-innodb...

Comment #46

shardesty commented 6 November 2014 at 19:02

Status	File	Size
new	query_1.txt	1.19 KB
new	query_2.txt	1.44 KB
new	show_process_list_1.txt	19.44 KB
new	show_process_list_2.txt	9.38 KB
new	Query-61.txt	17.52 KB

@mikeytown2:

This morning we took down the site again, and gathered yet more info, attached. (The two query files are the results of your queries in the previous post. The two process list files are from during the deadlock, before and after killing thread 61. Qyery 61 is an excerpt from the query log for thread 61. If you need more of the query log, just say so, we have over 500MB for the five minutes we had the query logging turned on.) Also, we watched closely what was happening in phpmyadmin as the site went down. So, we knew before we saw the results of your custom queries that thread 61 was the initial problem. We can take the site down - cause a deadlock in the database by clicking "save" on a user edit page, and then clearing the cache. This morning, as soon as I clicked save, thread 61 appeared and immediately went to sleep / null:

61 dbuser xxx.xx.xxx.xxx:38286 database Sleep 448 NULL

We waited about 30 seconds, then clicked the clear cache button. Within a couple seconds, thread 112 collided with thread 61.

112 dbuser xxx.xx.xxx.xxx:38292 database Query 393 Waiting for table metadata lock RENAME TABLE cache_entity_user TO cache_entity_use...

And then we had the big pile up of threads "Waiting for table metadata lock " right after it:

185	dbuser	xxx.xx.xxx.xxx:49720	database	Query	377	Waiting for table metadata lock	SELECT cid, data, created, expire, serialized FROM...
186	dbuser	xxx.xx.xxx.xxx:49721	database	Query	377	Waiting for table metadata lock	SELECT cid, data, created, expire, serialized FROM...
187	dbuser	xxx.xx.xxx.xxx:49722	database	Query	378	Waiting for table metadata lock	SELECT cid, data, created, expire, serialized FROM...
188	dbuser	xxx.xx.xxx.xxx:49723	database	Query	377	Waiting for table metadata lock	SELECT cid, data, created, expire, serialized FROM...

etc.

We waited until everything was fully locked up, and then killed thread 61. We saw the results in "show_process_list_2" for a few seconds, and then it all cleared up and the site was back to normal.

Next, we put in place the changes to settings.php and created a new module, and installed/enabled that. Ran the test again with results that were functionally identical to the above, just with different thread ID numbers.

Looking back over the last couple weeks, the only things of note that happened before the site started going down were:

* Updating core from 7.28 to 7.31 (five days previous)
* Ubuntu updates to libmysqlclient18 mysql-client-5.5 mysql-client-core-5.5 mysql-common mysql-server mysql-server-5.5 mysql-server-core-5.5 (five days previous)
* Ubuntu updates to libssl1.0.0 openssl (one day previous)

I haven't seen any issues related to being unable to save changes to users. So maybe I need to post that as a new bug report - but the problem certainly feels related to all these other PDOException type issues.

Comment #47

mikeytown2 commented 6 November 2014 at 19:21

...
61 Query     START TRANSACTION
61 Query     DELETE FROM cache_entity_user WHERE  (cid IN  ('2584'))
...
61 Query     INSERT INTO cache_entity_user (cid, serialized, created, expire, data) VALUES ('2584', '1', '1415270812', '0',
...

Looks like a TRANSACTION was started but it was never finished. Thus when the cache was cleared in another thread the cache_entity_user table is locked and for some reason thread_id 61 just sat and did nothing for the rest of the time.

Sounds like the wait_timeout idea didn't help... You waited at least 60 seconds for the sleeping thread before killing the test right? Just trying to rule out anything.

Comment #48

mikeytown2 commented 6 November 2014 at 20:02

Looks like the first query from #45 was able to pinpoint the zombie id. The second one had too small of a range to get the zombie id. Good to know in the future. Thanks for your work with this.

The heart of the issue is database queries to the cache tables being executed inside of a transaction and then that thread stalling for an unknown reason. If you're already using this patch in prod #2336521-35: Use asynchronous MySQLi connection to avoid RECORD LOCKS (Deadlocks) on cache tables then it looks like the only solution at this point is to find out why the transaction is not getting committed in user_save(). I will be working on a alt database backend in order to avoid the transaction issue, but that's not going to help you today.

Comment #49

mikeytown2 commented 6 November 2014 at 20:36

Looking into this more I think there is an issue inside of field_attach_update('user', $account); or one of the hooks that it uses. Looking at where the queries died points to it happening inside of here.

Comment #50

shardesty commented 6 November 2014 at 22:02

We waited almost 400 seconds in the second test, after we installed the patched settings.php and module, we wanted to be sure we gave it a chance to work.

Glad we could help testing your queries, just wish we didn't have the opportunity. We'll look more into field_attach_update, and thanks once more for your help with this.

Comment #51

mikeytown2 commented 6 November 2014 at 22:43

Looks like timeouts by default do not roll back the transaction http://dev.mysql.com/doc/refman/5.6/en/innodb-parameters.html#sysvar_inn...

The advice here says that the wait_timeout should roll it back http://stackoverflow.com/questions/9936699/mysql-rollback-on-transaction...

Eh... transactions in mysql are a pain.

Comment #52

mikeytown2 commented 6 November 2014 at 23:09

Adjusted the query so it only shows queries that have a row or table lock. Also added a couple more collumns from the processlist table

SELECT
  trx.trx_id AS trx_id,
  trx.trx_mysql_thread_id AS thread_id,
  trx.trx_query AS query,
  trx.trx_tables_locked AS tables_locked,
  trx.trx_rows_locked AS rows_locked,
  trx.trx_state AS trx_state,
  p.DB AS db,
  p.STATE AS process_state,
  trx.trx_operation_state AS trx_op_state,
  p.COMMAND AS command,
  p.TIME AS query_age,
  DATEDIFF(trx.trx_started, NOW()) AS trx_age,
  DATEDIFF(trx.trx_wait_started, NOW()) AS wait_age,
  wr.requested_lock_id AS requested_lock_id,
  wb.requested_lock_id AS blocking_lock_id,
  trx.trx_isolation_level AS iso_level
FROM information_schema.innodb_trx AS trx
LEFT JOIN information_schema.processlist p
  ON trx.trx_mysql_thread_id = p.ID
LEFT JOIN information_schema.innodb_lock_waits w
  ON trx.trx_mysql_thread_id = p.ID
LEFT JOIN information_schema.innodb_lock_waits wb
  ON trx.trx_id = wb.blocking_trx_id
LEFT JOIN information_schema.innodb_lock_waits wr
  ON trx.trx_id = wr.requesting_trx_id
WHERE trx_tables_locked > 0
OR trx_rows_locked > 0
GROUP BY trx.trx_mysql_thread_id

Comment #53

fabianx commented 7 November 2014 at 20:02

I would strongly suggest to not use entitycache module with DB cache backend.

Do you have the option to have at least that bin in memcache or redis?

Comment #54

mikeytown2 commented 12 November 2014 at 00:12

@shardesty
One more question, what version of PHP are you using?

Comment #55

mikeytown2 commented 20 November 2014 at 02:40

Found a post that talks about auto killing hanging transactions. Uses a stored procedure that gets called every X seconds. Looks like an interesting idea.
http://mysqlblog.fivefarmers.com/2012/08/28/identifying-and-killing-bloc...

Comment #56

Marko B commented 25 November 2014 at 20:13

I see in #48 something very similar to what I am having now in D7, described here more
http://drupal.stackexchange.com/questions/137979/user-import-from-other-...
maybe it is somehow related so my problem could bring some light to it.

Comment #57

mikeytown2 commented 25 November 2014 at 21:58

@Marko B
Checkout slides 74-106 from my DIY Drupal 7 Performance Presentation; you should be able to get it working correctly.

Comment #58

Marko B commented 1 December 2014 at 21:34

Thanx Mikey. I'll just add that my issue was with something else. I hade a cURL connection that was slow and it made site timeout. Sorry for this.

Comment #59

mikeytown2 commented 11 December 2014 at 23:58

I just encountered this while editing field info. The detection query is works very well; it ID-ed the stuck query and I was able to kill it without issue.

I've worked on reducing the query down to the parts I care about. More refinements might come later.

SELECT
  trx.trx_mysql_thread_id AS thread_id,
  trx.trx_query AS query,
  trx.trx_tables_locked AS tables_locked,
  trx.trx_rows_locked AS rows_locked,
  trx.trx_state AS trx_state,
  p.DB AS db,
  p.STATE AS process_state,
  trx.trx_operation_state AS trx_op_state,
  p.COMMAND AS command,
  p.TIME AS query_age,
  trx.trx_isolation_level AS iso_level,
  trx.trx_id AS trx_id
FROM information_schema.innodb_trx AS trx
LEFT JOIN information_schema.processlist p
  ON trx.trx_mysql_thread_id = p.ID
WHERE trx.trx_tables_locked > 0
OR trx.trx_rows_locked > 0
GROUP BY trx.trx_mysql_thread_id

I've been working on a new module Asynchronous Prefetch Database Query Cache that is being built in order to avoid issues like this one from happening. It does incorporate this patch; but the module is not production ready.

Comment #60

berdir

German

Switzerland

commented 12 December 2014 at 23:35

Comment #61

shardesty commented 19 December 2014 at 20:51

Belated update:

@fabienx: We uninstalled the entitycache module, and attempted to recreate the error. The error occurred anyway. We inherited this site from another company, and the cache modules like many other aspects of the site were apparently poorly thought out. The site is now consolidated on default drupal cache, and all the various contrib cache modules are disabled.

@mikeytown2: PHP version is:

PHP 5.5.9-lubuntu4.5 (cli) (built: Oct 29 2014 11:59:10)
Zend Engine v2.5.0 with Zend OPcache v7.0.3

We waited until our weekly maintenance window to do some more testing, seeing as the site was up so long as no one clicked on the save user button. What we found was strange - previously, we had been completely unable to replicate the error in any of our other environments, despite our best efforts to make those other environments match production as closely as possible.

The one difference that was most difficult to replicate was the fact that our production environment is load-balanced, with two servers hitting the same database, and files replicated between the two servers with unison. Outside users would hit one of the two servers randomly, but all administrative functions went through just one, the primary (server 9). We'd checked, and the pdo errors all occurred on server 9, which you'd expect since that's where the user save and cache clear would originate.

So, we rewired things a bit so that the secondary server (server 10) could be used for drupal admin as well. Since all the files and settings are constantly synced from 9 to 10, we expected that when we hit the save user then cleared cache, we'd get the database lockup. But we didn't. Hitting the same database as server 10, we could not recreate the database lockup at all. So far as we can determine, all settings - php, mysql, apache, ubuntu, drupal, whatever - were identical between the two servers.

As best we can now figure, there is some obscure difference, some setting, on server 9 that uniquely causes the error condition. Some research showed that server 9 was built by a different individual than server 10, but aside from that, we have no clue as to what the difference might be.

Having isolated the issue to one server - we made the decision to just blow the server away, and rebuild using the same procedure as we used for server 10, then restore the site. Since that time, we have not experienced any issues with PDO errors, or anything else. Our db performance is significantly better, though, thanks to all the tuning patches from mikeytown2.

Thanks to everyone for all their help with this issue - though it seems at this point that we'll not have any answer for why, exactly, the error was happening. The various patches that we applied were essential, though, in narrowing down the issue - without clearing away many of the errors we were seeing through the successive patches and without the sql scripts that allowed us to watch what was happening on the sql server in real time, it would have been much harder if not impossible to see that the issue was coming from that one server.

Comment #62

socialnicheguru commented 12 January 2015 at 18:45

From patch in 26 I am experiencing a number of these __truncate tables not being defined in the database for Drupal 7:
cache__truncate
cache_views__truncate
cache_variable__truncate
cache_rules__truncate
cache_libraries__truncate
cache_field__truncate

If you uninstall a module that has a truncated table, the truncated table does not get uninstalled. Shouldn't it?

Comment #63

morgantocker commented 4 August 2015 at 00:24

So I'm a little late to this issue. There are some known issues w.r.t. MySQL:
- 5.6 and above has a 'fast' DROP TABLE (but not truncate)
- 5.7 and above has a 'fast' TRUNCATE

See this MySQL bug for more details. I am simplifying 'fast' a little, but essentially with a large buffer pool MySQL can take some time to remove the pages from the deleted table. The new approach is to do this in a lazy way.

Drupal supports MySQL 5.5+ so that technically means neither statement is safe for large buffer pools.

Comment #64

4 August 2015 at 00:24

Version:

8.0.x-dev

» 8.1.x-dev

Drupal 8.0.6 was released on April 6 and is the final bugfix release for the Drupal 8.0.x series. Drupal 8.0.x will not receive any further development aside from security fixes. Drupal 8.1.0-rc1 is now available and sites should prepare to update to 8.1.0.

Bug reports should be targeted against the 8.1.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.2.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

» 8.8.x-dev

Drupal 8.6.x will not receive any further development aside from security fixes. Bug reports should be targeted against the 8.8.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.9.x-dev branch. For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

Comment #72

24 July 2016 at 10:35

Version:

8.8.x-dev

» 8.9.x-dev

Drupal 8.8.7 was released on June 3, 2020 and is the final full bugfix release for the Drupal 8.8.x series. Drupal 8.8.x will not receive any further development aside from security fixes. Sites should prepare to update to Drupal 8.9.0 or Drupal 9.0.0 for ongoing support.

Bug reports should be targeted against the 8.9.x-dev branch from now on, and new development or disruptive changes should be targeted against the 9.1.x-dev branch. For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

Comment #73

24 July 2016 at 10:35

Version:

8.9.x-dev

» 9.2.x-dev

Drupal 8 is end-of-life as of November 17, 2021. There will not be further changes made to Drupal 8. Bugfixes are now made to the 9.3.x and higher branches only. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

catch

he/him

English

commented 30 August 2023 at 08:08

Status:

Needs work

» Closed (outdated)

Drupal requires MySQL 5.7 now, so I believe this is resolved by https://bugs.mysql.com/bug.php?id=68184

Closing as outdated.

#2684359: The truncate method is does different things on different databases and depending if in a transaction is still open for some of the other issues.