We've been seeing an issue on our production environment where cache tags are not getting cleared from the Varnish server using the Late Runtime Processor.
For example, when the tags to be cleared are as follows:
entity_types node:21604 node_list content_moderation_state:2339 content_moderation_state_list
on our production environment, it would reliably clear these tags instead:
entity_types content_moderation_state:2339 node_list content_moderation_state:2339 content_moderation_state_list
So the node:21604 tag is missing, and replaced by a duplicated instance of the content_moderation_state tag.
This environment is running on a Galera database cluster. One of the tricks necessary for multi-master MySQL replication is to change the behaviour of AUTO_INCREMENT:
From https://mariadb.com/kb/en/library/mariadb-galera-cluster-known-limitations/
Do not rely on auto-increment values to be sequential. Galera uses a mechanism based on autoincrement increment to produce unique non-conflicting sequences, so on every single node the sequence will have gaps.
In the method DatabaseQueue::createItemMultiple, there is an assumption that the IDs generated by the database will be sequential:
// A multiple row-insert doesn't give back all the individual IDs, so
// calculate them back by applying subtraction.
for ($i = 1; $i <= count($records); $i++) {
$item_ids[] = $id;
$id++;
}
Which causes the databases IDs recorded in the TxBuffer to be incorrect, so the wrong object gets loaded from the buffer in QueueService::claim, leading to the tag confusion issue.
| Comment | File | Size | Author |
|---|---|---|---|
| #15 | interdiff_14-15.txt | 448 bytes | bpizzillo |
| #15 | purge-3094343-15.patch | 2.03 KB | bpizzillo |
| #14 | interdiff_11-14.txt | 1.26 KB | bpizzillo |
| #14 | purge-3094343-14.patch | 2.28 KB | bpizzillo |
| #11 | purge-3094343-11.patch | 2.11 KB | bpizzillo |
Issue fork purge-3094343
Show commands
Start within a Git clone of the project using the version control instructions.
Or, if you do not have SSH keys set up on git.drupalcode.org:
Comments
Comment #2
bklineThis bit us as well, with no database replication involved. Took quite a long time to track down the reason stale content kept getting stuck in the Akamai cache. Patch attached.
Comment #3
bklineComment #4
bklineComment #5
bklineFixed typo in the patch (mismatched variable name).
Comment #6
bklineComment #7
weynhamzGot several complaints from a client about the home page cache did not get updated after publishing some new nodes, the odd thing is that this seems only happens in Acquia Cloud environment, it behaves perfectly fine for local debugging and our ci environment. Spent all most two days debugging then landed to the same location as pointed out in this issue summary.
In my case, it was basically the 'node_list' cache tag that is missing in the invalidation request.
Below is some debugging detail.
Firstly, I suspected that the `node_list` cache tag was not inserted into the queue for some reason, be below confirms that the `node_list` is indeed inserted.
I did notice the unusual auto-increment id calculation when I read the code, but I didn't think it could be the cause of the cache invalidation problem.
https://git.drupalcode.org/project/purge/-/blob/8.x-3.x/src/Plugin/Purge...
Until, later on, I was able to narrow down to the following block of code that lose the `node_list` tag.
https://git.drupalcode.org/project/purge/-/blob/8.x-3.x/src/Plugin/Purge...
See debugging output below
Basically, https://git.drupalcode.org/project/purge/-/blob/8.x-3.x/src/Plugin/Purge... tries leveraging the cached invalidation instance in the buffer based on item_id, but with Aquia Cloud, the database auto_increment_id is not sequential, then this leads to getting the wrong invalidation instance always for the second element of the array, which is `node_list`.
Comment #8
randy tang commentedComment #9
randy tang commentedComment #10
avpadernoComment #11
bpizzillo commentedRe-rolling after PHPCS changes.
Comment #12
bpizzillo commentedComment #13
bpizzillo commentedComment #14
bpizzillo commentedPrevious patch hardcoded the table name and used
$this->connection->query. For some reason this broke a large number of tests. Usingstatic::TABLE_NAMEand$this->connection->insertdoes not seem to throw errors. Let's see if we can get a good build.Comment #15
bpizzillo commentedAccidentally left in old use statement from previous patch.
Comment #16
bpizzillo commentedComment #17
marvin_b8 commentedLooks pretty good so far, I just noticed the following things:
- unset the transaction should not be necessary (https://api.drupal.org/api/drupal/core%21lib%21Drupal%21Core%21Entity%21...)
- The logic for the id could be reversed.
So that it looks something like this:
and If it's a performance problem, we can also think about putting the whole thing in a different queue.
Comment #18
davidwbarratt commentedI accidentally created a duplicate of this issue at #3469458: Late Runtime Processor purges incorrect tags. This problem seems exceptionally bad with SQLite.
Comment #19
davidwbarratt commentedI tested the patch and it does not fix the issue I had in #3469458: Late Runtime Processor purges incorrect tags so I'm re-opening it.
Comment #20
avpaderno@davidwbarratt If the patch does not fix the bug described by this issue, the status should be changed.
Comment #21
avpadernoLet's create a merge request, now that patches are no longer tested.
Comment #22
davidwbarratt commentedTo clarify, the patch clearly fixes a problem, but I'm not sure if that's the problem I'm having in #3469458: Late Runtime Processor purges incorrect tags or if that's the same problem described in this issue. From the comment thread, it looks like the patch does fix the issue for some folks so it's possible there is more than one problem and the patch isn't a comprehensive solution.
Comment #23
bkline@davidwbarratt
That would seem to imply that you are able to reproduce the unwanted behavior using this patch with other databases than SQLite (just not as frequently). Can you confirm that this is true?
Comment #24
davidwbarratt commentedI added repo steps to #3469458: Late Runtime Processor purges incorrect tags it happens every time I use Late Runtime Processor, if I use the drush or cron processor the issue goes away.
I didn't test it on any other database, but I am using SQLite, I assumed since the problem happens consistantly that that could be a clue, but it might not be.
I figured this out because I was writting a Cloudflare Worker that receives the purge requests and I noticed it was missing a tag. Upon further investigation, the tag isn't being sent.
Comment #25
davidwbarratt commentedI did try switching from mod_php to php-fpm. I thought it might since the former holds on to the response and perhaps I was hitting a timeout or something, but it didn't make a difference.
Comment #27
davidwbarratt commentedI have fixed it!
The code basically relies on the indexing of the array to always match. Instead of making that assumption, we'll just use whatever is passed in and ensure the indexing matches up.
Comment #28
davidwbarratt commentedComment #32
godotislateWhile it's certainly safer to ensure the keys align, looking at the code in
QueueService::commitAdding(), I can't see how the keys would be anything other than sequential 0-indexed, since thearray_chunk()call does not preserve keys. Nevertheless, if it indeed addresses an issue, then the keys need to be preserved in all implementations ofQueueInterface::createItemMultiple(), so I've made the respective changes to MemoryQueue and QueueBase.I looked at how to write some tests for this, but I'm not sure it's possible to change the autoincrement setting of the database in tests, and otherwise mocking the database service is a huge lift. Leaving at Needs tests for now.
Comment #33
godotislateOne note about this solution is that since one multi-value insert query is replaced by multiple insert queries, the performance gain from creating multiple at once is lost. I don't have an alternate solution of how to address this. I thought about doing a select query to get the last
count(items)right after the insert query, but to do that, need to be sure that there were no other insertions. Which would mean using a lock and makes things much more complicated.