We've been seeing an issue on our production environment where cache tags are not getting cleared from the Varnish server using the Late Runtime Processor.

For example, when the tags to be cleared are as follows:
entity_types node:21604 node_list content_moderation_state:2339 content_moderation_state_list

on our production environment, it would reliably clear these tags instead:
entity_types content_moderation_state:2339 node_list content_moderation_state:2339 content_moderation_state_list

So the node:21604 tag is missing, and replaced by a duplicated instance of the content_moderation_state tag.

This environment is running on a Galera database cluster. One of the tricks necessary for multi-master MySQL replication is to change the behaviour of AUTO_INCREMENT:

From https://mariadb.com/kb/en/library/mariadb-galera-cluster-known-limitations/

Do not rely on auto-increment values to be sequential. Galera uses a mechanism based on autoincrement increment to produce unique non-conflicting sequences, so on every single node the sequence will have gaps.

In the method DatabaseQueue::createItemMultiple, there is an assumption that the IDs generated by the database will be sequential:

      // A multiple row-insert doesn't give back all the individual IDs, so
      // calculate them back by applying subtraction.
      for ($i = 1; $i <= count($records); $i++) {
        $item_ids[] = $id;
        $id++;
      }

Which causes the databases IDs recorded in the TxBuffer to be incorrect, so the wrong object gets loaded from the buffer in QueueService::claim, leading to the tag confusion issue.

Issue fork purge-3094343

Command icon Show commands

Start within a Git clone of the project using the version control instructions.

Or, if you do not have SSH keys set up on git.drupalcode.org:

Comments

grahamC created an issue. See original summary.

bkline’s picture

This bit us as well, with no database replication involved. Took quite a long time to track down the reason stale content kept getting stuck in the Akamai cache. Patch attached.

bkline’s picture

StatusFileSize
new2.09 KB
bkline’s picture

Status: Active » Needs review
bkline’s picture

StatusFileSize
new2.09 KB

Fixed typo in the patch (mismatched variable name).

bkline’s picture

weynhamz’s picture

Got several complaints from a client about the home page cache did not get updated after publishing some new nodes, the odd thing is that this seems only happens in Acquia Cloud environment, it behaves perfectly fine for local debugging and our ci environment. Spent all most two days debugging then landed to the same location as pointed out in this issue summary.

In my case, it was basically the 'node_list' cache tag that is missing in the invalidation request.

Below is some debugging detail.

Firstly, I suspected that the `node_list` cache tag was not inserted into the queue for some reason, be below confirms that the `node_list` is indeed inserted.

I did notice the unusual auto-increment id calculation when I read the code, but I didn't think it could be the cause of the cache invalidation problem.

https://git.drupalcode.org/project/purge/-/blob/8.x-3.x/src/Plugin/Purge...

INSERT INTO {purge_queue} (data, created) VALUES (:db_insert_placeholder_0, :db_insert_placeholder_1), (:db_insert_placeholder_2, :db_insert_placeholder_3), (:db_insert_placeholder_4, :db_insert_placeholder_5), (:db_insert_placeholder_6, :db_insert_placeholder_7), (:db_insert_placeholder_8, :db_insert_placeholder_9), (:db_insert_placeholder_10, :db_insert_placeholder_11), (:db_insert_placeholder_12, :db_insert_placeholder_13), (:db_insert_placeholder_14, :db_insert_placeholder_15)
Array
(
    [0] => Array
        (
            [data] => a:4:{i:0;s:3:"tag";i:1;a:0:{}i:2;s:11:"node:134386";i:3;a:0:{}}
            [created] => 1591339804
        )

    [1] => Array
        (
            [data] => a:4:{i:0;s:3:"tag";i:1;a:0:{}i:2;s:9:"node_list";i:3;a:0:{}}
            [created] => 1591339804
        )

    [2] => Array
        (
            [data] => a:4:{i:0;s:3:"tag";i:1;a:0:{}i:2;s:29:"content_moderation_state:7021";i:3;a:0:{}}
            [created] => 1591339804
        )

    [3] => Array
        (
            [data] => a:4:{i:0;s:3:"tag";i:1;a:0:{}i:2;s:29:"content_moderation_state_list";i:3;a:0:{}}
            [created] => 1591339804
        )

    [4] => Array
        (
            [data] => a:4:{i:0;s:3:"tag";i:1;a:0:{}i:2;s:41:"handy_cache_tags:content_moderation_state";i:3;a:0:{}}
            [created] => 1591339804
        )

    [5] => Array
        (
            [data] => a:4:{i:0;s:3:"tag";i:1;a:0:{}i:2;s:66:"handy_cache_tags:content_moderation_state:content_moderation_state";i:3;a:0:{}}
            [created] => 1591339804
        )

    [6] => Array
        (
            [data] => a:4:{i:0;s:3:"tag";i:1;a:0:{}i:2;s:21:"handy_cache_tags:node";i:3;a:0:{}}
            [created] => 1591339804
        )

    [7] => Array
        (
            [data] => a:4:{i:0;s:3:"tag";i:1;a:0:{}i:2;s:38:"handy_cache_tags:node:important_notice";i:3;a:0:{}}
            [created] => 1591339804
        )

)
Calculated ids
Array
(
    [0] => 225686
    [1] => 225687
    [2] => 225688
    [3] => 225689
    [4] => 225690
    [5] => 225691
    [6] => 225692
    [7] => 225693
)
Ids in database
225686
225691
225696
225701
225706
225711
225716
225721

Until, later on, I was able to narrow down to the following block of code that lose the `node_list` tag.

https://git.drupalcode.org/project/purge/-/blob/8.x-3.x/src/Plugin/Purge...

    dpm('claimed items before');
    dpm($items);

    // Iterate the $items array and replace each with full instances.
    foreach ($items as $i => $item) {

      // See if the inv. object is still buffered locally, else instantiate it.
      if (!($inv = $this->buffer->getByProperty('item_id', $item->item_id))) {
        $inv = $this->purgeInvalidationFactory->getFromQueueData($item->data);
      }

      // Ensure it is buffered, has the right state and properties, then add it.
      $this->buffer->set($inv, TxBuffer::CLAIMED);
      $this->buffer->setProperty($inv, 'item_id', $item->item_id);
      $this->buffer->setProperty($inv, 'created', $item->created);
      $items[$i] = $inv;
    }

    $this->logger->debug("claimed @no items.", ['@no' => count($items)]);

    dpm('claimed items after');
    dpm($items);

    return $items;
  }

See debugging output below

claimed items before
Array
(
    [0] => stdClass Object
        (
            [__CLASS__] => stdClass
            [item_id] => 225686
            [data] => Array
                (
                    [0] => tag
                    [1] => Array
                        (
                        )

                    [2] => node:134386
                    [3] => Array
                        (
                        )

                )

            [expire] => 0
            [created] => 1591339804
        )

    [1] => stdClass Object
        (
            [__CLASS__] => stdClass
            [item_id] => 225691
            [data] => Array
                (
                    [0] => tag
                    [1] => Array
                        (
                        )

                    [2] => node_list
                    [3] => Array
                        (
                        )

                )

            [expire] => 0
            [created] => 1591339804
        )

    [2] => stdClass Object
        (
            [__CLASS__] => stdClass
            [item_id] => 225696
            [data] => Array
                (
                    [0] => tag
                    [1] => Array
                        (
                        )

                    [2] => content_moderation_state:7021
                    [3] => Array
                        (
                        )

                )

            [expire] => 0
            [created] => 1591339804
        )

    [3] => stdClass Object
        (
            [__CLASS__] => stdClass
            [item_id] => 225701
            [data] => Array
                (
                    [0] => tag
                    [1] => Array
                        (
                        )

                    [2] => content_moderation_state_list
                    [3] => Array
                        (
                        )

                )

            [expire] => 0
            [created] => 1591339804
        )

)

claimed items after
Array
(
    [0] => stdClass Object
        (
            [__CLASS__] => Drupal\purge\Plugin\Purge\Invalidation\TagInvalidation
            [id:protected] => 0
            [context:protected] => 
            [expression:protected] => node:134386
           ...
        )

    [1] => stdClass Object
        (
            [__CLASS__] => Drupal\purge\Plugin\Purge\Invalidation\TagInvalidation
            [id:protected] => 5
            [context:protected] => 
            [expression:protected] => handy_cache_tags:content_moderation_state:content_moderation_state
            ...
        )

    [2] => stdClass Object
        (
            [__CLASS__] => Drupal\purge\Plugin\Purge\Invalidation\TagInvalidation
            [id:protected] => 8
            [context:protected] => 
            [expression:protected] => content_moderation_state:7021
            ...
        )

    [3] => stdClass Object
        (
            [__CLASS__] => Drupal\purge\Plugin\Purge\Invalidation\TagInvalidation
            [id:protected] => 9
            [context:protected] => 
            [expression:protected] => content_moderation_state_list
            ...
        )
)

Basically, https://git.drupalcode.org/project/purge/-/blob/8.x-3.x/src/Plugin/Purge... tries leveraging the cached invalidation instance in the buffer based on item_id, but with Aquia Cloud, the database auto_increment_id is not sequential, then this leads to getting the wrong invalidation instance always for the second element of the array, which is `node_list`.

randy tang’s picture

StatusFileSize
new3.13 KB
randy tang’s picture

StatusFileSize
new2.41 KB
avpaderno’s picture

Status: Needs review » Needs work
bpizzillo’s picture

StatusFileSize
new2.11 KB

Re-rolling after PHPCS changes.

bpizzillo’s picture

Status: Needs work » Needs review
bpizzillo’s picture

Status: Needs review » Needs work
bpizzillo’s picture

StatusFileSize
new2.28 KB
new1.26 KB

Previous patch hardcoded the table name and used $this->connection->query. For some reason this broke a large number of tests. Using static::TABLE_NAME and $this->connection->insert does not seem to throw errors. Let's see if we can get a good build.

bpizzillo’s picture

StatusFileSize
new2.03 KB
new448 bytes

Accidentally left in old use statement from previous patch.

bpizzillo’s picture

Status: Needs work » Needs review
marvin_b8’s picture

Status: Needs review » Reviewed & tested by the community

Looks pretty good so far, I just noticed the following things:

- unset the transaction should not be necessary (https://api.drupal.org/api/drupal/core%21lib%21Drupal%21Core%21Entity%21...)
- The logic for the id could be reversed.

So that it looks something like this:

  public function createItemMultiple(array $items) {
    // Wrap these in our own loop so we get the right ID values.
    $ids = [];
    $now = time();
    $transaction = $this->connection->startTransaction();
    try {
      foreach ($items as $data) {
        $id = $this->connection->insert(static::TABLE_NAME)
          ->fields([
            'data' => serialize($data),
            'created' => $now,
          ])->execute();

        if (empty($id)) {
          // This will trigger a rollback of the transaction.
          throw new \Exception('Inserted item did not return an ID');
        }

        $ids[] = (int) $id;
      }
    }
    catch (\Exception $e) {
      $transaction->rollBack();
      return FALSE;
    }

    return $ids;
  }

and If it's a performance problem, we can also think about putting the whole thing in a different queue.

davidwbarratt’s picture

I accidentally created a duplicate of this issue at #3469458: Late Runtime Processor purges incorrect tags. This problem seems exceptionally bad with SQLite.

davidwbarratt’s picture

I tested the patch and it does not fix the issue I had in #3469458: Late Runtime Processor purges incorrect tags so I'm re-opening it.

avpaderno’s picture

@davidwbarratt If the patch does not fix the bug described by this issue, the status should be changed.

avpaderno’s picture

Status: Reviewed & tested by the community » Needs work
Issue tags: +Needs reroll

Let's create a merge request, now that patches are no longer tested.

davidwbarratt’s picture

To clarify, the patch clearly fixes a problem, but I'm not sure if that's the problem I'm having in #3469458: Late Runtime Processor purges incorrect tags or if that's the same problem described in this issue. From the comment thread, it looks like the patch does fix the issue for some folks so it's possible there is more than one problem and the patch isn't a comprehensive solution.

bkline’s picture

@davidwbarratt

This problem seems exceptionally bad with SQLite.

That would seem to imply that you are able to reproduce the unwanted behavior using this patch with other databases than SQLite (just not as frequently). Can you confirm that this is true?

davidwbarratt’s picture

I added repo steps to #3469458: Late Runtime Processor purges incorrect tags it happens every time I use Late Runtime Processor, if I use the drush or cron processor the issue goes away.

I didn't test it on any other database, but I am using SQLite, I assumed since the problem happens consistantly that that could be a clue, but it might not be.

I figured this out because I was writting a Cloudflare Worker that receives the purge requests and I noticed it was missing a tag. Upon further investigation, the tag isn't being sent.

davidwbarratt’s picture

I did try switching from mod_php to php-fpm. I thought it might since the former holds on to the response and perhaps I was hitting a timeout or something, but it didn't make a difference.

davidwbarratt’s picture

I have fixed it!

[DEBUG] [purge] [2024-08-23T22:32:11] queue: claimed 3 items: node_list, node_list:article, node:49 | uid: 1 | request-uri: http://localhost:8888/node/49/edit?destination=%2Fadmin%2Fcontent | refer: http://localhost:8888/node/49/edit?destination=/admin/content | ip:  127.0.0.1 | link:
[DEBUG] [purge] [2024-08-23T22:32:11] purger_cloudflare_worker_0b7da3a883: Received Purge Request for node_list, node_list:article, node:49 | uid: 1 | request-uri: http://localhost:8888/node/49/edit?destination=%2Fadmin%2Fcontent | refer: http://localhost:8888/node/49/edit?destination=/admin/content | ip:  127.0.0.1 | link:
[DEBUG] [purge] [2024-08-23T22:32:11] purger_cloudflare_worker_0b7da3a883: Executing purge request for node_list, node_list:article, node:49 | uid: 1 | request-uri: http://localhost:8888/node/49/edit?destination=%2Fadmin%2Fcontent | refer: http://localhost:8888/node/49/edit?destination=/admin/content | ip:  127.0.0.1 | link:

The code basically relies on the indexing of the array to always match. Instead of making that assumption, we'll just use whatever is passed in and ensure the indexing matches up.

davidwbarratt’s picture

Status: Needs work » Needs review

davidwbarratt changed the visibility of the branch 3094343-queue-confusion-on-replicated-databases to hidden.

davidwbarratt changed the visibility of the branch 3094343-queue-confusion-on-replicated-databases to active.

godotislate made their first commit to this issue’s fork.

godotislate’s picture

Issue tags: -Needs reroll +Needs tests

The code basically relies on the indexing of the array to always match. Instead of making that assumption, we'll just use whatever is passed in and ensure the indexing matches up.

While it's certainly safer to ensure the keys align, looking at the code in QueueService::commitAdding(), I can't see how the keys would be anything other than sequential 0-indexed, since the array_chunk() call does not preserve keys. Nevertheless, if it indeed addresses an issue, then the keys need to be preserved in all implementations of QueueInterface::createItemMultiple(), so I've made the respective changes to MemoryQueue and QueueBase.

I looked at how to write some tests for this, but I'm not sure it's possible to change the autoincrement setting of the database in tests, and otherwise mocking the database service is a huge lift. Leaving at Needs tests for now.

godotislate’s picture

One note about this solution is that since one multi-value insert query is replaced by multiple insert queries, the performance gain from creating multiple at once is lost. I don't have an alternate solution of how to address this. I thought about doing a select query to get the last count(items) right after the insert query, but to do that, need to be sure that there were no other insertions. Which would mean using a lock and makes things much more complicated.