Use two cache entries for ChainedFast using atomic-exchange for the invalidation timestamp [#3395205]

Problem/Motivation

APCu writes need a lock() internally, which is slow and can bring a site, which is mistakenly writing too much to APCu down. (More unlikely after the cache tag clearing invalidating chainedfast has landed).

However APCu has a reader-lock atomic-exchange operation, which is WAY faster, because it does not need to lock the whole list, but only the individual item.

But in ChainedFast we 99% write the same item. Only the expiration timestamp has changed.

Steps to reproduce

- Make backend "dirty" all the time
- Loadtest umami
- See that apcu is becoming a bottleneck, because writes are piling up

Proposed resolution

- Instead of writing items again and again (which are 99% the same), we just update the timestamp
- And instead of writing to the same cache key again and again, we use a cheap CAS-lock to ensure it's only written once

I have written some rough code years ago, which I am hereby sharing with the community.

Unfortuantely the backend needs to be aware that it's used from a chained fast backend.

Here is the code for that backend (https://gist.github.com/LionsAd/25a60c064199a1c87e000cb7a2a4a73c):

<?php

namespace Drupal\Core\Cache;

use Drupal\Component\Assertion\Inspector;

/**
 * Stores cache items in the Alternative PHP Cache User Cache (APCu).
 */
class CFAwareApcuBackend extends ApcuBackend {

  /**
   * The base time to use.
   *
   * @var int
   */
  protected $baseTime;

  /**
   * {@inheritdoc}
   */
  public function __construct($bin, $site_prefix, CacheTagsChecksumInterface $checksum_provider) {
    parent::__construct($bin, $site_prefix, $checksum_provider);
    $this->baseTime = strtotime('midnight first day of this month') + 2;
  }

  /**
   * {@inheritdoc}
   */
  public function get($cid, $allow_invalid = FALSE) {
    $cids = [$cid];
    $data = $this->getMultiple($cids, $allow_invalid);
    return !empty($data[$cid]) ? $data[$cid] : FALSE;
  }

  /**
   * {@inheritdoc}
   */
  public function getMultiple(&$cids, $allow_invalid = FALSE) {
    $start = microtime(TRUE);
    $cids_copy = $cids;

    // Translate the requested cache item IDs to APCu keys.
    $map = [];
    $extra_map = [];

    foreach ($cids as $cid) {
      $key = $this->getApcuKey($cid);
      $map[$key] = $cid;

      $created_key = $key . ':apcu_created_time';
      $lock_key = $key . ':apcu_update_lock';

      $extra_map[$created_key] = $key;
      $extra_map[$lock_key] = $key;
    }

    $result = apcu_fetch(array_keys($map+$extra_map));
    $cache = [];
    if ($result) {
      foreach ($result as $key => $item) {
        // Ignore extra keys.
        if (isset($extra_map[$key])) {
          continue;
        }

        // Prepare item.
        $item = $this->prepareItem($item, $allow_invalid);
        if (!$item) {
          continue;
        }

        // Store APCu data in the item.
        $item->apcu_data = [];

        // Overwrite created value from APCu.
        $created_key = $key . ':apcu_created_time';

        $item->created = 0;
        if (isset($result[$created_key])) {
          $item->apcu_data[$created_key] = $result[$created_key];
          $item->created = $result[$created_key] / 1000 + (float) $this->baseTime;
        }

        // Store lock data values from APCu in the item.
        $lock_key = $key . ':apcu_update_lock';

        if (isset($result[$lock_key])) {
          $item->apcu_data[$lock_key] = $result[$lock_key];
        }

        $cache[$map[$key]] = $item;
      }
    }
    unset($result);

    $cids = array_diff($cids, array_keys($cache));
    return $cache;
  }

  /**
   * {@inheritdoc}
   */
  public function set($cid, $data, $expire = CacheBackendInterface::CACHE_PERMANENT, array $tags = [], $items = []) {
    assert(Inspector::assertAllStrings($tags), 'Cache tags must be strings.');

    $created_microtime = round(microtime(TRUE), 3);
    $created = (int) (($created_microtime - $this->baseTime) * 1000);

    // Fetch data, write lock and created values.
    $key = $this->getApcuKey($cid);
    $created_key = $key . ':apcu_created_time';
    $lock_key = $key . ':apcu_update_lock';

    // Do we have a valid old item?
    $old_item = FALSE;
    if (isset($items[$cid]) && $items[$cid]->valid) {
      $old_item = $items[$cid];
      $old_result = $old_item->apcu_data;
    }
    else {
      $old_result = apcu_fetch([$key, $created_key, $lock_key]);
      if (isset($old_result[$key])) {
        $old_item = $this->prepareItem($old_result[$key], FALSE);
      }
    }

    // Ensure created key exists.
    $old_created = isset($old_result[$created_key]) ? $old_result[$created_key] : 0;
    if (!isset($old_result[$created_key])) {
      apcu_add($created_key, 0);
    }

    // Is the old item data still valid?
    if ($old_item && is_string($old_item->data) && $old_item->data === $data) {
      // Just update the created timestamp.
      apcu_cas($created_key, $old_created, $created);
      return;
    }

    // Ensure lock key exists.
    $old_lock_value = isset($old_result[$lock_key]) ? $old_result[$lock_key] : 0;
    if (!isset($old_result[$lock_key])) {
      apcu_add($lock_key, 0);
    }

    // See if we win the race to update the item.
    $lock_value = mt_rand(0, 2147483647);
    if (apcu_cas($lock_key, $old_lock_value, $lock_value)) {
      parent::set($cid, $data, $expire, $tags);

      // If this fails, someone else has updated the item already.
      apcu_cas($created_key, $old_created, $created);
    }
  }

  /**
   * {@inheritdoc}
   */
  public function setMultiple(array $items = []) {
    foreach ($items as $cid => $item) {
      $this->set($cid, $item['data'], isset($item['expire']) ? $item['expire'] : CacheBackendInterface::CACHE_PERMANENT, isset($item['tags']) ? $item['tags'] : []);
    }
  }

  /**
   * {@inheritdoc}
   */
  protected function getIterator($search = NULL, $format = APC_ITER_ALL, $chunk_size = 100, $list = APC_LIST_ACTIVE) {
    throw new \Exception("This should not be called as it can deadlock the APCu cache.");
  }

}

And here is the hacked patch for the ChainedFast backend (https://gist.github.com/LionsAd/60b2fa2fcc602405010c1b075b1bb41a):

diff --git a/web/core/lib/Drupal/Core/Cache/ApcuBackendFactory.php b/web/core/lib/Drupal/Core/Cache/ApcuBackendFactory.php
--- a/web/core/lib/Drupal/Core/Cache/ApcuBackendFactory.php
+++ b/web/core/lib/Drupal/Core/Cache/ApcuBackendFactory.php
@@ -46,6 +46,9 @@ public function __construct($root, $site_path, CacheTagsChecksumInterface $check
     else {
       $this->backendClass = 'Drupal\Core\Cache\Apcu4Backend';
     }
+
+    $this->backendClass = 'Drupal\Core\Cache\CFAwareApcuBackend';
   }
 
   /**
diff --git a/web/core/lib/Drupal/Core/Cache/ChainedFastBackend.php b/web/core/lib/Drupal/Core/Cache/ChainedFastBackend.php
--- a/web/core/lib/Drupal/Core/Cache/ChainedFastBackend.php
+++ b/web/core/lib/Drupal/Core/Cache/ChainedFastBackend.php
@@ -166,7 +166,10 @@ public function getMultiple(&$cids, $allow_invalid = FALSE) {
         $cache[$item->cid] = $item;
         // Don't write the cache tags to the fast backend as any cache tag
         // invalidation results in an invalidation of the whole fast backend.
-        $this->fastBackend->set($item->cid, $item->data, $item->expire);
+        if (!$allow_invalid || $item->valid) {
+          // @todo Add interface check.
+          $this->fastBackend->set($item->cid, $item->data, $item->expire, [], $items);
+        }
       }
     }

The main difference is that the fast backend is getting the context of all the $items that are stored, so that it can lookup the item existing in $items. So set() is getting an additional argument in this interface.

The reason that it needs the items or original $item to look at the apcu_data, which in the case that the data came from the ChainedFastBackend in the first place can avoid the read lookup, because only the new timestamp needs to be compared.

Remaining tasks

User interface changes

API changes

Data model changes

Release notes snippet

Comments

Comment #1

19 October 2023 at 11:00

Fabianx created an issue. See original summary.

Comment #2

fabianx commented 19 October 2023 at 11:04

Issue summary:

View changes

Comment #3

fabianx commented 26 October 2023 at 10:22

Let me quickly write up the idea behind the code:

When a cache entry is retrieved from primary cache backend and it has not changed in the fast backend, then all we do is write the invalidation timestamp again and again.

We also might have several processes at once writing to the same item - again: the same data -- except for the timestamp. While before, the conflict would be resolved naturally by: last writer wins, now the conflict is resolved using an apcu lock.

This lock is the second part of the patch, but should probably be a follow-up as it makes an already non-trivial architecture harder to understand.

Given these properties we can split up each cache item into two parts:

- The data to be written
- The timestamp of the cache item

By splitting it up, we can "store" the new timestamp using a compare-and-swap operation without having to write the whole item.

So what we do on a set() from the chained fast backend:

- Retrieve items from APCu: [$old_cache_item, $timestamp] = apcu_fetch(...)
- Compare if $old_cache_item->data === $data [and probably same for tags now]

If it is not the same -> write: $cache_item normally [todo: locking code for sets]

compare-and-swap the new timestamp in, because we know the data is correct right now

- On gets:

[$cache_item, $timestamp] = apcu_fetch(...)

Overwrite the timestamp in $cache_item with the $timestamp from APCu - unless it's 0.

Comment #4

fabianx commented 26 October 2023 at 10:30

An optional optimization that the IS patch had done was also:

- Because we already retrieved the items from the fast backend, we can continue to make use of it.

No need to re-get the timestamp and cache_item when we just did the same before.

This is why ChainedFastBackend.php was patched to have an additional argument to the ->set().

Probably better if this is explicit to have a different set() method for when items are given that had just been retrieved from cache - if we wanted to do that.

Comment #5

fabianx commented 26 October 2023 at 10:33

Last to explain the lock:

The lock is for ensuring that only one item ever is written at the same time to the same cache key.

This can again be independently done:

It works like this:

- Retrieve the old value of the lock entry
- Create random value
- Do a CAS on the old value (compare-and-swap)

If we win the race to swap, then we will write the new entry.

Note: There can still be several items written after each other, but at least not concurrently.

Again the check for if the data is still the same, can help here.

Comment #6

fabianx commented 26 October 2023 at 10:43

Here is the code for set() using just the lock mechanism:

public function set($cid, $data, $expire = CacheBackendInterface::CACHE_PERMANENT, array $tags = []) {
  $key = $this->getApcuKey($cid);
  $lock_key = $key . ':apcu_update_lock';

  $results = apcu_fetch([$lock_key]);
  $old_lock_value = $results[$lock_key] ?? FALSE;

  // Ensure lock key exists.
  if ($old_lock_value === FALSE) {
    apcu_add($lock_key, 0);
    $old_lock_value = 0;
  }

  $lock_value = mt_rand(0, 2147483647);
  if (apcu_cas($lock_key, $old_lock_value, $lock_value)) {
    parent::set($cid, $data, $expire, $tags);
  }
}

This however brings to mind the question why APCu itself could not protect it's apcu_store() mechanism in this way.

But it would take quite some time to get it into APCu, so still useful here.

Comment #7

fabianx commented 26 October 2023 at 10:53

Here is another optimization of it:


public function set($cid, $data, $expire = CacheBackendInterface::CACHE_PERMANENT, array $tags = []) {
  $key = $this->getApcuKey($cid);
  $lock_key = $key . ':apcu_update_lock';

  $results = apcu_fetch([$lock_key]);
  $old_lock_value = $results[$lock_key] ?? FALSE;

  // Ensure lock key value exists.
  if ($old_lock_value === FALSE) {
    // Initialize the lock with a random value using apcu_add().
    $old_lock_value = mt_rand(0, 2147483647);
    apcu_add($lock_key, $old_lock_value);
  }

  $lock_value = mt_rand(0, 2147483647);
  if (apcu_cas($lock_key, $old_lock_value, $lock_value)) {
    parent::set($cid, $data, $expire, $tags);
  }
}

Here is the ChatGPT analysis of this function:

You’ve highlighted an important aspect of how these APCu functions interact with the underlying data store, and you are correct in your understanding of the different lock mechanisms they employ.

### apcu_cas()

- `apcu_cas()` operates with a lighter locking mechanism.
- It only requires a reader lock on the entry.
- It performs a compare-and-swap (CAS) operation, which is atomic.
- This means that it can check the current value and, if it matches the expected value, update it in a single atomic operation.
- Because it only needs a reader lock, it doesn't block other reads from happening simultaneously.

### apcu_store()

- On the other hand, `apcu_store()` requires a writer lock.
- A writer lock is more restrictive; it blocks both other writers and readers until the operation is complete.
- This can lead to longer wait times and potential bottlenecks, especially under high load or when dealing with large data sets.

### Why Your Approach Is Beneficial

- By initializing the lock with `apcu_add()` (which uses `apcu_store()` internally but is only called once at the start), and then using `apcu_cas()` for subsequent lock acquisitions, you ensure that the heavier writer lock is acquired less frequently.
- Most of the synchronization relies on `apcu_cas()`, which uses the lighter reader lock, resulting in less contention and better performance.
- This is especially beneficial in a high-concurrency environment, where many processes are trying to acquire the lock simultaneously.

By understanding and leveraging the underlying mechanisms of these APCu functions, your approach efficiently reduces lock contention, leading to better performance and scalability. This kind of optimization is crucial in performance-critical applications and high-load scenarios.

As background reading.

Comment #8

fabianx commented 26 October 2023 at 10:55

I think this is independently useful as it replaces one write in 99% of concurrent cases with two reads and just one non-concurrent write plus one additional cache key.

Comment #9

wim leers

Ghent 🇧🇪🇪🇺

commented 21 September 2024 at 22:17

Assigned:	Unassigned	» kristiaanvandeneynde
Priority:	Normal	» Major

Catching up on long-forgotten tabs … from last year’s European DrupalCon just before this year’s starts 😄

I think Kristiaan’s work on VariationCache had given him the necessary deep insight in this area to assess the impact of Fabian’s proposal. 😊🤞

Comment #10

wim leers

Ghent 🇧🇪🇪🇺

commented 21 September 2024 at 22:18

Status:

Active

» Needs review

Oops!

Comment #11

catch

he/him

English

commented 8 October 2024 at 08:36

Status:

Needs review

» Needs work

No MR here but since there is code, splitting the difference at needs work.

Comment #12

kristiaanvandeneynde

Dutch

Antwerp, Belgium

commented 25 October 2024 at 12:15

Assigned:

kristiaanvandeneynde

» Unassigned

Okay so I've just read up on this issue and I think I understand the problem space and proposed solution.

While I'm onboard with the idea behind the proposed change, a few questions arise:

Can changing the way we store things be considered a BC break? I.e.: Could people, for some reason, be relying on the current behavior and have targeted workarounds in place either in code or at the system level? If so, would this change break that?
Do we have a way to properly test this? I currently only see unit tests for ChainedFastBackend using the PhpBackend and kernel tests using the MemoryBackend. For ApcuBackend we only have unit tests. The approach suggested here seems to be specific to APCu and I'm not sure we have sufficient testing to ensure everything remains working as is.
Can we performance test this? The IS mentions Umami and we do have performance tests for Umami, but are they tailored to what's suggested here? If not, what would a new test case look like, keeping in mind PerformanceTestBase is a FunctionalJavascriptTest.

Having said all of the above, I do not see how this relates to VariationCache whatsoever, so I am unassigning myself from the issue :) As a subsystem maintainer, I am keeping an eye on this, however.

Comment #13

catch

he/him

English

commented 25 October 2024 at 12:28

Can changing the way we store things be considered a BC break? I.e.: Could people, for some reason, be relying on the current behavior and have targeted workarounds in place either in code or at the system level? If so, would this change break that?

We should only make this change in a minor release, but otherwise I think it's completely fine - we never guarantee database schema, much less for things like the cache API.

Can we performance test this? The IS mentions Umami and we do have performance tests for Umami, but are they tailored to what's suggested here? If not, what would a new test case look like, keeping in mind PerformanceTestBase is a FunctionalJavascriptTest.

I don't think this is possible unless/until we add instrumentation of specific PHP functions to performance testing (which is theoretically possible with the opentelemetry php extension) - then we'd be able to count function calls in requests and things like that.

Comment #14

25 October 2024 at 12:29

Version:

11.x-dev

» main

Drupal core is now using the main branch as the primary development branch. New developments and disruptive changes should now be targeted to the main branch.

Use two cache entries for ChainedFast using atomic-exchange for the invalidation timestamp