Problem/Motivation

After #3498193: Optimize last_write_timestamp writes in ChainedFastBackend we write the last_write_timestamp in chained fast backend 50ms ahead when the slower cache backend is written to. This allows us to avoid writing the timestamp on every single cache write (which could be thousands of times per second on cold starts - think every field storage + field config entity etc.).

However, we still fetch items from the fast backend even if there's zero chance they're going to be accepted, and we also write items back to the fast backend even if their created time is going to be older than the $last_write_timestamp.

Since this case will by definition happen when there's a cache stampede, it could be one of the causes of APCu locks as described in #3396880: Enhancing APCu Lock Efficiency (for ChainedFast for High-Concurrency Cache Invalidation). The previous MR doesn't make a previous situation worse, but it allows us to do this issue and hopefully make it better.

The combination of changes here result in over 1000 less writes to APCu on a cold cache request to Umami, and even second or third requests result in about 1-200 cache items being written, showing that APCu gets used for 'hot path' cache items, rather than everything that might be written to on a cold cache. This should considerably reduce write contention as well as potentially reducing APCu cache size.

Steps to reproduce

Proposed resolution

In ChainedFastBackend::getMultiple(), don't even bother looking up items from the fast backend or writing back to it, if $last_write_timestamp is >= now.

However also know that if the last_write_timestamp has just been written, it's likely to be written again (the next cache set to the consistent backend), so change the grace period when updating $last_write_timestamp from 50 milliseconds to 1 second. This means for the first second after a consistent backend write, all reads and writes skip the fast backend, but as soon as that second has passed, the fast backend will take over.

Relatedly, we currently always write through to the fast backend on ::set and ::setMultiple(), but again - we know that in cold cache situations, there could be thousands of sets, and every one of those sets invalidates the fast backend again, so you can end up writing to APCu 1000 times to get a single valid cache item in there. The MR just removes the through writes and adds a comment.

Remaining tasks

User interface changes

Introduced terminology

API changes

Data model changes

Release notes snippet

CommentFileSizeAuthor
#10 set-1-second.txt5.87 KBcatch
#9 set-after.txt38.95 KBcatch
#9 set-before.txt49.39 KBcatch

Issue fork drupal-3526080

Command icon Show commands

Start within a Git clone of the project using the version control instructions.

Or, if you do not have SSH keys set up on git.drupalcode.org:

Comments

catch created an issue. See original summary.

catch’s picture

Status: Active » Needs work
Issue tags: +Performance

Pushed what I think we need.

Didn't implement the 'additional grace period' because I'm not sure that's what we want. If there are lots and lots of cache sets/invalidations happening, then the $last_write_timestamp will keep being set ahead, and as soon as it's not, that's when we want to write to the fast backend, so what's here might be enough.

However it would be easy to add that additional grace period - we just need to subtract 1 second (or 500ms or whatever) from $compare.

catch’s picture

Status: Needs work » Needs review
Issue tags: +needs profiling

Moving to needs review.

Adding needs profiling because I think that running xhprof on a page request immediately after a cache clear should hopefully show a clear before/after here - e.g. there should be dozens or hundreds less apcu reads and writes but almost no other changes. And then after refreshing that page a couple of times, everything should be the same with/without the MR applied.

catch’s picture

Issue summary: View changes
fabianx’s picture

Looks great to me!

catch’s picture

Issue tags: -needs profiling

Removing the needs profiling tag because by happy accident this made existing unit tests fail, which gave me an idea how we should unit test this - unit test proves that this skips the fast backend when the timestamp is in the future.

However it would probably still be nice to profile it to actually show the real difference in apcu reads/writes.

catch’s picture

There's another potential benefit here although it will depend on what actually happens in real-life situations.

Field config entities (for example) aren't usually requested at all when caches are warm, they're used to build things like the views data cache, which then gets cached separately. See #3380145: ViewsData should not cache by language and #3454277: Allow fields to be opted out of views data for some background.

If those higher level caches are built and written during the 'grace period' added here, we won't write field config entities to the fast backend during that time, and then by the time we're out of the grace period, the higher level caches might be all built, which means we don't need to get those lower level cache items anyway, so we don't write them back to the fast backend even after the grace period.

So instead of writing the same amount of data to APCu, but 50ms later and less times, it may result in writing less data to APCu overall.

Might be able to verify this by logging fast backend cache ID writes to a file and comparing after a few page requests with and without the MR.

catch’s picture

StatusFileSize
new49.39 KB
new38.95 KB

I ran the following test:

Applied this patch to core:

diff --git a/core/lib/Drupal/Core/Cache/ApcuBackend.php b/core/lib/Drupal/Core/Cache/ApcuBackend.php
index bf5e4d98ca9..39b154255c5 100644
--- a/core/lib/Drupal/Core/Cache/ApcuBackend.php
+++ b/core/lib/Drupal/Core/Cache/ApcuBackend.php
@@ -183,6 +183,7 @@ protected function prepareItem($cache, $allow_invalid) {
    * {@inheritdoc}
    */
   public function set($cid, $data, $expire = CacheBackendInterface::CACHE_PERMANENT, array $tags = []) {
+    file_put_contents('/tmp/set.txt', $cid . "\n", FILE_APPEND);
     assert(Inspector::assertAllStrings($tags), 'Cache tags must be strings.');
     $tags = array_unique($tags);
     $cache = new \stdClass();
drush cr
rm /tmp/set.txt

This clears caches, but then clears out any apcu sets from the command line (e.g. via router rebuilds), so we're only seeing what happens with the first full page load. I could have checked PHP_SAPI instead of requiring the rm set.txt step but it was too late before I thought about that.

Then visited the front page.

Repeated this with and without the MR.

cat set-before.txt | wc -l
1236
cat set-after.txt | wc -l
889

So that's apcu sets reduced by around 1/3rd on the first page load after a drush cr.

This makes me think we should try a longer grace period, so will have a look at that next.

catch’s picture

Priority: Normal » Major
Issue summary: View changes
StatusFileSize
new5.87 KB

Tried a longer grace period and found more things.

First of all, I realised that in set/setMultiple we always unconditionally write through to the chained fast backend. The idea here was to avoid a consistent cache lookup on the next request (especially in single server setups), but this is the cause of hundreds of writes to the fast backend, which may or may not then be used on the next request anyway - e.g. any low level cache items underneath a layered cache (views data discovery, some plugins derivers) might not be requested on the next request. Also the next write to the consistent backend will invalidate the previous one, so it's 'last one wins' which means hundreds of writes to save one cache lookup on a cold cache, and at maximum one consistent lookup saved on a warmer cache. So I removed those two write-throughs.

This took 889 apcu writes down to just over 120.

I then added the 1 second grace period, and this reduced 120 acpu writes down to 2 or 3.

I then did a further test where I reloaded the Umami front page again after the first couple of steps, and then also clicked over to the 'recipes' page, to see what would happen with a couple more requests, and I got a total of 199 writes. I'm uploading that file - it's not an apples to apples comparison with the previous tests, but it shows that we have less writes even after more activity.

There is probably a bit of discussion to be had about what an ideal grace period should be, but tbh 1 second seems about right, it's usually enough time for discovery caches to be rebuilt, and it's discovery caches (views data, block plugins) are the ones that can result in lots and lots of small config entity and similar cache misses. It's also short enough that the fast backend will start to take load of the consistent backend pretty quickly still - even if we do more writes, if it keeps getting invalidated, the extra fast backend traffic doesn't result in less load.

catch’s picture

Issue summary: View changes

Added unit test coverage for the ::set issue.

catch’s picture

Title: Add a grace period to ChainedFastBackend when last_write_timestamp is ahead » Reduce write contention to the fast backend in ChainedFastBackend
catch’s picture

Issue summary: View changes
kristiaanvandeneynde’s picture

Gave this a review, generally like the idea and it seems promising in reducing a cache stampede's impact. However, I have some questions about whether we are profiling all the pieces involved in the process.

Edit: Mixed up FastBackendChain with ChainedBackend in my brain below.

If I understand this correctly, we are completely eliminating fast backend lookups and writes for a whole second whenever anything is written to the persistent backend. Won't that simply move the stampede from the fast backend to the persistent backend?

We know that ChainedBackend is used for cache items that are expensive and/or requested frequently during a request. So turning off for a whole second the very thing why people use ChainedBackend seems a bit... odd?

If we are talking about higher and lower level caches being warmed, wouldn't it make more sense for the lower level caches to not use ChainedBackend if we're going to be storing calculations derived from the lower level data in a higher level cache? Talking about your example from #8.

Either way, not sure about this.

kristiaanvandeneynde’s picture

Edit: Mixed up FastBackendChain with ChainedBackend in my brain below.

It seems to me that we are limited by the $lastWriteTimestamp being shared regardless of CID. Meaning if we write 5 different items with a small delay between them, entries one through four will be considered invalid because of the fifth item being written. Why?

From the docs:

 * Because this backend will mark all the cache entries in a bin as out-dated
 * for each write to a bin, it is best suited to bins with fewer changes.

Isn't that bit worth reconsidering instead? If I have an expensive item I know will be requested multiple times, why would I accept it not being found in the cache because another item in the same bin so happened to got written just a few ms before my retrieval attempt? Seems like keeping a last write per CID could also go a long way here.

kristiaanvandeneynde’s picture

Also if we are to go ahead with the current approach, the class docs need updating. This bit won't hold true anymore:

 * (chained) cache backend. Those cache entries that were created before the
 * last write are discarded, but we use their cache IDs to then read them from
 * the consistent (slower) cache backend instead; at the same time we update
 * the fast cache backend so that the next read will hit the faster backend
 * again. Hence we can guarantee that the cache entries we return are all
 * up-to-date, and maximally exploit the faster cache backend.
catch’s picture

I don't think we have chained fast for items that are requested multiple times per page, for that we have memory/static caching.

The reason is that we have a couple of dozen cache items that are requested on almost any request, especially state, block config for the current theme etc. and we can offload all of those from the database (or redis) for the majority of requests.

Last write per cid would mean as many persistent cache gets as there are fast backend gets and undermine the database offload entirely.

One thing I wondered about was whether we could have the fast backend store a random key to identify the local cache, and add that key to items in the persistent backend, and then only use last_write_timestamp when it doesn't match. This would help on single web head setups a lot, and maybe a bit with more web heads too. It would mean stuffing it into the persistent item transparently (like nesting the actual data in an array with the hash.

kristiaanvandeneynde’s picture

Yeah for a second there I confused ChainedFastBackend with BackendChain where we combine a memory and database cache. That is often what we use for items that are requested multiple times per page. Okay let me revisit what I wrote and see which bits still make sense.

kristiaanvandeneynde’s picture

Okay, after resolving the mix-up in my brain between ChainedFastBackend and BackendChain I actually really like this MR now. I still have some notes about spelling and docs in general, but the code looks good to me.

On a cold start we don't want several processes trying to write the same thing to APCu (or other software) thousands of times, so the 50ms (now 1s) locking mechanism prevents that. We still are able to retrieve the calculated thing from the DB cache, so it's not that bad. Over time, we will populate the fast backend properly and then we get the full benefit of it. But at least we won't be choking it when on cold caches.

The only thing I could still see is that both the 50ms and 1s window seem arbitrary. But something is definitely better than nothing here, so I suppose that's fine.

Other than the comments on the docs, I don't have any objections here. Seems like a great addition.

One thing I wondered about was whether we could have the fast backend store a random key to identify the local cache, and add that key to items in the persistent backend, and then only use last_write_timestamp when it doesn't match. This would help on single web head setups a lot, and maybe a bit with more web heads too. It would mean stuffing it into the persistent item transparently (like nesting the actual data in an array with the hash.

So each item in the persistent cache would be flagged with a key like so: ['some_key' => 'abcdef...', 'data' => $the_actual_cache_data] and upon retrieval we convert that back to $the_actual_cache_data?

But to what purpose? I'm not sure I can follow what you mean with "to identify the local cache" here.

nicxvan’s picture

I have a client struggling with performance on dev, I've backported this to 10.4 and deployed it and it seems to have helped after the first couple of page views.

smustgrave’s picture

Any recommendation for testing or reviewing this one?

catch’s picture

You could try to reproduce the steps in #9. The exact numbers will differ every request, but before/after applying the MR should show a big difference.

#3526080-9: Reduce write contention to the fast and consistent backend in ChainedFastBackend

berdir’s picture

Status: Needs review » Needs work

This looks great and all changes makes sense to me. Just one nitpick on a comment and how we refer to the interface for setting up mock objects.

catch’s picture

Status: Needs work » Needs review

Couple of dodgy looking test failures here, which could be new random failures, but could be unlucky:

https://git.drupalcode.org/issue/drupal-3526080/-/jobs/7370709

https://git.drupalcode.org/issue/drupal-3526080/-/jobs/7370711

catch’s picture

Re-running on gitlab CI the standard recipe test fail moved to a different place.

I have a hunch this might be #2066993: Use magic methods to sync container property to \Drupal::getContainer in functional tests, which just became committable. After a rebase, the MR was green first time. Running it a second time just in case to flush anything out.

edit: all green twice. Not perfectly scientific but I think that was probably it.

berdir’s picture

Status: Needs review » Reviewed & tested by the community

The semi-random test fails are a bit concerning, we did have two green runs since the rebase. Can't really explain why this would case them, only idea I have is that by making rebuilds faster that we're more likely to hit race conditions?

All the changes make sense to me, docs are clear now as well. It's not visible in the performance tests, since it's internal to the cache backend.

kristiaanvandeneynde’s picture

Had another look and comments and tests look clear to me now. +1

catch’s picture

Can't really explain why this would case them, only idea I have is that by making rebuilds faster that we're more likely to hit race conditions?

Yes that's my only idea too - the MR means we treat the fast backend as invalid more than we do in HEAD (on the basis that it's going to be invalid anyway for a lot of those reads, to save writes to the consistent backend), so I don't see how it could actually introduce a cache mis-match, but it can make an existing race condition more likely by making cold cache requests faster.

We do have #3496257: Race conditions in CacheCollector/State (again) which I got very stuck on, but don't think the failures we saw are related to state.

catch’s picture

Title: Reduce write contention to the fast backend in ChainedFastBackend » Reduce write contention to the fast and consistent backend in ChainedFastBackend
Issue tags: +11.3.0 release highlights

Because we track the last write timestamp in the consistent backend, this reduces contention in both - updating the title. Also adding a hopeful tag - can go with other stuff in the performance improvements section.

alexpott’s picture

Version: 11.x-dev » 11.3.x-dev
Status: Reviewed & tested by the community » Fixed

Committed and pushed e4311e61d54 to 11.x and c040ba1dd87 to 11.3.x. Thanks!

Fingers-crossed @berdir's concern in #26 doesn't cause issues.

Now that this issue is closed, review the contribution record.

As a contributor, attribute any organization that helped you, or if you volunteered your own time.

Maintainers, credit people who helped resolve this issue.

  • alexpott committed c040ba1d on 11.3.x
    fix: #3526080 Reduce write contention to the fast and consistent backend...

  • alexpott committed e4311e61 on 11.x
    fix: #3526080 Reduce write contention to the fast and consistent backend...

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.