Problem/Motivation

09:54:39 WimLeers: catch: Fabianx-screen: damiankloip: dawehner: alexpott: I just got a tell from lussoluca, with an interesting question:
09:54:46 WimLeers: Druplicon: WimLeers: 14 hours 55 min ago <lussoluca> tell wimleers why when ChainedFastBackend::getMultiple writes to fastBackend doesn’t set exipires and tags?
09:55:17 dawehner: probably because we have been too lazy to write proper unit test coverage
09:55:20 WimLeers: I looked at the issue where ChainedFastBackend was introduced (https://www.drupal.org/node/2248767) and neither was not discussed/mentioned a single time
09:55:24 Druplicon: https://www.drupal.org/node/2248767 => Use fast, local cache back-end (APCu, if available) for low-write caches (bootstrap, discovery, and config) #2248767: Use fast, local cache back-end (APCu, if available) for low-write caches (bootstrap, discovery, and config) => 128 comments, 22 IRC mentions
09:55:50 WimLeers: dawehner: we have pretty complete generic unit test coverage for cache backends
09:55:58 WimLeers: (GenericCacheBackendUnitTestBase)
09:56:57 dawehner: WimLeers: so just checked it, at least ApcuBackend has support for $expires/$tags
09:57:18 WimLeers: dawehner: yep
09:57:34 dawehner: WimLeers: so yeah it sounds like a clear bug without any intention
09:57:54 WimLeers: My initial thought was (and I still think) that it's like this because of the markAsOutdated()/lastWriteTimeStamp stuff
09:58:00 alexpott: WimLeers: get but the ChainedBackend is more complex than what the GenericCacheBackendUnitTestBase was implemented to test
09:58:18 WimLeers: well, it tests tag deletion/invalidation
09:58:55 WimLeers: so the chained back-end uses two actual back-ends: a fast one and a consistent one
09:59:00 WimLeers: *any* write, *any* invalidation to the chained backend immediately invalidates the entire  fast back-end
09:59:54 WimLeers: But I think it's correct that this is still problematic for the 'expires' part. If no writes happen for a long time, then we'll continue to use expired data
10:00:00 alexpott: WimLeers: I think you are correct - it doesn't matter because of the markAsOutdated()
10:00:11 alexpott: WimLeers: yep
10:00:14 WimLeers: … and we don't have test coverage for 'expires' in GenericCacheBackendUnitTestBase
10:00:21 alexpott: WimLeers: :)
10:00:25 WimLeers: :P
10:00:51 alexpott: it's official... it's a bug!
10:01:05 alexpott: lussoluca++
10:01:09 WimLeers: :)
10:01:21 WimLeers: I'll open an issue, unless you already are doing that

Beta phase evaluation

Reference: https://www.drupal.org/core/beta-changes
Issue priority Critical because a malfunctioning cache system can lead to wrong results, and even potentially information disclosure.
Disruption Zero disruption, straight bugfix.

Proposed resolution

Make the chained back-end set the expire timestamp and the cache tags.

Remaining tasks

  1. Extend GenericCacheBackendUnitTestBase to also test the expire property on cache items. The lack of such a test is what allowed this problem to go unnoticed for a long time in the first place.
  2. Review.

User interface changes

None.

API changes

None.

Comments

wim leers’s picture

Status: Active » Needs review
Issue tags: +Needs tests
StatusFileSize
new753 bytes
berdir’s picture

Interesting.

Expiration yes, I can imagine that could be a problem. But I'm actually not sure about cache tags.

Any set or delete of single items or cache tags results in a complete invalidation of the fast cache, so why bother with checking them?

wim leers’s picture

#2: I came to the exact same conclusion, see the IRC log in the IS:

09:59:00 WimLeers: *any* write, *any* invalidation to the chained backend immediately invalidates the entire  fast back-end
09:59:54 WimLeers: But I think it's correct that this is still problematic for the 'expires' part. If no writes happen for a long time, then we'll continue to use expired data

Glad to see that confirmed :)

berdir’s picture

Yeah, should have read that.

Discussed with @WimLeers, as we apparently do not bother to send invalidated/deleted cache tags to the fast cache backend (which does make sense because we invalidate anyway), we should not save the cache tags, as that will just result in unnecessary processing in AcpuBackend::prepareItem().

But we need to document why.

Status: Needs review » Needs work

The last submitted patch, 1: 2386161-1.patch, failed testing.

wim leers’s picture

Issue summary: View changes
catch’s picture

Yep #3/#4 looks right to me as well, but agreed we need explicit documentation on why it gets treated like this.

lussoluca’s picture

#4: I found this bug when implementing the cache datacollector for the Webprofiler module, If we want Webprofiler to collect and show cache tags we need to save them also to the fast cache backend

Anonymous’s picture

:-(

will look at this today.

Anonymous’s picture

i have no idea why the chained backend tests don't extend GenericCacheBackendUnitTestBase. beejeebus--

olli’s picture

wim leers’s picture

Oh, hah, I hadn't even noticed that!

Anonymous’s picture

Status: Needs work » Needs review
StatusFileSize
new3.96 KB

here's a patch.

berdir’s picture

Issue tags: +Ghent DA sprint
StatusFileSize
new2.35 KB

I think that the changes to ChainedFastBackend::getMultiple() are not necessary. The fast cache backend should already check if an item is expired itself, that is part of the API contract and I don't see why we can't rely on that.

To #8, why do you need the cache tags in the fast cache backend? As mentioned elsewhere, we immediately invalidate the fast cache backend on any cache tag invalidation, so adding them just results in pointless cache tags processing on getMultiple(), as they will never be invalidated...

Here's a patch without the expiration stuff and unified and documented the tags stuff to never set them on the fast backend.

lussoluca’s picture

I need them into the fast cache backend only for profiling purpose, not for cache expiration. If this cause a performance regression I just have to replace the fast cache backend service with one that saves cache tags when Webprofiler is enabled, no worries.

RTBC for me

berdir’s picture

Assigned: Unassigned » berdir

Working on integrating #2304413: Add generic cache backend unit test for ChainedFastBackend and providing test coverage through that, seeing some issues with that...

catch’s picture

Let's open a normal task to discuss tags in the fast backend. I doubt it would be a measurable performance regression (it's supposed to be a fast backend after all), but it's also right in the critical path and we've seen some strange numbers with unserialize() and APC in other issues.

berdir’s picture

Ok, this turned out to be a bit more complicated than I thought bit I finally figured it out, thanks to @alexpott.

I merged in the patch from #2304413: Add generic cache backend unit test for ChainedFastBackend and looked for expiration tests in the generic cache backend, so we have real tests for the expiration, with an actual implementation for the fast backend, to proof what I said above.

Turns out that we *do* have expiration tests ther but it passes already for ChainedFastBackend (as visible in #2304413: Add generic cache backend unit test for ChainedFastBackend) because it only tests for immediate set()/get(), which already works fine, it's only on the write-through after invalidation, that it was not working. So extended the new test there with a specific scenario that writes another cache entry in the middle for the chained fast backend.

That test also currently expects cache tags to be returned, so I reverted that part from my patch for now, we can look into avoiding that in a non-critical issue I think.

When running those tests however, I had weird random fails. Took me a long time to figure out out, but I think I finally have and I had to make the following changes:
- mark as outdated needs to happen *after* we update the persistent cache, so that we can be sure that other clients will get the correct data when they update their fast caches
- The actual problem I think was that that the cache created time is milliseconds and the last write was microseconds, and due to rounding, it sometimes happened that the millisecond was rounded up and created was then after last write in the comparison. So I changed that to round, which then resulted in even more fails is was then always the same millisecond. So the final change is to put the last write timestamp 1 millisecond in the future, so we possibly have to re-fetch a few cache entries from the persistent cache, but we can be sure that there is never wrong data returned.

Also, don't forget to credit @olli for his work in #2304413: Add generic cache backend unit test for ChainedFastBackend.

berdir’s picture

Forgot to re-enable the new test.

wim leers’s picture

Assigned: berdir »

This now has tests, so removing tag.

I think beejeebus needs to review this, so assigning to him.

Anonymous’s picture

Assigned: » Unassigned
berdir’s picture

Issue tags: -Needs tests
wim leers’s picture

- mark as outdated needs to happen *after* we update the persistent cache, so that we can be sure that other clients will get the correct data when they update their fast caches

This is the key change, and the reason I preferred beejeebus to review it.

So this is what

+++ b/core/lib/Drupal/Core/Cache/ChainedFastBackend.php
@@ -171,34 +171,34 @@ public function getMultiple(&$cids, $allow_invalid = FALSE) {
-    $this->markAsOutdated();
     $this->consistentBackend->setMultiple($items);
     $this->fastBackend->setMultiple($items);
+    $this->markAsOutdated();

When we introduced ChainedFastBackend (in #2231595: Add a cache backend that checks an inconsistent cache, then falls back to a consistent cache backend), we argued that we wanted to guarantee that we should mark it as outdated first, to guarantee that other web heads would get the updated data from the consistent back-end ASAP.

Unfortunately, what we missed, was that this makes it so that the other web HEAD might read the outdated data from the consistent back-end, since that hasn't been written to at the time we call markAsOutdated(). Hence this is a sound change.

That test also currently expects cache tags to be returned, so I reverted that part from my patch for now, we can look into avoiding that in a non-critical issue I think.

Sounds good. Opened a major task for that: #2392235: ChainedFastBackend shouldn't write cache tags to the fast cache back-end.


Besides these nitpicks, the patch looks good:

  1. +++ b/core/modules/system/src/Tests/Cache/ChainedFastBackendUnitTest.php
    @@ -0,0 +1,53 @@
    +   * @return
    

    Missing interface.

  2. +++ b/core/modules/system/src/Tests/Cache/ChainedFastBackendUnitTest.php
    @@ -0,0 +1,53 @@
    +  public function testExpirationonWriteThrough() {
    

    I think we should move this into GenericCacheBackendUnitTestBase.

    And s/Expirationon/ExpirationOn/ :)

… except that there is something wrong in the test, because this test-only patch (extracted from #19) is going to be green, whereas it should be red, hence it's not testing the right thing :)

(NR for testing this test-only patch.)

wim leers’s picture

Status: Needs review » Needs work
berdir’s picture

Status: Needs work » Needs review
StatusFileSize
new2.17 KB
new5.94 KB
new2.73 KB

Right, so I was confused.

The test can not work, because the consistent backend will obviously never return an expired item that we could put into the fast storage, so we can not reproduce this in a kernel test, it has to be in the unit test. Which did not actually properly check for this yet, so I removed the custom test method and instead ensured in the unit test that we're calling it as expected.

The last submitted patch, 25: fast-backend-expire-2386161-25-test-only.patch, failed testing.

wim leers’s picture

Status: Needs review » Reviewed & tested by the community
berdir’s picture

Status: Reviewed & tested by the community » Needs review
StatusFileSize
new6.09 KB
new960 bytes

Ah, forgot one thing, we discussed the order of set/markAsOutdated(), and the best approach there is to write to persistent first, then mark as outdated, then update our local fast cache. That is safe and we don't already invalidate or own set.

wim leers’s picture

Status: Needs review » Reviewed & tested by the community

Right!

alexpott’s picture

Status: Reviewed & tested by the community » Fixed

Added @olli to the commit credit for the dupe issue. This issue addresses a critical bug and is allowed per https://www.drupal.org/core/beta-changes. Committed 106e3d4 and pushed to 8.0.x. Thanks!

  • alexpott committed 106e3d4 on 8.0.x
    Issue #2386161 by Berdir, Wim Leers, beejeebus, olli: ChainedFastBackend...

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.