Problem/Motivation
09:54:39 WimLeers: catch: Fabianx-screen: damiankloip: dawehner: alexpott: I just got a tell from lussoluca, with an interesting question:
09:54:46 WimLeers: Druplicon: WimLeers: 14 hours 55 min ago <lussoluca> tell wimleers why when ChainedFastBackend::getMultiple writes to fastBackend doesn’t set exipires and tags?
09:55:17 dawehner: probably because we have been too lazy to write proper unit test coverage
09:55:20 WimLeers: I looked at the issue where ChainedFastBackend was introduced (https://www.drupal.org/node/2248767) and neither was not discussed/mentioned a single time
09:55:24 Druplicon: https://www.drupal.org/node/2248767 => Use fast, local cache back-end (APCu, if available) for low-write caches (bootstrap, discovery, and config) #2248767: Use fast, local cache back-end (APCu, if available) for low-write caches (bootstrap, discovery, and config) => 128 comments, 22 IRC mentions
09:55:50 WimLeers: dawehner: we have pretty complete generic unit test coverage for cache backends
09:55:58 WimLeers: (GenericCacheBackendUnitTestBase)
09:56:57 dawehner: WimLeers: so just checked it, at least ApcuBackend has support for $expires/$tags
09:57:18 WimLeers: dawehner: yep
09:57:34 dawehner: WimLeers: so yeah it sounds like a clear bug without any intention
09:57:54 WimLeers: My initial thought was (and I still think) that it's like this because of the markAsOutdated()/lastWriteTimeStamp stuff
09:58:00 alexpott: WimLeers: get but the ChainedBackend is more complex than what the GenericCacheBackendUnitTestBase was implemented to test
09:58:18 WimLeers: well, it tests tag deletion/invalidation
09:58:55 WimLeers: so the chained back-end uses two actual back-ends: a fast one and a consistent one
09:59:00 WimLeers: *any* write, *any* invalidation to the chained backend immediately invalidates the entire fast back-end
09:59:54 WimLeers: But I think it's correct that this is still problematic for the 'expires' part. If no writes happen for a long time, then we'll continue to use expired data
10:00:00 alexpott: WimLeers: I think you are correct - it doesn't matter because of the markAsOutdated()
10:00:11 alexpott: WimLeers: yep
10:00:14 WimLeers: … and we don't have test coverage for 'expires' in GenericCacheBackendUnitTestBase
10:00:21 alexpott: WimLeers: :)
10:00:25 WimLeers: :P
10:00:51 alexpott: it's official... it's a bug!
10:01:05 alexpott: lussoluca++
10:01:09 WimLeers: :)
10:01:21 WimLeers: I'll open an issue, unless you already are doing that
Beta phase evaluation
Proposed resolution
Make the chained back-end set the expire timestamp and the cache tags.
Remaining tasks
- Extend
GenericCacheBackendUnitTestBase to also test the expire property on cache items. The lack of such a test is what allowed this problem to go unnoticed for a long time in the first place.
- Review.
User interface changes
None.
API changes
None.
Comments
Comment #1
wim leersComment #2
berdirInteresting.
Expiration yes, I can imagine that could be a problem. But I'm actually not sure about cache tags.
Any set or delete of single items or cache tags results in a complete invalidation of the fast cache, so why bother with checking them?
Comment #3
wim leers#2: I came to the exact same conclusion, see the IRC log in the IS:
Glad to see that confirmed :)
Comment #4
berdirYeah, should have read that.
Discussed with @WimLeers, as we apparently do not bother to send invalidated/deleted cache tags to the fast cache backend (which does make sense because we invalidate anyway), we should not save the cache tags, as that will just result in unnecessary processing in AcpuBackend::prepareItem().
But we need to document why.
Comment #6
wim leersComment #7
catchYep #3/#4 looks right to me as well, but agreed we need explicit documentation on why it gets treated like this.
Comment #8
lussoluca#4: I found this bug when implementing the cache datacollector for the Webprofiler module, If we want Webprofiler to collect and show cache tags we need to save them also to the fast cache backend
Comment #9
Anonymous (not verified) commented:-(
will look at this today.
Comment #10
Anonymous (not verified) commentedi have no idea why the chained backend tests don't extend GenericCacheBackendUnitTestBase. beejeebus--
Comment #11
olli commentedThere's a test patch in #2304413: Add generic cache backend unit test for ChainedFastBackend.
Comment #12
wim leersOh, hah, I hadn't even noticed that!
Comment #13
Anonymous (not verified) commentedhere's a patch.
Comment #14
berdirI think that the changes to ChainedFastBackend::getMultiple() are not necessary. The fast cache backend should already check if an item is expired itself, that is part of the API contract and I don't see why we can't rely on that.
To #8, why do you need the cache tags in the fast cache backend? As mentioned elsewhere, we immediately invalidate the fast cache backend on any cache tag invalidation, so adding them just results in pointless cache tags processing on getMultiple(), as they will never be invalidated...
Here's a patch without the expiration stuff and unified and documented the tags stuff to never set them on the fast backend.
Comment #15
lussolucaI need them into the fast cache backend only for profiling purpose, not for cache expiration. If this cause a performance regression I just have to replace the fast cache backend service with one that saves cache tags when Webprofiler is enabled, no worries.
RTBC for me
Comment #16
berdirWorking on integrating #2304413: Add generic cache backend unit test for ChainedFastBackend and providing test coverage through that, seeing some issues with that...
Comment #17
catchLet's open a normal task to discuss tags in the fast backend. I doubt it would be a measurable performance regression (it's supposed to be a fast backend after all), but it's also right in the critical path and we've seen some strange numbers with unserialize() and APC in other issues.
Comment #18
berdirOk, this turned out to be a bit more complicated than I thought bit I finally figured it out, thanks to @alexpott.
I merged in the patch from #2304413: Add generic cache backend unit test for ChainedFastBackend and looked for expiration tests in the generic cache backend, so we have real tests for the expiration, with an actual implementation for the fast backend, to proof what I said above.
Turns out that we *do* have expiration tests ther but it passes already for ChainedFastBackend (as visible in #2304413: Add generic cache backend unit test for ChainedFastBackend) because it only tests for immediate set()/get(), which already works fine, it's only on the write-through after invalidation, that it was not working. So extended the new test there with a specific scenario that writes another cache entry in the middle for the chained fast backend.
That test also currently expects cache tags to be returned, so I reverted that part from my patch for now, we can look into avoiding that in a non-critical issue I think.
When running those tests however, I had weird random fails. Took me a long time to figure out out, but I think I finally have and I had to make the following changes:
- mark as outdated needs to happen *after* we update the persistent cache, so that we can be sure that other clients will get the correct data when they update their fast caches
- The actual problem I think was that that the cache created time is milliseconds and the last write was microseconds, and due to rounding, it sometimes happened that the millisecond was rounded up and created was then after last write in the comparison. So I changed that to round, which then resulted in even more fails is was then always the same millisecond. So the final change is to put the last write timestamp 1 millisecond in the future, so we possibly have to re-fetch a few cache entries from the persistent cache, but we can be sure that there is never wrong data returned.
Also, don't forget to credit @olli for his work in #2304413: Add generic cache backend unit test for ChainedFastBackend.
Comment #19
berdirForgot to re-enable the new test.
Comment #20
wim leersThis now has tests, so removing tag.
I think beejeebus needs to review this, so assigning to him.
Comment #21
Anonymous (not verified) commentedComment #22
berdirComment #23
wim leersThis is the key change, and the reason I preferred beejeebus to review it.
So this is what
When we introduced
ChainedFastBackend(in #2231595: Add a cache backend that checks an inconsistent cache, then falls back to a consistent cache backend), we argued that we wanted to guarantee that we should mark it as outdated first, to guarantee that other web heads would get the updated data from the consistent back-end ASAP.Unfortunately, what we missed, was that this makes it so that the other web HEAD might read the outdated data from the consistent back-end, since that hasn't been written to at the time we call
markAsOutdated(). Hence this is a sound change.Sounds good. Opened a major task for that: #2392235: ChainedFastBackend shouldn't write cache tags to the fast cache back-end.
Besides these nitpicks, the patch looks good:
Missing interface.
I think we should move this into
GenericCacheBackendUnitTestBase.And s/Expirationon/ExpirationOn/ :)
… except that there is something wrong in the test, because this test-only patch (extracted from #19) is going to be green, whereas it should be red, hence it's not testing the right thing :)
(NR for testing this test-only patch.)
Comment #24
wim leersComment #25
berdirRight, so I was confused.
The test can not work, because the consistent backend will obviously never return an expired item that we could put into the fast storage, so we can not reproduce this in a kernel test, it has to be in the unit test. Which did not actually properly check for this yet, so I removed the custom test method and instead ensured in the unit test that we're calling it as expected.
Comment #27
wim leersComment #28
berdirAh, forgot one thing, we discussed the order of set/markAsOutdated(), and the best approach there is to write to persistent first, then mark as outdated, then update our local fast cache. That is safe and we don't already invalidate or own set.
Comment #29
wim leersRight!
Comment #30
alexpottAdded @olli to the commit credit for the dupe issue. This issue addresses a critical bug and is allowed per https://www.drupal.org/core/beta-changes. Committed 106e3d4 and pushed to 8.0.x. Thanks!