The database backend is slow, but works everywhere Drupal does.

We now have a an APCu backend in core, and a ChainedFast cache backend which uses it. The ChainedFast backend with APCu is a lot faster than the database backend for cache bins that don't see a lot of writes.

Cache services can be tagged with default_backend: BACKEND_SERVICE_NAME. For example:

  cache.config:
    class: Drupal\Core\Cache\CacheBackendInterface
    tags:
      - { name: cache.bin, default_backend: cache.backend.chainedfast }
    factory_method: get
    factory_service: cache_factory
    arguments: [config]

This creates a config cache bin and informs the factory to use cache.backend.chainedfast by default.

This patch does some feature detection in the CacheFactory, and uses ChainedFastBackend with APCu by default for config, bootstrap and discovery bins if APCu is available.

CommentFileSizeAuthor
#131 interdiff.txt3.86 KBeffulgentsia
#131 2248767-131.patch20.9 KBeffulgentsia
#130 interdiff.txt1.15 KBeffulgentsia
#130 2248767-130.patch20.77 KBeffulgentsia
#128 interdiff.txt6 KBeffulgentsia
#128 2248767-128.patch20.77 KBeffulgentsia
#127 interdiff.txt403 byteseffulgentsia
#127 2248767-127.patch15.25 KBeffulgentsia
#124 interdiff.txt2.82 KBeffulgentsia
#124 2248767-124.patch15.04 KBeffulgentsia
#123 2248767-123.patch17.79 KBeffulgentsia
#118 115-118-interdiff.txt571 byteseffulgentsia
#118 2248767-118.patch17.83 KBeffulgentsia
#115 87-115-interdiff.txt1.88 KBeffulgentsia
#115 2248767-115.patch17.83 KBeffulgentsia
#113 2248767-113.patch17.41 KBAnonymous (not verified)
#99 2248767-87.patch17.16 KBeffulgentsia
#97 2248767.95.patch16.11 KBalexpott
#97 87-95-interdiff.txt5.88 KBalexpott
#87 interdiff.txt3.32 KBeffulgentsia
#87 2248767-87.patch17.16 KBeffulgentsia
#84 interdiff-2248767-84.txt3.16 KBAnonymous (not verified)
#84 2248767-84.patch17.15 KBAnonymous (not verified)
#78 interdiff.txt1.38 KBeffulgentsia
#78 2248767-78.patch16.9 KBeffulgentsia
#72 interdiff.txt5.69 KBeffulgentsia
#72 2248767-72.patch16.77 KBeffulgentsia
#67 xhprof runs.zip301.01 KBWim Leers
#64 patched-2248767.png48.9 KBAnonymous (not verified)
#64 head-2248767.png49.04 KBAnonymous (not verified)
#56 interdiff-2248767.txt1.41 KBAnonymous (not verified)
#56 2248767-55.patch13.53 KBAnonymous (not verified)
#53 2248767-53.patch12.37 KBAnonymous (not verified)
#53 interdiff-2248767.txt1.51 KBAnonymous (not verified)
#50 interdiff.txt598 byteseffulgentsia
#50 2248767-50.patch11.11 KBeffulgentsia
#49 interdiff.txt878 byteseffulgentsia
#49 2248767-49.patch11.11 KBeffulgentsia
#47 interdiff.txt552 byteseffulgentsia
#47 2248767-47.patch10.1 KBeffulgentsia
#43 interdiff.txt598 byteseffulgentsia
#43 2248767-43-baseline.patch9.46 KBeffulgentsia
#42 interdiff.txt944 byteseffulgentsia
#42 2248767-42.patch9.46 KBeffulgentsia
#41 interdiff.txt556 byteseffulgentsia
#41 2248767-41.patch9.47 KBeffulgentsia
#36 interdiff.txt995 byteseffulgentsia
#36 2248767-36.patch8.82 KBeffulgentsia
#35 interdiff.txt406 byteseffulgentsia
#35 2248767-35.patch8.05 KBeffulgentsia
#33 2248767-33.patch8.05 KBeffulgentsia
#28 2248767-28.patch1.27 KBAnonymous (not verified)
#13 2248767-13.patch1.45 KBAnonymous (not verified)
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

catch’s picture

Title: decide on default local cache in CacheFactory » Decide on default local cache in CacheFactory
Priority: Normal » Major
Issue tags: +Fast By Default

Bumping to major and tagging 'Fast by default'.

The two bins we want this for are config and discovery. We could hard code those bins in the factory, or somehow try to indicate per-bin whether it can use local cache or not. If we want the latter this could be yet another follow-up.

In irc beejeebus mentioned an intelligent fallback based on environment (i.e. check for apc support and use that, fall back to phpstorage if not). That seems OK. If you configure stuff that code won't kick in at all.

Berdir’s picture

The problem with using APC (u) by default might be that the default APC size of 32MB could be too small, but likely less a problem with 5.5 and APCU.

catch’s picture

We could potentially even check the size?

Berdir’s picture

Yeah, maybe, possibly as a separate hook_requirements warning or so? As it's not specific to caching, using apc/opcode for the files alone is going to fill up those 32MB *very* quickly...

Wim Leers’s picture

Title: Decide on default local cache in CacheFactory » Decide on default inconsistent cache in CacheFactory
Anonymous’s picture

Title: Decide on default inconsistent cache in CacheFactory » Decide on default fast, local cache in CacheFactory
Wim Leers’s picture

Title: Decide on default fast, local cache in CacheFactory » Decide on default fast, local cache in CacheFactory (i.e. default to APCu if available)
Issue tags: +sprint

#1807662: Built-in APCu support in core (PHP 5.5 only) just landed, now let's get this done!

Berdir’s picture

Something else to consider is that this could have unexpected side effects because it would be something else you have to empty when you want to re-install for example. Very bad things will happen if you start a new installation and then run into existing APC caches :) Good luck trying to explain to somehow who has never heard of APC that he has to restart apache now when he wants to re-install Drupal... and on shared hosting, might not even be able to do so?

We don't enable the APC classloader by default either, because that can be very nasty too if you're running into weird behavior if it tries to load classes that no longer exist and so on.

I'm considering that it might be a better option to provide a production.settings.local.php (and the existing one should be development.settings.local.php ;)) with explanations for things that should/can be enabled on a production site, this would also give us a nice place to document currently undocumented settings like the page_cache_without_db/config setting.

I'm all for fast (by default) but we need to consider the costs of it, in some cases, faster-with-three-mouseclicks might be preferrable :)

sun’s picture

To address #8, I'd suggest to simply make the very first installer screen force-empty the cache.

The new APCu cache backend uses the (hashed) site directory/path as global key prefix already, so that prefix will remain the same when re-installing (into the same conf/site path).

Anonymous’s picture

i could live with force-clearing in the installer.

some possibly dumb question:

- if we know we're in the installer, can we just not use APCu there?

- does D8 have a notion of 'i am now installed', that the cache factory could use?

ideally, if APCu will cause problems in the installer, we should just avoid using it there.

sun’s picture

Ideally, we'd have an algorithm like CoreServiceProvider::registerUuid() in CacheFactory (or a more adequate place, if there is one) — i.e., if there is no manually configured settings.php setting, then we automatically figure out the best / most performant implementation that is available.

I think we're definitely able to resolve the case of clearing external caches upon re-installation - as long as such external storages are using a reliable, site-specific prefix that is based on the conf/site directory (as opposed to hash_salt or similar, which would change with every installation and thus be unknown upon re-installation). Even confident enough to recommend to defer/postpone that topic to later / separate issue.

catch’s picture

The rebuild script could (probably has to) also clear the entire APC cache for a site - that way any stale APC issue is recoverable from

Anonymous’s picture

Assigned: Unassigned »
Status: Active » Needs review
FileSize
1.45 KB

so here is the dumbest possible patch that detects if apc functions are available, and sets the apcu backend as the default if they are. let's see what the bot thinks.

UPDATE - this includes the patch over at #2269453: apc_fetch() expects a string or array of strings to fix installer warnings.

sun’s picture

+1 to that "dumb" (== KISS) solution, in case it works.

Originally thought of putting that logic into CoreServiceProvider, but embedding the default/fallback negotiation directly into CacheFactory makes some good level of sense, too.

Berdir’s picture

We should limit this to discovery and maybe bootstrap cache bins (which is currently almost empty) I think? We don't want to default to APC for things like the render cache because that can get huge?

Test is already running 1h25 on testbot, that testbot usually finishes in 52-53 minutes... so that's not exactly good. Either a Undecillion test fails/exceptions or this is not actually making anything faster. One downside of this is that test methods and web requests have to maintain their own separate caches, so all the plugin discovery stuff is now happening separately for web requests...

damiankloip’s picture

I was going to suggest we add this to a settings file but after reading, looks like berdir already did that in #8. To me that seems more sensible, it seems the main reason to enable this by default is to try and earn the accolade 'faster than D7 out of the box'? If we have to do stuff like this to do that, it seems a bit false.

I think there would be more random issues than we can foresee now if we try to enable this by default. That is not including the need to restrict this to particular bins, which adds more unnecessary hard-coded complexity to a cache class. Settings can handle this per bin configuration already.

Berdir’s picture

Status: Needs work » Needs review

13: 2248767-13.patch queued for re-testing.

Berdir’s picture

Looks like the test bot had some trouble, we're not sure if it's the patch or testbot.

I applied locally and did run tests through CLI and it doesn't halt like testbot but I'm seeing a lot of test fails, seems to be failing at some point, especially for tests with multiple test methods.. tested with "Drupal\block\Tests\BlockTest" for example. Works in fine in browser, 30 fails in CLI.

Important: If you run test, make sure to set apc.enable_cli = 1 in php.ini/apc.ini, otherwise it fails with fatal error class APCIterator not found (on 5.5 apcu)

Will try one more time to see if it was the bot (both runs were on the same bot).

Berdir’s picture

Status: Needs work » Needs review

13: 2248767-13.patch queued for re-testing.

catch’s picture

iirc we have some tests that do things in the UI then check the cache API directly, I can't see those passing with APCu without #2231595: Add a cache backend that checks an inconsistent cache, then falls back to a consistent cache backend.

Anonymous’s picture

the more i think about this, the more i'm not sure how we can sanely test it.

the cache for web requests and cli scripts will not cross over, which is going to cause headaches.

when it's just a cache miss because an item was written in a web request, things might be a bit slower, but shouldn't be broken.

when a cache item is invalidated in one but not the other, i'm pretty sure we're going to see undefined behaviour, and i'm not sure of a simple way to fix that.

moshe weitzman’s picture

Status: Needs work » Closed (won't fix)

Yeah, invalidation headaches between web and cli look insurmountable when not using #2231595: Add a cache backend that checks an inconsistent cache, then falls back to a consistent cache backend. Sadly, I think this is a Won't Fix. I'd be happy if someone can propose a way forward. In that case, please reopen this.

This issue started off as a plea to default to the APC classloader when APC is available. Shall we open a new issue for that, in the spirit of Fast by Default?

catch’s picture

Status: Closed (won't fix) » Postponed

#2231595: Add a cache backend that checks an inconsistent cache, then falls back to a consistent cache backend was opened partly to enable this one. I don't see how this is insurmountable once that's in, so re-opening but postponing.

moshe weitzman’s picture

Postponed is fine.

My thinking is that CacheChain will only be used on multi-webhead sites and those are sufficiently complex that auto-detection is not that important. I guess one could use CacheChain on single webhead sites to get past the CLI/Web problem mentioned here. Not sure the complexity tradeoff is worth it.

Anonymous’s picture

Status: Postponed » Needs review
FileSize
1.27 KB

here's a first cut patch to get some more conversation going.

catch’s picture

I guess one could use CacheChain on single webhead sites to get past the CLI/Web problem mentioned here.

Well if we default to it for configuration and discovery, then single webhead sites will be able to default to reading from APC, rather than the database, while nothing will break on multiple web heads.

Anonymous’s picture

i'd like to also add the configuration cache to this, but we don't use CachedStorage for config anymore, so that would make this a more complicated patch.

also, berdir and sun each suggested a different mechanism.

berdir suggested writing this out to settings.php during install.

sun suggested something about putting this in the container during build.

i don't really care which mechanism we use, as long as whatever we do is not much more complex than the patch at #28.

moshe weitzman’s picture

I somewhat opposed this in #27 but I support it now. I see the consistency and speed benefits of using cache chain by default, even on single webserver sites. This approach is also compatible with Drush even though Drush has to maintain a separate APC cache from the webserver.

This is all overridable from Settings which is good. We don't do much writing to settings.php from installer if we can help it. I don't see much benefit to adding that here.

+1 from me. Lets fix those test fails.

effulgentsia’s picture

Status: Needs work » Needs review
FileSize
8.05 KB

berdir suggested writing this out to settings.php during install. sun suggested something about putting this in the container during build.

I prefer sun's idea. Just as Moshe does in #32. How's this for an implementation?

effulgentsia’s picture

Status: Needs work » Needs review
FileSize
8.05 KB
406 bytes

Right. Makes sense. Service references need only a leading @, but parameter references need both leading and trailing %. Oh, Symfony!

effulgentsia’s picture

FileSize
8.82 KB
995 bytes

i'd like to also add the configuration cache to this, but we don't use CachedStorage for config anymore, so that would make this a more complicated patch.

Possibly not very complicated. Let's see if this works.

effulgentsia’s picture

Status: Needs work » Needs review
FileSize
9.47 KB
556 bytes
effulgentsia’s picture

FileSize
9.46 KB
944 bytes
effulgentsia’s picture

FileSize
9.46 KB
598 bytes

This one is just to compare which failures are due to APCu and which are due to something else.

effulgentsia’s picture

Status: Needs work » Needs review
FileSize
10.1 KB
552 bytes

Status: Needs review » Needs work

The last submitted patch, 47: 2248767-47.patch, failed testing.

effulgentsia’s picture

Status: Needs work » Needs review
FileSize
11.11 KB
878 bytes

I'll need to open a separate issue for this language fix, but for now, just wanting to make some testbot progress.

effulgentsia’s picture

FileSize
11.11 KB
598 bytes

Now that those failures are fixed, back to APCu.

Status: Needs review » Needs work

The last submitted patch, 50: 2248767-50.patch, failed testing.

Anonymous’s picture

Assigned: » Unassigned
Anonymous’s picture

Status: Needs work » Needs review
FileSize
1.51 KB
12.37 KB

so this turns out to be a cool story.

the cli proc running the test fetches the last write timestamp, then never fetches it again.

in the meantime, the web php proc clears caches and updates the timestamp.

then the cli proc gets local cache hits, but doesn't invalidate the items because it has an old timestamp.

attached patch introduces a ttl on the last write timestamp, and makes the tests pass locally for me.

Status: Needs review » Needs work

The last submitted patch, 53: 2248767-53.patch, failed testing.

effulgentsia’s picture

Thanks! I'll take a look to see if I can make sense of the remaining failures, but meanwhile: #2311945: Add ResettableCacheBackendInterface, which despite the title, is related to the timestamp clearing during tests.

Anonymous’s picture

Status: Needs work » Needs review
Related issues: -#2311945: Add ResettableCacheBackendInterface
FileSize
13.53 KB
1.41 KB

ooooookay that was fun.

so it turns out that Drupal\field\Entity\FieldInstanceConfig->__wakeup() gets really sad when the state of the fields world has changed between when it was written to cache and when it was read from cache.

at least some of the responsibility for that lies with the ChainedFastBackend, so #55 adds a try catch around the fast backend getMultiple(). an exception is treated as 'we didn't find any valid items'.

sun’s picture

Hm. I don't think that any cache backend implementation should catch exceptions; it's going to hide real application errors caused by objects that cannot be unserialized. In addition, PHP Exceptions are very slow; regular runtime code is not supposed to throw exceptions.

+      catch (\Exception $e) {
+        // It's possible for cache items stored in the fast cache to implement
+        // PHP's object __wakeup() method, and throw exceptions. Treat an
+        // exception as 'we didn't find anything in the fast cache'.
+        $cids = $cids_copy;
+      }
effulgentsia’s picture

@sun: What's the alternative? The challenge with ChainedFastBackend is that we *know* that the fast backend might return an invalid entry. Once we retrieve it, we check its timestamp and throw it out if it's invalid (doesn't match the timestamp of the slow backend), but we can only do that check after retrieving it. So we have a situation where an unserialize() is getting called (in this case by apc_fetch() internally, but with a different backend, could be with an explicit call to unserialize()) on invalid data. We can't make our __wakeup() implementations not throw any exceptions on all possible states of invalid data, can we? As far as performance, if the cost is only when one is thrown, then it's not "regular" runtime code; it's runtime code that happens after a cache has been invalidated in one place, but the invalidation hasn't propagated to all the web nodes yet (or in the case of tests, from the CLI process to the web process).

Also, re-adding related issue that appears to have been removed in #56 due to an xpost.

effulgentsia’s picture

Also:

it's going to hide real application errors caused by objects that cannot be unserialized

No, because the catch is only for the fast backend. So throwing that one out, we proceed to check the consistent backend, which is the only authoritative one, so if there's something there that can't be unserialized, there's nothing in this patch that masks it.

effulgentsia’s picture

Now that we have a green patch here, here's the spin-off I promised in #49: #2312189: Harmonize configuration collection names and language codes.

Also, it's probably time to post some profiling numbers to justify this patch. Anyone up for that?

Anonymous’s picture

i'll post some xhprof run data tomorrow.

catch’s picture

Agreed on the exception catching, although it might be worth putting part of the answer in #59/#60 into the inline comment.

Discussed the timestamp ttl with beejeebus in irc and that works for me, also 300ms should mean that decently optimized requests finish before they get there anyway, once you're over that an extra call to the consistent backend is not your main problem.

Berdir’s picture

Anonymous’s picture

FileSize
49.04 KB
48.9 KB

did some Xhprof runs today.

anonymous user, front page with two articles, one comment on each. Search, recent content, recent comments, who's new blocks enabled.

average of five runs each, HEAD 755ms, patched 634ms. i put the data here:

https://docs.google.com/a/acquia.com/spreadsheets/d/1lws4xHUtAu9UDCl45Yn...

attached a couple of screenshots of the xhprof runs, sorted by exclusive time, you can see a big drop in db queries.

how HEAD looks:

not patched

how the patch looks:

patched

EDIT: anonymous user for both, not logged in.

Anonymous’s picture

if anyone else wants to do some profiling, just set up the env you want to test with the patch applied, then do runs with and without this in your settings.php:

$settings['cache']['default'] = 'cache.backend.database';
Anonymous’s picture

Issue summary: View changes
Issue tags: -sprint, -needs profiling
Wim Leers’s picture

FileSize
301.01 KB

Recreated anonymous user, front page with two articles, one comment on each. Search, recent content, recent comments, who's new blocks enabled.. Then did profiling.

HEAD
Lowest median out of 5 runs of ab -n 1 -c 100 http://domain: 228 ms/request.
Fastest out of 5 XHProf profiles: 329 ms
Patch
Lowest median out of 5 runs of ab -n 1 -c 100 http://domain: 204 ms/request.
I have to add that the standard deviation with this patch applied is significantly lower. I can consistently reproduce this number, which is not the case for HEAD.
Fastest out of 5 XHProf profiles: 314 ms
Run #HEAD Run #patch Diff Diff%
Number of Function Calls 98,447 97,295 -1,152 -1.2%
Incl. Wall Time (microsec) 329,120 314,421 -14,699 -4.5%
Incl. MemUse (bytes) 25,643,672 25,710,936 67,264 0.3%
Incl. PeakMemUse (bytes) 25,724,488 25,789,032 64,544 0.3%

So ab shows it's ~10% faster (in the given scenario), XHProf shows it's ~5% faster.

Wim Leers’s picture

Status: Needs review » Reviewed & tested by the community

AFAICT profiling was the last thing before this could be RTBC. I was able to confirm the performance improvement.

(My numbers are very different from beejeebus', but that can most likely be attributed to his environment; his responses take roughly twice as long. What matters most is that this is a repeatably measurable (and significant) performance improvement.)

pounard’s picture

Numbers by themselves don't mean anything, but the comparison before/after the patch is. By reducing the number of remote backend queries it will reduce concurrency problems in highly stressed environments, and this cannot be measured by neither of XHProf or apache benchmark. But my guess is that it will definitely be a huge benefit when used for caches that are not written too often.

Off-topic but still related, I'm really curious about seeing how Drupal 8 will behave on very high volume sites, or with a lot of logged in users, my guess is not that well, so this kind of patch is really necessary IMHO.

I think this is RTBC.

Berdir’s picture

It's actually possible that xhprof is making the APC function calls slower than they actually are, so that might explain some of the difference.

Also, doing testing like this is not too useful here. What you are testing is the mysql query cache, likely over a socket, and if MySQL doesn't have anything else to do, then that can be very fast. (you are testing apc vs. mysql, so that is faster, but memcache/redis vs. mysql will not make a big difference when profiling). That will change a lot if you have a site under load with large cache tables.

So, what's more interesting IMHO to compare is how many (cache) queries we save because of this, so query count before/after.

ab: Are you sure you did -n 1 -c 100 and not the other way round? What might be useful to test might be something like -n 500+ -c 10-20, then you will actually have some parallel load. But you're mostly testing your specific environment at that point.

effulgentsia’s picture

Assigned: Unassigned » effulgentsia
Status: Reviewed & tested by the community » Needs work

Thanks for the numbers and the RTBC! I'd like to add in a few more code comments and @todo links to followups before this lands though. Will do so today.

effulgentsia’s picture

Assigned: effulgentsia » Unassigned
Status: Needs work » Needs review
FileSize
16.77 KB
5.69 KB

Ok, done adding all the comments I wanted to. Hopefully, they're straightforward and this can be re-RTBC'd quickly.

Anonymous’s picture

Status: Needs review » Reviewed & tested by the community

even though i wrote much of the code,even though i wrote a small amount of the code, i think i can put this back to RTBC, as the changes in #72 are documentation only. effulgentsia++

EDIT: i am on smack.

chx’s picture

+ $this->fastService = function_exists('apc_fetch') ? 'cache.backend.apcu' : 'cache.backend.null';

D8.

catch’s picture

Status: Reviewed & tested by the community » Needs work

I thought we were going to conditionally set the cache backend to chained based on APCu being available or not. Whereas the patch is always using the chained backend but falling back to the null service. This means we'll be doing things like getting the timestamp on sites that don't have APCu available which will actually make things worse?

Anonymous’s picture

dammit, i think catch is right.

Wim Leers’s picture

Yes, he is. Sorry for the premature RTBC.

effulgentsia’s picture

Status: Needs work » Needs review
FileSize
16.9 KB
1.38 KB

Fixes #74/#75.

effulgentsia’s picture

Oh, I should point out that there is still one way in which #78 is worse than HEAD for people without APCu: it turns on caching for config, so on cache hits, it's neutral (a db query either way), but on cache misses, it's 2 queries (more if you then also include the set()s) instead of 1. However, cache misses on config should be rare.

andypost’s picture

Seems there's no clean way to use other cache backends (supposing xcache) without overriding a set of files still :(

effulgentsia’s picture

@andypost: this patch doesn't change that everything is still overriddable via settings.php. I.e., you can do:

$settings['cache']['default'] = 'cache.backend.xcache';

if you want all cache bins using that (assuming you also have a module that implements that service, since core doesn't ship with an xcache backend implementation).

Or, if you want xcache as your fast backend in a chained implementation, you can do:

$settings['cache']['chained_fast_cache']['fast'] = 'cache.backend.xcache';

I don't think we have that latter one documented yet, and we may want to move that variable out of $settings['cache'], since 'chained_fast_cache' is not a bin, but those are pre-existing issues in HEAD not affected by this patch.

damiankloip’s picture

I was going to mirror what catch pointed out but he beat me to it, and it's already been fixed :) Using the null backend is...ugh.

  1. +++ b/core/lib/Drupal/Core/Cache/CacheFactory.php
    @@ -27,13 +27,30 @@ class CacheFactory implements CacheFactoryInterface,  ContainerAwareInterface {
    +   * @param array $bin_default_backends
    

    Would $default_bin_backends be a better name?

  2. +++ b/core/lib/Drupal/Core/Cache/ChainedFastBackend.php
    @@ -102,14 +130,47 @@ public function get($cid, $allow_invalid = FALSE) {
    +      catch (\Exception $e) {
    

    We want to catch ANY exception here? Seems like a potential black hole.

  3. +++ b/core/lib/Drupal/Core/Cache/ChainedFastBackendFactory.php
    @@ -6,11 +6,57 @@
    +  function __construct(Settings $settings) {
    

    public

  4. +++ b/core/lib/Drupal/Core/Cache/ChainedFastBackendFactory.php
    @@ -6,11 +6,57 @@
    +    if (isset($cache_settings['chained_fast_cache']) && is_array($cache_settings['chained_fast_cache'])) {
    

    Why do we need to check is_array() here? I would rather see a notice if someone tried to configure this setting incorrectly (Like using a string there) rather than silent failure.

  5. +++ b/core/lib/Drupal/Core/Cache/ChainedFastBackendFactory.php
    @@ -22,24 +68,19 @@ class ChainedFastBackendFactory extends CacheFactory {
    +      $backend = new ChainedFastBackend(
    +        $backend,
    

    Overriding the same variable gets a bit messy in situations like this IMO. How about just return ChainedFastBackend(..) directly?

Anonymous’s picture

Assigned: Unassigned »

working on this at TCDrupal today.

Anonymous’s picture

FileSize
17.15 KB
3.16 KB

#82.1 - fixed.

#82.2 - i think we need to catch anything. if the error is caused by the layer using the cache system, and is not related to us fetching stale items, then it will blow up when they get items from the consistent backend. if it is caused by us fetching a stale item, there's nothing to fix - that's on the backend, not the caller. damiankloip asked from some logging, but i'm not sure i agree, so i left it out for now.

#82.3 - fixed.

#82.4 - fixed.

#82.4 - fixed.

Anonymous’s picture

woops - d.o. did a booboo.

penyaskito’s picture

Minor typo, nice to fix if it needs another iteration.

+++ b/core/lib/Drupal/Core/Cache/ChainedFastBackend.php
@@ -19,9 +19,11 @@
+ * Singe-node configurations that don't have that limitation can just use the

*Single

effulgentsia’s picture

Assigned: » Unassigned
FileSize
17.16 KB
3.32 KB

This fixes #82.1 and #82.3 in other places in the patch that weren't explicitly called out. It also fixes #86 and another miscellaneous typo.

I agree with #84's answer to #82.2, and this is commented at length (perhaps too much length?) in the patch. Suggestions on how to improve that comment are welcome.

Other than us possibly wanting to improve the long comments in ChainedFastBackend::getMultiple(), I think this is RTBC, but since I submitted some of the patches since #75, it's not for me to mark it as such.

effulgentsia’s picture

damiankloip asked from some logging, but i'm not sure i agree, so i left it out for now.

I don't see the point of logging either, but perhaps I'm missing something? Per #84, if the item is stale, then the only reason it's still in the fast cache is because a delete hasn't propagated yet, and that's by design, so why would we be interested in having a log of that condition?

moshe weitzman’s picture

Status: Needs review » Reviewed & tested by the community

I looked over the code and it looks swell to me.

alexpott’s picture

So this patch will make sites without APCu slower wrt to the config system because we're maintaining both {config} and {cache_config} tables - and we get a cache table for each config collection so multilingual will be even worse. I'm +1 to using an APCu cache for config - but I don't think it should be chained whilst config is in the database. So this then brings us to an interesting issue - can we use APCu to cache but back on to a non cache table? I.e. how do we cope with invalidation and using a timestamp to check for consistency. I'm not sure. Perhaps the best solution is to duplicate all the data in database cache bins as we do with the current patch and accept that. Hmmm.

I'm not setting this back to needs work for now - leaving this comment to explain why I have not committed it yet.

Anonymous’s picture

APCu is unusable on it's own, pretty much ever, if a site does anything outside of the web context, which is a huge percentage of sites (drush and/or cron).

so #90 is effectively saying 'let's not do this issue'. i don't see any way forward from here.

yched’s picture

Also, as stated in its doc, the chained/apcu backend is great for cache bins with a stable set of entries (e.g config, plugin discovery data), but not so much for bins with ever-growing entries (e.g entity cache or render cache), where new entries constantly invalidate the local caches (and also possibly fill up memory, discarding data for other bins ?)

Thus, is it a good idea to make this backend the default for all bins ?

Anonymous’s picture

this patch does not make the chained backend the default for all bins.

catch’s picture

I'm fine with the extra database writes on sites that are using the database backend with no APCu. The config and discovery cache bins should be very, very low-write (less than any 7.x cache table since all of those mixed at least two of config, discovery, state or content), and we already have caches which are a 1-1 swap for a database query to make those easier to offload when you want to.

Wim Leers’s picture

Title: Decide on default fast, local cache in CacheFactory (i.e. default to APCu if available) » Use fast, local cache back-end (APCu, if available) for low-write caches (bootstrap, discovery, and config)

#93 answers #92, but the title is misleading and probably led yched to write what he wrote in #92. Updating the title.

yched’s picture

Yup, my bad, but new title helps :-)
Thanks @beejeebus & @Wim

alexpott’s picture

Issue summary: View changes
Status: Reviewed & tested by the community » Needs review
FileSize
5.88 KB
16.11 KB

Whilst updating the issue summary I realised the following code in \Drupal\Core\Config\CachedStorage::construct is going to be problematic.

    if ($collection == StorageInterface::DEFAULT_COLLECTION) {
      $bin = 'config';
    }
    else {
      $bin = 'config_' . str_replace('.', '_', $collection);
    }
    $this->cache = $this->cacheFactory->get($bin);

Since the bin name of configuration depends on the config collection (with respect to core collections handle language overrides for config)

I wonder if the best approach here is not to use tags for the this but change the signature of CacheFactory::get($bin); to CacheFactory::get($bin, $default_backend = 'cache.backend.database');

And then in \Drupal\Core\Config\CachedStorage::construct

$this->cache = $this->cacheFactory->get($bin, 'cache.backend.chainedfast');

Sorry if everyone feels like I'm derailing the issue if the solution attached is not right feel free to rtbc #87 again.

Anonymous’s picture

thanks Wim for updating the summary.

alexpott - sorry i was so abrupt in #91.

#97 seems ok to me, but i'd defer to effulgentsia on that. if he's ok with it, then i think #97 is ready to go.

effulgentsia’s picture

Status: Needs review » Reviewed & tested by the community
FileSize
17.16 KB

Sorry if everyone feels like I'm derailing the issue if the solution attached is not right feel free to rtbc #87 again.

I'd prefer to do #87, so reuploading it and RTBCing it per the above quote.

I actually think we should change CachedStorage to not create a separate cache bin per collection, but put everything into the config bin, and put the collection name into the cache key instead. Because we have code like Cache::getBins() which assumes that all bins are known by the service container, so CachedStorage creating dynamic bins on the fly seems problematic. But we probably need to discuss that more in a separate issue.

alexpott’s picture

Well we can't put collection into cache key because of cache key length restrictions.

Berdir’s picture

We no longer have cache key restrictions, the cache backend automatically hashes the key if it is too long.

Status: Reviewed & tested by the community » Needs work

The last submitted patch, 99: 2248767-87.patch, failed testing.

Status: Needs work » Needs review

beejeebus queued 99: 2248767-87.patch for re-testing.

Status: Needs review » Needs work

The last submitted patch, 99: 2248767-87.patch, failed testing.

Status: Needs work » Needs review

effulgentsia queued 99: 2248767-87.patch for re-testing.

effulgentsia’s picture

Status: Needs review » Reviewed & tested by the community

#99 is green again, and #101 answers #100, so back to RTBC.

Status: Reviewed & tested by the community » Needs work

The last submitted patch, 99: 2248767-87.patch, failed testing.

Anonymous’s picture

Status: Needs work » Needs review

bot seems to be sad, going to retest again.

beejeebus queued 99: 2248767-87.patch for re-testing.

Anonymous’s picture

Status: Needs review » Reviewed & tested by the community

Status: Reviewed & tested by the community » Needs work

The last submitted patch, 99: 2248767-87.patch, failed testing.

Anonymous’s picture

this is a legit fail. if i add a sleep() or reduce the ttl to 0.15 it passes.

Anonymous’s picture

Status: Needs work » Needs review
FileSize
17.41 KB

ooookay, here's a fix.

basically, don't use the chained fast backend by default if this is a CLI proc.

if the chained fast backend is explicitly specified, use it. interdiff is:

diff --git a/core/lib/Drupal/Core/Cache/CacheFactory.php b/core/lib/Drupal/Core/Cache/CacheFactory.php
index 3345367..5e8348e 100644
--- a/core/lib/Drupal/Core/Cache/CacheFactory.php
+++ b/core/lib/Drupal/Core/Cache/CacheFactory.php
@@ -78,6 +78,11 @@ public function get($bin) {
     }
     elseif (isset($this->defaultBinBackends[$bin])) {
       $service_name = $this->defaultBinBackends[$bin];
+      // The ChainedFast backend is not a good fit for CLI processes, so don't
+      // use it by default.
+      if (php_sapi_name() === 'cli' && $service_name == 'cache.backend.chainedfast') {
+        $service_name = 'cache.backend.database';
+      }
     }
     else {
       $service_name = 'cache.backend.database';

Status: Needs review » Needs work

The last submitted patch, 113: 2248767-113.patch, failed testing.

effulgentsia’s picture

FileSize
17.83 KB
1.88 KB

As an alternative to #113, how about this instead? Interdiff is relative to #87/#99.

effulgentsia’s picture

Status: Needs work » Needs review

Status: Needs review » Needs work

The last submitted patch, 115: 2248767-115.patch, failed testing.

effulgentsia’s picture

Status: Needs work » Needs review
FileSize
17.83 KB
571 bytes

Typo fix.

Status: Needs review » Needs work

The last submitted patch, 118: 2248767-118.patch, failed testing.

alexpott’s picture

Created #2326203: Config's cached storage should only use one bin to deal with the additional cache bins being created by CachedStorage.

effulgentsia’s picture

Status: Needs work » Postponed

I think #120 will help us resolve the last test failures here (because that issue makes CachedStorage use the actual cache.config service, rather than some anonymous object that isn't accessible for WebTestBase to reset), so postponing this on that.

sun’s picture

Status: Postponed » Needs work
effulgentsia’s picture

Assigned: Unassigned » effulgentsia
Status: Needs work » Needs review
FileSize
17.79 KB

For starters, just a reroll, but assigning to myself because I want to make a couple other changes as a result of #122.

effulgentsia’s picture

Assigned: effulgentsia » Unassigned
FileSize
15.04 KB
2.82 KB

Because of #122, we no longer generate a new cache table for each collection, which makes the LanguageConfigFactoryOverride changes no longer strictly necessary for this issue, so I pulled it out of here. It's still in #2312189: Harmonize configuration collection names and language codes, where we can do it completely.

Also:

if i add a sleep() or reduce the ttl to 0.15 it passes.

Not happy with failures popping up when testbot or our tests become faster, so opened #2327013: Add a TTL to ChainedFastBackend to prevent long running requests from using data that is too stale and removed those changes from here, to verify that the reset() in WebTestBase is working in every case that it needs to.

Status: Needs review » Needs work

The last submitted patch, 124: 2248767-124.patch, failed testing.

effulgentsia’s picture

Excellent! The 1 failure in #124 was being masked in #123 with the TTL, but is now uncovered. It's a strange one though: it happens right after $this->moduleHandler->uninstall(array('config_test')) which ends with a drupal_flush_all_caches(), but somehow, right after that, we still have a ChainedFastBackend object hanging around with an outdated $lastWriteTimestamp, despite deleteAll() having been called on all of the bin services. I don't know, but one guess is maybe a result of a container rebuild and some split-brain between an old and new cache.config service? It can be fixed by making ChainedFastBackend::deleteAll() invoke deleteAll() on the fast backend, but that would just be another mask.

effulgentsia’s picture

Status: Needs work » Needs review
FileSize
15.25 KB
403 bytes

It can be fixed by making ChainedFastBackend::deleteAll() invoke deleteAll() on the fast backend, but that would just be another mask.

On second thought, here's that fix. While it shouldn't be necessary, because the $lastWriteTimestamp should discard later gets, I also don't think it's a bad idea to delete things from the fast backend immediately.

effulgentsia’s picture

FileSize
20.77 KB
6 KB

Found the source of the split-brain (the test itself was holding on to an old container), so fixing it that way. We can have a followup to discuss whether to make ChainedFastBackend propagate deletes to the fast backend immediately or not; that's now out of scope for this issue.

Wim Leers’s picture

Status: Needs review » Needs work
Issue tags: +Needs change record

Great clean-up/split in #124, and great detective work in #126/#127/#128!


Very close to RTBC; I only found nitpicks. But this will also need a change record.

  1. +++ b/core/lib/Drupal/Core/Cache/CacheFactory.php
    @@ -27,13 +27,30 @@ class CacheFactory implements CacheFactoryInterface,  ContainerAwareInterface {
    +   * to fallback to the global default of 'cache.backend.database'.
    

    s/fallback/fall back/

  2. +++ b/core/lib/Drupal/Core/Cache/ChainedFastBackend.php
    @@ -68,7 +70,7 @@ class ChainedFastBackend implements CacheBackendInterface {
    -   * @var int
    +   * @var float
    

    With the changes in #124, this is now an int again.

  3. +++ b/core/lib/Drupal/Core/Cache/ChainedFastBackendFactory.php
    @@ -6,40 +6,80 @@
    +   * @param \Drupal\Core\Site\Settings $settings
    +   *   The settings array.
    

    s/array/object/

effulgentsia’s picture

Status: Needs work » Needs review
FileSize
20.77 KB
1.15 KB

Fixed #129.1 and #129.3.

Re #129.2: nope, it's a float (see markAsOutdated()). HEAD's doc is wrong.

As far as change record, probably makes sense to document the new service tag.

effulgentsia’s picture

FileSize
20.9 KB
3.86 KB

In #2231595-96: Add a cache backend that checks an inconsistent cache, then falls back to a consistent cache backend, catch said he considered a change record optional. However, I'm fine with writing one, if now that we're turning it on by default makes it more worthwhile to inform people about.

But, if I'm going to write it, would it be okay to fix the $settings (per the attached patch) thing that's been bugging me since #2231595-67: Add a cache backend that checks an inconsistent cache, then falls back to a consistent cache backend? Because I don't want to add those settings that I disagree with to the change record :)

effulgentsia’s picture

Issue tags: -Needs change record

Added draft change record: https://www.drupal.org/node/2327507.

Wim Leers’s picture

Wonderful! The changes in #131 make a lot of sense; they make it significantly easier to understand. And one of the best change records I've seen yet!

I would love to RTBC, but #131 has been stuck on testbot for 19 hours and 40 minutes :(

Status: Needs review » Needs work

The last submitted patch, 131: 2248767-131.patch, failed testing.

Wim Leers queued 131: 2248767-131.patch for re-testing.

Wim Leers’s picture

Status: Needs work » Reviewed & tested by the community

#131 was canceled on testbot at my request (since it was stuck for >20 hours). Now re-tested, came back green. RTBC per #133.

webchick’s picture

Assigned: Unassigned » catch

This one seems catch-ish.

catch’s picture

Status: Reviewed & tested by the community » Fixed

This looks great to me now, OK with #2327013: Add a TTL to ChainedFastBackend to prevent long running requests from using data that is too stale being in a follow-up.

Committed/pushed to 8.0.x, thanks!

  • catch committed c1f444b on 8.0.x
    Issue #2248767 by effulgentsia, beejeebus, alexpott: Use fast, local...

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.