Optimize cacheability bubbling (Cache::mergeTags(), ::mergeContexts(), BubbleableMetadata) [#2454643]

In D8 HEAD, a significant portion of rendering the page is spent in bubbling cacheability metadata. There is at least some room for optimization still.

For example: we currently validate cache tags and cache contexts every time we bubble. That's done to give good stack traces to developers. But it's not free. We could choose to only do this when "finalizing": when writing to the cache or when generating the X-Drupal-Cache-(Tags|Contexts) headers.

@dawehner had some profiling (can't find it ATM) that showed this quite clearly.

Comment	File	Size	Author
#11	bubbleable_merge_before.png	147.32 KB	berdir
#11	bubbleable_merge_after.png	147.97 KB	berdir
#11	bubbleable_merge_diff.png	226.5 KB	berdir
#11	mergetags_after.png	142.72 KB	berdir
#11	mergetags_diff.png	204.88 KB	berdir
#11	cache-merge-2454643-11.patch	821 bytes	berdir

Comments

Comment #1

berdir

German

Switzerland

commented 19 March 2015 at 00:27

Yes, I can definitely see this too, although I'm not quite sure how much of that is overhead, especially if you look at a patch with a cold render cache, there are hundreds to thousands of those method calls.

+1 to not sorting/validating during merge.

I'm also seeing BubbleableMetadata::createFromRenderArray() and BubbleableMetadata::merge() quite high in the list, but that was done on a a bit older core version, we might have already refactored/optimized some of that.

Comment #2

dawehner

German

commented 26 March 2015 at 12:19

https://www.drupal.org/node/2342045#comment-9707699 contains the profiling where I saw those methods.

Comment #3

wim leers

Ghent 🇧🇪🇪🇺

commented 26 March 2015 at 12:24

So that's #2342045-192: Standard views base fields need to use same rendering as Field UI fields, for formatting, access checking, and translation consistency, relevant data:

HEAD: https://blackfire.io/profiles/6b8b8432-fbf4-4688-8490-b43f30d20b9b/graph
PATCH: https://blackfire.io/profiles/06de03e4-0d7b-4a8b-839a-5dae6207fddc/graph
Comparision: https://blackfire.io/profiles/compare/91160890-7703-4dd5-ab04-3405908b0c...

In the first link: 22 ms (3.22% of total page generation time) spent in Cache::mergeTags().

Comment #4

wim leers

Ghent 🇧🇪🇪🇺

commented 26 March 2015 at 15:15

Comment #5

wim leers

Ghent 🇧🇪🇪🇺

commented 26 March 2015 at 18:14

Brought from #2458349-23: Route's access result's cacheability not applied to the response's cacheability:

19:03:45 WimLeers: dawehner: ping — "I simply think you don't understand me at all." :( — sorry. I'm really trying to understand what you mean. Could you please explain?
19:04:15 dawehner: WimLeers: Cache::mergeTags() is the function signature
19:04:21 dawehner: vs. Cache::mergeTags($a, $b);
19:04:33 dawehner: ... IMHO the merginging of more than 2 is the special case
19:04:39 dawehner: so that could be moved into its own method
19:04:41 WimLeers: aaah
19:04:45 WimLeers: dawehner: yes, fair
19:05:02 WimLeers: dawehner: I'd be fine with that. merging 2 is indeed the >=95% case
19:05:05 WimLeers: (if not 99)

Comment #6

berdir

German

Switzerland

commented 29 March 2015 at 20:07

Hm, so now we also have a service call to validate contexts in mergeContexts().

I'm not sure I understand why we bother with that there? it just means we validate them over and over again. Why not just validate where we actually use them, which is somewhere in Renderer?

Comment #7

dawehner

German

commented 29 March 2015 at 20:21

I think the point was that the error should happen as early as possible, so its easier to find the code causing it.

Comment #8

wim leers

Ghent 🇧🇪🇪🇺

commented 30 March 2015 at 11:18

What #7 says.

Comment #9

berdir

German

Switzerland

commented 30 March 2015 at 12:43

Yes, I can see that, but that only works partially, when you merge with something else, it might also be what has already been set by someone else.

AFAIK you're adding a new API that allows to set those things through the renderer service, maybe we could additionally add it there instead?

The question IMHO is, is the improved DX worth that additional performance cost. I'll try to do some tests of the next days. In recent profiling runs that I did, multiple of those methods where listed high when sorted by exclusive run time. And IIRC, a large amount of the calls were caused by the renderer itself, due to the stacked rendering. Maybe we can introduce a mergeNoValidation() or something version of those and call them there, because it won't really help anyway when an exception is thrown there.

Comment #10

wim leers

Ghent 🇧🇪🇪🇺

commented 30 March 2015 at 12:51

+1 to everything in #9.

#7 + #8 just explain the rationale for why it works the way it does today. Obviously, if that's too costly for performance, we need to re-evaluate that. It's basically DX versus Performance: more of the former or more of the latter, not both. This is why assertions are so interesting for this: we need the DX only while developing, and we want the performance especially when in production. Assertions may thus enable us to have both. See #2454649: Cache Optimization and hardening -- [PP-1] Use assert() instead of exceptions in Cache::merge(Tags|Contexts). To move forward with that, we need a PoC patch and profiling to see if that's viable and if it's worth it.

Comment #11

berdir

German

Switzerland

commented 30 March 2015 at 23:04

Status	File	Size
new	cache-merge-2454643-11.patch	821 bytes
new	mergetags_diff.png	204.88 KB
new	mergetags_after.png	142.72 KB
new	bubbleable_merge_diff.png	226.5 KB
new	bubbleable_merge_after.png	147.97 KB
new	bubbleable_merge_before.png	147.32 KB

Ok, so here are a few numbers. The profiling overhead is definitely high because those methods are called a lot, but still, there's a lot going on.

I've tested this with a disabled render cache to simulate a cold cache. Of course, when that is enabled, then there are a lot less calls. Comparison numbers of that below. Tested on the frontpage of my installation profile, with beta8, so without the cache context validation.

HEAD:
Total wall time 1.6s
90% of that is in doRender(), which is both good and bad.
BubbleableMetadata::merge() is 5.2%, 2.3k calls
Cache::mergeTags() is 3.4%, 6k calls
Cache::mergeContexts is 1%, 2.3k calls
Cache::mergeMaxAges() is 2.1%, 5.9k calls

Those calls are overlapping of course, a large part of the Cache::merge* calls are from merge() (40-50%, most of the other half is AccessResult)

With the attached patch, we save a bit (around 1% of the total wall time). And maybe other improvements are possible too. mergeMaxAges() for example, I think we can make it a bit faster by doing a simple loop instead of fancy array_filter() functions. Again, the numbers come with considerable overhead, but it should still be worth doing some micro-optimization here.

With enabled render cache, I'm down to 600ms on that page. And there are still 700 calls to BubbleableMetadata::merge(), still 5%.

And comparing the same when the profiler is disabled, numbers based on ab -n 50 -c 1

Render cache fully enabled: 140ms
Completely disabled: 700ms
(that's an even bigger difference than I expected, also the result I saw in the browser were a bit higher, 960ms and 200ms, ab also went up to those numbers)

Given those numbers, I'm not sure how much we can actually trust those improvements. I tried to compare it with ab, but for some reason, it behaved very strange.

Would also love to compare that with PHP7 too, but that will probably have to wait until I can update to beta9.

Comment #12

wim leers

Ghent 🇧🇪🇪🇺

commented 31 March 2015 at 08:32

Thanks for kicking this off, Berdir!

I think we can make it a bit faster by doing a simple loop instead of fancy array_filter() functions

catch tried that, didn't notice a difference: #2443073-39: Add #cache[max-age] to disable caching and bubble the max-age.

I think @dawehner made a good point: all of these functions accept N arguments. But typically, we only call them with 2 arguments. If we'd only deal with 2 arguments, we could potentially make more aggressive optimizations.

Comment #13

berdir

German

Switzerland

commented 3 April 2015 at 10:48

Ok, different approach for testing this.

xhprof and xdebug disabled.

use Drupal\Core\Cache\Cache;

$before = microtime(TRUE);
for ($i = 0; $i < 5000; $i++) {
  Cache::mergeTags(['node:1'], ['node_list', 'node:1']);
}

$after = microtime(TRUE);

var_dump(($after - $before) * 1000);

PHP 5.5: 16-17ms
PHP 7: 4-5ms (yeah. Can I haz php7 today plz?)

Without validate + unique + sort:
PHP 5.5: 8ms
PHP 7: 2ms

Comment #14

wim leers

Ghent 🇧🇪🇪🇺

commented 3 April 2015 at 10:56

Nice! :)

4-time speed-up, makes sense. This is all super simple code. It was mostly slow because of PHP overhead. Less overhead = huge speedup!

Comment #15

borisson_

Dutch

Mechelen, 🇧🇪

commented 3 August 2015 at 19:14

Status:

Active

» Closed (duplicate)

This was fixed in #2471232: Optimize Cache::merge*(), by only accepting 2 instead of N arguments.

Optimize cacheability bubbling (Cache::mergeTags(), ::mergeContexts(), BubbleableMetadata)

Comments

Comment #1

Comment #2

Comment #3

Comment #4

Comment #5

Comment #6

Comment #7

Comment #8

Comment #9

Comment #10

Comment #11

Comment #12

Comment #13

Comment #14

Comment #15

Related issues

Referenced by

News items

Our community

Documentation

Drupal code base

Governance of community