Problem/Motivation
We run a performance test that uses the standard install profile. The test is this:
- Logs in as user 1
- Visits node/add/article
- Sets a title
- Press save
- Asserts 'has been created' appears on the page.
The test runs against the same database so the node table grows over time.
Here's a graph where you can see just how much of an outlier D9 is... compared to other branches. I've already tested reverting #2986452: Database reserved keywords need to be quoted as per the ANSI standard as this is an extremely low level change that's in D9 and not in D8. But unfortunately it doesn't make a difference.

Note that the tests are being run using nightwatch so the regression could well be on the frontend.
Proposed resolution
Investigate. Profile 8.9.x using blackfire.
Remaining tasks
User interface changes
API changes
Data model changes
Release notes snippet
| Comment | File | Size | Author |
|---|---|---|---|
| #29 | clean_cache_tag_queries.txt | 11.2 KB | neclimdul |
| #29 | reverted_cache_tag_queries.txt | 10.93 KB | neclimdul |
| #28 | Screenshot 2020-05-13 at 13.08.08.png | 1007.44 KB | wim leers |
| #21 | Screenshot 2020-05-04 at 16.14.36.png | 407.73 KB | alexpott |
| #14 | Screenshot 2020-04-30 at 17.05.10.png | 211.25 KB | alexpott |
Comments
Comment #2
alexpottComment #3
berdirHm, my behat tests are running in ~78min, the D9 branch is 82m, so a little bit slower, not quite sure how high the variation between test runs is.
Comment #4
neclimdulYikes... Also that graph is awesome! Is this in a framework we could make public like https://arewefastyet.com/ ?
Comment #5
alexpottSo this performance regression also affects Drupal 8.9.x as well. Here's why - it looks like we have to rebuild way way more on the node get after a node save. See https://blackfire.io/profiles/compare/f2835fb1-234f-42a8-bdf5-8c6ee39d30...() for a comparison.Whoops we're not comparing apples to apples in the Blackfire comparison. I forgot to remove my dev settings from settings.php for the 8.9.x stats so the comparison is all wrong. I'll rerun the 8.9.x profiling later as the performance regression is real but I need to profile it properly. Sorry.
@neclimdul I'm going to try to work out a way to make that performance graph public.
Comment #6
alexpottI thought it might be #2145751: Introduce ENTITY_TYPE_list:BUNDLE cache tag and add it to single bundle listing - but changing \Drupal\Core\Entity\EntityBase::getListCacheTagsToInvalidate() to return an empty array makes no difference :(
Comment #7
alexpottComment #8
alexpottComment #9
alexpottSo here's a better 8.9.x profile:
Comment #10
alexpott@lauriii asked me to profile the GET node/add/article on stark. Here's a comparison -https://blackfire.io/profiles/compare/c65a73e6-2230-4bc8-90d7-8d6a3e4ee2... - I'm concerned about my test because of the change in the number of queries. I don;t really understand that. But the important thing is that the additional calls to \Drupal\Core\Asset\LibraryDiscovery::getLibraryByName() in 8.9.x occur on stark too. Here's two files will all calls on 8.8.5 and 8.9.x.
Comment #11
alexpottI got the files in #10 by adding
file_put_contents('/tmp/8.9.x.txt', "$extension $name\n", FILE_APPEND | LOCK_EX);to the \Drupal\Core\Asset\LibraryDiscovery::getLibraryByName() method.Comment #12
catchI tried to reproduce #10 - using the standard profile, user/2 (with admin role), and a warm cache, Seven theme.
I get the following:
So an extra ~200 calls.
Comment #13
catchManually reviewing the 8.8 and 8.9 diffs, the datepicker library doesn't appear in the 8.8 dump, and it does in 8.9 - however, I then debugged this locally and datepicker showed up for both. So not convinced that's an accurate list.
Comment #14
alexpottSo @lauriii and I have tracked the bulk of the performance regression to #2914110: Shortcut hook_toolbar implementation makes all pages uncacheable see:
So something that is supposed to have made performance better has resulted in the specific test case of making performance worse.
As the patch reverts cleanly we could choose to revert from Drupal 8.9.x and Drupal 9.x.x and re-open the issue with this information. Or we could do more investigation to find out why the patches approach has caused this.
Comment #15
catchSo for every no-js bigpipe request, we re-do all the logic in system_js_settings_alter() - this seems to be responsible for the additional function calls between 8.8.x and 8.9.x because #2914110: Shortcut hook_toolbar implementation makes all pages uncacheable is an extra bigpipe placeholder.
The same request without bigpipe enabled has about 300 calls to \Drupal\Core\Asset\LibraryDiscovery::getLibraryByName() so bigpipe adds nearly 2000 calls to that method.
This suggests there is likely somewhere we could static cache - possibly even in system_js_settings_alter(), although it will likely require hashing information from the Assets class to get a decent cache key if we do so there.
Comment #17
catchJust checked this following a comment from @Berdir in slack. Comment links use a placeholder, but looks like there's no extra calls to system_js_settings_alter() because no assets are added to the placeholder. So it's only placeholders with libraries that are triggering this specific extra work.
Comment #18
berdirI'm kinda confused about this. The shortcut lazy builder doesn't have assets either, they are outside?
I can verify that there are indeed 5 calls to system_js_settings_alter() on node/add/article for me on a D9 installation, but in xhprof, they merely make up 3% of the total time for me. And there are almost 6k calls to getLibraryByName(), but that is only 1.5%.
And these calls don't exist on an article with 10 comments. but as mentioned, the difference isn't the assets, so I looked closer. One thing that confused me is that the backtrace is actually coming from the loop in sendNoJsPlaceholders. But I do have JS enabled, it's the same browser. That led me to \Drupal\big_pipe\Render\Placeholder\BigPipeStrategy::doProcessPlaceholders. Apparently the problem is that some placeholders, like form actions are attributes, so they can't be placeholdered with big pipe. And that causes that split and repeated processing. The two existing ones are the form action and token, and the new one isn't the shortcut area itself, it's the add shortcut link. And that doesn't exist in the frontend theme. This is what actually makes the difference here. And the difference to before is that we actually can cache the page, so we try to placeholder it, or something like that. That part I don't 100% understand because there is no relevant change in shortcut_preprocess_page_title() IMHO, not when it's actually visible. We just improved it to execute less code if the theme setting isn't enabled.
So I think there are two things to look into as ways to improve this:
a) Placeholder the whole add inline link in that function, so it doesn't cause an extra no-js split. But that doesn't fix the actual issue, just avoids this specific regression. But we are already doing things 3x or so, this just adds a 4th.
b) Improve \Drupal\big_pipe\Render\BigPipe::sendNoJsPlaceholders(). Honestly, I'm wondering if that optimization of displaying assets early is really worth it (that's what it is about, if I understood correctly). Maybe we should drop that and just print the parts and then add the complete assets. Because it's possible that this is costing us more than we gain from it. I don't think optimizing any other part of the attachment processing is going to be doing much. Because for me, the whole of sendNoJsPlaceholders() is still only ~6% of the response, so I don't quite see how this could cause such a visible and consistent regression. Maybe that splitting business is actually slowing down processing in the browser as well because some optimization breaks?
Time to summon @WimLeers here I think.
Comment #19
alexpottOne thing that is super interesting (at least to me) is that this cost is incurred even if you don't have big_pipe installed. I suspect that all administration pages with the shortcuts + toolbar are now more expensive even they have been visited before.
Comment #20
berdirAre you sure about that? the loop is definitely in big_pipe that triggers this, I can't imagine this happening when it's not enabled.
Comment #21
alexpottI'm very sure about it. Here's a graph of running the test on 8.8.5. The following scenarios are profiled
I still don't understand this though.
Comment #22
alexpottSo here's what I'm seeing locally.... comparing:
8.8.5 standard install without automated_cron, update and big_pipe modules
TO
8.8.5 standard install without automated_cron, update and big_pipe modules but with #2914110 applied...
https://blackfire.io/profiles/compare/ed87b977-d349-42b5-bb5e-da0a687c62... - for some reason a tonne more queries!!?!!?
Comment #23
neclimdulThose extra queries seems to be tied up in the cache tags logic and the increase in calculateChecksum calls.
CacheTagsChecksumTrait::isValid
-> CacheTagsChecksumTrait::calculateChecksum
-> DatabaseCacheTagsChecksum::getTagInvalidationCounts
getTagInvalidationCounts has an (uncached) query to the cachetags table. We can probably store the results of that query on the service and only calculate it once per call saving quite a few queries :-/
Comment #24
neclimdulTaking a second look calculateChecksum looks like it already tries to do this so its not that easy :(
Comment #25
berdirThe query differences are strange, but ther's only one extra cache tags query, and yeah, the cache tags storage is pretty well optimized. One cache tags query is changed, has fewer arguments and a new one is added, so apparently it results in checking two cache tags in a different query.
But the whole thing is a difference of 0.7%, and the query execution time is just a tiny bit different. As you said, that's a *very* different picture from what you see in the browser tests.
No idea whats going on but something is very strange :)
Comment #26
alexpottSo here's a comparison of the node GET following the POST - https://blackfire.io/profiles/compare/af5b69f8-1367-410b-bd29-5a6e4ca735... the thing that's interesting is the two new inserts into the dynamic page cache.
Comment #27
berdirWhich was the main point of the shortcut issue, to enable dynamic page cache for users with shortcut access. So that migh be a bit slower but follow-up request should be *way* faster then. And the two inserts are probably due to cache redirects.
Comment #28
wim leers(Sorry for taking more than a week to respond here 😅)
#14: woah!
#15:
😯
Indeed, that's also what I thought immediately. But we'll need to carefully investigate whether this is truly correct in all scenarios.
#18:
That's not what it's about. It's about assets associated with that particular placeholder's markup (i.e. CSS/JS that that markup needs). Those assets can only be known by rendering the render array into markup, and so they simply will not exist on the main page (unless something else on the page happens to require the same assets of course, but we cannot assume that).
If by "splitting" you mean sending chunks of HTML to the browser: all browsers are optimized to process chunks of HTML as they arrive in TCP/IP packets, because A) high latency, B) lossy connections may cause a lot of time to pass between (in-order) packets, and if the browser were to wait until all packets in response to a request for HTML were to arrive, then browsing the web would be noticeably more painful. So this is not the BigPipe doing something edge casey; it's relying on a browser optimization that all browsers have had for decades out of sheer necessity. There is a tiny probability that the browser used in
nightwatchtesting regressed in this, but if that truly were the case then we'd need to see the test duration go up for all Drupal core versions.#19:
That is frankly a relief, if true :)
That feels more plausible.
This all looks sane.
tour/tour) that are 100% certainly in the Toolbar (and hence correlated to #2914110: Shortcut hook_toolbar implementation makes all pages uncacheable), but also a lot that are not (for example,ckeditor/drupal.ckeditor.plugins.drupalimagecaption). It almost seems as if the same placeholder is being rendered twice.So I dug deeper.
I'm now confident that @catch's analysis in #15 is accurate. BigPipe builds as much as possible on existing abstraction layers. So it renders placeholders (which correspond to render arrays) into a HTML response, and hence executes the whole
HtmlAttachmentsResponseProcessorlogic. That computes the final values for thestyles,scriptsandscripts_bottomplaceholders for a particular HTML response. But whatever gets computed forscripts_bottomgets dropped, because:For (BigPipe) placeholders, usually
scripts_bottomcontains onlydrupalSettingswhich containsajaxPageState, which explains all those additional::getLibraryByName()calls.And in fact, all that work is unnecessary. Because BigPipe specifically only requests the

stylesandscriptsplaceholders to be replaced in the embedded HTML response:… but
\Drupal\Core\Asset\AssetResolver::getJsAssets((AttachedAssetsInterface $assets, $optimize)always computes bothscriptsandscripts_bottom. On top of that, it always computes$settingsand hence always invokeshook_js_settings_alter().If we were to make those two aspects smarter, that would significantly reduce the number of
::getLibraryByName()calls.But … like @Berdir pointed out in #18, even though there is a big number of additional function calls, they are cheap function calls and don't explain a ~6% total load time performance regression (in issue summary) or even ~15% performance regression (in #21).
This makes me suspect the regression is not in more asset library lookups/BigPipe's placeholder rendering, but rather in the top-level response rendering logic. #22 seems to point in the direction of more discovery and more config reads.
Looking at the various profiles posted here:
cache_discovery, 2 forcache_config, 1 forcache_renderand 1 forcachetags. The latter two make sense. The first six do not. This is what I'd like to understand better. My only hunch for now is that moving the call totheme_get_setting('third_party_settings.shortcut.module_link')inshortcut_preprocess_page_title()to an earlier place may be causing this.Comment #29
neclimdulNice!
On this last point, I agree with focusing on those queries. That feels like the smoke from the gun... just haven't figured out which gun... or something. However, the only thing that changed was moving
theme_get_setting('third_party_settings.shortcut.module_link')before!\Drupal::request()->attributes->has('exception'). I wouldn't think that attribute is set in this case or any case we're supper concerned about performance so I'm not convinced that would make a difference.What I was trying to figure out and didn't quite prove or quantify in #23 was that we where breaking an optimization in
CacheTagsChecksumTrait::isValid
-> CacheTagsChecksumTrait::calculateChecksum
-> DatabaseCacheTagsChecksum::getTagInvalidationCounts
because there is an increase in calls to that chain that seems to correlates to the increase in queries. Logically since that's the only place we query the cachetags table. The static caching in calculateChecksums only prevents calls for tags it knows about so there is a worse case situation where we decrease the efficiency of this where previously because a large tag group could "warm" the cache. Improving the local efficiency of that render ends up decreasing the overall caching efficiency by spreading the population over more smaller calls to the table.
To test this I added a line to dump the tags being queried out of the database into a file and went through the steps in the summary. We can see something like that happening but also some other interesting things. Here's a simplified version of the differences for one of the requests and you can see the files in the attachment.
on /node/1 after submit with the shortcut patch reverted we see.
With the patch we see.
It looks like my gut was sort of wrong because with I think a little bit of luck and the fact that some tags probably consistently group together we see something similar to what I was expecting with the
config:user.role.administratorandconfig:user.role.administratortags dropping of a lookup but they land on another existing lookup so no net change.This did expose a noticeable change though with the expected net increase of calls. We loose the
shortcut_valueslookup and replace it with aconfig:shortcut.set.defaultand then we also see what _looks_ like the page title block turning cacheable and making an additional hit to the tag table. That tracks with the intent of the shortcut patch so that's validating but also why does it hurt performance?I'm not sure but my hunch is that the title block normally renders without database hits or we're swapping a database hit for a render cache lookup. Or maybe both, it previously used already loaded data and now its hitting the render cache which is is why we see another hit to cache_render.
More research to be done but hope that helps.
Comment #30
berdir> I'm not sure but my hunch is that the title block normally renders without database hits or we're swapping a database hit for a render cache lookup. Or maybe both, it previously used already loaded data and now its hitting the render cache which is is why we see another hit to cache_render.
Yeah, that's touching a long-standing render cache issue, that some parts are simply not *worth* caching, because loading them from cache is as slow or even slower than just rendering them. That's not the same as being uncacheable. We're still happy to cache it one layer up, in DPC or so, but it's just not worth caching on its own. Related to #2935804: Add cache context exclusion list to RenderCache::set(), that one is about deciding that it's not worth to cache e.g. local tasks because it varies by path and/or by user or something. Could add a Cache::UNWORTHY = -2 or something to flag that.
Also, what we save on the shortcut table vs cache lookups might not be worth much with the database backend, but it's different with redis/memcache.
And still, a handful of extra queries don't explain the kind of regression that we're seeing in those graphs and we also don't see that with profiling. Wim also confirmed in Slack that he agrees that the data that we have so far doesn't explain such a massive regression. It's possible that it makes a bigger difference on that AWS server, especially if the database is remote or so or badly optimized, but still, seems far higher than we can explain so far.
IMHO, it might be worth more to focus on the 8.8 regressions first, like the entity info key value lookups and try to improve that.
Comment #34
andypostLooks it's better title now
Comment #35
stephencamilo commentedComment #36
andypostComment #40
steinmb commentedDo we have any updated performance tests that also show number from our current 10.x?
Comment #41
joseph.olstadIt was mentioned by @alexpott in comment #14
#3131589-14: [meta] Investigate performance regression in Drupal 8.9.x and 9.x and 10.x
that the following issue began the spike in queries and reduced performance
#2914110: Shortcut hook_toolbar implementation makes all pages uncacheable
was there any action done to mitigate this or resolve it?
Comment #42
catch@joseph.olstad no if you read the comments following #14, while it was possible to find extra database queries due to extra cache tags/cache gets, those were an expected result of the change that was made (which should have other positive side effects, just not in that specific scenario), and they don't account for the timing regression, which no-one can reproduce outside of the original test case where it was seen. #2935804: Add cache context exclusion list to RenderCache::set() is the issue for fixing those extra cache gets/sets by allowing the render cache to skip certain items without them being uncacheable.
Comment #43
catch#3414398: AssetResolver::getJsAssets() and AssetResolver::getCssAssets() can repeatedly try to calculate the same set of assets fixes part of the issue discussed in #28 - optimizes ::getJsAssets() to avoid running the same logic when the same arguments are passed in. This doesn't only save some js settings alter logic but also removes some cache gets for the same thing too.
Comment #44
catch#3414398: AssetResolver::getJsAssets() and AssetResolver::getCssAssets() can repeatedly try to calculate the same set of assets is in!
I realised while working on it that we can improve things a bit more there, so opened #3443108: Optimize AssetResolver caching
#3436146: Introduce a list of "frequent cache tags" to reduce lookup query amount is also open now to try to reduce queries from cache tag lookups further.
I think with these, we can probably close this issue since everything has either been documented in separate issues now, or turned out not to be reproducible?
Comment #45
catchChanging status to reflect the above.
Comment #46
alexpottYep I agree.