Problem/Motivation

We run a performance test that uses the standard install profile. The test is this:

  • Logs in as user 1
  • Visits node/add/article
  • Sets a title
  • Press save
  • Asserts 'has been created' appears on the page.

The test runs against the same database so the node table grows over time.

Here's a graph where you can see just how much of an outlier D9 is... compared to other branches. I've already tested reverting #2986452: Database reserved keywords need to be quoted as per the ANSI standard as this is an extremely low level change that's in D9 and not in D8. But unfortunately it doesn't make a difference.

Graph of Drupal performance by version

Note that the tests are being run using nightwatch so the regression could well be on the frontend.

Proposed resolution

Investigate. Profile 8.9.x using blackfire.

Remaining tasks

User interface changes

API changes

Data model changes

Release notes snippet

Comments

alexpott created an issue. See original summary.

alexpott’s picture

Issue summary: View changes
StatusFileSize
new462.19 KB
berdir’s picture

Hm, my behat tests are running in ~78min, the D9 branch is 82m, so a little bit slower, not quite sure how high the variation between test runs is.

neclimdul’s picture

Yikes... Also that graph is awesome! Is this in a framework we could make public like https://arewefastyet.com/ ?

alexpott’s picture

Version: 9.0.x-dev » 8.9.x-dev

So this performance regression also affects Drupal 8.9.x as well. Here's why - it looks like we have to rebuild way way more on the node get after a node save. See https://blackfire.io/profiles/compare/f2835fb1-234f-42a8-bdf5-8c6ee39d30...() for a comparison.

Whoops we're not comparing apples to apples in the Blackfire comparison. I forgot to remove my dev settings from settings.php for the 8.9.x stats so the comparison is all wrong. I'll rerun the 8.9.x profiling later as the performance regression is real but I need to profile it properly. Sorry.

@neclimdul I'm going to try to work out a way to make that performance graph public.

alexpott’s picture

I thought it might be #2145751: Introduce ENTITY_TYPE_list:BUNDLE cache tag and add it to single bundle listing - but changing \Drupal\Core\Entity\EntityBase::getListCacheTagsToInvalidate() to return an empty array makes no difference :(

alexpott’s picture

Title: Investigate performance regression in Drupal 9.x » Investigate performance regression in Drupal 8.9.x and 9.x
alexpott’s picture

Issue summary: View changes
alexpott’s picture

Issue summary: View changes

So here's a better 8.9.x profile:

alexpott’s picture

StatusFileSize
new33.52 KB
new35.29 KB

@lauriii asked me to profile the GET node/add/article on stark. Here's a comparison -https://blackfire.io/profiles/compare/c65a73e6-2230-4bc8-90d7-8d6a3e4ee2... - I'm concerned about my test because of the change in the number of queries. I don;t really understand that. But the important thing is that the additional calls to \Drupal\Core\Asset\LibraryDiscovery::getLibraryByName() in 8.9.x occur on stark too. Here's two files will all calls on 8.8.5 and 8.9.x.

alexpott’s picture

I got the files in #10 by adding file_put_contents('/tmp/8.9.x.txt', "$extension $name\n", FILE_APPEND | LOCK_EX); to the \Drupal\Core\Asset\LibraryDiscovery::getLibraryByName() method.

catch’s picture

StatusFileSize
new39.88 KB
new44.03 KB

I tried to reproduce #10 - using the standard profile, user/2 (with admin role), and a warm cache, Seven theme.

I get the following:

wc -l library88x.txt 
2135 library88x.txt

wc -l library89x.txt 
2349 library89x.txt

So an extra ~200 calls.

catch’s picture

Manually reviewing the 8.8 and 8.9 diffs, the datepicker library doesn't appear in the 8.8 dump, and it does in 8.9 - however, I then debugged this locally and datepicker showed up for both. So not convinced that's an accurate list.

alexpott’s picture

So @lauriii and I have tracked the bulk of the performance regression to #2914110: Shortcut hook_toolbar implementation makes all pages uncacheable see:

So something that is supposed to have made performance better has resulted in the specific test case of making performance worse.

As the patch reverts cleanly we could choose to revert from Drupal 8.9.x and Drupal 9.x.x and re-open the issue with this information. Or we could do more investigation to find out why the patches approach has caused this.

catch’s picture

So for every no-js bigpipe request, we re-do all the logic in system_js_settings_alter() - this seems to be responsible for the additional function calls between 8.8.x and 8.9.x because #2914110: Shortcut hook_toolbar implementation makes all pages uncacheable is an extra bigpipe placeholder.

The same request without bigpipe enabled has about 300 calls to \Drupal\Core\Asset\LibraryDiscovery::getLibraryByName() so bigpipe adds nearly 2000 calls to that method.

This suggests there is likely somewhere we could static cache - possibly even in system_js_settings_alter(), although it will likely require hashing information from the Assets class to get a decent cache key if we do so there.

Version: 8.9.x-dev » 9.1.x-dev

Drupal 8.9.0-beta1 was released on March 20, 2020. 8.9.x is the final, long-term support (LTS) minor release of Drupal 8, which means new developments and disruptive changes should now be targeted against the 9.1.x-dev branch. For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

catch’s picture

So for every no-js bigpipe request,

Just checked this following a comment from @Berdir in slack. Comment links use a placeholder, but looks like there's no extra calls to system_js_settings_alter() because no assets are added to the placeholder. So it's only placeholders with libraries that are triggering this specific extra work.

berdir’s picture

I'm kinda confused about this. The shortcut lazy builder doesn't have assets either, they are outside?

I can verify that there are indeed 5 calls to system_js_settings_alter() on node/add/article for me on a D9 installation, but in xhprof, they merely make up 3% of the total time for me. And there are almost 6k calls to getLibraryByName(), but that is only 1.5%.

And these calls don't exist on an article with 10 comments. but as mentioned, the difference isn't the assets, so I looked closer. One thing that confused me is that the backtrace is actually coming from the loop in sendNoJsPlaceholders. But I do have JS enabled, it's the same browser. That led me to \Drupal\big_pipe\Render\Placeholder\BigPipeStrategy::doProcessPlaceholders. Apparently the problem is that some placeholders, like form actions are attributes, so they can't be placeholdered with big pipe. And that causes that split and repeated processing. The two existing ones are the form action and token, and the new one isn't the shortcut area itself, it's the add shortcut link. And that doesn't exist in the frontend theme. This is what actually makes the difference here. And the difference to before is that we actually can cache the page, so we try to placeholder it, or something like that. That part I don't 100% understand because there is no relevant change in shortcut_preprocess_page_title() IMHO, not when it's actually visible. We just improved it to execute less code if the theme setting isn't enabled.

So I think there are two things to look into as ways to improve this:

a) Placeholder the whole add inline link in that function, so it doesn't cause an extra no-js split. But that doesn't fix the actual issue, just avoids this specific regression. But we are already doing things 3x or so, this just adds a 4th.

b) Improve \Drupal\big_pipe\Render\BigPipe::sendNoJsPlaceholders(). Honestly, I'm wondering if that optimization of displaying assets early is really worth it (that's what it is about, if I understood correctly). Maybe we should drop that and just print the parts and then add the complete assets. Because it's possible that this is costing us more than we gain from it. I don't think optimizing any other part of the attachment processing is going to be doing much. Because for me, the whole of sendNoJsPlaceholders() is still only ~6% of the response, so I don't quite see how this could cause such a visible and consistent regression. Maybe that splitting business is actually slowing down processing in the browser as well because some optimization breaks?

Time to summon @WimLeers here I think.

alexpott’s picture

One thing that is super interesting (at least to me) is that this cost is incurred even if you don't have big_pipe installed. I suspect that all administration pages with the shortcuts + toolbar are now more expensive even they have been visited before.

berdir’s picture

Are you sure about that? the loop is definitely in big_pipe that triggers this, I can't imagine this happening when it's not enabled.

alexpott’s picture

Issue summary: View changes
StatusFileSize
new407.73 KB

I'm very sure about it. Here's a graph of running the test on 8.8.5. The following scenarios are profiled

  • 8.8.5
  • 8.8.5 without big_pipe installed
  • 8.8.5 + the patch from #2914110
  • 8.8.5 + the patch from #2914110 without big_pipe installed.

I still don't understand this though.

alexpott’s picture

So here's what I'm seeing locally.... comparing:
8.8.5 standard install without automated_cron, update and big_pipe modules
TO
8.8.5 standard install without automated_cron, update and big_pipe modules but with #2914110 applied...

https://blackfire.io/profiles/compare/ed87b977-d349-42b5-bb5e-da0a687c62... - for some reason a tonne more queries!!?!!?

neclimdul’s picture

Those extra queries seems to be tied up in the cache tags logic and the increase in calculateChecksum calls.

CacheTagsChecksumTrait::isValid
-> CacheTagsChecksumTrait::calculateChecksum
-> DatabaseCacheTagsChecksum::getTagInvalidationCounts

getTagInvalidationCounts has an (uncached) query to the cachetags table. We can probably store the results of that query on the service and only calculate it once per call saving quite a few queries :-/

neclimdul’s picture

Taking a second look calculateChecksum looks like it already tries to do this so its not that easy :(

berdir’s picture

The query differences are strange, but ther's only one extra cache tags query, and yeah, the cache tags storage is pretty well optimized. One cache tags query is changed, has fewer arguments and a new one is added, so apparently it results in checking two cache tags in a different query.

But the whole thing is a difference of 0.7%, and the query execution time is just a tiny bit different. As you said, that's a *very* different picture from what you see in the browser tests.

No idea whats going on but something is very strange :)

alexpott’s picture

So here's a comparison of the node GET following the POST - https://blackfire.io/profiles/compare/af5b69f8-1367-410b-bd29-5a6e4ca735... the thing that's interesting is the two new inserts into the dynamic page cache.

berdir’s picture

Which was the main point of the shortcut issue, to enable dynamic page cache for users with shortcut access. So that migh be a bit slower but follow-up request should be *way* faster then. And the two inserts are probably due to cache redirects.

wim leers’s picture

StatusFileSize
new1007.44 KB

(Sorry for taking more than a week to respond here 😅)

#14: woah!


#15:

The same request without bigpipe enabled has about 300 calls to \Drupal\Core\Asset\LibraryDiscovery::getLibraryByName() so bigpipe adds nearly 2000 calls to that method.

😯

This suggests there is likely somewhere we could static cache

Indeed, that's also what I thought immediately. But we'll need to carefully investigate whether this is truly correct in all scenarios.


#18:

b) Improve \Drupal\big_pipe\Render\BigPipe::sendNoJsPlaceholders(). Honestly, I'm wondering if that optimization of displaying assets early is really worth it (that's what it is about, if I understood correctly).

That's not what it's about. It's about assets associated with that particular placeholder's markup (i.e. CSS/JS that that markup needs). Those assets can only be known by rendering the render array into markup, and so they simply will not exist on the main page (unless something else on the page happens to require the same assets of course, but we cannot assume that).

Maybe that splitting business is actually slowing down processing in the browser as well because some optimization breaks?

If by "splitting" you mean sending chunks of HTML to the browser: all browsers are optimized to process chunks of HTML as they arrive in TCP/IP packets, because A) high latency, B) lossy connections may cause a lot of time to pass between (in-order) packets, and if the browser were to wait until all packets in response to a request for HTML were to arrive, then browsing the web would be noticeably more painful. So this is not the BigPipe doing something edge casey; it's relying on a browser optimization that all browsers have had for decades out of sheer necessity. There is a tiny probability that the browser used in nightwatch testing regressed in this, but if that truly were the case then we'd need to see the test duration go up for all Drupal core versions.


#19:

One thing that is super interesting (at least to me) is that this cost is incurred even if you don't have big_pipe installed.

That is frankly a relief, if true :)

I suspect that all administration pages with the shortcuts + toolbar are now more expensive even they have been visited before.

That feels more plausible.


  • The chart in the issue summary clearly shows that Drupal 8.9.x & 9.0.x have a regression in saving an article.
  • There are correlations/investigations from #2914110: Shortcut hook_toolbar implementation makes all pages uncacheable to cache tags, asset libraries and BigPipe.
  • Cache tags
    • Saving an article causes cache tags to be invalidated. But cache tag invalidation merely causes counters to be incremented. The invalidation itself cannot be causing this. Only the consequences of checking whether a cache item has been invalidated (which happens on every cache read, but with smart static caching) could cause a regression; either in the cache tags invalidation check query or in the rebuilding of caches. As @Berdir points out: that query (and storage) has been very much optimized already, and there's only a 0.7% delta (#25).
    • As a consequence of #2914110: Shortcut hook_toolbar implementation makes all pages uncacheable there are new writes to the Dynamic Page Cache cache bin (#26 + #27).

    This all looks sane.

  • #9 had a hunch about asset libraries, #10 dug deeper. #12 confirmed the problem in #10. If I look at the diff between the 8.8.5 and 8.9 data dumps in #10, I spot some things (for example, tour/tour) that are 100% certainly in the Toolbar (and hence correlated to #2914110: Shortcut hook_toolbar implementation makes all pages uncacheable), but also a lot that are not (for example, ckeditor/drupal.ckeditor.plugins.drupalimagecaption). It almost seems as if the same placeholder is being rendered twice.

    So I dug deeper.

    I'm now confident that @catch's analysis in #15 is accurate. BigPipe builds as much as possible on existing abstraction layers. So it renders placeholders (which correspond to render arrays) into a HTML response, and hence executes the whole HtmlAttachmentsResponseProcessor logic. That computes the final values for the styles, scripts and scripts_bottom placeholders for a particular HTML response. But whatever gets computed for scripts_bottom gets dropped, because:

     *   - So, for every no-JS BigPipe placeholder, we send its associated CSS and
     *     header JS that has not already been sent (the bottom JS is not yet sent,
     *     so we can accumulate all of it and send it together at the end). This
     *     ensures that the markup is rendered as it was originally intended: its
     *     CSS and JS used to be blocking, and it still is. Let's call it an
     *     Embedded HTML response. Each of those Embedded HTML Responses must also
     *     take into account the cumulative AJAX page state of the HTML document and
     *     all preceding Embedded HTML responses.
    

    For (BigPipe) placeholders, usually scripts_bottom contains only drupalSettings which contains ajaxPageState, which explains all those additional ::getLibraryByName() calls.

    And in fact, all that work is unnecessary. Because BigPipe specifically only requests the styles and scripts placeholders to be replaced in the embedded HTML response:

    … but \Drupal\Core\Asset\AssetResolver::getJsAssets((AttachedAssetsInterface $assets, $optimize) always computes both scripts and scripts_bottom. On top of that, it always computes $settings and hence always invokes hook_js_settings_alter().

    If we were to make those two aspects smarter, that would significantly reduce the number of ::getLibraryByName() calls.

    But … like @Berdir pointed out in #18, even though there is a big number of additional function calls, they are cheap function calls and don't explain a ~6% total load time performance regression (in issue summary) or even ~15% performance regression (in #21).


This makes me suspect the regression is not in more asset library lookups/BigPipe's placeholder rendering, but rather in the top-level response rendering logic. #22 seems to point in the direction of more discovery and more config reads.


Looking at the various profiles posted here:

  • #9-first-link: not showing a perf regression — explanation for extra asset library work provided above
  • #9-second-link: 30% performance decrease supposedly, but this is completely attributable to highly volatile file I/O wait times (+40% for only a tiny difference in function calls). Also 34% increase in DB wait times, despite only 2 extra queries — pretty much all queries are slower. It looks like the "after" test here was performed on a system under high load.
  • #9-third-link: 53% performance decrease supposedly, but similar problems: only a single extra class being loaded is causing a 148% I/O wait time increase and 13% increase in DB wait time despite the same query count. Again, system under high load would explain this.
  • #10: link is dead
  • #22: unchanged performance but indeed 8 additional queries. 4 for cache_discovery, 2 for cache_config, 1 for cache_render and 1 for cachetags. The latter two make sense. The first six do not. This is what I'd like to understand better. My only hunch for now is that moving the call to theme_get_setting('third_party_settings.shortcut.module_link') in shortcut_preprocess_page_title() to an earlier place may be causing this.
neclimdul’s picture

StatusFileSize
new10.93 KB
new11.2 KB

Nice!

On this last point, I agree with focusing on those queries. That feels like the smoke from the gun... just haven't figured out which gun... or something. However, the only thing that changed was moving theme_get_setting('third_party_settings.shortcut.module_link') before !\Drupal::request()->attributes->has('exception'). I wouldn't think that attribute is set in this case or any case we're supper concerned about performance so I'm not convinced that would make a difference.

What I was trying to figure out and didn't quite prove or quantify in #23 was that we where breaking an optimization in

CacheTagsChecksumTrait::isValid
-> CacheTagsChecksumTrait::calculateChecksum
-> DatabaseCacheTagsChecksum::getTagInvalidationCounts

because there is an increase in calls to that chain that seems to correlates to the increase in queries. Logically since that's the only place we query the cachetags table. The static caching in calculateChecksums only prevents calls for tags it knows about so there is a worse case situation where we decrease the efficiency of this where previously because a large tag group could "warm" the cache. Improving the local efficiency of that render ends up decreasing the overall caching efficiency by spreading the population over more smaller calls to the table.

To test this I added a line to dump the tags being queried out of the database into a file and went through the steps in the summary. We can see something like that happening but also some other interesting things. Here's a simplified version of the differences for one of the requests and you can see the files in the attachment.

on /node/1 after submit with the shortcut patch reverted we see.

array (
  0 => 'config:shortcut.set.default',
  1 => 'config:system.menu.admin',
  2 => 'config:user.role.administrator',
  3 => 'config:user.role.authenticated',
)
array (
  1 => 'shortcut_values',
)
array (
  1 => 'config:block.block.bartik_local_tasks',
  4 => 'local_task',
)

With the patch we see.

array (
  0 => 'config:system.menu.admin',
)
array (
  1 => 'config:block.block.bartik_local_tasks',
  2 => 'config:user.role.administrator',
  3 => 'config:user.role.authenticated',
  4 => 'local_task',
)
array (
  4 => 'config:block.block.bartik_content',
  11 => 'config:block.block.bartik_page_title',
  15 => 'config:block_list',
)
array (
  0 => 'config:shortcut.set.default',
)

It looks like my gut was sort of wrong because with I think a little bit of luck and the fact that some tags probably consistently group together we see something similar to what I was expecting with the config:user.role.administrator and config:user.role.administrator tags dropping of a lookup but they land on another existing lookup so no net change.

This did expose a noticeable change though with the expected net increase of calls. We loose the shortcut_values lookup and replace it with a config:shortcut.set.default and then we also see what _looks_ like the page title block turning cacheable and making an additional hit to the tag table. That tracks with the intent of the shortcut patch so that's validating but also why does it hurt performance?

I'm not sure but my hunch is that the title block normally renders without database hits or we're swapping a database hit for a render cache lookup. Or maybe both, it previously used already loaded data and now its hitting the render cache which is is why we see another hit to cache_render.

More research to be done but hope that helps.

berdir’s picture

> I'm not sure but my hunch is that the title block normally renders without database hits or we're swapping a database hit for a render cache lookup. Or maybe both, it previously used already loaded data and now its hitting the render cache which is is why we see another hit to cache_render.

Yeah, that's touching a long-standing render cache issue, that some parts are simply not *worth* caching, because loading them from cache is as slow or even slower than just rendering them. That's not the same as being uncacheable. We're still happy to cache it one layer up, in DPC or so, but it's just not worth caching on its own. Related to #2935804: Add cache context exclusion list to RenderCache::set(), that one is about deciding that it's not worth to cache e.g. local tasks because it varies by path and/or by user or something. Could add a Cache::UNWORTHY = -2 or something to flag that.

Also, what we save on the shortcut table vs cache lookups might not be worth much with the database backend, but it's different with redis/memcache.

And still, a handful of extra queries don't explain the kind of regression that we're seeing in those graphs and we also don't see that with profiling. Wim also confirmed in Slack that he agrees that the data that we have so far doesn't explain such a massive regression. It's possible that it makes a bigger difference on that AWS server, especially if the database is remote or so or badly optimized, but still, seems far higher than we can explain so far.

IMHO, it might be worth more to focus on the 8.8 regressions first, like the entity info key value lookups and try to improve that.

Version: 9.1.x-dev » 9.2.x-dev

Drupal 9.1.0-alpha1 will be released the week of October 19, 2020, which means new developments and disruptive changes should now be targeted for the 9.2.x-dev branch. For more information see the Drupal 9 minor version schedule and the Allowed changes during the Drupal 9 release cycle.

Version: 9.2.x-dev » 9.3.x-dev

Drupal 9.2.0-alpha1 will be released the week of May 3, 2021, which means new developments and disruptive changes should now be targeted for the 9.3.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Version: 9.3.x-dev » 9.4.x-dev

Drupal 9.3.0-rc1 was released on November 26, 2021, which means new developments and disruptive changes should now be targeted for the 9.4.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

andypost’s picture

Title: Investigate performance regression in Drupal 8.9.x and 9.x » Investigate performance regression in Drupal 8.9.x and 9.x and 10.x

Looks it's better title now

stephencamilo’s picture

Status: Active » Closed (won't fix)
andypost’s picture

Status: Closed (won't fix) » Active

Version: 9.4.x-dev » 9.5.x-dev

Drupal 9.4.0-alpha1 was released on May 6, 2022, which means new developments and disruptive changes should now be targeted for the 9.5.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Version: 9.5.x-dev » 10.1.x-dev

Drupal 9.5.0-beta2 and Drupal 10.0.0-beta2 were released on September 29, 2022, which means new developments and disruptive changes should now be targeted for the 10.1.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Version: 10.1.x-dev » 11.x-dev

Drupal core is moving towards using a “main” branch. As an interim step, a new 11.x branch has been opened, as Drupal.org infrastructure cannot currently fully support a branch named main. New developments and disruptive changes should now be targeted for the 11.x branch, which currently accepts only minor-version allowed changes. For more information, see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

steinmb’s picture

Issue tags: +Performance

Do we have any updated performance tests that also show number from our current 10.x?

joseph.olstad’s picture

It was mentioned by @alexpott in comment #14
#3131589-14: [meta] Investigate performance regression in Drupal 8.9.x and 9.x and 10.x
that the following issue began the spike in queries and reduced performance
#2914110: Shortcut hook_toolbar implementation makes all pages uncacheable

was there any action done to mitigate this or resolve it?

catch’s picture

@joseph.olstad no if you read the comments following #14, while it was possible to find extra database queries due to extra cache tags/cache gets, those were an expected result of the change that was made (which should have other positive side effects, just not in that specific scenario), and they don't account for the timing regression, which no-one can reproduce outside of the original test case where it was seen. #2935804: Add cache context exclusion list to RenderCache::set() is the issue for fixing those extra cache gets/sets by allowing the render cache to skip certain items without them being uncacheable.

catch’s picture

#3414398: AssetResolver::getJsAssets() and AssetResolver::getCssAssets() can repeatedly try to calculate the same set of assets fixes part of the issue discussed in #28 - optimizes ::getJsAssets() to avoid running the same logic when the same arguments are passed in. This doesn't only save some js settings alter logic but also removes some cache gets for the same thing too.

catch’s picture

#3414398: AssetResolver::getJsAssets() and AssetResolver::getCssAssets() can repeatedly try to calculate the same set of assets is in!

I realised while working on it that we can improve things a bit more there, so opened #3443108: Optimize AssetResolver caching

#3436146: Introduce a list of "frequent cache tags" to reduce lookup query amount is also open now to try to reduce queries from cache tag lookups further.

I think with these, we can probably close this issue since everything has either been documented in separate issues now, or turned out not to be reproducible?

catch’s picture

Title: Investigate performance regression in Drupal 8.9.x and 9.x and 10.x » [meta] Investigate performance regression in Drupal 8.9.x and 9.x and 10.x
Category: Task » Plan
Status: Active » Reviewed & tested by the community

Changing status to reflect the above.

alexpott’s picture

Status: Reviewed & tested by the community » Fixed

I think with these, we can probably close this issue since everything has either been documented in separate issues now, or turned out not to be reproducible?

Yep I agree.

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.