Problem/Motivation

Seeing on a number of live production sites that \Drupal\Core\Access\RouteProcessorCsrf::renderPlaceholderCsrfToken is a major bottleneck.

New relic screenshot

Proposed resolution

Profile
Wondering if this is made worse by admin_toolbar (https://www.drupal.org/project/admin_toolbar)

Remaining tasks

Decide what we can do

User interface changes

API changes

Data model changes

Comments

larowlan created an issue. See original summary.

wim leers’s picture

Hm.

Why is it very slow? It calls the CSRF generator service. That's it.

The only way we can make this faster, is by making the CSRF generator service faster AFAICT. Or, the dependencies of that service of course.

It's absolutely necessary to generate those CSRF tokens. You need them. There are only two ways you can avoid that cost:

  1. start doing per-session caching, which has as a consequence that you'll end up caching many, many variations of the same data, just with a different CSRF token -> different trade-off
  2. avoid rendering links that need CSRF tokens at all :)

P.S.: I'm surprised this shows up as such a big cost. Is it because the rest of the site is so very fast? Or is it because the (dependencies of) the CSRF token generator service is so slow?

larowlan’s picture

Should point out this is happening on two distinct production sites.

Both are running admin_toolbar.

I'll try and debug some more.

wim leers’s picture

Cool, thanks :)

Version: 8.2.x-dev » 8.3.x-dev

Drupal 8.2.0-beta1 was released on August 3, 2016, which means new developments and disruptive changes should now be targeted against the 8.3.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

bmarti44’s picture

I'm seeing this on our large site as well, I'd love to know if you ever found out what the cause was?

Version: 8.3.x-dev » 8.4.x-dev

Drupal 8.3.0-alpha1 will be released the week of January 30, 2017, which means new developments and disruptive changes should now be targeted against the 8.4.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

blanca.esqueda’s picture

+1 Having the exact same issue.

jazzslider’s picture

I'm experiencing this as well. In my case, the New Relic transaction trace showed an 11.3s transaction running through this processor. That time apparently breaks down as follows:

* Drupal\Core\Extension\ExtensionDiscovery::scanDirectory was called twice, totalling 3.4s
* Drupal\user\PermissionHandler::sortPermissions was called 111 times, totalling 1.1s.
* Drupal\Core\Extension\Discovery\RecursiveExtensionFilterIterator::getChildren was called 52 times, totalling 0.8s.
* And of course a bunch more things happened that took less time individually, but added up to quite a bit.

This was on the /admin/modules page, so it makes sense that ExtensionDiscovery was involved. I have other example transactions that show a completely different set of slowest functions, so …results unclear.

Version: 8.4.x-dev » 8.5.x-dev

Drupal 8.4.0-alpha1 will be released the week of July 31, 2017, which means new developments and disruptive changes should now be targeted against the 8.5.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.5.x-dev » 8.6.x-dev

Drupal 8.5.0-alpha1 will be released the week of January 17, 2018, which means new developments and disruptive changes should now be targeted against the 8.6.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

chris.smith’s picture

I'd like to confirm that renderPlaceholderCsrfToken can be a bottleneck in the context of an intranet.

When an authenticated user loads a page within an intranet, the CSRF generator service produces unique sha256 hashes for all routes on the page. Secure hashes are designed to be slow; the faster you can calculate a hash, the more viable it is to use brute force. The renderPlaceholderCsrfToken method has a high throughput and is slow by design; the perfect bottleneck.

dawehner’s picture

@chris.smith
It would be interesting to see the result when you change it against a faster hash alternative. How much does it improve your entire intranet?

chris.smith’s picture

@dawehnerff
Our initial thoughts were to remove the encryption rather than replace it. We generated a token, but unencrypted. We saw the performance improve; with 200 concurrent authenticated users, the renderPlaceholderCsrfToken ran nearly 50% faster and we saw significantly less CPU usage.

The bottleneck was not entirely removed until we disabled the RouteProcessorCsrf::processOutbound method. This method is responsible for adding the token to the query string. Once disabled, we had completely removed the CsrfToken process and performance gains were significant.

With CSRF, we could scale our intranet to 250 concurrent authenticated users. With the CSRF disabled and all else remaining the same, we can now scale our intranet to 500 concurrent users. Average response times at 250 concurrent users has dropped from 2.5 seconds to 0.5 seconds.

dawehner’s picture

Issue tags: +Needs security review

I always understood CSRF token protection of being primarily secure by having an unknown secret, not by using a particular secure hash function. Maybe we could use sha256 or something faster. It would be great if someone from the security team could give a response.

chris.smith’s picture

@dawehner
Agreed. Advice from the security team would be very helpful. Also, the computeToken function does use sha256 encryption.

handkerchief’s picture

StatusFileSize
new37.17 KB

We have the exact same problem.

This website is in development mode. So the cache for templates etc. are disabled. admin_toolbar is also installed.

slow_website

handkerchief’s picture

Priority: Normal » Major

We have checked this problem on multiple instances. We can confirm that this problem is on every website, wether caching is switched off or on.

On that particular website, the page loading is normal, if the page (node) is already in the cache. If not, the page load increases until half or one minute.

Drupal Core 8.4.4

handkerchief’s picture

StatusFileSize
new32.52 KB

What the hell can that be? It happen also as anonymous user without admin_toolbar access.

slow_website

handkerchief’s picture

StatusFileSize
new48.38 KB
handkerchief’s picture

In my case, a misconfiguration in redis was the problem. In fact the TTL setting. So it's all about caching.

dawehner’s picture

@handkerchief
Do you mind sharing a few more details about this?

handkerchief’s picture

Of course, what exactly do you want to know?

The cause is found but not fixed yet. Because of https://www.drupal.org/project/redis/issues/2944938

I think with that setting, my problem won't occur.

Ps: The correct statistics of the problem can be found from this comment onwards: https://www.drupal.org/project/drupal/issues/2941542#comment-12462942

berdir’s picture

Priority: Major » Normal
Status: Active » Postponed (maintainer needs more info)

I'm pretty sure these new relic reports are bogus because new relic is supposed to report requests, not single functions. I think it just incorrectly reports some routes as that function, but that makes no sense.

If you see this, try to install https://www.drupal.org/project/new_relic_rpm, that will make sure that requests are reported as the route name and you will be able to see what it actually is.

> When an authenticated user loads a page within an intranet, the CSRF generator service produces unique sha256 hashes for all routes on the page.

Why would it do that? It should only generate a token for routes that specifically need a CSRF token, only a few do that. I guess there are cases when there can be a lot, like a long list of flag links or so, but that's still far from "all routes".

I'd like to see some actual profiling here that confirms that it is a performance issue, this never showed up for my in profiling.

Anonymous’s picture

Same problem on live website, very slow only in the production environment, identified with a New Relic Monitoring.

Solved by the install of php-apcu (PHP 7.1), which seems necessary for Drupal 8.

Version: 8.6.x-dev » 8.7.x-dev

Drupal 8.6.0-alpha1 will be released the week of July 16, 2018, which means new developments and disruptive changes should now be targeted against the 8.7.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.7.x-dev » 8.8.x-dev

Drupal 8.7.0-alpha1 will be released the week of March 11, 2019, which means new developments and disruptive changes should now be targeted against the 8.8.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.8.x-dev » 8.9.x-dev

Drupal 8.8.0-alpha1 will be released the week of October 14th, 2019, which means new developments and disruptive changes should now be targeted against the 8.9.x-dev branch. (Any changes to 8.9.x will also be committed to 9.0.x in preparation for Drupal 9’s release, but some changes like significant feature additions will be deferred to 9.1.x.). For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

Version: 8.9.x-dev » 9.1.x-dev

Drupal 8.9.0-beta1 was released on March 20, 2020. 8.9.x is the final, long-term support (LTS) minor release of Drupal 8, which means new developments and disruptive changes should now be targeted against the 9.1.x-dev branch. For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

Version: 9.1.x-dev » 9.2.x-dev

Drupal 9.1.0-alpha1 will be released the week of October 19, 2020, which means new developments and disruptive changes should now be targeted for the 9.2.x-dev branch. For more information see the Drupal 9 minor version schedule and the Allowed changes during the Drupal 9 release cycle.

Version: 9.2.x-dev » 9.3.x-dev

Drupal 9.2.0-alpha1 will be released the week of May 3, 2021, which means new developments and disruptive changes should now be targeted for the 9.3.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Version: 9.3.x-dev » 9.4.x-dev

Drupal 9.3.0-rc1 was released on November 26, 2021, which means new developments and disruptive changes should now be targeted for the 9.4.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

larowlan’s picture

Status: Postponed (maintainer needs more info) » Closed (duplicate)
Issue tags: +Bug Smash Initiative

Thanks for reporting this issue. We rely on issue reports like this one to resolve bugs and improve Drupal core.

As part of the Bug Smash Initiative, we are triaging issues that are marked "Postponed (maintainer needs more info)". This issue was marked "Postponed (maintainer needs more info)" 4 years ago

However since then we identified the issue was #2949457: Enhance Toolbar's subtree caching so that menu links with CSRF token do not need one subtree cache item per session in combination with admin toolbar module.

As a result this can be closed as a duplicate