Problem/Motivation
Seeing on a number of live production sites that \Drupal\Core\Access\RouteProcessorCsrf::renderPlaceholderCsrfToken is a major bottleneck.
New relic screenshot

Proposed resolution
Profile
Wondering if this is made worse by admin_toolbar (https://www.drupal.org/project/admin_toolbar)
Remaining tasks
Decide what we can do
User interface changes
API changes
Data model changes
| Comment | File | Size | Author |
|---|---|---|---|
| #20 | slow_website_3.png | 48.38 KB | handkerchief |
| #19 | slow_website_2.png | 32.52 KB | handkerchief |
| #17 | slow_website.png | 37.17 KB | handkerchief |
| Screenshot 2016-07-01 14.26.29.png | 44.67 KB | larowlan |
Comments
Comment #2
wim leersHm.
Why is it very slow? It calls the CSRF generator service. That's it.
The only way we can make this faster, is by making the CSRF generator service faster AFAICT. Or, the dependencies of that service of course.
It's absolutely necessary to generate those CSRF tokens. You need them. There are only two ways you can avoid that cost:
P.S.: I'm surprised this shows up as such a big cost. Is it because the rest of the site is so very fast? Or is it because the (dependencies of) the CSRF token generator service is so slow?
Comment #3
larowlanShould point out this is happening on two distinct production sites.
Both are running admin_toolbar.
I'll try and debug some more.
Comment #4
wim leersCool, thanks :)
Comment #6
bmarti44 commentedI'm seeing this on our large site as well, I'd love to know if you ever found out what the cause was?
Comment #8
blanca.esqueda commented+1 Having the exact same issue.
Comment #9
jazzslider commentedI'm experiencing this as well. In my case, the New Relic transaction trace showed an 11.3s transaction running through this processor. That time apparently breaks down as follows:
* Drupal\Core\Extension\ExtensionDiscovery::scanDirectory was called twice, totalling 3.4s
* Drupal\user\PermissionHandler::sortPermissions was called 111 times, totalling 1.1s.
* Drupal\Core\Extension\Discovery\RecursiveExtensionFilterIterator::getChildren was called 52 times, totalling 0.8s.
* And of course a bunch more things happened that took less time individually, but added up to quite a bit.
This was on the /admin/modules page, so it makes sense that ExtensionDiscovery was involved. I have other example transactions that show a completely different set of slowest functions, so …results unclear.
Comment #12
chris.smith commentedI'd like to confirm that renderPlaceholderCsrfToken can be a bottleneck in the context of an intranet.
When an authenticated user loads a page within an intranet, the CSRF generator service produces unique sha256 hashes for all routes on the page. Secure hashes are designed to be slow; the faster you can calculate a hash, the more viable it is to use brute force. The renderPlaceholderCsrfToken method has a high throughput and is slow by design; the perfect bottleneck.
Comment #13
dawehner@chris.smith
It would be interesting to see the result when you change it against a faster hash alternative. How much does it improve your entire intranet?
Comment #14
chris.smith commented@dawehnerff
Our initial thoughts were to remove the encryption rather than replace it. We generated a token, but unencrypted. We saw the performance improve; with 200 concurrent authenticated users, the renderPlaceholderCsrfToken ran nearly 50% faster and we saw significantly less CPU usage.
The bottleneck was not entirely removed until we disabled the RouteProcessorCsrf::processOutbound method. This method is responsible for adding the token to the query string. Once disabled, we had completely removed the CsrfToken process and performance gains were significant.
With CSRF, we could scale our intranet to 250 concurrent authenticated users. With the CSRF disabled and all else remaining the same, we can now scale our intranet to 500 concurrent users. Average response times at 250 concurrent users has dropped from 2.5 seconds to 0.5 seconds.
Comment #15
dawehnerI always understood CSRF token protection of being primarily secure by having an unknown secret, not by using a particular secure hash function. Maybe we could use sha256 or something faster. It would be great if someone from the security team could give a response.
Comment #16
chris.smith commented@dawehner
Agreed. Advice from the security team would be very helpful. Also, the computeToken function does use sha256 encryption.
Comment #17
handkerchiefWe have the exact same problem.
This website is in development mode. So the cache for templates etc. are disabled. admin_toolbar is also installed.
Comment #18
handkerchiefWe have checked this problem on multiple instances. We can confirm that this problem is on every website, wether caching is switched off or on.
On that particular website, the page loading is normal, if the page (node) is already in the cache. If not, the page load increases until half or one minute.
Drupal Core 8.4.4
Comment #19
handkerchiefWhat the hell can that be? It happen also as anonymous user without admin_toolbar access.
Comment #20
handkerchiefother debug infos:
https://www.drupal.org/project/drupal/issues/2941542
Comment #21
handkerchiefIn my case, a misconfiguration in redis was the problem. In fact the TTL setting. So it's all about caching.
Comment #22
dawehner@handkerchief
Do you mind sharing a few more details about this?
Comment #23
handkerchiefOf course, what exactly do you want to know?
The cause is found but not fixed yet. Because of https://www.drupal.org/project/redis/issues/2944938
I think with that setting, my problem won't occur.
Ps: The correct statistics of the problem can be found from this comment onwards: https://www.drupal.org/project/drupal/issues/2941542#comment-12462942
Comment #24
berdirI'm pretty sure these new relic reports are bogus because new relic is supposed to report requests, not single functions. I think it just incorrectly reports some routes as that function, but that makes no sense.
If you see this, try to install https://www.drupal.org/project/new_relic_rpm, that will make sure that requests are reported as the route name and you will be able to see what it actually is.
> When an authenticated user loads a page within an intranet, the CSRF generator service produces unique sha256 hashes for all routes on the page.
Why would it do that? It should only generate a token for routes that specifically need a CSRF token, only a few do that. I guess there are cases when there can be a lot, like a long list of flag links or so, but that's still far from "all routes".
I'd like to see some actual profiling here that confirms that it is a performance issue, this never showed up for my in profiling.
Comment #25
Anonymous (not verified) commentedSame problem on live website, very slow only in the production environment, identified with a New Relic Monitoring.
Solved by the install of php-apcu (PHP 7.1), which seems necessary for Drupal 8.
Comment #33
larowlanThanks for reporting this issue. We rely on issue reports like this one to resolve bugs and improve Drupal core.
As part of the Bug Smash Initiative, we are triaging issues that are marked "Postponed (maintainer needs more info)". This issue was marked "Postponed (maintainer needs more info)" 4 years ago
However since then we identified the issue was #2949457: Enhance Toolbar's subtree caching so that menu links with CSRF token do not need one subtree cache item per session in combination with admin toolbar module.
As a result this can be closed as a duplicate