Problem/Motivation

There was a report on reddit about a significant drop in page requests per second on 11.1 It seems to be specific to cached pages. Initial investigation pointed to a combination of #2909185: Replace ContainerAwareEventDispatcher with Symfony EventDispatcher and a conversion to OOP hooks #3442009: OOP hooks using attributes and event dispatcher. However, cached pages will be addressed by #3538294: Regression: All stack middlewares are constructed at the same time even for cached pages in almost all cases when there is no event dispatcher dependent middleware firing before page cache. On the other hand, profiling the admin permissions page has shown reverting to the in house written event dispatcher results in a 10% speedup.

Steps to reproduce

Work in progress:
Using ddev
ddev get fullfatthings/ddev-spx
ddev restart
Checkout 11.x
ddev drush si
Create article content
Enable caching
ddev spx on
Visit https://drupal.ddev.site/?SPX_UI_URI=/
Check enable for current browser session
Add user function count
Refresh logged out version of node/1 twice
Refresh SPX UI URI page
select request

ddev spx off
ab -c1 -n500 https://drupal.ddev.site/node/1
ab -c50 -n5000 https://drupal.ddev.site/node/1
ab -c100 -n50000 https://drupal.ddev.site/node/1

Checkout branch remove event from hooks
Checkout branch reverted symfony event dispatcher

Use xhprof or https://github.com/fullfatthings/ddev-spx
Profile current 11.x
resolveServicesAndParameters > 1100 calls
1:500 Requests per second: 15.98 [#/sec] (mean)
50:5000 Requests per second: 409.91 [#/sec] (mean)
100:50000 Requests per second: 366.23 [#/sec] (mean)

permissions page (refreshed once) wall time 478ms 9.01Mb 326k total calls

Profile 11.x with hook event dispatcher removed in #3506930: Separate hooks from events.
resolveServicesAndParameters > 274
1:500 Requests per second: 17.32 [#/sec] (mean)
50:5000 Requests per second: 488.49 [#/sec] (mean)
100:50000 Requests per second: 358.26 [#/sec] (mean)

permissions page (refreshed once) wall time 460ms 8.46Mb 315k total calls

Profile 11.x with event dispatcher issue reverted
resolveServicesAndParameters > 31
1:500 Requests per second: 18.06 [#/sec] (mean)
50:5000 Requests per second: 546.62 [#/sec] (mean)
100:50000 Requests per second: 393.44 [#/sec] (mean)

permissions page (refreshed once) wall time 422ms 8.59Mb 325k total calls

Proposed resolution

Reintroduce the container aware dispatcher but with a much better tie in: instead of maintaining a service pass, use RegisterEventSubscribersPass to convert the method calls added by Symfony RegisterListenersPass. Also compared to the version which got removed last year, the test was a copy of a Symfony 3.x test, upgrade it to a copy of the Symfony 7.3 test.

Remaining tasks

User interface changes

N/A

Introduced terminology

API changes

Data model changes

Release notes snippet

Issue fork drupal-3538212

Command icon Show commands

Start within a Git clone of the project using the version control instructions.

Or, if you do not have SSH keys set up on git.drupalcode.org:

Comments

nicxvan created an issue. See original summary.

nicxvan’s picture

Issue summary: View changes
nicxvan’s picture

Issue summary: View changes
nicxvan’s picture

Issue summary: View changes

Ok I got spx running, I think we might need to determine what exactly to benchmark and how to report it.

For now I'm confirming some information from the reddit thread, for example resolveServicesAndParameters is called 1100 times on 11.x with the steps in the IS.

nicxvan’s picture

Issue summary: View changes
nicxvan’s picture

Took a quick pass at reverting #2909185: Replace ContainerAwareEventDispatcher with Symfony EventDispatcher to try benchmarking and it didn't work, I'll take another look unless someone gets to it before me.

nicxvan’s picture

Issue summary: View changes
catch’s picture

It looks from the reddit thread that the resolveServicesAndParameters calls take about 2-3ms - the problem with this on page cache hits is that a page cache hit only takes 2-3ms at most, so it's doubling the request time.

(edit: I'm seeing a lot more than this locally, more like 10-20ms, but some of that might be xhprof overhead).

Also 11.1 only has core hooks converted, once you have a site with 2-300 modules, all of which have OOP hooks, that 2ms could turn into 5-10ms or more, which will be noticeable on dynamic page cache hits too (dynamic page cache hits are usually more like 60-70ms, so increasing those by 10-20%).

One issue I'm seeing profiling is this is running for discovered procedural non-hooks still - uploading some dump output from having a look around. And because those go from collection to service closure, it looks like it results in two calls each.

catch’s picture

StatusFileSize
new3.53 KB

Here's a quick go at a revert, it also doesn't work. I get "Unable to find the controller for path "/". The route is wrongly configured." after a drush cr.

catch’s picture

StatusFileSize
new104.32 KB

To describe the basic issue:

We add hooks to the event dispatcher via the 'calls' parameter to the event dispatcher, which calls ::addListener(), this happens every time event dispatcher is instantiated, e.g. on every single request.

So even though we don't instantiate the individual hook services, we iterate over every hook implementation in core to create a service closure. This involves many recursive calls to ::resolveServicesAndParameters().

ContainerAwareEventDispatcher on the other hand only creates the service definition lazily - e.g. when a specific event is dispatched.

Even though the original reason we added ContainerAwareEventDispatcher is no longer an issue (e.g. services are not actually constructed for every listener), the sheer number of event listeners in Drupal core now after OOP hooks means that even creating the service closures on every request is causing a measurable performance regression.

catch’s picture

Here's a before/after xhprof screenshot with HEAD vs. #3506930: Separate hooks from events.

xhprof screenshot with 5800 function calls and 53ms

xhprof screenshot with 3700 function calls and 24ms

Note that profiling differs whether you have dev dependencies installed or not, at least partly due to OpenTelemetry SDK, I think this was with dev dependencies installed, but the reduction is about the same even if the totals are different.

So #3506930: Separate hooks from events is removing about 2/5ths of the function calls and about half the wall time.

If #2909185: Replace ContainerAwareEventDispatcher with Symfony EventDispatcher is going to be non-trivial to revert, then we could potentially profile 10.3.x just before/after that commit to see what the impact on cached pages is there. That would give us an idea whether it really is worth reverting or not.

One thing I noticed during debugging that a lot of event listeners we have in core (that aren't hooks) are core event listeners for core events, we could also reduce that list by converting some core events to hooks 😇.

longwave’s picture

nicxvan’s picture

Thank you!

A couple of quick notes in passing.

I got past those errors locally but didn't save them cause it still felt wrong i had to clear cache a dozen times.

I only tested 11.x and the remove events from hooks.
11.1 will be far worse because the skip flag isn't set and preprocess is now picked up.

11.1 might not be able to backport the event extraction for hooks so we might have to backport the event change.

nicxvan’s picture

I've been discussing this in slack with @ghostofdrupalpast since he alerted me too the issue.

#1972300-37: Write a more scalable dispatcher I think shows the dispatcher will be problematic on any sites with many events.

I think this is more evidence that the root cause to tackle is the dispatcher change linked in the issue summary.

Reverting that should resolve the issue.

longwave’s picture

This MR combines the calls to addListener() which removes ~670 method calls locally, but it doesn't really make that much difference.

There is nothing stopping us having multiple event dispatchers; I am wondering if we should put hooks on their own, separate dispatcher.

longwave’s picture

MR!12850 is promising locally. Testing on the Umami homepage, warm cache:

11.x:

Wall Time 11,910 µs
CPU Time 9,726 µs
Memory Usage 3,249,088 bytes
Peak Memory Usage 4,462,840 bytes

670 calls to EventDispatcher::addListener()

MR!12850:

Wall Time 8,083 µs
CPU Time 6,179 µs
Memory Usage 2,082,648 bytes
Peak Memory Usage 2,481,720 bytes

151 calls to EventDispatcher::addListener()

edit: just saw #3506930: Separate hooks from events, is this what that is trying to do, but a different/simpler implementation?

nicxvan’s picture

Separate hooks from events was more about the signature being problematic.

However, I discussed that issue with someone last week who wants to remain anonymous and they convinced me that the signature mismatch is not problem from a practical standpoint.

But for performance reasons shown here it is crucial.

The problem with that issue and the MRs here is they are both just treating the symptom not the cause.

Symfonies event dispatcher just doesn't scale to hundreds of events.

ContainerAwareEventDispatcher on the other hand scales to thousands.

The comment I linked in 15 shows that the changes here or separating hooks from events only kicks the can down the road a bit.

Larger and more complex sites with hundreds of event listeners will still hit a wall.

The issue removing it clearly calls or the performance concerns which were not tested.

I think the solution is a revert.

donquixote’s picture

The page that you are testing, are any hooks called on a warm cache?
And is ModuleHandler instantiated at all in these requests?

If ::addListener() calls are what slows down these requests, maybe we could go even further and have a separate event dispatcher with only those events that are commonly invoked in requests with warm cache?

longwave’s picture

It's the Umami homepage as anonymous, so on a warm cache no hooks should be invoked, hence the drop in calls to addListener. As I understand it from the Reddit thread and the issue summary it's cached pages that are the problem here?

As soon as a hook is invoked I think this will make no difference, because instantiating the hook event dispatcher goes back to being expensive.

longwave’s picture

Status: Active » Needs review

longwave changed the visibility of the branch 3538212-combine-calls to hidden.

longwave changed the visibility of the branch 3538212-investigate-reports-of to hidden.

nicxvan’s picture

Yeah that's why I think switching this in practice doesn't help us.

Either we revert to container aware or we pull hooks out of event dispatcher (or both).

If we don't revert we should go upstream and try to get a fix.

I think the right thing to do is both.

longwave’s picture

The problem is when can we make these changes. The bigger the change the more likely we have to wait for 11.3 but given this is a regression maybe we want a stopgap fix in 11.2?

ghost of drupal past made their first commit to this issue’s fork.

catch’s picture

I'm not too worried about an 11.2 backport, this is a regression for cached pages, but it's a regression from very fast to still quite fast. It'll be barely measurable for requests that don't hit the page cache (partly why we haven't noticed it until now) and it's all CPU rather than i/o. If we end up with something backportable that's great but more interested in the longer term solutions.

Either we revert to container aware or we pull hooks out of event dispatcher (or both).

So I think if we revert to container aware event dispatcher, then we should have almost zero overhead from having hooks in event dispatcher, but we could still move them out for other reasons.

If we move hooks out first, then the regression from not using container aware will be whatever it was in 10.3 when the initial issue to not use it was committed, which seems relatively small but also worth fixing. While I might have predicted the future I also wasn't thinking about cached pages in that comment.

The problem with that issue and the MRs here is they are both just treating the symptom not the cause.

This is true but also I think if we know it doesn't scale to thousands of events and we already have a plan to stop putting thousands of events in it, we should definitely do that.

ghost of drupal past’s picture

There. I reintroduced the container aware dispatcher but with much better tie-in via a new ContainerAwareEventDispatcherDefinition which YamlFileLoader uses. In the original version a new pass was needed instead of Symfony's. No longer. The very definition of the service captures the addMethodCall calls from the Symfony RegisterListenersPass and changes them into the argument structure required by ContainerAwareEventDispatcher. This allows keeping all the functionality of RegisterListenersPass including multiple event dispatchers with very little overhead. Tests could be resurrected of course and/or tests could be written for ContainerAwareEventDispatcherDefinition but since all of core flows through it, I haven't felt the need.

The performance regression was introduced in D10.3, this needs to be backported widely.

nicxvan’s picture

I will try to do some performance testing against both branches.

I'll try to find some more intense pages to test too so it's hitting hooks.

I'll be honest, I think while @longwave's approach is much simpler I think it only helps in one very specific case while @ghostofdrupalpast's seems to be attacking the root cause.

It still needs validation of course.

Also while it's more complex, it is more likely able to be cleanly backported whereas putting hooks in their own event dispatcher can only possibly help 11.1+.

nicxvan’s picture

Issue summary: View changes
Status: Needs review » Active
nicxvan’s picture

Status: Active » Needs review

I didn't mean to delete that file or change status, it was an old tab.

nicxvan’s picture

Issue summary: View changes
nicxvan’s picture

Issue summary: View changes

I also am adding a permissions page view logged in. I cleared cache then refreshed twice for each.

Still kind of preliminary, but I think there is a leader.

longwave’s picture

If we inject the container and read from it at runtime I think this means we will never be able to make event listener services private by default. I am not sure if this is truly important, although @alexpott and I have discussed it as a long term aim in the past.

ghost of drupal past’s picture

Note the changes do not invalidate the performance work, they are cosmetic / build time. But now the code doesn't look like something the cat barfed up. That ladder of ifs was hard to comprehend, this is linear.

znerol’s picture

ghost of drupal past’s picture

That's even better but a performance regression is present since 10.3 for any site with a large number of listeners regardless of cached pages. It's possible this issue needs to be retitled for that and the issue znerol linked is the urgent page cache performance regression which is independent after all.

znerol’s picture

a performance regression is present since 10.3 for any site with a large number of listeners regardless of cached pages.

The performance regression was reported for 11.1 and is clearly measurable on 11.1 on cached responses. Is there any evidence that 10.3 suffered a performance regression on cached pages of the same scale?

ghost of drupal past’s picture

Not on cached pages. The performance regression exists on every page if you have enough listeners.

znerol’s picture

Not on cached pages. The performance regression exists on every page if you have enough listeners.

Then please submit your PR as a separate issue.

nicxvan’s picture

Then please submit your PR as a separate issue.

I'm not sure what you mean by this, the root cause was the change in 10.3 linked in the issue summary but was only exposed in 11.1 due to the oop conversion adding so many listeners.

Whether this gets backported is a separate discussion, but ghost already submitted a branch that I did some preliminary testing on.

ghost of drupal past’s picture

@znerol yes, it's possible this issue should be retitled to "revert the event dispatcher" and the issue title from here needs to move to yours. Sorry, we didn't realize the event dispatcher shouldn't even be there in the first place. However, fixing middlewares to be constructed on demand only avoids constructing the event dispatcher in core, the moment anyone adds a middleware firing before page cache which has the event dispatcher as an argument will be hit by the issue here. I believe both issues need to be fixed and quite likely backported widely.

needs-review-queue-bot’s picture

Status: Needs review » Needs work
StatusFileSize
new4.07 KB

The Needs Review Queue Bot tested this issue. It fails the Drupal core commit checks. Therefore, this issue status is now "Needs work".

This does not mean that the patch necessarily needs to be re-rolled or the MR rebased. Read the Issue Summary, the issue tags and the latest discussion here to determine what needs to be done.

Consult the Drupal Contributor Guide to find step-by-step guides for working with issues.

catch’s picture

Title: Investigate reports of performance drop in 11.1 » Event dispatcher scaling issues with hundreds of event listeners

Trying a re-title.

ghost of drupal past’s picture

Based on the performance benchmark in the issue summary there is a 10% hit on the admin performance page from the Symfony event dispatcher just with standard without hooks. So I am not sure whether "hundreds" are even needed.

nicxvan’s picture

Priority: Normal » Major
Status: Needs work » Needs review

This is at least major i think.

ghost of drupal past’s picture

Title: Event dispatcher scaling issues with hundreds of event listeners » Reintroduce the container aware dispatcher for a 10% speedup
Issue summary: View changes

ghost of drupal past changed the visibility of the branch 3538212-separate-dispatcher to hidden.

ghost of drupal past’s picture

Issue summary: View changes
godotislate’s picture

A few small comments, mostly about typing.

needs-review-queue-bot’s picture

Status: Needs review » Needs work
StatusFileSize
new4.83 KB

The Needs Review Queue Bot tested this issue. It fails the Drupal core commit checks. Therefore, this issue status is now "Needs work".

This does not mean that the patch necessarily needs to be re-rolled or the MR rebased. Read the Issue Summary, the issue tags and the latest discussion here to determine what needs to be done.

Consult the Drupal Contributor Guide to find step-by-step guides for working with issues.

nicxvan’s picture

Status: Needs work » Needs review
ghost of drupal past’s picture

Issue summary: View changes
ghost of drupal past’s picture

Issue summary: View changes
needs-review-queue-bot’s picture

Status: Needs review » Needs work
StatusFileSize
new91 bytes

The Needs Review Queue Bot tested this issue. It no longer applies to Drupal core. Therefore, this issue status is now "Needs work".

This does not mean that the patch necessarily needs to be re-rolled or the MR rebased. Read the Issue Summary, the issue tags and the latest discussion here to determine what needs to be done.

Consult the Drupal Contributor Guide to find step-by-step guides for working with issues.

nicxvan’s picture

Status: Needs work » Needs review
Issue tags: +no-needs-review-bot

I think the bot is confused.

nicxvan changed the visibility of the branch 11.x to hidden.

nicxvan’s picture

Rebased this, I'll give it another review in a bit

nicxvan’s picture

Status: Needs review » Needs work

There are a couple of comments on the MR.

nicxvan’s picture

berdir’s picture

I can't imagine still seeing a regression now between #3538294: Regression: All stack middlewares are constructed at the same time even for cached pages and #3506930: Separate hooks from events. If anything, thanks to the first issue on page cache scenario, it should actually be considerably faster than 11.2. You should also no longer see a difference in this scenario between 11.x and this MR because event dispatcher is no longer initialized. You would need a different scenario, where event subscriber is invoked, for example a dynamic page cache hit, with a considerable amount of regular events. So I have my doubts this is still a release priority.

catch’s picture

Yeah I think we can untag it.

nicxvan’s picture

I think that makes sense.

This could do with a fresh benchmark at some point though.

berdir’s picture

Some early numbers from our project, comparing 11.2 against 11.x, as well as #3551605: Injecting ConfigFactory into BanIpManager is expensive, use a closure as that's necessary to get the full benefit here.

It's quite a difference. As usual, times vary a lot per request, but here's what I'm seeing:

Time: -6.71 ms (-53%)
12.8 ms → 6.05 ms

I/O Wait: -159 µs (-12%)
1.31 ms → 1.15 ms

Peak Memory: -7.51 MB (-76%)
9.84 MB → 2.34 MB

Network-68 kB (-35%)
194 kB → 126 kB

This is with redis, but with ban enabled, so it opens the database question and runs one query

This was on ddev, which unfortunately doesn't create php.access.log files by default which contain time and memory usage, so I'll see if I can either enable this or run it somewhere where I can that information without that.

It was also withou_ optimized autoloader. Using that I can push the absolute times even lower

andypost’s picture

Looking at numbers it looks bad idea to reintroduce and maintain custom dispatcher

ghost of drupal past’s picture

@andypost @berdir do I misunderstand #66 because it seems the speedup is substantial?

catch’s picture

I think #66 is 11.2.x vs 11.x, not vs this issue.

Version: 11.x-dev » main

Drupal core is now using the main branch as the primary development branch. New developments and disruptive changes should now be targeted to the main branch.

Read more in the announcement.

nicxvan’s picture

Just trying to revive this again, I suspect the 10% number isn't as dramatic now that we don't use EventDispatcher for hooks, but this is probably still worth doing.

nicxvan’s picture

Duplicate

andypost’s picture

10% is viable so ++

longwave’s picture

We should re-measure the change here after #3583505: Use Symfony PhpDumper instead of a serialized array container structure as that will likely have an impact, and I have no idea in which direction it will be.