All stack middlewares are constructed at the same time even for cached pages [#2473113]

Reference: https://www.drupal.org/core/beta-changes
Issue category	Task
Issue priority	Major
Prioritized changes	The main goal of this issue is page cache performance

Comment	File	Size	Author
#45	ab runs.zip	3.09 KB	Wim Leers
#45	histogram_interleaved.png	5.46 KB	Wim Leers
#45	histogram_facet.png	6.07 KB	Wim Leers
#39	all_stack_middlewares-2473113-39.patch	6.28 KB	znerol

#39	interdiff.txt	2.03 KB	znerol
#34	interdiff.txt	2.43 KB	znerol
#34	all_stack_middlewares-2473113-34.patch	4.58 KB	znerol

#28	interdiff.txt	2.28 KB	znerol
#28	2473113-lazy-middlewares-quick-rabbit-27.diff	4.6 KB	znerol

#27	2473113-lazy-middlewares-quick-rabbit.diff	2.32 KB	znerol

#20	interdiff.txt	1.35 KB	znerol
#20	2473113-lazy-middlewares-20.diff	2.91 KB	znerol

#15	2473113-lazy-middlewares-14.diff	1.56 KB	znerol

#15	interdiff.txt	1.16 KB	znerol
#13	2015-04-17-162239_833x450_scrot.png	83.06 KB	dpovshed
#13	2473113_13.patch	608 bytes	dpovshed

#9	dummycache.tar_.gz	1.02 KB	znerol
#9	xhprof-kit-page-cache.txt	3.74 KB	znerol
#9	xhprof-kit-no-page-cache.txt	3.9 KB	znerol
#9	2473113-lazy-middlewares.diff	907 bytes	znerol

#1	xhprof-callgraph.png	798.63 KB	znerol
#1	xhprof-stacked-kernel.png	230.8 KB	znerol

Comment #1

znerol CreditAttribution: znerol commented 17 April 2015 at 11:06

File	Size
xhprof-stacked-kernel.png	230.8 KB
xhprof-callgraph.png	798.63 KB

According to my profiling we could potentially could shave off 1.5 ms for cached responses if we do not construct low priority middlewares. In order to restrict the profiling to http kernel construction exclusively, I hacked the necessary code directly into the application kernel:

diff --git a/core/lib/Drupal/Core/DrupalKernel.php b/core/lib/Drupal/Core/DrupalKernel.php
index 8592402..83db918 100644
--- a/core/lib/Drupal/Core/DrupalKernel.php
+++ b/core/lib/Drupal/Core/DrupalKernel.php
@@ -576,7 +576,16 @@ public function terminate(Request $request, Response $response) {
    */
   public function handle(Request $request, $type = self::MASTER_REQUEST, $catch = TRUE) {
     $this->boot();
-    return $this->getHttpKernel()->handle($request, $type, $catch);
+    xhprof_enable(XHPROF_FLAGS_CPU + XHPROF_FLAGS_MEMORY);
+    $this->getHttpKernel();
+    $xhprof_data = xhprof_disable();
+    require_once "/home/lo/src/xhprof/xhprof_lib/utils/xhprof_lib.php";
+    require_once "/home/lo/src/xhprof/xhprof_lib/utils/xhprof_runs.php";
+
+    $xhprof_runs = new \XHProfRuns_Default();
+    $run_id = $xhprof_runs->save_run($xhprof_data, "d8");
+    error_log("http://localhost:3001/index.php?run=$run_id&source=d8");
+    return new Response('hi!');
   }
 
   /**

Log in or register to post comments

Comment #2

Fabianx CreditAttribution: Fabianx for Acquia commented 17 April 2015 at 11:25

I think it is worth a try to use:

lazy: true

for the low priority middleware's for now, the proxy should have only 2-3 function calls overhead per middleware, which should be acceptable.

Log in or register to post comments

Comment #3

Wim Leers

Ghent 🇧🇪🇪🇺

CreditAttribution: Wim Leers at Acquia commented 17 April 2015 at 11:28

Priority:	Normal	» Major
Issue tags:		+D8 Accelerate Dev Days

Hrm, you said "1.5 ms for cached responses". Making responses served by the Page Cache 1.5 ms faster would be *huge*.

But when you talked to me in person just now, it seemed you said that it doesn't seem to be worth it. So now I'm confused :)

Raising to major if the 1.5 ms win is correct.

Log in or register to post comments

Comment #4

Wim Leers

Ghent 🇧🇪🇪🇺

CreditAttribution: Wim Leers at Acquia commented 17 April 2015 at 11:30

To clarify: saving 1.5 ms would be a 22% gain. If it's just 1 ms on my machine, then it'd be a 15% gain. Both would be great wins.

Log in or register to post comments

Comment #5

Fabianx CreditAttribution: Fabianx for Acquia commented 17 April 2015 at 11:32

#4: It is especially HUGE if you use a non-DB backend (e.g. pure APC), because you then save the 1.6 ms of PDO construction, too it seems ...

Log in or register to post comments

Comment #6

Wim Leers

Ghent 🇧🇪🇪🇺

CreditAttribution: Wim Leers at Acquia commented 17 April 2015 at 11:33

Correct! That time would be saved even when only the page cache's cache back-end would not be in the DB.

Log in or register to post comments

Comment #7

dawehner

German

CreditAttribution: dawehner commented 17 April 2015 at 12:26

for the low priority middleware's for now, the proxy should have only 2-3 function calls overhead per middleware, which should be acceptable.

Well, for debugability I'd propose to mark the one before page cache not as lazy, as otherwise you just have one more level in all of the call stacks.

Log in or register to post comments

Comment #8

Wim Leers

Ghent 🇧🇪🇪🇺

CreditAttribution: Wim Leers at Acquia commented 17 April 2015 at 12:28

#7: Good point!

Log in or register to post comments

Comment #9

znerol CreditAttribution: znerol commented 17 April 2015 at 13:11

File	Size
2473113-lazy-middlewares.diff	907 bytes

xhprof-kit-no-page-cache.txt	3.9 KB
xhprof-kit-page-cache.txt	3.74 KB
dummycache.tar_.gz	1.02 KB

xhprof-kit-page-cache.txt

drush en page_cache dummycache

=== SUM: 2473113-lazy-middlewares-summary..8_0_x-summary compared (5531036063c19..55310360e761b):

ct  : 210,600|233,500|22,900|10.9%
wt  : 952,642|1,158,435|205,793|21.6%
mu  : 163,615,192|179,015,280|15,400,088|9.4%
pmu : 164,050,488|179,433,200|15,382,712|9.4%

xhprof-kit-no-page-cache.txt

drush pm-uninstall page_cache dummycache

=== SUM: 2473113-lazy-middlewares-summary..8_0_x-summary compared (55310482292f4..553104874a837):

ct  : 5,351,500|5,348,900|-2,600|-0.0%
wt  : 16,953,651|16,999,313|45,662|0.3%
mu  : 1,847,605,784|1,844,377,152|-3,228,632|-0.2%
pmu : 1,864,027,280|1,861,036,320|-2,990,960|-0.2%

I propose to move the page cache middleware to priority 0 and push low-priority ones to negative values. That way we'd have an easy indicator whether or not lazy should be present on the service definition.

Log in or register to post comments

Comment #10

znerol CreditAttribution: znerol commented 17 April 2015 at 13:11

Status:

Active

» Needs review

Log in or register to post comments

Comment #11

Fabianx CreditAttribution: Fabianx for Acquia commented 17 April 2015 at 13:35

Status:

Needs review

» Needs work

For those not used to reading xhprof-kit reports. The SUM usually is a sum of 100 (so divide by 100) and znerol did put the patch first then the core, so we need to reverse the reading, so the above means we save:

10.9% function calls == 229 function calls
21.6% wt == 2 ms, on average

We only have 26 function calls more due to the proxying, which is 0.45 ms on average or 0.3% of 169 ms that a full bootstrap takes.

That both sounds acceptable.

CNW for moving the priorities.

I agree < 0 ==> lazy as guideline, >= 0 is not lazy.

Log in or register to post comments

Comment #12

Berdir

German

Switzerland

CreditAttribution: Berdir at MD Systems GmbH commented 17 April 2015 at 13:40

One downside of making them lazy I guess is that it makes the stacktrace a bit deeper. I think it adds at least one or two nested methods per middleware, so that's a bit unfortunate but probably acceptable.

Log in or register to post comments

Comment #13

dpovshed CreditAttribution: dpovshed commented 17 April 2015 at 14:32

File	Size
2473113_13.patch	608 bytes

2015-04-17-162239_833x450_scrot.png	83.06 KB

In my patch I declared as lazy only these 2 services which have chance to be used after page cache.

Result of profiling is attached as well, we're getting 8% in average in speed. According to xhprof we have reduced number of class instantiation via Composer\Autoload\ClassLoader::loadClass exactly by 2, from 59 to 57

Log in or register to post comments

Comment #14

Wim Leers

Ghent 🇧🇪🇪🇺

CreditAttribution: Wim Leers at Acquia commented 17 April 2015 at 15:11

@znerol: please assign the issue to yourself next time. @dpovshed was doing the same work as you! (Only necessary during sprints.)

I propose to move the page cache middleware to priority 0 and push low-priority ones to negative values. That way we'd have an easy indicator whether or not lazy should be present on the service definition.

+1

We only have 26 function calls more due to the proxying, which is 0.45 ms on average or 0.3% of 169 ms that a full bootstrap takes.

"Only 26" and "only 0.45 ms"? That seems a lot?

One downside of making them lazy I guess is that it makes the stacktrace a bit deeper. I think it adds at least one or two nested methods per middleware, so that's a bit unfortunate but probably acceptable.

That's why @dawehner requested to not make the ones before the page cache middleware lazy in #7.

Log in or register to post comments

Comment #15

znerol CreditAttribution: znerol commented 17 April 2015 at 15:17

Status:

Needs work

» Needs review

File	Size
interdiff.txt	1.16 KB
2473113-lazy-middlewares-14.diff	1.56 KB

@dpovshed: The difference between #9 and #13 is that the former also makes the http_kernel.basic lazy. Did you left that out intentionally? If you look at the stacked kernel, that service is the innermost; it runs after the page cache.

Reassigned priorities for page cache and late middlewares, interdiff is against #9.

Log in or register to post comments

Comment #16

17 April 2015 at 15:19

Status:

Needs review

» Needs work

The last submitted patch, 15: 2473113-lazy-middlewares-14.diff, failed testing.

Log in or register to post comments

Comment #17

17 April 2015 at 15:31

Status:

Needs work

» Needs review

znerol queued 15: 2473113-lazy-middlewares-14.diff for re-testing.

Log in or register to post comments

Comment #18

jibran

he/him

Urdu

Sydney, Australia

CreditAttribution: jibran commented 17 April 2015 at 17:09

Issue tags:

+Needs tests

We need tests for that as well.

Log in or register to post comments

Comment #19

Fabianx CreditAttribution: Fabianx for Acquia commented 17 April 2015 at 18:48

Status:

Needs review

» Needs work

+++ b/core/core.services.yml
@@ -567,6 +567,7 @@ services:
   http_kernel.basic:
     class: Symfony\Component\HttpKernel\HttpKernel
     arguments: ['@event_dispatcher', '@controller_resolver', '@request_stack']
+    lazy: true

Should we not use -50 for that one?

--

Yes, tests should check that any middleware marked with a priority of < 0 is lazy?

Log in or register to post comments

Comment #20

znerol CreditAttribution: znerol commented 20 April 2015 at 15:45

Issue summary:	View changes
Status:	Needs work	» Needs review

File	Size
2473113-lazy-middlewares-20.diff	2.91 KB

interdiff.txt	1.35 KB

Should we not use -50 for that one?

No, because the http_kernel.basic is not a middleware, it is always the innermost kernel.

Log in or register to post comments

Comment #21

effulgentsia CreditAttribution: effulgentsia at Acquia commented 20 April 2015 at 16:13

Priority:

Major

» Critical

Tentatively raising this to Critical, because it meets the >1ms savings for page cache threshold in #1744302: [meta] Resolve known performance regressions in Drupal 8. I don't know that it meets the would need to be deferred to 9.x threshold: I don't think we should change service priorities in patch-level releases, but I don't think we've had a discussion yet on whether we can in minor releases.

Log in or register to post comments

Comment #22

effulgentsia CreditAttribution: effulgentsia at Acquia commented 20 April 2015 at 17:31

We only have 26 function calls more due to the proxying, which is 0.45 ms on average or 0.3% of 169 ms that a full bootstrap takes.

It's a shame that making 3 services lazy adds that much overhead when internal page cache is not used. And if that's really what it is, it means that for sites using Varnish or other external proxies, overhead is added to cache misses with no compensating benefit on cache hits. And in http://wimleers.com/blog/drupal-8-page-caching-enabled-by-default, Wim mentions that even some cheap hosting plans offer Varnish, and that might increase during the lifetime of 8.x.

So, what if instead of adding lazy: true in the YAML files, we make page_cache module implement a compiler pass to add the lazy flag to all <0 priority middlewares? In other words, only add the proxy overhead when the module that benefits from it is enabled.

Log in or register to post comments

Comment #23

Wim Leers

Ghent 🇧🇪🇪🇺

CreditAttribution: Wim Leers at Acquia commented 20 April 2015 at 18:00

So, what if instead of adding lazy: true in the YAML files, we make page_cache module implement a compiler pass to add the lazy flag to all <0 priority middlewares? In other words, only add the proxy overhead when the module that benefits from it is enabled.

I like that idea a lot. Also removes the burden of having to remember to set lazy: true in the first place. We should apply that strategy also for paramconverters etc.

Log in or register to post comments

Comment #24

Wim Leers

Ghent 🇧🇪🇪🇺

CreditAttribution: Wim Leers at Acquia commented 20 April 2015 at 18:00

So, what if instead of adding lazy: true in the YAML files, we make page_cache module implement a compiler pass to add the lazy flag to all <0 priority middlewares? In other words, only add the proxy overhead when the module that benefits from it is enabled.

I like that idea a lot. Also removes the burden of having to remember to set lazy: true in the first place. We should apply that strategy also for paramconverters etc.

Log in or register to post comments

Comment #25

znerol CreditAttribution: znerol commented 20 April 2015 at 18:49

How about just adding a new optional parameter to the http_middleware tag. A middleware which potentially short-circuits subsequent middlewares or the kernel just could declare that by adding the tag. If it is present, the compiler pass would just add the lazy flag to all middlewares with a lower priority than the first one with the tag-parameter. This also would free us from introducing priorities with magic meaning etc.

Having a hard time to come up with an appropriate name for the parameter though.

Log in or register to post comments

Comment #26

znerol CreditAttribution: znerol commented 20 April 2015 at 18:58

I'd document the flag something like this:

If it can be expected that many or most requests will be handled directly by a middleware instead of being passed to the underlying kernel, then add XXX: true to the http_middleware tag of the service definition. This prevents premature construction of services which are not needed when the request can be handled directly by the middleware.

Log in or register to post comments

Comment #27

znerol CreditAttribution: znerol commented 20 April 2015 at 19:36

File	Size
2473113-lazy-middlewares-quick-rabbit.diff	2.32 KB

Something like this probably. No interdiff because this is an alternative approach.

Log in or register to post comments

Comment #28

znerol CreditAttribution: znerol commented 20 April 2015 at 20:14

File	Size
2473113-lazy-middlewares-quick-rabbit-27.diff	4.6 KB

interdiff.txt	2.28 KB

Adding a test.

Log in or register to post comments

Comment #29

effulgentsia CreditAttribution: effulgentsia at Acquia commented 20 April 2015 at 20:38

Priority:

Critical

» Major

Just brainstorming a name for the flag: commonly_skips_delegation? Reprioritizing back to Major if this approach can be successful without changing existing priorities, since that means this could be solved in a minor or patch release if necessary.

Log in or register to post comments

Comment #30

Fabianx CreditAttribution: Fabianx commented 21 April 2015 at 08:11

What about:

lazy_load_remaining: true

Or:

lazy_load_delegates: true

Or:

lazy_load_successors: true

Or:

make_delegates_lazy: true

etc.

Log in or register to post comments

Comment #31

cpj CreditAttribution: cpj commented 21 April 2015 at 11:47

+1 for lazy_load_delegates: true

Log in or register to post comments

Comment #32

znerol CreditAttribution: znerol commented 28 April 2015 at 16:03

Perhaps we could come up with a term which defines the behavior.

forwarder (TRUE by default), specify FALSE if it can be expected that many or most requests will be handled directly by the middleware instead of being passed to the underlying kernel.

[first_|early_|quick_]responder (FALSE by default), specify TRUE if it can be expected that many or most requests will be handled directly by the middleware instead of being passed to the underlying kernel.

Log in or register to post comments

Comment #33

fgm

French

Paris, France

CreditAttribution: fgm commented 28 April 2015 at 19:09

I think a term in the line suggested by znerol will be more explicit because it implies a purpose, not just a technical behavior.

In this line of thinking, from a DX PoV, terms in the (early|first|quick)_responder vein seem more appropriate, because an active behavior (intercepting the request) is associated with a boolean TRUE value, whereas a term like forwarder associates the active behavior with a boolean FALSE, which is not consistent cognitively.

And to complete selection, first is an absolute, which may be wrong since several middlewares may have this behavior and only one of them can be first, while quick introduces an element of performance relative to the middleware itself, which is not necessarily true.

In that light, early_responder seems the best choice.

/me exits bikeshedding mode

Log in or register to post comments

Comment #34

znerol CreditAttribution: znerol commented 28 April 2015 at 20:46

File	Size
all_stack_middlewares-2473113-34.patch	4.58 KB

interdiff.txt	2.43 KB

What about just responder? If that is too cryptic, may response_generator?

Log in or register to post comments

Comment #35

dawehner

German

CreditAttribution: dawehner commented 29 April 2015 at 05:45

I'm fine with responder, its at least clear what it does.
It would be though great in some docs to explain what a responder does different compared to other middlewares ... and doesn't that mean we should mark the main http kernel as responder?

Otherwise this looks pretty nice!

Log in or register to post comments

Comment #36

dawehner

German

CreditAttribution: dawehner commented 29 April 2015 at 05:45

I'm fine with responder, its at least clear what it does.
It would be though great in some docs to explain what a responder does different compared to other middlewares ... and doesn't that mean we should mark the main http kernel as responder?

Otherwise this looks pretty nice!

Log in or register to post comments

Comment #37

znerol CreditAttribution: znerol commented 29 April 2015 at 06:40

Status:

Needs review

» Needs work

The main http kernel is not a middleware. It does not have the http_middleware tag.

Where do we document service tags?

Log in or register to post comments

Comment #38

dawehner

German

CreditAttribution: dawehner commented 29 April 2015 at 08:00

Ups, you are right.

One please where we could place that documentation could be the top of \Drupal\Core\DependencyInjection\Compiler\StackedKernelPass
but #2264047: [meta] Document service definition tags seems to suggest maybe a different form.

@znerol
I guess we don't need tests anymore, right?

Log in or register to post comments

Comment #39

znerol CreditAttribution: znerol commented 29 April 2015 at 17:38

Status:	Needs work	» Needs review
Issue tags:	-Needs tests

File	Size
interdiff.txt	2.03 KB
all_stack_middlewares-2473113-39.patch	6.28 KB

Adds documentation.

Log in or register to post comments

Comment #40

Fabianx CreditAttribution: Fabianx commented 29 April 2015 at 22:37

Status:

Needs review

» Reviewed & tested by the community

RTBC! Docs looks great!

Log in or register to post comments

Comment #41

tstoeckler

he/him

German

Essen, Germany

CreditAttribution: tstoeckler commented 30 April 2015 at 06:39

Status:

Reviewed & tested by the community

» Needs review

The variable is called $first_responder , even though the tag is just responder. Would be great if we can quick-fix that, thanks.

Log in or register to post comments

Comment #42

znerol CreditAttribution: znerol commented 30 April 2015 at 06:45

Re: #41 it is possible to add the tag to more than one middleware. The lazy flag is added to all middleware definitions beyond the first one having the responder tag.

Log in or register to post comments

Comment #43

Fabianx CreditAttribution: Fabianx commented 30 April 2015 at 08:43

Status:

Needs review

» Reviewed & tested by the community

Yes, $first_responder is right as we explicitly search for the first responder in the chain.

Log in or register to post comments

Comment #44

Wim Leers

Ghent 🇧🇪🇪🇺

CreditAttribution: Wim Leers at Acquia commented 30 April 2015 at 09:24

This looks awesome!

According to my profiling we could potentially could shave off 1.5 ms for cached responses if we do not construct low priority middlewares.

Is this number still accurate? If nobody else does, I'll profile it.

Log in or register to post comments

Comment #45

Wim Leers

Ghent 🇧🇪🇪🇺

CreditAttribution: Wim Leers at Acquia commented 30 April 2015 at 09:42

Issue summary:

View changes

File	Size
histogram_facet.png	6.07 KB
histogram_interleaved.png	5.46 KB
ab runs.zip	3.09 KB

Benchmarked, 1000 requests ($ ab -n 1000 -c 1 -g ab.tsv http://tres/).

Before

Requests per second:    146.47 [#/sec] (mean)
Time per request:       6.827 [ms] (mean)
Time per request:       6.827 [ms] (mean, across all concurrent requests)
Transfer rate:          1731.07 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:     5    7   0.4      7      10
Waiting:        5    6   0.4      6       9
Total:          5    7   0.4      7      10

After

Requests per second:    157.93 [#/sec] (mean)
Time per request:       6.332 [ms] (mean)
Time per request:       6.332 [ms] (mean, across all concurrent requests)
Transfer rate:          1866.52 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:     5    6   0.4      6       9
Waiting:        5    6   0.4      6       8
Total:          5    6   0.4      6       9

Conclusion

Median decreased by 1 ms

0.5 ms/request faster on average

Max is 1 ms lower

Log in or register to post comments

Comment #46

alexpott

he/they

English

🇪🇺🌍

CreditAttribution: alexpott at Chapter Three commented 30 April 2015 at 12:02

Status:

Reviewed & tested by the community

» Fixed

Committed 9166525 and pushed to 8.0.x. Thanks!

Thanks for adding the beta evaluation to the issue summary.

Log in or register to post comments

Comment #47

30 April 2015 at 12:03

alexpott committed 9166525 on 8.0.x

Issue #2473113 by znerol, dpovshed, Wim Leers, Fabianx, dawehner,...

Log in or register to post comments

Comment #48

tstoeckler

he/him

German

Essen, Germany

CreditAttribution: tstoeckler commented 30 April 2015 at 14:01

Ahh yes, thanks for the clarification!

Log in or register to post comments

Comment #49

14 May 2015 at 14:04

Status:

Fixed

» Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.

Log in or register to post comments

All stack middlewares are constructed at the same time even for cached pages

Problem/Motivation

Proposed resolution

Remaining tasks

User interface changes

API changes

Beta phase evaluation

Comments