Problem/Motivation

@fgm discoverd that all stack middlewares including their dependencies are instantiated upon construction of the http_kernel service. As a result even when a response is delivered from the cache, low priority middlewares like http_kernel.basic, http_middleware.session, http_middleware.kernel_pre_handle are instantiated.

Preventing premature service initialization was one of the reasons for rewriting the EventDispatcher (#1972300: Write a more scalable dispatcher). The stacked kernel suffers the same problem, though it is unclear at the moment how significant this issue is.

Delivery of cached pages is improved significantly by setting the lazy flag on late middlewares such that those are only instantiated on cache miss.

Proposed resolution

@dawehner proposed to make middlewares lazy.

Remaining tasks

Profiling.
Review / Commit.

User interface changes

None.

API changes

Move late middlewares to a priority < 0 and mark them lazy, move the page cache middleware to priority 0.

Beta phase evaluation

Reference: https://www.drupal.org/core/beta-changes
Issue category Task
Issue priority Major
Prioritized changes The main goal of this issue is page cache performance
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

znerol’s picture

According to my profiling we could potentially could shave off 1.5 ms for cached responses if we do not construct low priority middlewares. In order to restrict the profiling to http kernel construction exclusively, I hacked the necessary code directly into the application kernel:

diff --git a/core/lib/Drupal/Core/DrupalKernel.php b/core/lib/Drupal/Core/DrupalKernel.php
index 8592402..83db918 100644
--- a/core/lib/Drupal/Core/DrupalKernel.php
+++ b/core/lib/Drupal/Core/DrupalKernel.php
@@ -576,7 +576,16 @@ public function terminate(Request $request, Response $response) {
    */
   public function handle(Request $request, $type = self::MASTER_REQUEST, $catch = TRUE) {
     $this->boot();
-    return $this->getHttpKernel()->handle($request, $type, $catch);
+    xhprof_enable(XHPROF_FLAGS_CPU + XHPROF_FLAGS_MEMORY);
+    $this->getHttpKernel();
+    $xhprof_data = xhprof_disable();
+    require_once "/home/lo/src/xhprof/xhprof_lib/utils/xhprof_lib.php";
+    require_once "/home/lo/src/xhprof/xhprof_lib/utils/xhprof_runs.php";
+
+    $xhprof_runs = new \XHProfRuns_Default();
+    $run_id = $xhprof_runs->save_run($xhprof_data, "d8");
+    error_log("http://localhost:3001/index.php?run=$run_id&source=d8");
+    return new Response('hi!');
   }
 
   /**
Fabianx’s picture

I think it is worth a try to use:

lazy: true

for the low priority middleware's for now, the proxy should have only 2-3 function calls overhead per middleware, which should be acceptable.

Wim Leers’s picture

Priority: Normal » Major
Issue tags: +D8 Accelerate Dev Days

Hrm, you said "1.5 ms for cached responses". Making responses served by the Page Cache 1.5 ms faster would be *huge*.

But when you talked to me in person just now, it seemed you said that it doesn't seem to be worth it. So now I'm confused :)

Raising to major if the 1.5 ms win is correct.

Wim Leers’s picture

To clarify: saving 1.5 ms would be a 22% gain. If it's just 1 ms on my machine, then it'd be a 15% gain. Both would be great wins.

Fabianx’s picture

#4: It is especially HUGE if you use a non-DB backend (e.g. pure APC), because you then save the 1.6 ms of PDO construction, too it seems ...

Wim Leers’s picture

Correct! That time would be saved even when only the page cache's cache back-end would not be in the DB.

dawehner’s picture

for the low priority middleware's for now, the proxy should have only 2-3 function calls overhead per middleware, which should be acceptable.

Well, for debugability I'd propose to mark the one before page cache not as lazy, as otherwise you just have one more level in all of the call stacks.

Wim Leers’s picture

#7: Good point!

znerol’s picture

xhprof-kit-page-cache.txt

drush en page_cache dummycache

=== SUM: 2473113-lazy-middlewares-summary..8_0_x-summary compared (5531036063c19..55310360e761b):

ct  : 210,600|233,500|22,900|10.9%
wt  : 952,642|1,158,435|205,793|21.6%
mu  : 163,615,192|179,015,280|15,400,088|9.4%
pmu : 164,050,488|179,433,200|15,382,712|9.4%

xhprof-kit-no-page-cache.txt

drush pm-uninstall page_cache dummycache

=== SUM: 2473113-lazy-middlewares-summary..8_0_x-summary compared (55310482292f4..553104874a837):

ct  : 5,351,500|5,348,900|-2,600|-0.0%
wt  : 16,953,651|16,999,313|45,662|0.3%
mu  : 1,847,605,784|1,844,377,152|-3,228,632|-0.2%
pmu : 1,864,027,280|1,861,036,320|-2,990,960|-0.2%

I propose to move the page cache middleware to priority 0 and push low-priority ones to negative values. That way we'd have an easy indicator whether or not lazy should be present on the service definition.

znerol’s picture

Status: Active » Needs review
Fabianx’s picture

Status: Needs review » Needs work

For those not used to reading xhprof-kit reports. The SUM usually is a sum of 100 (so divide by 100) and znerol did put the patch first then the core, so we need to reverse the reading, so the above means we save:

10.9% function calls == 229 function calls
21.6% wt == 2 ms, on average

We only have 26 function calls more due to the proxying, which is 0.45 ms on average or 0.3% of 169 ms that a full bootstrap takes.

That both sounds acceptable.

CNW for moving the priorities.

I agree < 0 ==> lazy as guideline, >= 0 is not lazy.

Berdir’s picture

One downside of making them lazy I guess is that it makes the stacktrace a bit deeper. I think it adds at least one or two nested methods per middleware, so that's a bit unfortunate but probably acceptable.

dpovshed’s picture

In my patch I declared as lazy only these 2 services which have chance to be used after page cache.

Result of profiling is attached as well, we're getting 8% in average in speed. According to xhprof we have reduced number of class instantiation via Composer\Autoload\ClassLoader::loadClass exactly by 2, from 59 to 57

Wim Leers’s picture

@znerol: please assign the issue to yourself next time. @dpovshed was doing the same work as you! (Only necessary during sprints.)


I propose to move the page cache middleware to priority 0 and push low-priority ones to negative values. That way we'd have an easy indicator whether or not lazy should be present on the service definition.

+1

We only have 26 function calls more due to the proxying, which is 0.45 ms on average or 0.3% of 169 ms that a full bootstrap takes.

"Only 26" and "only 0.45 ms"? That seems a lot?

One downside of making them lazy I guess is that it makes the stacktrace a bit deeper. I think it adds at least one or two nested methods per middleware, so that's a bit unfortunate but probably acceptable.

That's why @dawehner requested to not make the ones before the page cache middleware lazy in #7.

znerol’s picture

Status: Needs work » Needs review
FileSize
1.16 KB
1.56 KB

@dpovshed: The difference between #9 and #13 is that the former also makes the http_kernel.basic lazy. Did you left that out intentionally? If you look at the stacked kernel, that service is the innermost; it runs after the page cache.

Reassigned priorities for page cache and late middlewares, interdiff is against #9.

Status: Needs review » Needs work

The last submitted patch, 15: 2473113-lazy-middlewares-14.diff, failed testing.

Status: Needs work » Needs review
jibran’s picture

Issue tags: +Needs tests

We need tests for that as well.

Fabianx’s picture

Status: Needs review » Needs work
+++ b/core/core.services.yml
@@ -567,6 +567,7 @@ services:
   http_kernel.basic:
     class: Symfony\Component\HttpKernel\HttpKernel
     arguments: ['@event_dispatcher', '@controller_resolver', '@request_stack']
+    lazy: true

Should we not use -50 for that one?

--

Yes, tests should check that any middleware marked with a priority of < 0 is lazy?

znerol’s picture

Issue summary: View changes
Status: Needs work » Needs review
FileSize
2.91 KB
1.35 KB

Should we not use -50 for that one?

No, because the http_kernel.basic is not a middleware, it is always the innermost kernel.

effulgentsia’s picture

Priority: Major » Critical

Tentatively raising this to Critical, because it meets the >1ms savings for page cache threshold in #1744302: [meta] Resolve known performance regressions in Drupal 8. I don't know that it meets the would need to be deferred to 9.x threshold: I don't think we should change service priorities in patch-level releases, but I don't think we've had a discussion yet on whether we can in minor releases.

effulgentsia’s picture

We only have 26 function calls more due to the proxying, which is 0.45 ms on average or 0.3% of 169 ms that a full bootstrap takes.

It's a shame that making 3 services lazy adds that much overhead when internal page cache is not used. And if that's really what it is, it means that for sites using Varnish or other external proxies, overhead is added to cache misses with no compensating benefit on cache hits. And in http://wimleers.com/blog/drupal-8-page-caching-enabled-by-default, Wim mentions that even some cheap hosting plans offer Varnish, and that might increase during the lifetime of 8.x.

So, what if instead of adding lazy: true in the YAML files, we make page_cache module implement a compiler pass to add the lazy flag to all <0 priority middlewares? In other words, only add the proxy overhead when the module that benefits from it is enabled.

Wim Leers’s picture

So, what if instead of adding lazy: true in the YAML files, we make page_cache module implement a compiler pass to add the lazy flag to all <0 priority middlewares? In other words, only add the proxy overhead when the module that benefits from it is enabled.

I like that idea a lot. Also removes the burden of having to remember to set lazy: true in the first place. We should apply that strategy also for paramconverters etc.

Wim Leers’s picture

So, what if instead of adding lazy: true in the YAML files, we make page_cache module implement a compiler pass to add the lazy flag to all <0 priority middlewares? In other words, only add the proxy overhead when the module that benefits from it is enabled.

I like that idea a lot. Also removes the burden of having to remember to set lazy: true in the first place. We should apply that strategy also for paramconverters etc.

znerol’s picture

How about just adding a new optional parameter to the http_middleware tag. A middleware which potentially short-circuits subsequent middlewares or the kernel just could declare that by adding the tag. If it is present, the compiler pass would just add the lazy flag to all middlewares with a lower priority than the first one with the tag-parameter. This also would free us from introducing priorities with magic meaning etc.

Having a hard time to come up with an appropriate name for the parameter though.

znerol’s picture

I'd document the flag something like this:

If it can be expected that many or most requests will be handled directly by a middleware instead of being passed to the underlying kernel, then add XXX: true to the http_middleware tag of the service definition. This prevents premature construction of services which are not needed when the request can be handled directly by the middleware.

znerol’s picture

Something like this probably. No interdiff because this is an alternative approach.

znerol’s picture

Adding a test.

effulgentsia’s picture

Priority: Critical » Major

Just brainstorming a name for the flag: commonly_skips_delegation? Reprioritizing back to Major if this approach can be successful without changing existing priorities, since that means this could be solved in a minor or patch release if necessary.

Fabianx’s picture

What about:

lazy_load_remaining: true

Or:

lazy_load_delegates: true

Or:

lazy_load_successors: true

Or:

make_delegates_lazy: true

etc.

cpj’s picture

+1 for lazy_load_delegates: true

znerol’s picture

Perhaps we could come up with a term which defines the behavior.

forwarder (TRUE by default), specify FALSE if it can be expected that many or most requests will be handled directly by the middleware instead of being passed to the underlying kernel.

[first_|early_|quick_]responder (FALSE by default), specify TRUE if it can be expected that many or most requests will be handled directly by the middleware instead of being passed to the underlying kernel.

fgm’s picture

I think a term in the line suggested by znerol will be more explicit because it implies a purpose, not just a technical behavior.

In this line of thinking, from a DX PoV, terms in the (early|first|quick)_responder vein seem more appropriate, because an active behavior (intercepting the request) is associated with a boolean TRUE value, whereas a term like forwarder associates the active behavior with a boolean FALSE, which is not consistent cognitively.

And to complete selection, first is an absolute, which may be wrong since several middlewares may have this behavior and only one of them can be first, while quick introduces an element of performance relative to the middleware itself, which is not necessarily true.

In that light, early_responder seems the best choice.

/me exits bikeshedding mode

znerol’s picture

What about just responder? If that is too cryptic, may response_generator?

dawehner’s picture

I'm fine with responder, its at least clear what it does.
It would be though great in some docs to explain what a responder does different compared to other middlewares ... and doesn't that mean we should mark the main http kernel as responder?

Otherwise this looks pretty nice!

dawehner’s picture

I'm fine with responder, its at least clear what it does.
It would be though great in some docs to explain what a responder does different compared to other middlewares ... and doesn't that mean we should mark the main http kernel as responder?

Otherwise this looks pretty nice!

znerol’s picture

Status: Needs review » Needs work

The main http kernel is not a middleware. It does not have the http_middleware tag.

Where do we document service tags?

dawehner’s picture

Ups, you are right.

One please where we could place that documentation could be the top of \Drupal\Core\DependencyInjection\Compiler\StackedKernelPass
but #2264047: [meta] Document service definition tags seems to suggest maybe a different form.

@znerol
I guess we don't need tests anymore, right?

znerol’s picture

Status: Needs work » Needs review
Issue tags: -Needs tests
FileSize
2.03 KB
6.28 KB

Adds documentation.

Fabianx’s picture

Status: Needs review » Reviewed & tested by the community

RTBC! Docs looks great!

tstoeckler’s picture

Status: Reviewed & tested by the community » Needs review

The variable is called $first_responder , even though the tag is just responder. Would be great if we can quick-fix that, thanks.

znerol’s picture

Re: #41 it is possible to add the tag to more than one middleware. The lazy flag is added to all middleware definitions beyond the first one having the responder tag.

Fabianx’s picture

Status: Needs review » Reviewed & tested by the community

Yes, $first_responder is right as we explicitly search for the first responder in the chain.

Wim Leers’s picture

This looks awesome!

According to my profiling we could potentially could shave off 1.5 ms for cached responses if we do not construct low priority middlewares.

Is this number still accurate? If nobody else does, I'll profile it.

Wim Leers’s picture

Issue summary: View changes
FileSize
6.07 KB
5.46 KB
3.09 KB

Benchmarked, 1000 requests ($ ab -n 1000 -c 1 -g ab.tsv http://tres/).

Before
Requests per second:    146.47 [#/sec] (mean)
Time per request:       6.827 [ms] (mean)
Time per request:       6.827 [ms] (mean, across all concurrent requests)
Transfer rate:          1731.07 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:     5    7   0.4      7      10
Waiting:        5    6   0.4      6       9
Total:          5    7   0.4      7      10
After
Requests per second:    157.93 [#/sec] (mean)
Time per request:       6.332 [ms] (mean)
Time per request:       6.332 [ms] (mean, across all concurrent requests)
Transfer rate:          1866.52 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:     5    6   0.4      6       9
Waiting:        5    6   0.4      6       8
Total:          5    6   0.4      6       9
Conclusion
Median decreased by 1 ms
0.5 ms/request faster on average
Max is 1 ms lower

alexpott’s picture

Status: Reviewed & tested by the community » Fixed

Committed 9166525 and pushed to 8.0.x. Thanks!

Thanks for adding the beta evaluation to the issue summary.

  • alexpott committed 9166525 on 8.0.x
    Issue #2473113 by znerol, dpovshed, Wim Leers, Fabianx, dawehner,...
tstoeckler’s picture

Ahh yes, thanks for the clarification!

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.