Problem/Motivation
@fgm discoverd that all stack middlewares including their dependencies are instantiated upon construction of the http_kernel
service. As a result even when a response is delivered from the cache, low priority middlewares like http_kernel.basic
, http_middleware.session
, http_middleware.kernel_pre_handle
are instantiated.
Preventing premature service initialization was one of the reasons for rewriting the EventDispatcher
(#1972300: Write a more scalable dispatcher). The stacked kernel suffers the same problem, though it is unclear at the moment how significant this issue is.
Delivery of cached pages is improved significantly by setting the lazy
flag on late middlewares such that those are only instantiated on cache miss.
Proposed resolution
@dawehner proposed to make middlewares lazy.
Remaining tasks
Profiling.
Review / Commit.
User interface changes
None.
API changes
Move late middlewares to a priority < 0 and mark them lazy, move the page cache middleware to priority 0.
Beta phase evaluation
Issue category | Task |
---|---|
Issue priority | Major |
Prioritized changes | The main goal of this issue is page cache performance |
Comment | File | Size | Author |
---|---|---|---|
#45 | ab runs.zip | 3.09 KB | Wim Leers |
#45 | histogram_interleaved.png | 5.46 KB | Wim Leers |
#45 | histogram_facet.png | 6.07 KB | Wim Leers |
#39 | all_stack_middlewares-2473113-39.patch | 6.28 KB | znerol |
#39 | interdiff.txt | 2.03 KB | znerol |
Comments
Comment #1
znerol CreditAttribution: znerol commentedAccording to my profiling we could potentially could shave off 1.5 ms for cached responses if we do not construct low priority middlewares. In order to restrict the profiling to http kernel construction exclusively, I hacked the necessary code directly into the application kernel:
Comment #2
Fabianx CreditAttribution: Fabianx for Acquia commentedI think it is worth a try to use:
lazy: true
for the low priority middleware's for now, the proxy should have only 2-3 function calls overhead per middleware, which should be acceptable.
Comment #3
Wim LeersHrm, you said "1.5 ms for cached responses". Making responses served by the Page Cache 1.5 ms faster would be *huge*.
But when you talked to me in person just now, it seemed you said that it doesn't seem to be worth it. So now I'm confused :)
Raising to major if the 1.5 ms win is correct.
Comment #4
Wim LeersTo clarify: saving 1.5 ms would be a 22% gain. If it's just 1 ms on my machine, then it'd be a 15% gain. Both would be great wins.
Comment #5
Fabianx CreditAttribution: Fabianx for Acquia commented#4: It is especially HUGE if you use a non-DB backend (e.g. pure APC), because you then save the 1.6 ms of PDO construction, too it seems ...
Comment #6
Wim LeersCorrect! That time would be saved even when only the page cache's cache back-end would not be in the DB.
Comment #7
dawehnerWell, for debugability I'd propose to mark the one before page cache not as lazy, as otherwise you just have one more level in all of the call stacks.
Comment #8
Wim Leers#7: Good point!
Comment #9
znerol CreditAttribution: znerol commentedxhprof-kit-page-cache.txt
drush en page_cache dummycache
xhprof-kit-no-page-cache.txt
drush pm-uninstall page_cache dummycache
I propose to move the page cache middleware to priority 0 and push low-priority ones to negative values. That way we'd have an easy indicator whether or not
lazy
should be present on the service definition.Comment #10
znerol CreditAttribution: znerol commentedComment #11
Fabianx CreditAttribution: Fabianx for Acquia commentedFor those not used to reading xhprof-kit reports. The SUM usually is a sum of 100 (so divide by 100) and znerol did put the patch first then the core, so we need to reverse the reading, so the above means we save:
10.9% function calls == 229 function calls
21.6% wt == 2 ms, on average
We only have 26 function calls more due to the proxying, which is 0.45 ms on average or 0.3% of 169 ms that a full bootstrap takes.
That both sounds acceptable.
CNW for moving the priorities.
I agree < 0 ==> lazy as guideline, >= 0 is not lazy.
Comment #12
BerdirOne downside of making them lazy I guess is that it makes the stacktrace a bit deeper. I think it adds at least one or two nested methods per middleware, so that's a bit unfortunate but probably acceptable.
Comment #13
dpovshed CreditAttribution: dpovshed commentedIn my patch I declared as lazy only these 2 services which have chance to be used after page cache.
Result of profiling is attached as well, we're getting 8% in average in speed. According to xhprof we have reduced number of class instantiation via Composer\Autoload\ClassLoader::loadClass exactly by 2, from 59 to 57
Comment #14
Wim Leers@znerol: please assign the issue to yourself next time. @dpovshed was doing the same work as you! (Only necessary during sprints.)
+1
"Only 26" and "only 0.45 ms"? That seems a lot?
That's why @dawehner requested to not make the ones before the page cache middleware lazy in #7.
Comment #15
znerol CreditAttribution: znerol commented@dpovshed: The difference between #9 and #13 is that the former also makes the
http_kernel.basic
lazy
. Did you left that out intentionally? If you look at the stacked kernel, that service is the innermost; it runs after the page cache.Reassigned priorities for page cache and late middlewares, interdiff is against #9.
Comment #18
jibranWe need tests for that as well.
Comment #19
Fabianx CreditAttribution: Fabianx for Acquia commentedShould we not use -50 for that one?
--
Yes, tests should check that any middleware marked with a priority of < 0 is lazy?
Comment #20
znerol CreditAttribution: znerol commentedNo, because the
http_kernel.basic
is not a middleware, it is always the innermost kernel.Comment #21
effulgentsia CreditAttribution: effulgentsia at Acquia commentedTentatively raising this to Critical, because it meets the >1ms savings for page cache threshold in #1744302: [meta] Resolve known performance regressions in Drupal 8. I don't know that it meets the would need to be deferred to 9.x threshold: I don't think we should change service priorities in patch-level releases, but I don't think we've had a discussion yet on whether we can in minor releases.
Comment #22
effulgentsia CreditAttribution: effulgentsia at Acquia commentedIt's a shame that making 3 services lazy adds that much overhead when internal page cache is not used. And if that's really what it is, it means that for sites using Varnish or other external proxies, overhead is added to cache misses with no compensating benefit on cache hits. And in http://wimleers.com/blog/drupal-8-page-caching-enabled-by-default, Wim mentions that even some cheap hosting plans offer Varnish, and that might increase during the lifetime of 8.x.
So, what if instead of adding
lazy: true
in the YAML files, we make page_cache module implement a compiler pass to add the lazy flag to all <0 priority middlewares? In other words, only add the proxy overhead when the module that benefits from it is enabled.Comment #23
Wim LeersI like that idea a lot. Also removes the burden of having to remember to set
lazy: true
in the first place. We should apply that strategy also for paramconverters etc.Comment #24
Wim LeersI like that idea a lot. Also removes the burden of having to remember to set
lazy: true
in the first place. We should apply that strategy also for paramconverters etc.Comment #25
znerol CreditAttribution: znerol commentedHow about just adding a new optional parameter to the
http_middleware
tag. A middleware which potentially short-circuits subsequent middlewares or the kernel just could declare that by adding the tag. If it is present, the compiler pass would just add the lazy flag to all middlewares with a lower priority than the first one with the tag-parameter. This also would free us from introducing priorities with magic meaning etc.Having a hard time to come up with an appropriate name for the parameter though.
Comment #26
znerol CreditAttribution: znerol commentedI'd document the flag something like this:
If it can be expected that many or most requests will be handled directly by a middleware instead of being passed to the underlying kernel, then add
XXX: true
to thehttp_middleware
tag of the service definition. This prevents premature construction of services which are not needed when the request can be handled directly by the middleware.Comment #27
znerol CreditAttribution: znerol commentedSomething like this probably. No interdiff because this is an alternative approach.
Comment #28
znerol CreditAttribution: znerol commentedAdding a test.
Comment #29
effulgentsia CreditAttribution: effulgentsia at Acquia commentedJust brainstorming a name for the flag:
commonly_skips_delegation
? Reprioritizing back to Major if this approach can be successful without changing existing priorities, since that means this could be solved in a minor or patch release if necessary.Comment #30
Fabianx CreditAttribution: Fabianx commentedWhat about:
lazy_load_remaining: true
Or:
lazy_load_delegates: true
Or:
lazy_load_successors: true
Or:
make_delegates_lazy: true
etc.
Comment #31
cpj CreditAttribution: cpj commented+1 for
lazy_load_delegates: true
Comment #32
znerol CreditAttribution: znerol commentedPerhaps we could come up with a term which defines the behavior.
forwarder
(TRUE
by default), specifyFALSE
if it can be expected that many or most requests will be handled directly by the middleware instead of being passed to the underlying kernel.[first_|early_|quick_]responder
(FALSE
by default), specifyTRUE
if it can be expected that many or most requests will be handled directly by the middleware instead of being passed to the underlying kernel.Comment #33
fgmI think a term in the line suggested by znerol will be more explicit because it implies a purpose, not just a technical behavior.
In this line of thinking, from a DX PoV, terms in the
(early|first|quick)_responder
vein seem more appropriate, because an active behavior (intercepting the request) is associated with a boolean TRUE value, whereas a term likeforwarder
associates the active behavior with a boolean FALSE, which is not consistent cognitively.And to complete selection,
first
is an absolute, which may be wrong since several middlewares may have this behavior and only one of them can be first, whilequick
introduces an element of performance relative to the middleware itself, which is not necessarily true.In that light,
early_responder
seems the best choice./me exits bikeshedding mode
Comment #34
znerol CreditAttribution: znerol commentedWhat about just
responder
? If that is too cryptic, mayresponse_generator
?Comment #35
dawehnerI'm fine with responder, its at least clear what it does.
It would be though great in some docs to explain what a responder does different compared to other middlewares ... and doesn't that mean we should mark the main http kernel as responder?
Otherwise this looks pretty nice!
Comment #36
dawehnerI'm fine with responder, its at least clear what it does.
It would be though great in some docs to explain what a responder does different compared to other middlewares ... and doesn't that mean we should mark the main http kernel as responder?
Otherwise this looks pretty nice!
Comment #37
znerol CreditAttribution: znerol commentedThe main http kernel is not a middleware. It does not have the
http_middleware
tag.Where do we document service tags?
Comment #38
dawehnerUps, you are right.
One please where we could place that documentation could be the top of
\Drupal\Core\DependencyInjection\Compiler\StackedKernelPass
but #2264047: [meta] Document service definition tags seems to suggest maybe a different form.
@znerol
I guess we don't need tests anymore, right?
Comment #39
znerol CreditAttribution: znerol commentedAdds documentation.
Comment #40
Fabianx CreditAttribution: Fabianx commentedRTBC! Docs looks great!
Comment #41
tstoecklerThe variable is called
$first_responder
, even though the tag is justresponder
. Would be great if we can quick-fix that, thanks.Comment #42
znerol CreditAttribution: znerol commentedRe: #41 it is possible to add the tag to more than one middleware. The
lazy
flag is added to all middleware definitions beyond the first one having theresponder
tag.Comment #43
Fabianx CreditAttribution: Fabianx commentedYes, $first_responder is right as we explicitly search for the first responder in the chain.
Comment #44
Wim LeersThis looks awesome!
Is this number still accurate? If nobody else does, I'll profile it.
Comment #45
Wim LeersBenchmarked, 1000 requests (
$ ab -n 1000 -c 1 -g ab.tsv http://tres/
).Comment #46
alexpottCommitted 9166525 and pushed to 8.0.x. Thanks!
Thanks for adding the beta evaluation to the issue summary.
Comment #48
tstoecklerAhh yes, thanks for the clarification!