Support for Drupal 7 is ending on 5 January 2025—it’s time to migrate to Drupal 10! Learn about the many benefits of Drupal 10 and find migration tools in our resource center.
Problem/Motivation
Route preloading helps to avoid DB queries, but it comes with the cost to unserialize more than actually needed.
Proposed resolution
Still load the preloaded routes, but just unserialize on the fly.
Remaining tasks
User interface changes
API changes
API addition - new interface + method
Beta phase evaluation
Issue category | Task because it's refactoring functions for performance |
---|---|
Issue priority | Major because it improves performance for non-cached pages. Not critical because it's a modest improvement and not a release blocker since the changes are non-breaking. |
Prioritized changes | The main goal of this issue is performance |
Comment | File | Size | Author |
---|---|---|---|
#106 | crappy-interdiff.txt | 751 bytes | aspilicious |
#104 | 2381505-104.patch | 13.63 KB | aspilicious |
#99 | 2381505-98.patch | 13.57 KB | dawehner |
#99 | interdiff.txt | 960 bytes | dawehner |
#94 | increment.txt | 1.37 KB | pwolanin |
Comments
Comment #1
dawehnerLet's upload the existing patch.
Comment #3
dawehnerLet's fix it.
Comment #4
dawehnerGood profiling criterias would be a similar page like #2380417: Bootstrap flame graph
Comment #5
Damien Tournoud CreditAttribution: Damien Tournoud commentedIt's worse than you think: given the way
Drupal\Core\Routing\CompiledRoute
is currently written, each route is deserialized twice: once inSymfony\Component\Routing\CompiledRoute::unserialize
, once inDrupal\Core\Routing\CompiledRoute::unserialize
.And on serialization, it is serialized in the parent, deserialized and re-serialized...
Comment #6
Damien Tournoud CreditAttribution: Damien Tournoud commentedAlso, I don't get why the preloader is storing the list of non-admin route in the state? Why not just adding a database column on the
{router}
table a populate it inMatcherDumper::dump()
?Comment #7
Damien Tournoud CreditAttribution: Damien Tournoud commentedAs much as possible, you should call the provider here. Or else you risk deserializing the same route multiple time.
(And it's really sad that you have to override all the other methods, but that's
Symfony\Component\Routing\RouteCollection
's fault.)Comment #8
dawehnerThank you for your great comments tonight!
Okay, that is a clear bug, let's fix it.
Right, we can fix that as well, but its makes the code a bit more ugly, if you ask me.
I don't care, using state seemed to be the most flexible solution in case contrib has better ideas.
At least for that usecase I realized we don't need to subclass the RouteCollection BS, let's just implement that on the
RouteProvider
level and be done.Comment #10
catchYes this was the original reason and it still seems sensible to me, if someone comes up with a better contrib optimization they should be able to completely remove this one from running at all.
Comment #11
Damien Tournoud CreditAttribution: Damien Tournoud commented@catch: but really they should do that by swapping out the route preloading service, no?
Having the default
MatcherDumper
add this information to the database doesn't really hurt.Comment #12
Damien Tournoud CreditAttribution: Damien Tournoud commentedThe current implementation will not scale, and from the look of it, it is already pretty inefficient even with core alone.
Comment #13
catch@Damien I'm also thinking of things like altering all HTML admin routes away completely then using a PHP dumper for everything else. So no pre-loading at all then. In that case the route pre-loading could be replaced with a null implementation and the dumper would be different anyway, so even for this keeping it in the db seems OK indeed.
Another thing that would help with simple routes that don't need to generate links, or links that do but where there's a 100% render cache hit, would be to only pre-load the routes when generating a link for the first time. No point pre-loading something that'll never get accessed. Probably separate issue for that.
Comment #14
moshe weitzman CreditAttribution: moshe weitzman commentedUsing a db column instead of state would not help with our goal of a speedup via "unserialize on the fly". Lets consider that out of scope for this issue.
Hopefully someone can debug the test fail on the most recent patch.
Comment #15
moshe weitzman CreditAttribution: moshe weitzman commentedUsing a db column instead of state would not help with our goal of a speedup via "unserialize on the fly". Lets consider that out of scope for this issue.
Hopefully someone can debug the test fail on the most recent patch.
Comment #16
dawehnerAlright, so this time it looks a bit better.
Comment #18
dawehnerMaybe that is all, what is needed.
Comment #19
moshe weitzman CreditAttribution: moshe weitzman commentedAre we really unserializing on the fly here? I see that loadNonAdminRoutes() passes TRUE as second argument to getRoutesByNames().
I can do a flame graph with and without this change if you think that would be helpful.
Comment #20
dawehnerEhem, you are absolute right, my bad.
Comment #22
dawehnerMaybe the filewidget test was just a random failure.
Comment #24
dawehnerMaybe, just maybe, this helps.
Comment #26
dawehnerFirst just a reroll.
Comment #28
pwolanin CreditAttribution: pwolanin commentedPatch still applies cleanly, but I'm not seeing the fail locally.
Comment #31
pwolanin CreditAttribution: pwolanin commentedI'm not sure it makes sense to modify the interface - why not just add a method that does the desired pre-loading?
It also looks like there is a bug in the case you needed to rebuild routes.
Really not able to reproduce the test bot fail locally, even after discussion with dawehner.
I get fails when putting index.php into the Drupal URL, but I get the same fails on 8.0.x.
Comment #33
pwolanin CreditAttribution: pwolanin commentedI spun up a testbot myself - looks like a PHP segfault
Very weird, but let's try changing the code a little.
Comment #34
pwolanin CreditAttribution: pwolanin commentedWow - this is just horrible.
Even with my own testbot, debugging this was just trial and error.
If I disable APC, instead of a segfault I get a SQL fatal due to duplicate uuid generation. WTF
It seems all is solved by putting back the call to
parent::serialize();
in CompiledRoute::serialize(). So clearly some weird PHP bug in that version of 5.4 that expects you to call serialize up the full chain of inheritance.Comment #35
pwolanin CreditAttribution: pwolanin commentedAnd being PHP - it seems in other cases calling
parent::serialize()
causes the segfault: https://bugs.php.net/bug.php?id=65591Comment #36
pwolanin CreditAttribution: pwolanin commentedOops the last interdiff was wrong - here's the correct one.
Comment #37
pwolanin CreditAttribution: pwolanin commentedUpdated comment
Comment #39
BerdirFYI: Field API had segfaults with the Serializable interface for quite some time and switched to use __sleep()/__wakeup(): #2074253: Fatal error when using Serializable interface on PHP 5.4
Comment #40
pwolanin CreditAttribution: pwolanin commentedSince the upstream symfony class implements the serialize method, and I don't think __sleep() actually allows you to decide which data to serialize, I' not sure moving to __sleep() is a viable option
Comment #41
dawehnerGreat work @pwolanin!!
I think the problem mentioned in #2074253: Fatal error when using Serializable interface on PHP 5.4 is not a problem here, we don't have objects referencing each other etc.
The Route object is some kind of simple object, just containing data.
We don't call
serialize()
often anyway, so there is actually no point in optimizing that call to parent.Comment #42
pwolanin CreditAttribution: pwolanin commentedYes, but in any case - clearly serialize is dangerous territory
Comment #43
dawehner@pwolanin
Do we need something else on this issue?
Comment #44
pwolanin CreditAttribution: pwolanin commentedI think it's RTBC. The only bit where I'm not 100% sure about the tradeoff is:
But as long as the reset() method is used everything works as intended.
Comment #47
rteijeiro CreditAttribution: rteijeiro commentedRe-rolled!
Comment #51
dawehnerJust fixing some stuff.
Comment #53
dawehnerThere we go.
Comment #54
Crell CreditAttribution: Crell at Palantir.net commentedSo wait, so this will segfault in some versions, either way? It's just a question of which version we segfault on? That... sucks.
This method would need to be added to an interface. Remember, alternate implementations of RouteProvider already exist.
I'm open to this if it's a significant performance boost, but I am wary of moving preload awareness into the provider. The current preloader is a completely transparent zero-API add-on. This moves it partially into the Provider, which feels like mixing concerns. Could we perhaps better abstract the operations involved, eg, so that the Provider doesn't need to know that cache preloading is the task at hand? Something like fetchRoute() (gets serialized version), decodeRoute() (turns serialized version into loaded version) and loadRoute() (gets serialized version and decodes it)? (I'm just making up names but hopefully you get the idea.)
Comment #55
dawehnerYeah its certainly not a bad idea to give these things a better name.
Comment #56
pwolanin CreditAttribution: pwolanin commented@Crell - it seems calling
parent::serialize()
is the less risky of the 2 choices here. The bug example had to set up a complex nesting to get that to cause the segfault.Comment #57
pwolanin CreditAttribution: pwolanin commentedIn either case we are assuming that the route provider has an internal cache. Is that a useful assumption?
We could do something like this, though I'm not sure it makes senes to extend the interface another level.
We could also just check that it's a Drupal\Core\Routing\RoutProviderInterface
Comment #58
dawehnerWhy do you not just use
$this->routeProvider instanceof PreloadRouteProviderInterface
Comment #59
pwolanin CreditAttribution: pwolanin commentedoh, I guess that works too - I just recall using is_subclass_of() for some of the plugin code.
Comment #60
pwolanin CreditAttribution: pwolanin commentedComment #61
dawehnerYeah, you need that if you don't have an actual instance of those objects.
Comment #62
Crell CreditAttribution: Crell at Palantir.net commentedThis comment still confuses me. It reads as "this will sometimes segfault, sometimes not", which is not confidence-inducing. WHY did we decide that doing this was the better option? (Don't tell me; update the patch.)
PreloadableRouteProviderInterface?
Unnecessary. It's always RouteProviderInterface. "Or a subclass thereof" is implicit in class/interface type checks.
This I like.
Much better this time around!
Comment #63
dawehnerWell, that is just reality, PHP is interesting, see http://3v4l.org/QFDDc especially. We don't change behaviour here, but its good that we document it, for now.
Note: In order to solve it in a sane way, we would need upstream to use protected variables, but they don't, which well, some of us strongly disagree:
https://github.com/symfony/symfony/pull/14282
Well, I think adding the two possible candidates indicates clearly that some other parts of the code has different behaviour depending on the more specific interface.
Comment #65
dawehnerLet's fix it.
Comment #66
Crell CreditAttribution: Crell at Palantir.net commentedThe comment is still unacceptably vague. Looking at the code, I don't know why we're doing something that is called out as dangerous.
Comment #67
dawehner... because its PHP, well, what about just leaving out the entire documentation, given that its all what we are doing there.
Comment #68
pwolanin CreditAttribution: pwolanin commentedOk, how about just a little note so people don't try to optimize it out? Also fixes doxygen.
Comment #69
Fabianx CreditAttribution: Fabianx for Acquia commentedIs there a reason to leave the variable set?
--
Except this question, the patch looks RTBC for me.
Can we get some profiling numbers, please?
Comment #70
pwolanin CreditAttribution: pwolanin commented@Fabianx - yes, there is an isset() check to avoid loading it again.
Comment #71
Fabianx CreditAttribution: Fabianx for Acquia commentedI see that isset() and assume it is this one, but won't always $this->routes[$name] be set then and we can never reach that condition?
Comment #72
klausi@Fabianx: the check is happening in preloadRoutes() with the array_diff(), so we need to keep it set. Maybe the comment should point that out.
If this is just a list of strings it should use "string[]" instead of "array" as type.
Same here, should be something like "string[]" or whatever the data type of the array values is.
Comment #73
Fabianx CreditAttribution: Fabianx for Acquia commentedThanks, klausi. That makes more sense now :).
Comment #74
pwolanin CreditAttribution: pwolanin commentedok, I'll fix up the comments in a bit
Comment #75
pwolanin CreditAttribution: pwolanin commentedActually, I think the code changed from the earlier version such that we don't need to keep it set.
Comment #77
pwolanin CreditAttribution: pwolanin commentedLooks like it might be a testbot issue: "File /tmp/some-temp-file.jpg could not be copied to temporary://some-temp-file.jpg."
Comment #79
dawehnerOh nice, I wasn't aware that this is possible.
Comment #80
Fabianx CreditAttribution: Fabianx for Acquia commentedRTBC + 1
Comment #81
pwolanin CreditAttribution: pwolanin commentedComment #82
alexpottThis issue is a normal task so we need to outline how it fits within the allowable Drupal 8 beta criteria. Can someone add Drupal 8 beta phase evaluation template to the issue summary.
Comment #83
alexpottAlso do we have any numbers on the perf benefits?
Comment #84
pwolanin CreditAttribution: pwolanin commentedComment #85
pwolanin CreditAttribution: pwolanin commentedComment #86
Fabianx CreditAttribution: Fabianx for Acquia commentedStill needs profiling, what would be a good test case?
Comment #87
Crell CreditAttribution: Crell at Palantir.net commentedFind/make a non-admin page that has no links on it, or extremely few. That would be the upper-bound; we'd still pre-fetch all non-admin routes, but now only deserialize one or two. Benchmark the before/after with this patch.
If it looks to make a difference, then this is a performance improvement done in an API-safe way, IMO. (That is my stance as routing maintainer.)
Comment #88
dawehnerHere are the steps I've done:
Comment #89
catchThis bit makes we wonder a bit.
We added this for pre-loading. So why not call it preloadNonAdminRoutes (or just update the docs) and skip the additional interface and code path?
There's no return from there, so it shouldn't make any practical difference.
Comment #90
Crell CreditAttribution: Crell at Palantir.net commentedcatch: As currently written in this patch, if the RouteProvider supports preloading we preload (but don't deserialize) non-admin routes. If it does not, then we still pre-load (but spend the time deserializing) non-admin routes. That's still faster than doing nothing but not as fast as delaying the deserialization, too. That gives us options.
The main reason for the separate interface is noted in #54: We don't want to load too much awareness of preloading into all route providers. That's the wrong place for that knowledge to reside. Making it a separate, opt-in interface follows the Interface Segregation Principle (I in SOLID), and keeps RouteProviders conceptually independent of preloading. (Strictly speaking, the fact the preloaded routes are kept in serialized form is an internal implementation of the default RouteProvider, as it should be; alternate implementations may have better options they can leverage and we should let them.)
Comment #91
Fabianx CreditAttribution: Fabianx for Acquia commentedRTBC + 1, I agree with #90 the Interface gives us flexibility
Comment #92
catchNo this really isn't right:
(emphasis added).
The interface isn't PreLoadAndUnserialize. If we don't pre-load we shouldn't pre-load at all, if we do pre-load we should pre-load. Whether we unserialize or not is not the distinction here - that's 100% an internal optimization and depends on the storage.
Comment #93
Crell CreditAttribution: Crell at Palantir.net commentedThe alternative then wouldn't be to merge the interfaces, but to simply remove the else clause entirely. Ie, if a RouteProvider doesn't have the preload interface we don't do anything *at all*. I would accept that if catch would.
Comment #94
pwolanin CreditAttribution: pwolanin commentedOk, simplified.
Comment #95
Fabianx CreditAttribution: Fabianx for Acquia commentedRTBC - if tests pass
Comment #96
alexpottAssigning to catch since he's been involved with this one.
Comment #97
catchWould save unnecessarily loading the state entry and simplify things a bit more.
Comment #98
Wim LeersComment #99
dawehnerHa, @crell mentioned the same point.
Comment #100
webchickBack to catch
Comment #101
Fabianx CreditAttribution: Fabianx for Acquia commentedAs this is called on every request and hence is in the critical path I take the liberty to nit on this:
Should be:
so that we don't do anything if the interface is wrong.
Comment #102
catchYes that also saves having to worry about precedence here if we move the check above.
Comment #103
aspilicious CreditAttribution: aspilicious commentedI'm going to reroll this.
Comment #104
aspilicious CreditAttribution: aspilicious commentedComment #105
Wim Leers#104: could you please provide an interdiff? Thanks!
Comment #106
aspilicious CreditAttribution: aspilicious commentedNo I forgot, so I did a crappy file interdiff :(
Comment #107
Crell CreditAttribution: Crell at Palantir.net commentedIf we really really want the best performance, we can write a compiler pass that detects the interface of the route provider service and only registers the pre-loader listener if it has the right interface. That may be excessive, though. :-)
Comment #108
aspilicious CreditAttribution: aspilicious commentedI'm assigning back to catch, so he can decide
Comment #109
Fabianx CreditAttribution: Fabianx for Acquia commentedRTBC + 1
Comment #110
catchThe instanceof check is extremely cheap if we're actually implementing it, and the nitpicking here is as much style (i.e. not having assignments in the second part of an if statement) as optimization. So yes compiler pass seems excessive. There's something to be said for not loading the interface if no runtime code uses it, but since we think it'll be implemented 99.9% of the time that's not worth worrying about.
Committed/pushed to 8.0.x, thanks!
Comment #112
aspilicious CreditAttribution: aspilicious commented