Problem/Motivation
Route matching takes around 4ms on the minimal profile on my machine.
It also includes queries that are not suited to the MySQL query cache - a query to resolve path alias to path, then a further query to resolve path to route + parameters. These queries are obviously per-path and we've seen at least one real client site almost go down due to query cache locking in 6.x (but the same would apply for 7.x) - see #1234830-40: Revert cache_menu patch removal, add a $conf setting instead (was: cache_menu: huge table size).
Proposed resolution
Add caching to Drupal\Core\Routing\RouteProvider::getCollectionForRequest(). This will both allow the database query to be offloaded to a cache backend, and also reduce the number of queries from two to one, as well as cut out a fair bit of PHP logic.
Remaining tasks
Bear #1234830: Revert cache_menu patch removal, add a $conf setting instead (was: cache_menu: huge table size) in mind. This was originally rolled back due to explosion in cache_menu size in 7.x. We may have to restrict this cache to LRU caches (which would require a new interface for cache backends to indicate they support LRU).
API changes
Comment | File | Size | Author |
---|---|---|---|
#83 | 2480811-83-interdiff.txt | 1.36 KB | Berdir |
#83 | 2480811-83.patch | 27.06 KB | Berdir |
#75 | Screen Shot 2015-06-18 at 2.28.25 PM.png | 198.45 KB | catch |
#73 | interdiff.txt | 2.17 KB | catch |
#73 | 2480811-73.patch | 27.02 KB | catch |
Comments
Comment #1
Wim LeersComment #2
Fabianx CreditAttribution: Fabianx commentedComment #3
Fabianx CreditAttribution: Fabianx commentedComment #4
catchComment #5
dawehnerHow much we can do also depends on #2364011: [meta] External caches mix up response formats on URLs where content negotiation is in use
Comment #6
catchHere's a rough patch. Only lightly tested manually for profiling.
No query reduction with database backend (with the LRU check hacked out) - we drop two queries but add two back (cache get + cache tags).
Comment #8
dawehnerThis might be problematic because we have
\Drupal\Core\Routing\ContentTypeHeaderMatcher
Also i'm curious whether its really a good idea to inject the cache at this place. For upcasted entities for example we would store the entire entity in there. ... This would also make life harder for accept header based routing, or rather remove support for it for now.Comment #9
catchRight I started with the most high-level place I could find, but it might be a bit too high level. It would be good to bypass both the path alias and route lookup here if we can though.
Comment #10
Crell CreditAttribution: Crell at Palantir.net commentedArchitecturally, I'd prefer to see the caching done in a wrapping object, rather than injected directly. It keeps the concerns separated better.
That said, swapping 2 queries for 2 queries doesn't seem like a win. For this to really have an impact, I'd suspect we need to bypass the path alias look up as well, or work that into a single query or something. At that point, though, we lose a lot of the modular flexibility of the system. I think the approach in #2471234: Create a ChainedRouteProvider which subclasses RouteProvider but could return early for most common routes is going to have more promise, but we'll of course need benchmarks to be sure.
Comment #11
catch@Crell please read the issue summary and the linked comment to explain why this doesn't boil down to 'swapping two queries for two queries'.
Comment #12
Fabianx CreditAttribution: Fabianx commented#11: Use chained fast backend for now, that will give the same performance as having memcache.
--
More precise: Use chained fast backend for now _just for benchmarking_, that will give the same performance as having memcache.
Comment #13
dawehnerThis issue is somehow coupled to what we do in #2364011: [meta] External caches mix up response formats on URLs where content negotiation is in use
Comment #14
catchOK here's a rough draft of an updated approach, don't love it, but for performance it should be better and no parameter caching.
There are two queries we want to avoid when possible - because both are terrible for the MySQL query cache - the inbound path alias lookup, and route matching.
Ideally we want to replace those with a single cache lookup - because what we really want to get to is from request path to route. However currently path alias matching is triggered via a kernel event and route lookup is in the router.
However we never use the current path until... the router lookup.
So the not-very-pleasant bit - I made CurrentPathStack fire a new event when there's no paths in the stack and one is requested, this allows the path lookup to run at that point and set it correctly.
Then in RouteProvider the path alias and route lookup both happen in an easily cacheable spot.
Front page renders and should be enough to profile with. I'm seeing shortcuts in the xhprof diff which shouldn't be different, probably indicates a bug. Tests will fail. Will look again next time I have time.
Comment #16
catchHelps when you add new files to patches.
Comment #18
catchShould fix a lot of test failures this time..
Comment #20
catchAnd some more.
Comment #22
catchNeed to set the current path on cache hits - prevents it being looked up again anyway later in the request.
Test failures are now as follows:
1. The installer - because the event wasn't previously fired there and now it is - needs a null service in the installer.
2. Some tests need updating.
This is functioning properly now for runtime code though, so profiling next.
Comment #23
catchHere's an xhprof diff.
With ChainedFastBackend used for profiling, this saves the two database queries as expected (the path alias and route lookup). It also saved 6ms in my diff but that'll be variable (and it's out of 500ms...).
Kicking off test run just to get a baseline with the latest patch.
Comment #25
catchMoved the caching out of CurrentPath into RouteProvider - should fix all/most of those install failures.
Comment #27
catchJust fixing RouteProviderTest.
Comment #29
dawehner: Class 'Drupal\Core\Path\InboundPathEvent' not found in d8/core/lib/Drupal/Core/Routing/RouteProvider.php on line 143
Comment #30
catchOh dear. This time with the missing file.
Comment #32
catchWhich was an older version.
Comment #34
catchI can't reproduce those fails locally either via run-tests.sh or the browser runner, trying a retest just to make sure it's the same set. (132 fails, 5 exceptions last run).
Comment #37
dawehner+1 to have a dedicated event for that!
Let's see why it fails
Comment #38
catchFixed PathTaxonomyTermTest and possibly one or two others.
Still can't reproduce the breadcrumb failures locally at all.
Comment #39
dawehnerI had a look at one of the test failures: DownloadTest, that one was reproducable on the local system.
We support "clean" URLs for accessing private files. As the routing system don't support arbitrary placeholders, we rewrite /system/files/foo/bar/baz to
/system/files?file=foo/bar/baz
using inbound path processing, see\Drupal\system\PathProcessor\PathProcessorFiles
This itself works fine, but in case path processing is not executed, we loose the added query parameter in the cache HIT.
Comment #40
dawehnermeh, you forget the new file again ...
Comment #41
dawehnerHere is a fix for the breadcrumb case + the missing file
Comment #44
catchThanks for the breadcrumb fix - were you able to reproduce locally yourself?
This should fix DownloadTest which also fails locally for me - saves and restores the query string as discussed with dawehner in irc. I also now use the query string as part of the cid, which is necessary for #2490920: Add back query parameter support for path aliases. anyway.
Also it should not have the new file missing this time.
Comment #46
dawehnerYes, used a subdir installation for that. Its so valuable that I have d8.dev and localhost/d8 pointing to the same installation.
@catch
Do you think we need some dedicated test coverage, or is the passive test coverage good enough?
Comment #47
catchThose test fails are the same as https://www.drupal.org/node/2498849#comment-9990855 so looks like a bot issue.
Sending for re-test.
@dawehner not sure on dedicated test coverage. The passive test coverage is clearly enough to catch bugs, but a unit test would catch the bugs we're slowly fixing here quicker (at least the query parameters one).
Comment #51
catchconfig.system.site saving doesn't invalidate cache tags - this is a broader bug than this issue (i.e. site name in the page cache), but added the fix for now.
Didn't figure out
MenuLinkContentDeriverTest
yet (adding route_match cache tags to MenuTreeStorage cache item did not help) but that should be the single remaining fail.Comment #53
Crell CreditAttribution: Crell at Palantir.net commentedIf we're going to make a new event for this, it should be at least namespaced, preferably its own constant (to be consistent with elsewhere).
Talking with dawehner a bit in IRC, I'm not convinced an event is the right connection point here. Recall that
1) Events are the most expensive extension point tool we have.
2) There are other route providers already implemented, and we want to make it easy for people to experiment. That means as little implicit logic as possible.
If we want to move alias lookup inside the provider, it should be an explicit action. Either a direct dependency, or a direct dependency with another interface (to indicate that the provider will handle aliasing, etc.), or something like that.
That also begs the question of whether all inbound path processing should move inside the provider, and what the implications are for other pre-routing request listeners; they will not have the unaliased route available.
Thoughts?
Is this just a Git quirk?
Comment #54
catchI kept the event here originally because PathSubscriber does two things:
1. Runs inbound path processors and updates CurrentPathStack with the result
2. Sets the cache key for alias pre-loading.
We don't want to do anything specific to aliases inside the route provider - because those are just an implementation of inbound path processing (and can theoretically be completely disabled).
However I'd argue routing is already tightly coupled to inbound path processing (albeit implicitly) via the current path stack - since it has to go current path -> route, not request path -> route.
So the answer here is to drop the new event, inject the PathProcesser into route provider, but keep PathSubscriber handling the alias caching. New patch updated for that.
This has an interesting side-effect in that previously RouteProvider depended on the current path stack to get the current path for routing. Now it goes directly from request path to route (which is what a normal Symfony router would also do). The current path stack dependency is now only to set the processed path on the stack. This this makes the current path a bit less useful overall - since after routing you can get an unprocessed path from RouteMatch potentially.
I think that's fine, they don't have the route available now, just the unaliased path which we discourage relying on outside routing anyway. So they either need to listen to KernelEvents::CONTROLLER if they need the current path (as PathSubscriber now does in the updated patch), or not rely on it being normalized by this point.
PathAliasTest passed locally but didn't run many others, see how this goes.
Comment #56
Crell CreditAttribution: Crell at Palantir.net commentedWhy CONTROLLER, rather than just a lower-priority REQUEST listener? We don't need the controller here; just a post-routing request.
Moving all of inbound path processing inside the routing system makes sense to me. Especially if at some point RouteProvider started getting its runtime input through the constructor (WAT?), fixing that seems like a good thing all on its own, caching aside.
Question: If we do move the logic inside RouteProvider, would it be possible to make the caching a wrapping class around the provider rather than integrated into it? That would be preferable as it is more loosely coupled, easier to test, and easier for other RouteProvider implementations to reuse.
Comment #57
catchFrom Symfony's own documentation:
http://symfony.com/doc/current/components/http_kernel/introduction.html
Seems like the right place to me. If there's an event that's suitable, then why rely on priority order? We shouldn't be doing any outbound path processing between getting the route and executing the controller. Really the AliasManager should figure out the cache key the first time an alias is looked up, I want to open a follow-up issue to look into this.
I'm not convinced about a wrapping class, the actual caching logic is < 6 lines of very simple code. If it makes the LruCacheBackend logic simpler then it could be worth doing for that though (i.e. if we wanted to conditionally switch between the services based on whether the cache bin is using an LRU-capable backend).
Comment #58
Crell CreditAttribution: Crell at Palantir.net commentedHm. OK, I've not seen the controller event used as a "post-routing event", but if Symfony's docs say that's kosher I'll run with it.
Keeping the cache in a wrapping object gives us the flexibility to try logic like that. (LRU, switching, etc.) At that point it just becomes a matter of twiddling the container and adding another interstitial object if we feel like it. There's a lot of flexibility there we can experiment with without API impact, assuming the wrapping cache isn't that complicated to implement. (It seems like it shouldn't be. Let's try.)
Comment #59
catchThis should get tests to green - or at least if no new fails are introduced elsewhere.
Comment #61
catchThat test doesn't fail locally, but also I haven't seen it anywhere else. Re-uploading patch.
Comment #63
amateescu CreditAttribution: amateescu as a volunteer commentedThere you go :) The fix is stolen from
Drupal\ckeditor\Tests\CKEditorTest::assertCKEditorLanguage()
.Edit: btw, I could reproduce the fails when running the test in CLI.
Comment #64
dawehnerWhen we talked about that on the WSCCI meeting crell favoured to put the caching into its own class wrapping the RouteProvider itself.
I'm personally not convinced that adding another layer makes things easier to understand
Comment #65
catchYes it makes the class hierarchy deeper, to encapsulate a few lines of code, so I'm also not keen.
Comment #66
catchThere's an API change here, and while this doesn't meet the definition of a performance critical in terms of raw milliseconds, there are two reasons I think it does more generally:
For high traffic sites, these two queries bring sites down due to MySQL query cache contention (see issue summary and especially the linked D7 issue).
For all sites with at least some traffic, offloading queries from the database to memcache or a similar backend helps both performance and scalability since it reduces the queries per second required from the database.
Comment #67
catchWhen I opened this, I wanted to restrict it to cache backends implementing LruCacheBackendInterface, however I'm second-guessing this now.
- the patch now caches both route and path alias matching (and all other inbound path processing) - so it's a useful cache even with the database backend. Only just but not useless.
- the cache items should be a bit smaller than the 7.x menu_get_item() cache.
- the path alias pre-load cache is also per-page, and it hasn't been a problem for the entire lifetime of Drupal 7
- hosts generally offer much more disk space than they did 5 years ago - this is something that has increased massively compared to CPU or memory.
- If you run into problems and are on a space-constrained host with no chance of memcache, you could set @cache.data to the memory backend. Only the path alias pre-load cache and this patch use @cache.data - the whole point of adding it was for high-cardinality/large cache items.
Comment #68
dawehnerChanged some minor bits + added more specific test coverage ...
Gladly I'm not a maintainer of the routing system to decide whether we want to go with the wrapper approach or not, see #2480811-58: Cache incoming path processing and route matching
Comment #69
Crell CreditAttribution: Crell as a volunteer commentedFrom #58:
So I tried. Turns out, it is more complicated. Because the cache key depends on the path, and specifically the post-path-processing path, in order to form the cid in a wrapping cache class we would need to inject the path processor there as well, and run through it again. We'd also need to inject the current path service to the caching wrapper, too. At that point we don't seem to be gaining much anymore in terms of code cleanliness. (I might argue that means we need to think about it harder and refactor a bit more deeply, but given that we're in late beta I'll pass on that for now.)
However, while I was at it I realized that the cache structure is using meaningful positional keys. That totally doesn't fly from a DX perspective. So let's turn that into a named array at the very least so the code is more self-documenting.
It also looks like we expect path processors to be using paths without a leading /, although that's not documented. We should finally get around to standardizing those on a leading /, which would let us remove various trim and concat calls. Off topic for this issue, though, so I've filed a new one: #2508037: Clarify PathProcessor path format
Comment #70
dawehner+1 to use proper keys, it makes things at least a bit easier to grok.
Let me RTBC that, catch worked out most of it anyway
Yeah and let's be fair, this is a true internal optimization, something you should actually not even care about, which is then fine if its not exposed as part of the interface / public classes.
Well right, you wanted to store paths with '/' in front, which I think is the way to go, but yeah we wanted to avoid changing too many APIs at once.
Comment #71
Wim LeersNit: s/THe/The/
This thoroughly confuses me:
AccessAwareRouter
is called during the REQUEST event.Also, since
AccessAwareRouter
isn't touched at all, the IS seems outdated?Do we still call this "system path"?
"may mean a change to the front page route" is ambiguous. What about "may cause a different route to become the front page route"?
Nit: 2 newlines, should be 1.
Comment #72
catchSo this subscriber doesn't do anything about incoming paths.
What it does is set the cache key for the AliasManager - which once it has a cache key set, will preload internal paths for bulk alias lookup, and/or record looked up paths if there's no cache item yet.
That cache key is based on the internal path for the request, so it has to run after routing/inbound path processing. I moved it to this event, because Symfony recommends that for 'post-routing/pre-controller' stuff, which is exactly what this is, and because using a different event felt more robust than relying on priorities.
On the other hand though, I think we should open a follow-up to rip out this subscriber, and use current route match in AliasManager - and base the cache key on route + params, which is all we need. I think this was forgotten in the current_path()->route match conversions. Here's that follow-up #2508217: AliasManager should use the current route match for outbound alias pre-loading cache keys.
I'll do some profiling - expecting this to be neutral with the database cache with standard install though, but it will likely kick in once there are a higher number of path aliases or more inbound path processing. With something like memcache we save two queries though.
Comment #73
catch#1. Fixed:
#3. Grep says we do:
#4: didn't quite use that wording, but good suggestion and updated the docs.
#5 fixed.
Profiling coming up.
Comment #74
Wim Leers#73.4: much more understandable :) Thanks!
Comment #75
catchHere's an xhprof diff:
With the database cache, as mentioned above we don't save in number of raw queries:
-1 route match, -1 path alias lookup
+1 cache get +1 cache tag lookup.
However:
- The cache tag is the same query on every single page, so it can be usefully cached in the query cache, unlike path or route lookup.
- For the cache get we exchange a dynamic query from route lookup for a static one - so we skip query compiling.
- About 12 less classes loaded
- once you add memcache you get a lot more scalability benefit.
Comment #76
Wim LeersConvinced.
Comment #77
dawehnerWow that is quite an improvement. Was this the frontpage with cached data?
Comment #78
catchI didn't take lots of profiling runs here, so I would not at all stick by the wall time or CPU numbers, they should just be ignored (although it'd be lovely if this wasn't the case).
However I did review the function calls to make sure only what I expected to be saved was there, so they should be good.
This was node/1 with warm caches iirc.
Comment #79
Crell CreditAttribution: Crell at Palantir.net commentedConcur on RTBC.
Comment #80
alexpottNice patch only a couple of minor nits.
Removed the @param docs - but we still have a parameter.
Not used
Comment #81
dawehnerWrote the CR
Comment #82
dawehnerComment #83
BerdirAddressed the nitpicks.
The CR from @dawehner is here: https://www.drupal.org/node/2509216. What seems to be missing is the second part.. how that affects event listeners exactly. I have some module that might be affected by this.. e.g. globalredirect and redirect, wondering how they will be affected (Especially globalredirect.. that's doing things like comparing the path with the alias and redirects to the alias). So some more information on that would be very helpful.
Comment #84
BerdirCan also confirm what @catch said in #78. The wall time difference in that profiling isn't real. I'm also seeing fewer saved function calls, not sure why, likely different setup (standard with 2 languages and negotiation but I'd expect actually more savings there, not less)
In my profile run, we're saving ~1.3ms by not calling PathProcessorManager::processInbound() anymore, that includes language negotiation and 2 config loads. getRouteCollectionForRequest() is exactly as fast as before, even though the calls are completely different and as @catch said, it's a cache query that can be moved of to memcache/redis way more easily.
Comment #85
Wim LeersSo given #83 and #84, I think this can move back to RTBC.
Comment #86
dawehner@berdir
Is https://www.drupal.org/node/2509216/revisions/view/8584240/8586258 what you wanted?
In my profile run, we're saving ~1.3ms by not calling PathProcessorManager::processInbound() anymore
I agree those numbers have been quite high, but you afaik also have a much lower baseline.
Comment #87
catch#86 looks like the right change to me.
Just to triple-confirm, the wall time in #75 should be a complete fluke, that's not what this change is about.
Comment #89
alexpottCommitted 7145644 and pushed to 8.0.x. Thanks!
From the CR
I've been concerned for a while about the clustering of events with priorities around this number. I think currently there events where it would impossible to add an event in-between. I think it will be wroth opening an issue to assess all of core's events and their priorities.
Comment #90
Fabianx CreditAttribution: Fabianx as a volunteer commented#89: I agree, I am also concerned on the priorities being scattered throughout so many files.
With D7 module weight was at least pretty easy to understand, with events we probably need a handbook page that has all the priorities and order listed ...
---
Also: Yay!
Comment #91
Wim Leers+1 to that concern. At the very least, we should create more room between the weights. SmartCache in fact runs into this problem: it needs to sit at a place where there is no gap whatsoever!
Comment #92
Wim LeersPublished the CR.
Comment #93
catchWe could add a note about FilterControllerEvent to the change record in this case - this is where I moved PathSubscriber to.
In general doing a big event / subscriber audit would be worthwhile.
Comment #94
Crell CreditAttribution: Crell at Palantir.net commented+1 on doing an audit of the kernel event listeners. I have opened an issue here: #2510738: Audit (and reorganize?) Kernel event listener priorities
Comment #95
Wim Leers@Crell++ for opening that follow-up!