Follow-up from #2224777: Unlimited allowed length of path in menu router allows DoS attack on Drupal 8.
Problem/Motivation
Route of all fitness values are processed for regex match and passed through for final matching. In Drupal 7 the code ordered by fitness and had a limit 1, so we only got the optimal route.
In Drupal 8, we are trying to use a 2 element index, but it can never be used by MySQL because of the DESC ordering:
mysql> EXPLAIN SELECT name, route, fit FROM router WHERE pattern_outline IN('admin/structure', 'admin') ORDER BY fit DESC, name ASC;
+----+-------------+--------+-------+---------------------+---------------------+---------+------+------+-----------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------+-------+---------------------+---------------------+---------+------+------+-----------------------------+
| 1 | SIMPLE | router | range | pattern_outline_fit | pattern_outline_fit | 767 | NULL | 2 | Using where; Using filesort |
+----+-------------+--------+-------+---------------------+---------------------+---------+------+------+-----------------------------+
1 row in set (0.00 sec)
Even changing to fit ASC and removing name we get a filesort - so we would be safer sorting in PHP.
Also, we don't have a web test to check that a request with many path parts doesn't cause a 503.
Proposed resolution
Fix Route Provider to do the query based on the number of parts, which is a condition we can add to the where clause and use an index
add a test.
Remaining tasks
review patch
User interface changes
N/A
API changes
none. This change is internal to the router and should not be disruptive to any other existing code.
Beta phase evaluation
Issue category | Task because .it's a follow-up to a prior bug |
---|---|
Issue priority | Major because it's in the critical path for every request |
Prioritized changes | The main goal of this issue is performance |
Disruption | Not disruptive for core/contributed and custom modules/themes |
Comment | File | Size | Author |
---|---|---|---|
#93 | mergediff.txt | 1.34 KB | neclimdul |
#91 | further_optimize-2228217-91.patch | 9.64 KB | josephdpurcell |
#55 | microbench.php_.txt | 1.98 KB | neclimdul |
#51 | increment.txt | 1.01 KB | pwolanin |
#51 | 2228217-51.patch | 7.67 KB | pwolanin |
Comments
Comment #1
dawehnerremove tag.
Comment #2
pwolanin CreditAttribution: pwolanin commentedComment #3
pwolanin CreditAttribution: pwolanin commentedFirst patch, shifts route sorting to PHP so we never have a filesort, and reject any routes with less than maximal fitness.
Since having fit in the index provides on benefit, let's just remove it.
Comment #5
pwolanin CreditAttribution: pwolanin commentedFixes the test case text and adds another.
Discussing with tim.plunkett, it seems that based on the interface and the regex match, we should not actually limit to max fitness, though that seems a bit weird to me.
So, the optimization (of sorts) that remains is doing the sort in PHP so that we avoid a filesort in mysql.
Remaining work is to add a web test for a very long path. I'll get to that shortly.
Comment #6
pwolanin CreditAttribution: pwolanin commentedThis moves code around in the RouterTest test- no actual changes, so you can see the additions in the next patch.
Turns a hard-to-read function call with closure to a simple foreach.
Alters some logic in the dumper and the route provider so we can skip the regex while getting basically the same 1st pass filtering of routes.
Comment #8
pwolanin CreditAttribution: pwolanin commentedoops, diffed against 8.x instead of origin/8.x - ignore #6 since it won't apply
Comment #9
pwolanin CreditAttribution: pwolanin commentedAdds test for long paths.
Comment #10
klausiShouldn't we also optimize getCandidateOutlines() now? Because it returns ancestors with different length, which are excluded by :count_parts anyway and can never match? So in getCandidateOutlines() we should check the length of the binary digits and if it does not match we can exclude the mask. We are already doing that for masks that are longer than the path, so I think we can now also do that for masks that are shorter than the current path?
Why? What is bad about an SQL filesort? Please expand the comment.
Bad naming. The method name should tell us what it does, not where it is used. Suggestions: sortRoutesByFitness() or compareRoutesByFitnees().
Same for the comment, suggestion: "Comparison callback for sorting routes by fitness."
Why do we have to cast to "int" here? We can always be sure that we have numbers as fitness, right? Is this for robustness? Please add a comment.
Comment #13
pwolanin CreditAttribution: pwolanin commentedpostponed on https://drupal.org/node/1366020 which should fix the search module test fails by moving keywords out of the path to a query string.
Comment #14
pwolanin CreditAttribution: pwolanin commentedThe cast to int for $a['fit'] is because all values that come from the database are actually PHP strings.
We cannot optimize the candidate outlines now, since we need to match routes with trailing optional path parts - this is why we have the >= check in the query.
Comment #15
dawehnerI somehow doubt that we actually still need the order by name. This was introduced when we had no proper router filter mechanism for accept headers etc. In the longrun it feels OK to skip that ORDER.
Comment #16
pwolanin CreditAttribution: pwolanin commented@Dawehner - the result must be deterministic, so I think it's important to sort by name also.
Comment #17
klausi@pwolanin: but PHP converts strings containing numbers for us automatically when using ">" operators, so the cast should not be necessary?
Comment #18
pwolanin CreditAttribution: pwolanin commented@klausi. Hmm, ok. Shows how little I trust PHP is be sane :)
http://www.php.net/manual/en/language.operators.comparison.php
However, it seems like there is no guarantee what it will do with the route names.
Comment #19
pwolanin CreditAttribution: pwolanin commentedComment #20
jhodgdon#1366020: Overhaul SearchQuery; make search redirects use GET query params for keywords is in (!!), so un-postponing, see comment #13 here.
Comment #21
dawehnerThis is an unneeded change for this issue as well as micro optimization. There will be people which actually want to use anonymous functions because they think it is more readable.
Can't we at least keep some doTestDefaultController methods? Keeping them separated by method makes it much better to read.
Comment #24
ecrown CreditAttribution: ecrown commentedi am working on this issue
Comment #25
ecrown CreditAttribution: ecrown commentedThis is a re-roll of the patch at #19
Comment #26
ecrown CreditAttribution: ecrown commentedComment #32
mgiffordJust unassigning issues that haven't been developed for a bit in the D8 queue.
Comment #33
pwolanin CreditAttribution: pwolanin as a volunteer and at Acquia commentedre-roll, but need an update function for index change and to address dawehner's comments.
Comment #35
Fabianx CreditAttribution: Fabianx as a volunteer and at Tag1 Consulting commentedComment #36
Fabianx CreditAttribution: Fabianx as a volunteer and at Tag1 Consulting commentedComment #37
Fabianx CreditAttribution: Fabianx as a volunteer and at Tag1 Consulting commentedLooks great to me, not able to RTBC, but I like the approach to remove the not-useful SQL sort here and add another condition.
I assume index already takes into account number_parts?
Comment #38
pwolanin CreditAttribution: pwolanin as a volunteer and at Acquia commented@Fabianx - the patch adds the number of parts to the index, so that's why an update function is needed
Comment #39
pwolanin CreditAttribution: pwolanin as a volunteer commentedfixed test for changed in class constructor, add update hook, roll out test changes not essential for the issue.
Comment #40
pwolanin CreditAttribution: pwolanin as a volunteer commentedComment #41
pwolanin CreditAttribution: pwolanin as a volunteer commentedA couple code comment fixes.
Comment #42
Fabianx CreditAttribution: Fabianx as a volunteer and at Tag1 Consulting commentedCould we fill out the disruption section of the beta evaluation.
From the parts I do understand this is RTBC and test coverage looks great, but I can neither confirm nor deny if this needs a change record and how contrib is or is not affected.
Thanks!
Comment #43
neclimdulNeat!
This looks like maybe a bug fix? Not really clear what this change is accomplishing.
Didn't dawehner ask for this change to be removed in #21.1? I find the closure easier to read personally so unless there's a appreciable performance gain can we do that?
Comment #44
pwolanin CreditAttribution: pwolanin as a volunteer commentedThe change from pattern outline to number of parts is deliberate so we include the count of optional parts. Otherwise, if you have a path with 2 optional trailing parts, and you supply the 1st but not the second the route wouldn't match.
I personally find the closure + filter not readable, so I think the foreach is much more readable and self-evident in what's happening in that we are assembling an array of the non-empty parts. So, I do not think it should be changes as in the patch. I don't think clever code is necessarily clear or better code.
Comment #45
pwolanin CreditAttribution: pwolanin as a volunteer commentedComment #46
Fabianx CreditAttribution: Fabianx as a volunteer and at Tag1 Consulting commentedThanks for filling out disruption section, I am also +1 to remove the closure. Is is not only more readable, but also slightly more performant and this is in the critical path.
Comment #47
neclimdulWell, not warm cache critical path. This only gets hit when caches are cold and the path hasn't been routed yet.
If we're going to optimize it, why not just handle it with string operations?
Comments and code are both clearer IMHO.
Quick benchmark script with a largish path. You can adjust it with a smaller or larger path.
Comment #48
neclimdulPHP 7 is a bit more dramatic getting close to twice as fast. Previous benchmark was Ubuntu's build of PHP 5.6.4.
Comment #49
pwolanin CreditAttribution: pwolanin as a volunteer commented@neclimdul. The string operation fails for
foo//bar
, so let's leave it as the loop.To do it correctly as string operations you need to use
preg_replace()
I think.Comment #50
neclimdultypo in the str_replace but foo///bar would still have failed.
Did some more benches, peter suggested preg_split, we bounced some tests around. Look preg_split is a pretty solid winner when sending abusive paths. The closure is miserable in those cases too so I'd say its actually good we replace that with something.
Comment #51
pwolanin CreditAttribution: pwolanin as a volunteer commentedHere's with preg_split
Comment #52
Fabianx CreditAttribution: Fabianx as a volunteer and at Tag1 Consulting commentedBack to RTBC if tests pass.
Comment #53
neclimdul+1 RTBC
Comment #54
Crell CreditAttribution: Crell at Palantir.net commentedConcur with RTBC, as there's even some new tests to ensure nothing breaks.
That said, re #47/ #48, if you're seeing that dramatic a difference between a loop and a closure it probably means you have xdebug on. XDebug does very unpleasant things to the performance of stack calls. At least on 5.4, the last time I benchmarked it, the difference without XDebug was measurable but not enough to care about. Order of magnitude difference means XDebug is on, so benchmarks are not reliable at all.
Comment #55
neclimdulBetter data:
Comment #57
neclimdulLooks random:
Comment #59
alexpottI thought about asking for an upgrade path test but considering it is so simple I think it's not worth it.
Committed 7d5e65d and pushed to 8.0.x. Thanks!
Thanks for adding the beta evaluation to the issue summary.
Comment #61
alexpottComment #62
alexpottShould have asked for that update test.
reverting.
Comment #64
lauriiiThis is how I fixed this earlier when I was working on other upgrade path and this was blocking me to proceed.
Comment #65
alexpottNeed to fix this too
Comment #66
pwolanin CreditAttribution: pwolanin as a volunteer commentedDoe we support UTF-8 patterns or do routes, or could we make that an ASCII column?
Comment #67
Crell CreditAttribution: Crell at Palantir.net commentedI'm not sure. Sounds like something we need a test for. I don't think we can require ASCII-only paths, though, as that would be rather rude to UTF-8 path using international users. (UTF-8 domain names and paths are a legit thing according to IETF and ICANN, even if they're rarely used outside of China.)
Comment #68
pwolanin CreditAttribution: pwolanin as a volunteer commentedDomain names don't really matter here, and in paths the request should be URL encoded if UTF-8, but I suspect PHP gets the decoded strong which would have the UTF-8 chars.
Comment #69
pwolanin CreditAttribution: pwolanin as a volunteer commentedHere's the schema fix
Comment #70
Fabianx CreditAttribution: Fabianx as a volunteer and at Tag1 Consulting commentedBack to RTBC
Comment #71
Crell CreditAttribution: Crell at Palantir.net commented191 is a magic number. In context I have NFI what it means or why it is. It needs either a constant or a comment. (Or Constant Comment, if you're a tea drinker.)
Comment #72
neclimdulbe prepared to see it a lot. mysql max index length is 767 bytes. 767/4 = 191. Which means the max varchar you can index is 191 characters long or you need to do this hack.
I think the better question is why isn't this working and we need to hard code it?
http://cgit.drupalcode.org/drupal/tree/core/lib/Drupal/Core/Database/Dri...
Comment #73
Crell CreditAttribution: Crell at Palantir.net commentedA comment in the code to that effect with a @see would be sufficient. Character encoding bites (pun intended), but we need to at least document with a reference where there's a magic number. (A constant is the other option if it's logical and reasonable to do, as then the constant can have the documentation.)
Comment #74
neclimdulMy point was I'm not sure the changes in #64and #69 are actually not correct. #51 seems like it was actually fine because we should not be hard coding assumptions about mysql into our database schema. e.g. I don't think postgres has this limitation and could index the full field.
There seems to be a bug in the Database API that would be a blocker for this issue. The linked code is from #1314214: MySQL driver does not support full UTF-8 (emojis, asian symbols, mathematical symbols) and was clearly written to handle this case, it just seems from alex's comment that it isn't. Likely since testbot didn't catch this, db_add_index() doesn't trigger the code and so is bugged.
edit: typo
Comment #75
neclimdulThat's exactly what it is. #2538844: \Drupal\Core\Database\Driver\mysql\Schema::addIndex doesn't normalize index length
Comment #76
neclimdulor rather, the issue already existed. #2537928: MySQL index normalization only works on table create
Comment #77
neclimdulUnblocked. Doing a quick reroll of the previously committed patch.
Comment #78
neclimdulSomething like this maybe. I'm not sure how to assert that the index was applied without Database API supporting schema introspection which I'm under the impression it currently doesn't (hence the delay on the patch that blocked this).
Comment #79
neclimdulSorry for the weirdness in the interdiff. its a mergediff (update function name conflict) + spec addition to index add + the test kinda all in one.
Comment #80
Crell CreditAttribution: Crell at Palantir.net commentedWasn't Noin one of the Dwarfs from The Hobbit?
Comment #81
catchIt does for indexes, just not for anything else.
https://api.drupal.org/api/drupal/core%21lib%21Drupal%21Core%21Database%...
Also a few minor things:
Why a public method and not a closure for the sort?
heh
Per-above this could check for the index.
Both functions are deprecated.
Comment #82
neclimdul1) seems likely. This has the possibility of being called a lot so this seems a reasonable micro-optimization.
2) :-D
3) Done
4) Sure, we don't provide a connection to updates so the alternative isn't a lot better though.
Note, I don't know if this will work. I'm getting failures locally on any update tests using
drupal-8.bare.standard.php.gz
.Comment #83
neclimdulman I've been having trouble scrolling back up to change the status recently. sorry guys.
Comment #85
neclimdulI need a break from this issue. Me and simpletest are not on good terms right now.
Comment #86
neclimdulComment #87
Crell CreditAttribution: Crell at Palantir.net commentedComment #88
alexpottThis update exists - sorry.
Comment #89
dawehner.
Comment #90
josephdpurcell CreditAttribution: josephdpurcell commentedWorking on an update now.
Comment #91
josephdpurcell CreditAttribution: josephdpurcell commentedRerolling.
Conflict was:
Resolution is:
Comment #92
Crell CreditAttribution: Crell at Palantir.net commentedAn interdiff would have been sufficient. :-) Thanks.
Comment #93
neclimdulAwesome thanks. Interdiff's are tricky when rerolling conflicts because the previous patch doesn't apply.
Tip: Since it looks like you where doing a merge which is also how I handle these as well, you can resolve the conflict and then do the diff. Git will give a mergediff which is pretty clear at showing both the conflict and resolution in one file. Example output for #91 attached.
Derp, I didn't rename the second function when making the mergediff but you get the point. #91 is correct.
Comment #94
alexpottSecond time lucky! Committed 7730866 and pushed to 8.0.x. Thanks!
Thanks for adding the beta evaluation to the issue summary.