Problem/Motivation
Whilst working on #1987898: Convert user_view_page() to a new style controller we've discovered that the 8.0.x routing system is case sensitive.
In drupal 7 /node
and /NODE
would respond with the same page. In Drupal 8 /NODE
will get a 404.
In addition path and alias should be consistent.
for example create content with path node/1 and set path alias as /path/alias
@webchick, @alexpott and @pwolanin did a quick survey of several big websites that also use dynamically-generated pages, both PHP and not, for example:
* WordPress (25% of the web): http://ma.tt/2015/05/wearable-gadgets/ vs. http://ma.tt/2015/05/wearable-Gadgets/ - case insensitive.
* Microsoft (ASP.NET): http://www.microsoft.com/en-us/default.aspx vs. http://www.microsoft.com/en-us/DeFault.aspx - case insensitive.
* Wikipedia (#7 website in the world): http://en.wikipedia.org/wiki/Main_Page vs. http://en.wikipedia.org/wiki/main_page - case insensitive.
Mind you, lots of other ones are not: Facebook, Google, etc. but they also tend to write their own stuff vs. use a generated dynamic system.
Additionally, no matter what we do, GOOGLE.COM and google.com both take you to the same place in your browser. So the expectation from non-technical users is likely to be that URLs are case insensitive; it's very difficult to explain that "if it comes after the slash then case matters, else..."
And finally, the Drupal <= 7 case-insensitive behavior is not hurting anyone (unlike the ability to put infinite /something/made/up/here paths), and breaking this behavior has a lot of ripple effects. For example, sites having to put manual redirects in, sites having to contact 3rd party websites/customers/etc. and ask them to change links, etc. Lots of disruption for not a lot of gain.
Proposed resolution
we agreed to fix this by keeping most of the existing behavior from Drupal 7.
/Node/1 - not 404. Same as /node/1
Drupal 8 should handling incoming paths consistently, and independent of the database back-end.
The patch stores the pattern outline as lower-case in the database and changes the query to lower-case the incoming path.
However, the route path is left untouched (and will render as entered by the developer or in the UI), so we also modify the compiled route regex to be case-insensitive.
Split aliases into a sub issue: #2584243: Make Drupal handle path aliases in a consistent and case-insensitive fashion on all database drivers
Remaining tasks
final review
User interface changes
none
API changes
none
Data model changes
none
Beta phase evaluation
Issue category | Bug because links in D6/D7 sites can break during an upgrade to D8. |
---|---|
Issue priority | Major because previously working links can break during a D8 upgrade. Not critical because:
See #23 for more details. |
Disruption | Disruptive for sites running on beta versions that added case-sensitive aliases in Drupal 6-7 or in Drupal 8. |
Comment | File | Size | Author |
---|---|---|---|
#367 | interdiff.txt | 649 bytes | himanshu-dixit |
#367 | 2075889-367.patch | 27.93 KB | himanshu-dixit |
#364 | increment-2075889-364.txt | 1.49 KB | pwolanin |
#364 | 2075889-364.patch | 27.93 KB | pwolanin |
#358 | interdiff.txt | 749 bytes | himanshu-dixit |
Comments
Comment #1
pwolanin CreditAttribution: pwolanin commentedMost CMSs are case sensitive so I'm not sure if I would consider this a bug or an unexpected feature of using symfony.
If we want to maintain case insensitivity, seems like we'd need to intervene very early? Maybe overriding the request class or manipulating the globals.
Comment #2
xjmSee previous issue: #368093: Block visibility settings are case sensitive ?
Comment #3
alexpottAlso see https://github.com/symfony/symfony/pull/297 for why Symfony core is case sensitive.
Comment #4
oadaeh CreditAttribution: oadaeh commentedAccording to the W3C (http://www.w3.org/TR/1999/REC-html401-19991224/types.html#h-6.4):
"URIs in general are case-sensitive. There may be URIs, or parts of URIs, where case doesn't matter (e.g., machine names), but identifying these may not be easy. Users should always consider that URIs are case-sensitive (to be on the safe side)."
Comment #5
catchSee also #333054: Make all varchar columns case sensitive (regression) (was make page cache case sensitive) and before that #99970: Page cache matching is case insensitive.
I'm not really that bothered whether URLs are case sensitive or not, but we have various assumptions in core / pending core issues which want URLs to be case sensitive or not, and that's been changed without any of those cases being considered at all.
Comment #6
Crell CreditAttribution: Crell commentedI'll be honest, I never knew we were doing anything case-insensitive. Paths are case-sensitive. Strings are by default case-sensitive in PHP. So routing is case-sensitive. It never occurred to me to consider that anything should be case-insensitive.
Yes, anything relating to paths should be case-sensitive, across the board, everywhere. If that's not the case, it's a bug.
Comment #8
dawehnerThe new routing system also changed the behavior or being able to append any kind of argument which are just ignored,
so things like
node/123/foobar
just works.If people want to have insensitivity, what about make it possible and provide a potential contrib module?
Comment #9
Crell CreditAttribution: Crell commentedSome features just shouldn't be. We shouldn't concern ourselves with it. The entire system is swappable to a ridiculous degree by virtue of using the DIC for everything, so you could do whatever wacky stuff you want from contrib.
I'm thinking this is a won't fix.
Comment #10
twistor CreditAttribution: twistor commentedYou can also do this with mod_spelling in Apache. It comes by default, but is not usually enabled.
RE #8, That would be a bug-fix. Responding to arbitrary URLs makes it easy for someone to SEO DoS your site.
+1 to letting this "feature" die.
Comment #11
catchThis is a critical bug not because of the behaviour change but because there's no upgrade path for it. Upgrade paths are not a won't fix and I wish people would read the issue before posting.
I don't know what exactly might need to be done for the upgrade path, but again that's because it's not been discussed yet. For example do we need to go in and lowercase every system path in the URL alias table? What about block visibility rules?
Comment #12
pwolanin CreditAttribution: pwolanin commented@catch - it sounds like the path rules already lowercase - we should check.
Are we converting path aliases to be route based or always path? I guess the way it's built now it has to be path, though that leave a lot of fragility if route path patterns are changed?
Comment #13
catch@pwolanin, no-one's really looking at that or discussed it etc.. There's #1553358: Figure out a new implementation for url aliases which is looking at the CMI implications which also hasn't been updated in a long time.
Re-titling so it's more obvious what the bug is.
Comment #14
Crell CreditAttribution: Crell commentedI don't know what an upgrade path here would even look like... What's the task to be done here? What data do we care about? (We don't care about paths in content blobs because, well, I don't think we ever really have.)
Comment #15
catch@Crell - yes that's the point. The change was made by accident, and no-one has yet done the research to see exactly which subsystems were affected, hence why it's a critical upgrade path issue because it's not even known yet how much may be broken.
However, two examples were given in comment #11 which you appear to be ignoring. Perhaps you could have checked those instead of posting another contentless follow-up?
I spent a couple of minutes testing path aliases to confirm the bug.
An alias pointing to a system path of /User/Register works just fine on Drupal 7. Upgrade to Drupal 8 and it's a 404.
While testing I also discovered an unrelated bug, but that'll affect anyone trying to re-save an alias with an incorrect path (assuming they don't fix it). Opened #2086559: Adding an invalid path alias shows an exception in the UI for that.
On system paths in blobs, we'll need to decide about that (as well as incoming links which may break), but it's not the case that we've never cared about those, given the critical regression in Drupal 7 that's been open for months dealing with a similar issue #1934568: Allow sites using the 'image_allow_insecure_derivatives' variable to have partial protection from the Drupal 7.20 security issue. A commented out .htaccess rule that checks for a lower case version of incoming paths and redirects to that if necessary might be enough for external links and blobby content.
As well as path aliases,
{menu_links}
also stores system paths as a string and we have interfaces that let you enter that manually. I haven't tested what happens on the upgrade path for those yet, given that's tied up with access checks I'd expect it to be worse than a 404.This is further complicated by the fact that we don't enforce lower case router paths in Drupal 7, so it's valid to have route names like admin/Content - meaning that simply lowercasing all router paths stored everywhere could potentially break legitimate aliases/links.
Comment #16
catchHaving a look through D8 upgrade path issues now we have agreement on moving to migrate. Unfortunately this one is still going to be an issue since the old data might need transformation.
Comment #17
catchNo longer blocks 8.0 due since the migration path can go in later, but it'll still need massaging of data during the migration.
Comment #19
alexpottI really think that we should not do the massaging of data either - this will be very messy - and does not fix the fact that you might have important links to your site that no longer work. Personally I think we should offer a compatibility mode here or something like that to allow sites that are upgrading from D7 or earlier the ability to use case insensitive paths OR inform people about mod_spelling - I guess the problem with the latter solution might be shared hosting.
Comment #20
pwolanin CreditAttribution: pwolanin at Acquia commentedDiscussing in person with Crell and dawehner at Drupalcon LA. We were thinking that this is a doc issue and not something core should directly handle. There can be a contrib module to make path handling case-insensitive if some site needs that.
In D8 aliased and non-alised handling shoudl be consistent, so we should file a bug if the handling of aliased paths is not case sensitive also.
Comment #21
webchickI spoke with this at DrupalCon LA with @alexpott and @pwolanin.
We did a quick survey of several big websites that also use dynamically-generated pages, both PHP and not, for example:
* WordPress (25% of the web): http://ma.tt/2015/05/wearable-gadgets/ vs. http://ma.tt/2015/05/wearable-Gadgets/ - case insensitive.
* Microsoft (ASP.NET): http://www.microsoft.com/en-us/default.aspx vs. http://www.microsoft.com/en-us/DeFault.aspx - case insensitive.
* Wikipedia (#7 website in the world): http://en.wikipedia.org/wiki/Main_Page vs. http://en.wikipedia.org/wiki/main_page - case insensitive.
Mind you, lots of other ones are not: Facebook, Google, etc. but they also tend to write their own stuff vs. use a generated dynamic system.
Additionally, no matter what we do, GOOGLE.COM and google.com both take you to the same place in your browser. So the expectation from non-technical users is likely to be that URLs are case insensitive; it's very difficult to explain that "if it comes after the slash then case matters, else..."
And finally, the current behaviour is not hurting anyone (unlike the ability to put infinite /something/made/up/here paths), and breaking this behaviour has a lot of ripple effects. For example, sites having to put manual redirects in, sites having to contact 3rd party websites/customers/etc. and ask them to change links, etc. Lots of disruption for not a lot of gain.
Therefore, we agreed to fix this by keeping the existing behaviour from Drupal 7.
Comment #23
johnshortessThe comments above happened as I was performing triage of this issue at DrupalCon Los Angeles. I've updated the issue summary and added a beta evaluation.
While I did not perform a D6/D7 to D8 upgrade to reproduce the issue, I did confirm that handling for node paths is case sensitive, and that node paths such as NODE/1 or Node/1 404 now when they didn't previously. I also noticed that this behavior is not consistent with how path aliases are handled, and kgoel and I opened #2489484: Path and alias case sensitivity should be consistent in core.
kgoel and I discussed this with alexpott, pwolanin, crell, and dawehner, who all agree this issue is still relevant and still major, and alexpott discussed the issue with webchick, as she outlined in #21.
Comment #25
pwolanin CreditAttribution: pwolanin at Acquia commentedNote that there are still some edge cases we need to document. "mbstring is a non-default extension." So, for example if PHP has mbstring enabled then the Drupal Unicode::strtolower() uses: http://php.net/manual/en/function.mb-strtolower.php
In this case Ä -> ä
But in the fallback in Unicode::strtolower() it only maps accented characters, so it seems like Ä is left untouched, as well as many others (Cyrillic, etc).
Comment #26
johnshortessAs pwolanin noted during the sprint, string comparisons are case-insensitive in MySQL, but are not in other databases. D8 does not currently convert URL aliases to lowercase before saving them in the database -- if we change this behavior, do we want to note it in the help text below the URL alias field on the node page?
Comment #27
johnshortessHere's a preliminary patch. It runs Unicode::strtolower() on incoming paths, and on URL aliases before they're saved. It doesn't make any changes to help text.
Comment #29
johnshortessThe testbot revealed that some data in the URL needs to be case-sensitive, e.g. the Base64-encoded data in /admin/config/system/actions/add/RlNfIL8R9iZFfGM45liyQlw3nkJlc3vvhbAGQN5bYUc. Per pwolanin, it looks like we need to convert to lowercase in getRoutesByPath() in RouteProvider, and also in RouteCompiler, as well as deeper in the alias manager than the first patch. PathProcessor needs to be reverted because routing is happening after path processing.
Comment #30
kgoel CreditAttribution: kgoel at Forum One commentedComment #32
ohthehugemanatee CreditAttribution: ohthehugemanatee as a volunteer commentedtested locally:
* patch applied cleanly
* /USER gives a 404, /user works normally
* /NODe/1 gives a 404, /node works normally
* I created a URL alias to node/1 called "Case-Sensitive-Alias". It points to node/1 when entered as /case-sensitive-alias and Case-SENSITIVE-alias.
* one-time login link token gives an "invalid token" error when I muck with the case, and works fine when I leave it as drush generated it.
Looks like the Path tests need to be updated, though.
Comment #33
mradcliffeWe should run the tests through PostgreSQL and SQLite too.
Edit: specifically ohthehugemanatee's fourth test regarding URL aliases.
Comment #34
xjm@marcvangend asked on the closed duplicate #2489484: Path and alias case sensitivity should be consistent in core whether this issue should be critical, as that one was filed as critical. Per discussion in #23, I agree with webchick and alexpott that this is not critical because:
.htaccess
rules).That said, it's definitely a gross major that we should fix sooner rather than later. :P
Thank you to the major triage sprint participants for digging this out again and to everyone who discussed it at DrupalCon.
Comment #37
johnshortess@ohthehugemanatee - Now that I'm back from my short post-DrupalCon vacation, I can take a crack at updating tests. My experience with tests is pretty limited, though, so it might be several days. Is there a documented process somewhere for modifying existing tests, so that regressions aren't introduced?
@xjm - I defintely agree that this is not critical, but it's definitely going to be a big gotcha for site owners. If this doesn't make it into 8.0.0, there should definitely be some sort of documentation of the issue and possible workarounds. An .htaccess rule will work for Apache, but Nginx sites (including notably anyone on Pantheon) would need an alternate method -- maybe a regex match and 301 at the top of settings.php? I'm thinking there should probably be a short acknowledgement of the issue in UPGRADE.txt (I don't see a "Known Issues" section there now -- will there be?), linking to a new D.o documentation page with more detail and sample .htaccess/settings.php code. I'm happy to write a draft for this, but not sure about process. Would that be a child issue of this, a totally separate issue, and would it be created now, or when/if it's determined that this isn't going into 8.0.0?
Comment #38
kgoel CreditAttribution: kgoel at Forum One commentedI have converted path into lowercase in RouteCompiler.php but I think path needs to be converted into lowercase somewhere else beside RouteProvider and RouteCompiler since node/1 works but NoDe/1 or NODE/1 returns 404. I need to debug more.
Comment #40
Crell CreditAttribution: Crell at Palantir.net commentedThis is strtolower-ing on save, but I don't see anywhere that's strtolower-ing on lookup. Unless that's already in the code base somewhere? Might this be what's causing some of the test fails?
What's the performance impact of a case-insensitive regex? It may be faster to instead:
1) strtolower() the route path when dumping the routes.
2) strtolower the incoming path in the route provider.
3) Do the normal case-sensitive regex.
That way the regex does less work but the result is the same. It's probably simple enough that we can try both approaches and benchmark.
Comment #41
pwolanin CreditAttribution: pwolanin at Acquia commented@Crell - you want to do strtolower in #2 just for finding the routes? We have to be careful to preserve case for the portions that are slugs unless we want to prohibit base64 password reset URLs and the like.
However, that seems like it defeats the point of the issue - so maybe tokens or anything else case sensitive needs to move to a query string?
Comment #42
Crell CreditAttribution: Crell as a volunteer commentedUgh. You're right, some tokens could break that way. My concern is the performance of a case-insensitive regex. Some google spelunking led me to this post by Tim Bray. See the section "on case and diacritics":
http://www.tbray.org/ongoing/When/200x/2003/10/11/SearchI18n
So IMO we cannot switch to a case-insensitive regex without benchmarks saying the cost is acceptable. If the cost is not acceptable, we either need a plan B or to stick with case-sensitivity. Moving tokens to the query string would work for things like cron or password resets that we control, but it sounds like something that may bite contrib badly if they don't know about it.
Anyone want to run a benchmark and prove me wrong to worry about performance here? I'd be very happy if you did. :-)
Comment #43
pwolanin CreditAttribution: pwolanin at Acquia commentedHow much overhead is the regex change? Or we just need to benchmark?
Comment #44
Crell CreditAttribution: Crell as a volunteer commentedpwolanin: The article I linked says a lot, but I think we just need benchmarks to say. It shouldn't be too difficult to do both and see. If the performance difference is large then that makes the decision for us.
Comment #45
kgoel CreditAttribution: kgoel at Forum One commentedThis needed reroll.
Comment #47
kgoel CreditAttribution: kgoel at Forum One commentedpwolanin and I tested node with path node/1 and NoDE/1 where node/1 works but NoDE/1 returns 404 with the #45 patch. We debugged and found where it's failing....
We think the problem is in matchCollection method in core/vendor/symfony/routing/Matcher/UrlMatcher.php class and it's where route path is alway prefix with node/
*Symfony\Component\Routing\CompiledRoute*staticPrefix = "/node"
Comment #48
Crell CreditAttribution: Crell at Palantir.net commentedI talked this over with KGoel and dug through this at MWDS. Here's what's going on:
Symfony's RouteCompiler (and Drupal's subclass thereof) creates a regex to use for matching the path. However, as a performance optimization it also stores the initial state part of the route (staticPrefix). That is, for a path of /node/{node} it will store "/node", so that it can then do a cheaper strpos() check on the path to filter out unmatching cases (say, /admin/extend) easily. That, however, is still case-sensitive, even if we make the regex case insensitive.
That match is in Symfony's UrlMatcher, so to fix that we would need to subclass UrlMatcher and change that line to a stripos() to make it case-insensitive. That's an annoying amount of work to do, but it should work.
That said, if we're going to do that it would be even more performant to remove that check entirely. In Drupal's case, we have matching effectively spread out across a couple of classes. The path-matching would have mostly happened already back in the RouteProvider. In fact, in the typical case by the time we get to UrlMatcher there will only be one route anyway. Exceptions would be if http and https were routed separately, or REST cases where different methods are routed separately. In those cases, though, the prefix check is still useless because the paths are the same.
Therefore, if we're going to have to subclass UrlMatcher anyway, let's just eliminate that line as redundant. That will resolve the insensitivity issue as well as be a small performance micro-op on its own.
Comment #49
kgoel CreditAttribution: kgoel at Forum One commentedReroll. Working on addressing #48
Comment #50
mradcliffeThis might be confused as a typo so maybe a comment?
Or RouteCompiler could override that method and also a comment in that method? I think that it would need to basically do the same thing since $regex is private.
Comment #52
kgoel CreditAttribution: kgoel at Forum One commentedReroll
Comment #53
kgoel CreditAttribution: kgoel at Forum One commentedThis is just a quick patch to address #48. Still need to address #50
Comment #56
Crell CreditAttribution: Crell at Palantir.net commentedThis is the problematic line. It collides with the class name here. It's not needed, though, since we're already inheriting from that class indirectly.
Comment #57
kgoel CreditAttribution: kgoel at Forum One commented@crell - thank you for the pointer!
Comment #59
mpdonadioNot sure if this is the right direction for the test changes. Since Random::name() doesn't guarantee an uppercase character, this issue may have random fails.
Comment #61
kgoel CreditAttribution: kgoel at Forum One commentedComment #65
kgoel CreditAttribution: kgoel at Forum One commentedComment #66
pwolanin CreditAttribution: pwolanin as a volunteer and at Acquia commentedGood progress, but a little more to do I think
Add a code comment here, and in the methods docs to say that it's lower-cased
If possible, we should not need to do this if we lowercase the input?
Needs docs especially to explain why it differs from the parent method
Comment #67
dawehnerWell, this would be just the case, in case routes are defined to be lowercase, right? Nothing ensures that at the moment.
This change is not needed at all
What would be also nice is a dedicated test which passes in a non lowercase value to ensure that this usecase still continues to work.
Comment #68
pwolanin CreditAttribution: pwolanin as a volunteer and at Acquia commented@dawehner - I thought the dumper was lower-casing the route path - if not, that's a bug.
Comment #69
pwolanin CreditAttribution: pwolanin as a volunteer and at Acquia commentedYes, we should have some test routes with mixed-case paths and make sure they get lowered as expected.
Comment #70
pwolanin CreditAttribution: pwolanin as a volunteer and at Acquia commentedOk, after discussion with tstockler, dawehner, and kgoel, this changes the RouteBuilder to force all route paths to be lower case. This means that the CompiledRoute regex is already lower case, all patterns are lower case, etc.
Also forcing lower case everywhere in the AliasStorage for consistency, and other fixes to try to get consistency including in the caching.
Need to ask WimLeers about page cache.
Comment #71
Wim LeersFrom the POV of Page Cache, this is fine. It's just possible that multiple cache entries exist for the same lowercased URL. So the cache hit ratio may then be slightly lower than it would otherwise be.
So, it could be argued this is a potential DoS attack, for those sites that have Page Cache enabled. But… we already have that problem, for example random query strings.
Comment #73
pwolanin CreditAttribution: pwolanin as a volunteer and at Acquia commentedForgot about the path slug problem. :-(
We need to leave src paths alone in aliases and use the case-insensitive regex because we cannot lower-case bas64 path portions matched as slugs.
Tempted to "won't fix" it. At the very least, it won't actually be case insensitive in all cases.
Comment #75
mradcliffeDoes this patch need an update hook with the storage changes so that any alias that could have been saved with upper case (or migrated in with upper case) is converted to lower case?
Opinions:
* No: Anyone creating alias will find it not working anyway.
* No: Migrate is critical for release, but not beta.
* Yes: It is technically a data model change and being technically correct is the best kind of correct.
* Yes: If the site still has case sensitive aliases stored, then it won't be possible to delete or load the bad data.
I updated the issue summary to use the latest template with data model changes, and uncommented the disruption line based on the current patch.
Comment #76
mradcliffeI also added sqlite and pgsql testbots to run the patch. It looks like SQLite is going to fail on PathAliasTest.
Comment #78
kgoel CreditAttribution: kgoel at Forum One commentedFixed fail for PostgreSQL and sqlite
Comment #79
kgoel CreditAttribution: kgoel at Forum One commentedwith some of route path case insensitivity test
Comment #80
mradcliffeFirst patch:
I looked at this specific test in core currently, and it does not do any assertions. Are we just asserting that the node was saved?
The assertion was removed in commit e044adb014315580372f8d21a0ed4867ff07842f in #2509300: Path alias UI allows node/1 and /node/1 as system path then fatals. It was removed in patch #27. There wasn't any specific mention either by @dawehner or any reviewers about why that assertion was removed. The assertion comment was "Confirm that the alias was converted to a relative path."
Maybe this best relegated to a follow-up to discuss whether the "create absolute path alias" should have its assertions added back, or maybe it should be removed now if it has no value?
Next patch:
Should have a full stop. Not critical to the patch.
Comment #81
kgoel CreditAttribution: kgoel at Forum One commentedHaven't addressed #801.
Comment #82
dawehnerDo we need both?
What about renaming the method to something like applyStrToLowerToCollection or something like that. It would be nice if the method name makes it clear what is done here.
You can use $collection->getAll(), then you don't need to typehint $route any longer
is differe a proper english word?
Comment #84
dawehnerWell, the idea of that other issue was that path aliases/sources now starts with a slash, so that particular assertion simply doesn't make any sense anymore. I think given the further test its okay to assume here that the tests got saved.
Comment #85
kgoel CreditAttribution: kgoel at Forum One commentedAddressed #82and some other small fixes.
Comment #88
kgoel CreditAttribution: kgoel at Forum One commentedComment #89
dawehnerWe have tests, we have the documentation that we are case insensitive, though I think the better place to document that would have been
routing.api.php
.I think expanding that a bit would be great!
Comment #90
jhodgdonAdding something to the Router topic in routing.api.php sounds like a good idea to me.
Also... just a question about the current patch. If you use the path UI to add an alias to a path, does it check to see if the alias already exists, and not allow creation if so? I think it does... If so, does it check in a case-insensitive manner? Seems like after this patch, it must, and I am not seeing anything in this patch that changes it (but I may have missed it).
And we should also document this in the help docs and/or page help for the UI for aliases.
Comment #91
jhodgdonIn other words...
For developers, I think adding docs to routing.api.php saying that paths and aliases need to be unique, where unique ignores case, is a good idea.
But we also need to consider the Drupal user, and document this for them too. It would affect:
- creating aliases from the Aliases management page
- creating aliases with the alias field or whatever it is for entity aliases (like creating an alias for a node while editing the node)
- places where you actually create paths directly, like in Views... not sure what other modules in Core do the same?
And then my other question is whether this patch also makes sure that when a user is creating an alias or path in the UI via any of these places, is it checked for being unique in a case-insensitive way? This patch needs to make sure that is happening.
Comment #92
dawehnerThis is a good question.
\Drupal\path\Form\PathFormBase::validateForm
is using$aliasStorage->aliasExists()
and$this->aliasManager->getPathByAlias()
which we change in the patch:Comment #93
kgoel CreditAttribution: kgoel at Forum One commentedIs there a place to document the above?
Comment #94
jhodgdonFor the end user, we use hook_help() to document stuff like this [either in the main help topic for a module or on page-specific help], and/or user interface text (field descriptions and field labels). Also, error messages may need to be updated, such as the one saying a path or an alias already exists when it may look to the user as if this is not the case.
So... I am not sure why it was decided that Drupal 8 URL handling needed to be case-insensitive, other than "Drupal 7 was", but what about accents/diacriticals?
Comment #95
pwolanin CreditAttribution: pwolanin as a volunteer and at Acquia commented@jhodgdon - sadly, we'd need to change a lot more code to allow us to uniformly use the unicode strlower() method for consistency.
However, I think the case-insenstivie preg_match() and that should be similar in their support of utf-8.
See #21 for discussion of CMS comparisons on this behavior. Maybe that should be copied to the issue summary? See also comment #15
Comment #96
kgoel CreditAttribution: kgoel at Forum One commentedAdded some doc
Comment #100
dawehnerWell, we deal with that automatically, don't we? I think we should instead document what actually happens.
Comment #101
kgoel CreditAttribution: kgoel at Forum One commentedDo you mean explain where we are converting path into lower-case? I wasn't sure if I need to document RouteProvider::getRouteCollectionForRequest in api and explaining that bit would result into complication. I can certainly try if that's what you mean.
Comment #104
kgoel CreditAttribution: kgoel at Forum One commentedwith path slug test coverage
Comment #105
jhodgdonThis is coming along well! I reviewed the patch and had some mostly docs/coding standards comments:
Nitpick: you need a blank line here between the two methods.
This is the documentation for the applyLowerCaseCollection() method. It seems like the docs are too non-specific. It's not "apply additional processing", it's "lower-case each path".
If you want to have the method be more generic, I think it needs a more generic name, like applyProcessingCollection(), and then this docs line can stay.
If you want the method to be as it's named, then I think the docs need to be made more specific.
Um. Is Symfony already using the "u" modifier to indicate it could be Unicode?
What's a path slug? I don't like seeing terms in documentation that I don't know, and I've never seen this one in Drupal docs before.
nitpick: the second line here should not be indented.
Also... I am a bit confused as to why you require the input to this method to be lower-case. Why not just lower-case it in the method, as is being done in other methods?
In fact, here it looks like you *are* lower-casing $path...
info => information
But... can this be more specific? What exactly is "path information" and how is it parsed?
The docs ("An array ...") need to be on the next line, indented 2 spaces, ending in .
But... it says it returns "an array of parameters" -- what does this mean? Oh I guess it's the route information. Let's say that.
nitpick: move { to the previous line
nitpick: extra space before "It".
What's a path slug and how does it "receive" anything?
Can you put
$method = ...
in parens? To me, a line of code like this with === and = in the same line, without () to group the assignment, is hard to read.
Actually, better yet make the assignment a separate line before the if().
Um... Maybe reword this to say:
Paths are case insensitive, and need to be unique.
Are these changes to the tests necessary? It seems like we would want people to be able to enter their preferred case for paths, so that their URLs can have upper-case letters in them, and just make sure that *matching* is considered in a case-insensitive way, right? So the tests should not require that the input is all lower-case. Right?
Here I think I would leave the first sentence as it was, and add a new sentence just afterwards saying something like:
Aliases are case insensitive, and need to be unique.
I'd just leave this as it was.
Change that added sentence to say:
Aliases are case insensitive, and need to be unique.
Again, can't the user enter upper-case? This test should allow that too?
What's a slug (again)?
- Changes to error messages they would get when they try to make an alias that isn't unique
- Changes to the #description on Path fields in the UI explaining they are case insensitive and need to be unique
- Similar changes to Views UI, where it lets people create paths for pages and feeds
Comment #106
pwolanin CreditAttribution: pwolanin as a volunteer and at Acquia commentedLet me work on fixing those points.
re:
#105.3 The regex in the Symfony route compiler is created as
So it (oddly) uses the s modifier but not u. I'm not sure if u should be used or not - probably need to add some test coverage.
I can already make utf-8 path aliases like:
/content/右blah
Which comes in the URL as:
/content/%E5%8F%B3blah
Comment #107
pwolanin CreditAttribution: pwolanin as a volunteer and at Acquia commentedOk, so I bit the bullet and made the code consistently process and match paths using Unicode::strtolower(). The tradeoff is that we need to recover the original variables, so I added some tweaks for that. Really this whole code path needs to be optimized/simplified for the things as a follow-up (maybe 8.1.x).
Changes in response to #105
1. fixed
2. fixed
3. Symfony is not using "u". Maybe an upstream bug or follow-up. Removed "i" now.
4. reworded
5. fixed spacing and wording
6. fixed wording
7. This doxygen was just copied from the parent implementation. Removed the word "info" as it is meaningless here.
8. fixed spacing. The words are copied from the parent class docs. Added the word "route" to clarify.
9. fixed, though in many cases we are leaving it like the parent for later comparison.
10. changed the code and this comment
11. reworded
12. fixed, though often we are leaving these matching the parent method despite violating Drupal code style.
13. This is actually wrong. Paths are NOT unique.
14. This is the easiest way to fix the tests since we need to have a lower case value to compare to the output, and since we have a bug that randomMAchineName() returns mixed case. There are tests verifying that mixed case becomes lower case.
15. changed
16. changed
17. changed
18. Other test cases verify that they are converted
19. reworded and fixed test
20. Added some description text. Views doesn't force paths to be unique afaik - Last one may win.
21. As above, route paths are not unique. This was one of the (dubious) advantages of adopting Symfony routing.
Comment #108
pwolanin CreditAttribution: pwolanin as a volunteer and at Acquia commentedAdding "rc deadline" tag
Comment #111
pwolanin CreditAttribution: pwolanin as a volunteer and at Acquia commentedSilly problem with unserialize - easy to fix.
Comment #112
YesCT CreditAttribution: YesCT commented#108 and #34 "We can fix it in a patch release safely.", are opposite in terms of if this is rc deadline or not.
Comment #113
pwolanin CreditAttribution: pwolanin as a volunteer and at Acquia commentedok, let's call it "rc target".
There might be some data changes needed for path aliases so better to do it before.
Comment #114
YesCT CreditAttribution: YesCT commentedtook out the original report from the summary (we can use revisions to see it, and bits were already integrated into the summary)
Comment #115
YesCT CreditAttribution: YesCT commentedI will fix these now.
nit. indent should be 2 spaces.
Lower cases
(can we third person verb that?)
^support^supports
and
what does "complete part" mean?
full namespace paths in comments.
https://www.drupal.org/node/1354#classes
"If you use a namespace on a class anywhere in documentation, always make sure it is a fully-qualified namespace (beginning with a backslash)."
period at end of sentence.
what kind of array?
array of strings?
no blank line at beginning of methods.
and...
is there test coverage for all the paths through this method?
it looks really complicated. maybe breaking this up with some helper methods would make it easier to test and understand.
Case and punctuation for a sentences.
period at end of sentence.
extra newline.
Comment #116
YesCT CreditAttribution: YesCT commenteda.
oh, UrlMatcher::matchCollection() was a copy and paste from symfony, and modified. mmm.
if this was drupal code, the "local" variables would be cased_like_this and not lowerCamelCase
... maybe we want to keep it as close as possible to the parent, so it can be easily diff'ed?
but we did other standard changes... hm.
b.
looks like it is lowercase (not lower case)
http://english.stackexchange.com/questions/59409/lowercase-lower-case-or...
grepping core agrees:
ag "lowercase" core/ | wc -l
274
ag "lower case" core/ | wc -l
8
I will do b. next (so it has it's own interdiff)
Comment #117
YesCT CreditAttribution: YesCT commentedjust the "lower case" -> "lowercase" (with a refactor/rename of the method to go along with it)
Comment #118
pwolanin CreditAttribution: pwolanin as a volunteer and at Acquia commentedre: #115.3
"what does "complete part" mean?" - a path part is basically the string between a pair of slashes (or what you get by exploding on '/'). So a complete part is tying to emphasize that Drupal only supports a path like
/my/{foo}/edit
Where the{foo}
wildcard/variable is the complete thing between slashes, but Drupal doesn't support/my/{foo}{bar}/edit
or/my/{foo}-yow/edit
even though it is supported by Symfony.This does point out a basic mismatch between what we are doing and the Symfony code. Ideally we should stop using a lot of that code and optimize for what we actually want, but that's a bit of a project.
Comment #119
kgoel CreditAttribution: kgoel at Forum One commentedComment #120
pwolanin CreditAttribution: pwolanin as a volunteer and at Acquia commentedWith some test additions to verify that routes added and altered in the dynamic and later events have paths converted to lower case.
Comment #121
dawehnerAPI wise I would have expected that the source would be lowercased as well given that its the actual internal path.
Comment #122
pwolanin CreditAttribution: pwolanin as a volunteer and at Acquia commented@dawehner - the source of a path alias may include a case-sensitive variable value (e.g. the action ID in an action edit link) so we cannot change those to lower case.
However, it doesn't matter, since if the path processor replaces the incoming aliases with a mixed-case link, the case-insensitive code in the router still resolves it correctly.
Comment #123
dawehnerAlright. This particular issue doesn't cause the failure sin sqlite/pgsql so I consider this as done.
Comment #124
dawehnerI think we need an update path to update at least the path aliases.
Given that its not system critical we could use a post_update and just resave all the aliases and be done.
Comment #125
pwolanin CreditAttribution: pwolanin as a volunteer and at Acquia commentedPlus update function and update test.
Comment #126
dawehnerI think we can use a post_update in which firing hooks is not a problem.
Comment #127
pwolanin CreditAttribution: pwolanin as a volunteer and at Acquia commentedok
Comment #128
dawehnersad world that there is no API for that
Do you mind putting that into a dump file? Its not like we came up with that for fun :)
Comment #130
pwolanin CreditAttribution: pwolanin as a volunteer and at Acquia commentedok, also fixed Drupal\system\Tests\Update\UpdatePostUpdateTest
Comment #131
dawehnerThanks
Comment #132
catchTagging rc deadline because I'm not entirely confident this can go in during RC. Just because it can go into a patch release doesn't always mean it is RC-able.
Comment #133
catchDoes the test database have more than 50 aliases in it? If not I think we should add some.
Additionally we should check that something at the beginning and end of the batch has been updated.
Comment #135
catchI ask this because the last time we had a batched update in core it only updated the first 50 and left the rest.
Comment #137
pwolanin CreditAttribution: pwolanin as a volunteer and at Acquia commentedAdded more aliases - now it has 101 to update, and we check each one.
Comment #139
pwolanin CreditAttribution: pwolanin as a volunteer and at Acquia commentedSimple re-roll needed
Comment #140
dawehnerIts kinda crazy how much we test to be honest.
Comment #142
effulgentsia CreditAttribution: effulgentsia at Acquia commentedPersonally, I'm +1 to this, but I'd rather @catch make the call on if and when this goes in, so assigning to him. Also raising to Critical, because this affects data within Drupal enough that I want to make sure it gets explicitly evaluated prior to RC than accidentally forgotten.
Given the code comment, why this change?
Comment #143
pwolanin CreditAttribution: pwolanin as a volunteer and at Acquia commented@effulgentsia - I made the code change there since Symfony indexes them sequentially, and the array_values() call gives you sequential positions.
I didn't understand the views code well enough to tell why the views argument handlers are numbered sequentially rather than by path position, but given that this is the only existing use of that compiled route method in core, this seemed like the easiest fix.
Comment #144
dawehnerAs alternative you could have introduced an additional method.
Comment #145
tstoecklerThis part I don't understand. We set $pathVariables to $this->mappedPathVariables, but the parent::__construct() call sets it to $this->pathVariables as well, which is returned by getPathVariables() by default. So there really is no need to override the method, because it's the same thing. Which means in turn it's completely unnecessary to have the $this->mappedPathVariables variable because it too is completely duplicated. What am I missing?
Would be nice to add a @see to Symfony's RouteCompiler::compilePattern() which - unless I'm mistaken - seems to be where Symfony does it's variable mapping without this requirement.
This does not seem to be entirely correct. $pathinfo is only converted to lowercase in a single check, not entirely. It is then passed on as-is afterwards. Can we clarify this?
Here a @see to Drupal's RouteCompiler::getMappedPathVariables() would be nice, which is where the reverse is happening, unless I'm mistaken.
Minor, but these assertion messages are not very helpful, nor really accurate. I.e. what does it mean for a route to be successful?
Not really important, but this seems unnecessary? Seems totally fine to override getSubscribedEvents() and then just add RoutingEvents::DYNAMIC, but then we still have alterRoutes(), which is a bit nicer.
I don't understand this, as $parameters comes from $route->compile()->getPathVariables() so it seems the key would be important? Also I guess this underlines the fact that ::getMappedPathVariables() is in fact not necessary?!
Comment #146
tstoecklerEdit, the previous one was a *massive* crosspost with #120 (which is also the patch I reviewed)
Comment #147
catchThis is not critical, I do think it blocks a meaningful migration path from Drupal 6 per #2075889-11: Make Drupal handle incoming paths in a case-insensitive fashion for routing so tagging for that. With hindsight we should have rolled back the patch that introduced the bug at the time.
I'll try to discuss this issue with the other committers today as to whether we commit it before or during RC, which I think was the intent of effulgentsia bumping it to critical.
It would also be useful to know whether this really is viable for a patch release or not.
#1 and #7 both look substantive, so bumping back to CNR.
Comment #148
pwolanin CreditAttribution: pwolanin as a volunteer and at Acquia commentedre: #145
.1) Yes, this is unnecessary in terms of actual functionality, it's just serving to document the behavior of the keys. I could also just call parent:: to do that
.7) Before this path it happens that Symfony simply makes an incrementing numeric array. There is actually no guarantee of that, so the Views change makes that core more robust/correct in any case.
Comment #149
tstoecklerRe 1) Then please let's just call parent:: and remove the additional variable. It makes sense to keep the function for the docs, but the additional variable is unnecessarily confusing IMO.
Don't really grok the reply to 7) but this shouldn't be held up on me understanding something. I guess an inline comment wouldn't hurt but I won't be a stickler about that.
Comment #150
pwolanin CreditAttribution: pwolanin as a volunteer and at Acquia commentedre: #145
.1) changed
.2) added
.3) changed
.4) added
.5) changed messages
.6) left as-is, since the base class really provides no value here.
.7) Added comment as minor code re-organization.
Comment #151
swentel CreditAttribution: swentel commentedsmall nitpick - can be fixed on commit
'from from'
Comment #152
pwolanin CreditAttribution: pwolanin as a volunteer and at Acquia commenteddoh. fixed.
Comment #153
tstoeckler@pwolanin thanks a lot. Also grok the Views thing now. Thanks for the comment! It's not for me to RTBC this, but I have no further comments. Looks good to me.
Comment #158
dawehnerPretty easy.
Comment #159
pwolanin CreditAttribution: pwolanin as a volunteer and at Acquia commentedOh, bah - I didn't see that the
$route
variable gets re-assigned within there. That code needs some cleanup, but out of scope for this issue.Comment #160
dawehnerYeah that is not that nice to be honest. The bug fix I applied is minimal (just move around some code) so I set this back to RTBC.
Comment #162
catchWhy not make this a LIKE condition which is case insensitive anyway?
LIKE is already case insensitive so why the strtolower() here?
Don't we normalize this on save?
Comment #163
mradcliffeIt's not necessarily true that LIKE is case insensitive, but it's hacked that way in the driver Condition class.
That said I think I prefer doing the explicit non case sensitivity (application) rather than implicit non case sensitivity (database).
Comment #164
jhodgdonDoesn't the case sensitivity of database searches depend on the collation settings for the database anyway?
Comment #165
catchIt does but we use that everywhere else in core including critical places like usernames, and had about 5 issues spanning multiple years on how to handle it.
I'd generally have expected a few queries here to be converted to using LIKE() and not many other changes in general.
Comment #166
pwolanin CreditAttribution: pwolanin as a volunteer and at Acquia commented@catch - I'm confused. I thought from our discussion in IRC you agreed that using the database case insensitivity or LOWER() was not the right thing to do?
We could possibly handle the path alias matching using LIKE instead of forcing to lower case, but for the final match in the routing we are using a regular expression. Fixing this final match behavior was the trickiest part and it happens all in PHP. Also, for the SQL part of routing we use an IN query, so there is no way to make that case insensitive in general.
The cases in tests where we do things like:
*are* exactly because we lower case on save, it's easier to start with a lower case value for later comparison rather than lower casing it later for each comparison.
There are some tests that verify the case change, so we don't need to do it every time.
Comment #167
pwolanin CreditAttribution: pwolanin as a volunteer and at Acquia commentedHere's a tweak to switch to using LIKE in the alias storage.
Comment #168
catchWe were discussing the update - where we might want to make sure the updated data is exactly the same as it would have been if it was entered directly. I hadn't realised we were lowercasing everything in PHP everywhere else too.
In 7.x it's completely valid to have a path alias of 'Happy' for node/1 and when the URL is generated it will show 'Happy'.
Case insensitive matching for incoming paths doesn't require we force everything to lowercase in storage. This would also mean we don't need an update at all, or any of the string changes.
I'm not sure about routes though.
Comment #169
Wim Leers(I'm sorry if I'm stating things already discussed, I did not try to completely understand all 167 comments, even though I did read through all of them, but I did try to look up things to write a review that's as constructive as possible.)
It's strange to see both explicit lowercasing when storing and still doing case-insensitive queries when loading.
I'd expect either:
I guess this goes back to #162.1 and the subsequent comments up to this one.
Given #168, it seems it may be best to still store whatever alias the user enters, still generate URLs with those potentially uppercase aliases, but just lowercase them when doing actual routing?
Currently, this patch does more than solving the problem at hand:
. Rather than only turning those 400s into 200s (which this does), it also prevents uppercased aliases altogether. Which may be sensible, but feels out of scope.This forces our route-related events to use lowercase paths, sounds sane.
Not yet lowercasing here makes sense; we basically need to lowercase as late as possible. It is kinda confusing though (
), but I guess there's no other way without a huge BC break.This seems like a separate bugfix that needs test coverage. Add test in a follow-up?
So this is the "as late as possible", where lowercasing is appropriate.
I wonder if this is not an oversimplification then, given that we still support casing for e.g. base64 tokens?
So IMO, per my first point, none of this would/should be necessary.
Comment #170
Wim LeersLet me explicitly state what #169 implies: without the alias changes, this only breaks BC in a very minimal way (route paths can no longer use uppercase characters; if they did, they'd be lowercased automatically), and an upgrade path becomes unnecessary.
Comment #171
pwolanin CreditAttribution: pwolanin as a volunteer and at Acquia commentedOk, so we will remove the "rc deadline" and consider this for sometime in 8.0.x with no changes to alias storage other than making sure we have case insensitive matching?
Comment #172
catchYes I think with no migration path this gets much less risky. Given it's an indirect migrate issue, making it RC target.
Comment #173
Wim Leers#171: I personally think that's the least disruptive solution, assuming I'm not overlooking a strong reason to lowercase aliases.
Comment #174
dawehnerYeah especially if we can get rid of some upgrade paths this would be wonderful, on less source of failure.
Comment #175
jhodgdonOK I'm a bit confused too about the case sensitivity issue. Here is what I think should happen, for optimal niceness and usability and kittens:
a) URL resolution should be case insensitive. So for example, if someone goes to URLs like:
example.com/fOo
example.com/FoO
These URLs should work the same, and both should match an alias in the database "foo" or "Foo" or whatever, or a path of "foo" or "Foo" or whatever.
b) We should allow users to enter paths (such as on Views UI when creating a feed or page) and aliases (on the aliases config page or when creating content) that contain upper-case, and we should preserve that when making URLs. So if they alias node/5 as "About", its URL should be example.com/About. And if they create a view display page and want its URL to be "AllTheVideos", they should be able to do that.
c) When someone is creating a path or alias in the UI, we should validate that there is not an existing path or alias that matches what they entered, and when checking this we should use case-insensitive matching. I think whether they're entering a path or an alias, we should check against both (e.g. they shouldn't be able to create an alias of "Admin" since path "admin" exists).
Is that the plan? If not, why not? And then what is the plan?
Comment #176
pwolanin CreditAttribution: pwolanin as a volunteer and at Acquia commented@jhogdon, I mostly agree except saving and rendering the path they created as mixed case may be problematic if we want to do a case insensitive match. Let me see if I can make it work. But on that point we may still need to force lower case. I don't think in Drupal 7 you would generally be able to create mixed case paths (as opposed to aliases)?
Comment #177
pwolanin CreditAttribution: pwolanin as a volunteer and at Acquia commentedSo basically I think to do what jhogdon thinks is correct we'd need to rewrite the route compiler to be our own custom code, maybe documenting that we don't support host patterns or any partial variables.
Is that worth the effort?
Comment #178
pwolanin CreditAttribution: pwolanin as a volunteer and at Acquia commentedHere's a patch starting to convert alias storage to use LIKE queries. We have to use a select object to force the ILIKE in pgsql, I think.
Comment #179
pwolanin CreditAttribution: pwolanin as a volunteer commentedRemove upgrade-related test code and revert lowercase calls in other tests.
If we go for rewriting our route compiler we'll need to significantly expand \Drupal\Tests\Core\Routing\RouteCompilerTest
Comment #184
jhodgdonI tested in Drupal 7:
- In Views you can definitely assign a mixed-case path to a page view display. For instance I made one with path "TeStInG".
- If you also in the same view add a menu item to the display, the menu item URL points to the mixed-case URL you designated.
- You can access the URL via paths testing, TESTING, etc. --- all go to the same page.
- After doing this, the alias page let me create an alias of 'testing'. I put one in with a path of 'node' and an alias of 'testing', and now both TeStInG and testing go to node instead of my view, so that isn't great.
- On the aliases page, I tried to add 2 aliases, one with alias "test1" and one with alias "TEST1". When I tried to save the second one, it said "TEST1" was in use.
- On the aliases page, I made an alias "AbCdE" for a node I had on the site. The "view" link for that node is now "AbCdE" so it respects my mixed-case alias, but "abcde" also will get me there.
So. I think D7 basically works like I suggested in #175.
The only difference is that Views doesn't seem to care/notice if I assign a page view a path that already exists, or that matches an alias that already exists. And alias creation checks for existing aliases that match (using case-insensitive match), but not for existing paths.
Comment #185
catchThere's an issue for this somewhere, will try to find later. Shouldn' affect anything here though.
Comment #186
pwolanin CreditAttribution: pwolanin as a volunteer commentedThere is an issue about taking over paths with aliases - that's a different problem.
Basically - I don't have much time for this now to release, so we might be able to fix up the code with forced lower-case paths to rtbc again, but supporting mixed case paths would need a couple other people to join in on working on the code and rewriting the RouteCompiler logic.
Comment #187
attiks CreditAttribution: attiks at Attiks commented#184
If I understand correctly this is by design at least for views, it allows you to replace core paths with views, might be less of an issue in D8
Comment #188
jhodgdonTaking all paths and aliases that people enter and saving them as lower-case just seems wrong... especially if D7 didn't do this.
Comment #189
attiks CreditAttribution: attiks at Attiks commented#188 I agree a 100%, we should not change it, it will break contrib modules like for instance shurly
Comment #190
pwolanin CreditAttribution: pwolanin as a volunteer commented@jhogdon - please look at the last patch. It's now leaving aliases alone, but forcing actual route paths to be lower case.
Comment #191
kgoel CreditAttribution: kgoel at Forum One commentedComment #192
pwolanin CreditAttribution: pwolanin as a volunteer commented@kgoel - this doesn't seem like the right fix - we are trying to make path aliases work in a case insensitive way at the DB query level.
Comment #193
pwolanin CreditAttribution: pwolanin as a volunteer commentedThe real problem was the changes in the base test code in core/modules/simpletest/src/AssertContentTrait.php where the path was forced to lower case. Not sure that should ever have been in the patch.
This interdiff is WRT #179
Comment #194
Crell CreditAttribution: Crell at Palantir.net commentedFrom #177:
Ak, no. :-(
Didn't we move away from LIKE in queries because they're unindexed?
Comment #196
pwolanin CreditAttribution: pwolanin as a volunteer commented@Crell - not sure what you are saying "no" to. We already don't support partial path part variables, and I don't see much value to host tokens
Comment #197
pwolanin CreditAttribution: pwolanin as a volunteer commentedFix that failure by reverting the change to that test w.r.t. 8.0.x
Comment #198
Crell CreditAttribution: Crell as a volunteer commentedPeter and I discussed further in IRC. To expand on my previous comment:
AFAIK we do support host-based routing. We got that as part of the Symfony router import. If we somehow managed to break it along the way, that's a problem with our test coverage. We should add test coverage to verify, and if it's not working now that's a bug we should fix. The same is true of scheme routing. The whole point of adopting a Symfony-based router was to support routing by arbitrary parts of the request, not just the path. Regressing on that now seems short-sighted, wasteful, and ill-advised.
If we're going to explicitly say that paths and aliases are all case-insensitive, then I don't see why it's a problem to fold their case. The net result is the same. Going to foo.com/About will still work as expected, but generated links to it will lower-case, which is more consistent anyway.
I don't see how modules like Shurly would be impacted. If path routing is case insensitive, then such modules could not have both /AbC and /abc anyway. And such modules would be issuing redirects, probably directly from the routing system (just create a route and wire it straight to a redirecting controller), so... I don't see why that's even relevant.
Comment #199
catchWe moved away from LOWER() queries because they're unindexed, to LIKE.
Comment #200
attiks CreditAttribution: attiks at Attiks commented##19 I took shurly as an example, there might be other modules, honestly don't know.
As you said, shurly in Drupal 8 should indeed work directly on the router level, but they also need to be able to output a case sensitive URL, otherwise it will no longer work.
I checked the D6 code and it uses a binary comparison to resolve the short url, so definitely case sensitive, it might be that it has changed in the D7 version, but not sure. I know shurly is an extreme example, because it uses hook_boot to bail out asap.
#194
+1 if we can avoid the use of LIKE
It might have been addressed but the password reset links are case sensitive as well in D7.
Comment #201
pwolanin CreditAttribution: pwolanin as a volunteer commented@Crell - matching HTTP verb and scheme (http vs https) are not part of the hostname match, they are handled independently.
It doesn't seem like Symfony really handles query string variables, so I'm not sure following their whole flow is really giving us what we want in the end.
@attiks please read the thread - a lot of the trouble here is that variables in the path (like for password reset links) are case sensitive, even if we don't want the path itself to be.
Also - path aliases as of the latest patches continue to be generated with mixed (original) case, so you can alias your lower-case route path to a mixed case one if needed.
Comment #202
attiks CreditAttribution: attiks at Attiks commented#201 I knew I should have read everything first, sorry for the confusion.
If I understand correctly the path you provide for a view will always be lowercase, so you need to create an alias if you want casing in the path?
Comment #203
pwolanin CreditAttribution: pwolanin as a volunteer commented@attiks - yes with the current patch, which is less work that rewriting more of the internals at this point.
Comment #204
attiks CreditAttribution: attiks at Attiks commented#203 Thanks, works for me if it makes life easier.
I had a quick look at the patch, small remark/question
respects -> aspects
should this be $compiledRoute?
Comment #206
pwolanin CreditAttribution: pwolanin as a volunteer commentedSplit aliases into a sub issue: #2584243: Make Drupal handle path aliases in a consistent and case-insensitive fashion on all database drivers
Here's hopefully just the routing parts and related test fixes. Might still be some unneeded changes left.
Comment #207
catchExcuse the db_select() in here (both its use and the resulting query), but wanted to see what happened combining making the route lookup use LIKE alongside #57.
admin/confiG loads for me, see what the bot says.
Don't love using an OR query, but this is a small change if it actually works.
Comment #208
catchNow that path aliases are split out, retitling.
Comment #209
catchSlightly cleaned up.
Comment #212
catchs/collection/connection/g catch--
Comment #213
pwolanin CreditAttribution: pwolanin as a volunteer commentedI think this is logically wrong:
There is no guarantee that the case insensitive match is the same as unicode lower case, which is why I tried to take it out of the patch,
Comment #214
Crell CreditAttribution: Crell as a volunteer commentedcatch: Huh. If I follow correctly, you're proposing "SELECT ... WHERE case-sensitive-check OR case-insensitive-check", for performance? I'm open to that if it works/is fast.
Comment #216
catchNot quite, for admin/config it looks like:
WHERE foo LIKE 'admin/config' OR foo LIKE 'admin/%' or foo LIKE 'admin' or whatever the candidates are.
It's just a way to get a db-agnostic case-insensitive equivalent of an IN() query.
@pwolanin patch is not remotely passing so something is wrong, but it looks like overriding matchCollection() ought to give us enough control here?
I was trying to figure out what taking over RouteCompiler would entail, got to the query, then to matchCollection(), then to kgoel's patch.
Comment #217
pwolanin CreditAttribution: pwolanin as a volunteer commented@cath - there is absolutely no reason to change the query. You can lower-case the string going into the query. That's already in the patch. you could also lower-case the pattern in the DB you are matching against without changing the route path that's rendered. That's also easy.
The hard part remains correctly doing the final UrlMatch part.
Comment #218
Wim LeersI just had an idea that I have not at all fully thought through. But it might make things a lot simpler, so I just wanted to mention it.
Can we use an inbound path processor to solve this?
Comment #219
pwolanin CreditAttribution: pwolanin as a volunteer commented@Wim Leers - no, because we need to preserve the case of any path portions matching variable.
Comment #220
Wim LeersRight!
Ah well :) It was worth calling out — I totally forgot we had inbound path processors :)
Comment #221
pwolanin CreditAttribution: pwolanin as a volunteer commentedOk, starting over - here are parts just related to making the DB part of the match case insensitive. The main change from before is the strtolower in \Drupal\Core\Routing\RouteCompiler::getPatternOutline() instead of actually forcing the route paths to lowercase when discovering them.
At this point this shouldn't have any obvious effect since the regex match in UrlMatcher is still case sensitive. so "needs review" mostly is for people interested to look at this subset of changes to understand them.
As above, while using the 'i' modifier on the regex might usually work for the remaining part, I think it's technically incorrect as a fix for that part.
Comment #222
dawehnerWhat about adding just the documentation for now, so people at least not rely on the behaviour?
Comment #223
Wim Leers#222++
Should we RTBC here and mark it rc eligible, then go back to the old goal? Or should we move #222 into a separate issue?
Comment #224
Wim Leers11:05:35 <dawehner> WimLeers: regarding https://www.drupal.org/node/2075889#comment-10580284 I don't care at all, nor do I think anyone does?
Okay then. Let's do this. Changed the tags around temporarily.
Comment #228
catch@pwolanin I'm not sure #213 is enough of a problem to make that approach unviable - we can have test coverage for several different languages, and if we eventually found out one particular route definition that's not ascii doesn't lowercase the same then that's very minor compared to the status quo.
Comment #232
pwolanin CreditAttribution: pwolanin as a volunteer commented@catch - I really prefer to have code that's correct as opposed to code that might usually work, and other times give you weird bugs.
Comment #233
catchThe problem with the current patch though is that rather than just case-insensitivity, it's removing case-awareness altogether - that's not necessarily 'correct' - pretty sure Drupal 7 routes are case-aware and case-insensitive.
Asking for a second opinion.
Comment #240
pwolanin CreditAttribution: pwolanin as a volunteer commentedNo, the earlier patch was doing that, but the direction I want to go in now would leave case awareness (when rendered)
Comment #241
dawehnerWhy did we not just committed the documentation?`
Comment #242
catchCommitted/pushed the docs change only to 8.1.x and cherry-picked to 8.0.x
Comment #245
andypostAccording https://tools.ietf.org/html/rfc3986 6.2.2.1. Case Normalization
So would be great to apply normalization when URLs are rendered
Comment #247
mikeryanGiven that there appears to be no actionable migration system task here, does this still need to be tagged "Migrate critical"?
Comment #248
catchNo I think D8 upgrade path is enough here.
Comment #252
dkre CreditAttribution: dkre commentedWhat's the current status of this issue?
I'm seeing 404s on views urls in Drupal 8.2.3.
Eg. With a views page url of: /visit/things-to-do ~ "/Visit/Things-to-do" will return a 404.
Node urls seem to be uneffected.
The pickle of this issue is that it's really only really discoverable once you're in production. Old urls or external links start cluttering up your 404 reports with no way practical way to resolve it.
Comment #253
drupalninja99 CreditAttribution: drupalninja99 commentedI am confused as well, did this get backed out in another ticket?
Comment #254
DamienMcKennaSo what I think a consensus is that.. we need more tests to fully document what functionality we want to support? :)
Comment #255
DamienMcKennaSplitting this into a separate issue would be completely redundant because this issue included commits that specifically reference this issue, so then we'd need to do another commit that changed the URL... so lets write some tests that clearly define what we want Drupal core to support, and then fix any bugs that prevent it from working that way.
Comment #256
dkre CreditAttribution: dkre commentedI'm just looking for the same expected behavior from Drupal 7, this seems to be the description of the issue. This is a clear regression.
As a site builder or developer should I need to be subscribed to changelogs to catch these instances in core? This isn't noted as a known issue in the release notes.
Comment #257
DamienMcKenna@dkre: Change notices are meant for intentional changes, this is a bug thus doesn't get a change notice record.
Comment #258
dkre CreditAttribution: dkre commentedI do understand that, I was speaking broadly about how as a site builder or developer familiar with D7 would discover these sorts of changes. I started working with 7 not long after it's release so I'm unfamiliar with what happened during the D6 - D7 transition but I'm just surprised these issues are documented in a way which is practical and highly visible given the technical debt they create.
Comment #259
mpdonadioI think the general problem is that the patch we committed on this issue was a doc-only one.
Here is a simple, functional test that shows the problem, both with normal paths, and ones from views. I suspect other patches on this issue have better testing, but it will take a while to go through them.
Comment #261
pwolanin CreditAttribution: pwolanin as a volunteer commentedHere's a straight re-roll of #221 against 8.3.x
Comment #262
mpdonadioLet's see if anything catches on fire.
Comment #263
pwolanin CreditAttribution: pwolanin as a volunteer and at SciShield commentedOk, so this merges back in some of the previous work I did. A big difference is that the route path is not modified in the database, but the pattern outline is.
This means a route defined with a mixed-case path will render that way. I think that was a key sticking point for catch earlier. It also means we do not depend in any way on the string lowering properties of the SQL (or other) back-end, but just on their ability to match unicode strings 1:1.
Needs to be combined with tests, but first seeing if it broke something else.
Overall, there is a small amount of useful code we are getting out of \Symfony\Component\Routing\RouteCompiler, but on set it feels like we are jumping hoops just to say we are using those components. I am especially not sure I believe Crell's assertion that we do (or want to) support host-based routing.
Comment #264
pwolanin CreditAttribution: pwolanin as a volunteer and at SciShield commentedAdds test from #259 (passes locally)
We should test more paths per method there, and/or supplement this with a kernel test since we don't necessarily need a full browser test.
Comment #267
dawehnerWhy is this needed to solve this issue? For me this seems to be a different issue and has nothing to do with the lowercase support
Mh, so this method is called also by
getRoutesByPatternma
, which could then cause issues with base64 encoded information?I'm not sure this is the right place for doing those ... Shouldn't we change the path somehow also for all the route filters? They are executed before this method is called ...
For even better test coverage it would be great to have a path with a placeholder, ideally even with a uppercase parameter passed in
Comment #268
pwolanin CreditAttribution: pwolanin as a volunteer and at SciShield commentedOk, digging into this - it turns out the tokens from the compiled route are used by \Drupal\Core\Routing\UrlGenerator::doGenerate to render the path, so the trick of compiling the route with a lower-cased path won't simply work.
Probably should just rewrite the rest of the route compiler to utilize the simpler Drupal routing rules and build the token with the original path and regex with lower-case path.
@dawehner - the purpose of
public static function getMappedPathVariables()
is to restore the original raw (mixed case) incoming path for any portions of the path corresponding to variables.Comment #269
pwolanin CreditAttribution: pwolanin as a volunteer and at SciShield commentedSpent a lot of time this evening rewriting the route compiler to handle the path part in a simplified Drupal way while preserving the original case in the tokens and producing a lower case regex. Made progress, but not passing obvious tests still locally.
A less pretty alternative would be to lowercase the regex and then fix the broken regex that result.
Basically the regex looks like:
'#^/foo/(?P<bar>[^/]++)$#s'
so lowercasing it is broken - you'd just have to replace(?p<
with(?P<
Comment #270
pwolanin CreditAttribution: pwolanin as a volunteer and at SciShield commented@dawehner - maintaining a mapping of path variables with path part as index is needed to get the correct raw variable later. That's happening here just inside the compiler.
This patch seems to be passing tests now and has a unit test starting to verify the we get the expected compiled result for the paths we want to handle.
Also removes the lowercasing in a place where as you note it might break views.
Still more work to do if we go this direction (how much support for host matching and variables?)
Putting to NR to run the tests.
I can see some possible ways to simplify the code.
Comment #271
dawehner@pwolanin
Do you want to comment your result of your hard work, or basically why you think its not worth to do it? I think if needed we could just mark this issue as closed (won't fix) or so and bite the bullet.
Comment #272
pwolanin CreditAttribution: pwolanin as a volunteer and at SciShield commented@dawehhner - I can pull together pieces here that will fix the issue 99% of the time - the fix just won't be "correct" in the sense that for some PHP configurations it may not work for all non-ASCII paths correctly.
It's better do do that than nothing, just frustrating that the correct fix requires rewriting symfony code (in part because it's marked private).
Comment #273
dawehnerIt's better do do that than nothing, just frustrating that the correct fix requires rewriting symfony code (in part because it's marked private).
Well, the question is whether we could improve some of that stuff upstream somehow. Maybe make it case insensitive optionally or so.
Comment #274
pwolanin CreditAttribution: pwolanin as a volunteer and at SciShield commentedok, so this is the imperfect but working option. It combines the patches in #259 (test only) and #261 plus some additional fixes. Those fixes are what's in the interdiff.
Comment #275
mpdonadioWorking on some additional functional test coverage. Will post what I have at end of night.
Comment #276
mpdonadioFew more functional tests.
Comment #277
mpdonadioThese should really be $this->assertSession()->pageTextMatches and pageTextNotMatches. The regex is b/c pageTextContains() and pageTextNotContains() are case insensitive.
Comment #279
pwolanin CreditAttribution: pwolanin as a volunteer and at SciShield commentedThe test fails seem sporadic and unrelated (alternating between a page cache test fail and OutsideInBlockFormTest fail), but maybe there is a subtle bug.
This patch adds to the test to check more examples including UTF-8 in the route path.
I combined the initial very short test methods so it runs faster for local iteration (~1 min instead of 2+), also fixed missing content type so the node can be viewed.
Also updated the issue summary to reflect the current patch.
QUESTION:
looking at https://github.com/symfony/symfony/pull/297/commits is it worth having an option to flag case sensitivity on? I'm not sure I see a use case, but it's a trivial feature to add.
Comment #280
mpdonadioRead #279 in place. Nice test changes. Think we have good coverage now.
While an upstream fix would be ideal, I think this option will address immediate needs, help anyone who has been holding back on Drupal 8, and is ready for final review.
Comment #284
mpdonadioI only added some test coverage, and not any of the actual change. I think this is ready.
Comment #285
dawehnerTo be honest I would prefer to see the test coverage inside
\Drupal\KernelTests\Core\Routing\RouteProviderTest
as this is the main point where we have our routing test coverage.Comment #286
pwolanin CreditAttribution: pwolanin as a volunteer and at SciShield commented@dawehner - we could add more tests in kernel test, but mpdonadio and I both think a real end-to-end test including all the path processors is needed.
Comment #287
dawehnerWell, then we should have both.
Comment #288
pwolanin CreditAttribution: pwolanin as a volunteer and at SciShield commentedMakes a small code comment change requested in person by xjm and adds a kernel test method reviewed in person by mpdonadio
Comment #289
pwolanin CreditAttribution: pwolanin as a volunteer and at SciShield commented:( #2849669: Fix \Drupal\Component\Utility\Unicode() because of the Symfony mbstring polyfill
Comment #290
alexpottThese changes look good but out-of-scope as they are not necessary to make the fix. Also it results in us running the same preg_split() here and in ::gretRoutesByPath() - I think we should file a followup and see if we can make it more performant.
Good to see test coverage the slugs case is not affected - so if the controller is case-sensitive everything will work.
Comment #291
pwolanin CreditAttribution: pwolanin as a volunteer and at SciShield commentedok, this rips out the extra bug fix to normalize repeated slashes.
spin that off to issue: #2852361: Ignore repeated slashes in the incoming path like Drupal <= 7
Comment #292
dawehnerI'm quite pleased to see how few lines of actual bug fixing is involved.
Comment #293
alexpottFixing the issue title to conform to the scope of the fix.
Comment #295
xjmJust to follow up on #294, the 'rc eligible' tag was removed on purpose; it is only for:
This is not one of those things. Reference: https://www.drupal.org/core/d8-allowed-changes#rc
Comment #296
alexpottDiscussed with @xjm and @catch.
@xjm raised the point about existing sites where they have made
/MYPATH
and/mypath
different. Is this possible on MySQL - what happens and what is the impact of this patch? Is there existing test coverage? Should we have an update function that detects duplicate routes due to case sensitivity and does something - or a warning on the status report?Also @catch pointed out that @dkre was correct in #256 - we should have had a CR for this for D7 to D8. Now we need a change record to tell developers and site owners about the changed behaviour.
Comment #297
xjmWe also support PostgreSQL which I think would respect the case-sensitivity, so that should also be tested.
Comment #298
pwolanin CreditAttribution: pwolanin as a volunteer and at SciShield commentedback to NR for people to check this change record draft: https://www.drupal.org/node/2852554
If it's satisfactory, please remove "Needs change record" tag
@mpdonadio and I discussed the duplicate route issue in person at NJ camp - the possibility of having multiple routes with the same path was supposed to be an important feature we got from Symfony routing, so it's something that already could happen. Adding test coverage around that might be a good idea, but seemed out of scope for this issue.
Comment #299
mpdonadioRead the CR, started to make some edits, and then decided that it is good.
The patch is green against all supported databases. I also looked at the test coverage again, and think we are OK.
I think we are back to RTBC with this unless we explicitly want to do the ideas from #296:
- Existing sites where they have made /MYPATH and /mypath different. Is this possible on MySQL - what happens and what is the impact of this patch? Is there existing test coverage?
- Should we have an update function that detects duplicate routes due to case sensitivity and does something - or a warning on the status report
?
Comment #300
pwolanin CreditAttribution: pwolanin as a volunteer and at SciShield commentedSite owners could already have created odd problems by having several routes with the exact same path, so I am not in favor of trying to babysit this situation or adding a warning.
We don't, for example, warn you if you create a path alias that will block an existing route path from being accessed.
Comment #301
alexpottAnd so we allow people to break their sites when they least expect it.
Is there explicit test coverage of routes with the same path (other than case) in core. If not we should add some because having test coverage of this means that we are finally proving our routing system is database agnostic which is an improvement on D7 and HEAD. Given that we don't have any test fails on Postgres I doubt this coverage exists.
Comment #302
pwolanin CreditAttribution: pwolanin as a volunteer and at SciShield commented@alexpott - there is not any test coverage of routes with duplicate path that I could find, and I'm a little frustrated since that seems out of scope for this fix. But, sure.
This adds a couple test cases to verify that we get a non-error response even when there are multiple route with the same path (modulo case), and that all are found by the route provider.
Comment #303
mpdonadioI read increment-2075889-302.txt several times.
I think RouterTest::testDuplicateRoutePaths() addresses the concerns end-to-end with different routes with the same path.
I also think that RouteProviderTest:: testDuplicateRoutePaths() addresses the concerns about building the collection when there are different routes with the same path.
I think we are back to RTBC unless I am not interpreting what was requested in #301.
Comment #304
alexpottThe test added in #302 is not actually proving or testing which route is responding nor is it testing the change brought by making paths case insensitive for duplicate routes. Patch attached adds more commentary to the test and importantly asserts against the actual route doing the responding.
Should be the actual route - missing the number.
This path needs to match exactly the path case of router_test.three_duplicate3 otherwise we're just testing case insensitivity again. The purpose of the test is to test how the router system responds now that paths that were once considered different and it had different matches for are now not.
Comment #306
pwolanin CreditAttribution: pwolanin as a volunteer and at SciShield commentedAfter discussing with alexpott in IRC, he pointed out that we are sorting the route collection in PHP in \Drupal\Core\Routing\RouteProvider::routeProviderRouteCompare().
I and mpdonadio didn't see that yesterday, so in fact we should have a deterministic ordering of the route collection regardless of the database back-end.
This makes a small test tweak and expands several code comments to make that fact more clear.
Note that the route name column in the {router} table is ascii not utf-8, so using strcmp() to compare the route names is correct.
Comment #307
YesCT CreditAttribution: YesCT as a volunteer commentedAdding test runs for different databases is a good idea. Thanks.
A (final?) tests only patch might help assure us about the test coverage.
I looked at the most recent feedback, and the two interdiffs since the last rtbc. And I think the changes address that concern. Back to RTBC.
Comment #308
alexpottCommitted 77a2431 and pushed to 8.4.x. Thanks!
I've run the tests a few times locally and have seen them fail plenty. Nice to see this finally resolved and in the best way possible.
Comment #310
pwolanin CreditAttribution: pwolanin as a volunteer and at SciShield commentedThanks! I think this was targeted to also go in 8.3.x also as a major bug fix. The Change Record draft indicates 8.3.x, and should be published once it's there.
Comment #312
pwolanin CreditAttribution: pwolanin as a volunteer and at SciShield commentedNot sure why the bot is confused - maybe trying to apply the already committed patch?
The last patch applies cleanly to 8.3.x (and so should be able to be cherry-picked). Just to verify I've re-rolled by applying #306 to 8.3.x and made a patch file from that.
Comment #313
xjmThis is not the question. The question is not whether they could have odd problems. The question is whether existing sites that have already such duplicate-but-different-case paths will have a change in behavior as a result of updating from 8.2.x. Whatever their odd problems might be, they have presumably set things up with the workarounds they need. If upgrading changes the content that is served on a path that would be really bad. So that's what we need confirmation of for this issue. And I think @alexpott's test-only patch up there may demonstrate that there is a behavior change for such sites -- precisely because the test-only patch fails and the combined patch passes.
So we need confirmation of what happens when upgrading from case-sensitive to case-insensitive, and if it's changing what content is served at some path alias, then we need a solution for that and definitely can't put it into 8.3.x in that scenario.
Comment #314
pwolanin CreditAttribution: pwolanin as a volunteer and at SciShield commented@xjm - many more sites migrating from D7 will be broken the longer we wait to get this out.
Do you want an update hook that emits warnings? A requirements warning?
Any of these things are possible if you think they are important enough, or we can do them as a follow-up after this goes in. I prefer the latter, since I think the major bug fix is many times more important to fix than these edge cases.
Comment #315
xjmThanks @pwolanin.
We don't support BC with D7, but we do with 8.2.x. It's unfortunately much worse to break compatibility between stable D8 minors than to let an already-existing problem for the alpha-stability D7 to D8 migration. You're much more likely to miss this in testing a minor update than a major one, since minor updates are supposed to be BC and minimally disruptive.
For the same reason, we can't defer addressing BC/minor upgrade path issues to followups.
I totally agree that the behavior in HEAD is a problem/antipatten and that we should do what we can to fix it as soon as we can. But not disrupting existing sites that are already on 8.2.x has to come first.
I'd really like to see some manual testing of the actual upgrade path for a site that has managed to have both /MYPATH and /mypath defined to different routes because of this bug in 8.0 and 8.1 and 8.2. We could start with alexpott's updated test scenario and try it all ways around with the alphabetical order of route names and so forth. We also need to make sure that testing happens on a case-sensitive environment. Basically we should try as hard as possible to BREAK a site with this update, because then that allows us the mitigate such breakage and ship mitigations with the fix.
Even if sites that managed to exploit this "feature"/bug in previous releases have some edge case weird stuff going on, unexpectedly changing what is served at a given path, even only in some edgecase, is a really bad regression.
Also (I just thought of this) what about a site/code that is restricting access to some path based on the path? They shouldn't do that; it's super fragile. But what if they did?
Comment #316
pwolanin CreditAttribution: pwolanin as a volunteer and at SciShield commented@xjm - I think a requirement warning or error might be the best path forward, since I don't think we can fix the problem without human judgement, just alert if it exists (or is created in the future)
So it seems there are a few cases to warn or error on:
Comment #317
mpdonadioFor 8.3.x, can we split the difference and
- Create a config variable system.routing.bc_path_sensitive in system.schema.yml
- Default it to FALSE (meaning a new file system.settings.yml ?)
- Set it to TRUE in a system_update_83XX
- Use this to define the functionality in the router
This way we preserve case sensitivity in existing installs, but new installs get the expected behavior. Existing users who want case insensitivity can do a config override in settings.php
That should be *way* less work than other options.
?
Comment #318
alexpottTesting some interesting configurations leads me to suspect that this fix might be the start of fixing a bigger bug. Chrome on OS X has some really really weird behaviour with case sensitive paths and access content.
Here's my steps-to-reproduce using the standard install profile:
1. Create a view on nodes using the views wizard with the path
admin/contENT
2. Whilst logged in visit admin/content - I see the admin node listing
3. Try to visit admin/contENT by just changing the url in bar. Chrome will autocomplete and match to admin/content so pressing return will visit admin/content
4. If you paste the full url in the address (in my case
http://drupal8alt.dev/admin/contENT
) and hit return you'll get to the expected page5. Now for magic... put the carat at the end of the address bar and press return -> I'm taken to admin/content (wtf)
My point being is that if you are really on paths of different cases doing different things you'll probably be having quite few problems already.
FWIW Firefox does not behave like this at all - ie. it doesn't futz with what the user enters in the URL bar.
Comment #319
xjmThanks @alexpott. What about a scenario where the site has defined both
/mypath
and/MYPATH
(i.e. nothing that core already defines), possibly on purpose?One can stop Chrome from autocompleting by typing over and pressing delete or escape, but the autocomplete behavior is annoying even for non-case-magic aliases. So while that's further evidence that the HEAD behavior is abnormal for the internet, I don't think it's enough to say "don't worry about this".
That is... pretty special.
Comment #320
xjmOh re: #317:
Unfortunately I think that runs into a lot of the exact same problems as #2751325: All serialized values are strings, should be integers/booleans when appropriate is discussing in terms of long-term compatibility. (These two issues are actually kind of similar as they both deal with changes to the valid data type/format for an external request. I think this issue is much WTFier and so less likely to be leveraged as a "feature" out there on existing sites, but on the other hand it affects a far wider audience; thence care is needed in both cases.)
Comment #321
alexpott@xjm the /mypath and /MYPATH would have exactly the same problems in chrome and whilst it is possible that some has used the API this way. It is totally not an intended API and given the weight of the 1 million Drupal 7 sites that we want to upgrade to Drupal 8 with the least possible problems far outweighs unintended use of the API that we have no evidence for. Whilst we have plenty of evidence that people will link using whatever case they happen to have had caps lock stuck on. It's not that we shouldn't worry about this it is that we should worry about the Drupal 7 to 8 migration more.
All that said, we could implement a more complex solution. We could prioritise exact matching over case insensitive matching. This will add complexity and brittleness.
Also all my testing has revealed an interesting side effect of this change which deserves some thought. The page cache CID is still case sensitive and it caches by the user entered path so if you have a route that responds to /test and you make requests for /test and /TEST you'll get:
In the 8.3.x the /TEST would be a 404 but now in 8.4.x it is a 200 and so would tEst, tesT etc... in the past these would have been 400s and clearable separately. This means that potentially this opens 8.4.x up to worse cache filling attacks. We have this with the 404s atm but it is tricker to manage with 200s.
Comment #322
BerdirA bit weird that it is cached separately but I don't see this as a problem.
The problem with 4xx is that we don't know what will be there, but in your case, both TEST and test will have the same cache tags and will be guaranteed to be invalidated properly and consistently. We have no per-url or prefix or anything like that invalidation API for page cache.
And if you want to fill up the page cache, then all you need is ?i=$i, there is no limit there and this is not making it worse.
Comment #323
xjmAgain, I don't really care about Chrome, I mean what happens on update for those sites?
Comment #324
xjmIndeed.
Comment #325
pwolanin CreditAttribution: pwolanin as a volunteer and at SciShield commented@alexpott - yes, the CID is still case sensitive. The problem is that we don't know at that point which parts of the path are e.g. base64 encoded data that needs to be preserved and which parts are not, so we'd have to revive earlier and more complex solutions in this thread if we want to reconstruct a path where the variable pats are untouched but the slugs are left in the incoming case.
You seemed to prefer splitting off things like that to other issues (like the duplicate slashes) in any case.
@xjm if you previously defined both /mypath and /MYPATH as different routes - before upgrade they'd resolve to the 2 different routes, after they would both resolve to one of the two (the one with the 1st sorting route name) assuming that they don't have some requirement that distinguishes them (for example a regex requiring the path of /MYPATH to be upper case).
This might be worth a requirements warning, I agree with alexpott that providing the expected behavior for upgrade D7 sites is a much more urgent priority.
Comment #326
xjmThanks @pwolanin. This is the disruption I want to mitigate, even if sites that got there are goofy anyway.
I don't agree with that though. As I said, providing as much BC as possible with 8.2.x is a higher priority. People upgrading from D7 are still blocked on migration and contrib and everything else, whereas this fix can also disrupt both people who already fixed it after their D7 upgrade and had capslock-related regressions in the other direction, or people who started from D8. Edit: Also as I said they're likely to test their major update much more thoroughly than the minor one, because they already have to build a fresh site to migrate into, test the migration which has other big gaps than this, rewrite their theme, etc.
Thanks @pwolanin and @alexpott for continuing to discuss this. This is definitely a major-major bug; I just want to make sure we balance fixing the bug/restoring continuity with D7 with making our minor updates as safe as possible.
Comment #327
xjmAlso re: requirements warning that's a tricky solution, because (as @catch and @alexpott and I have discussed in the course of several issues both now and for experimental modules, etc.) we don't have very good possibility for "choose-your-own-adventure" upgrade paths in patch/minor/etc. releases.
Comment #328
alexpottHere's an approach that is BC compatible and does not lose the static prefix optimisation either. There's one fail in the RouterTest that I'm not sure of why.
The approach basically tries case sensitive routing and if that does not work tries doing a case insensitive match. This means that if a site does have two routes with the same path except for case we still respect this. We could remove this BC layer in Drupal 9 and only support case insensitive matching.
Also looking at the code even closer I think:
This change has performance implications that should have been measured. Changing this to empty string makes routing skip an optimisation added by Symfony.
Given the situation this issue is in is quite rare, I'm going to attach the patch here and set the branch back to 8.4.x to get opinions on this approach. Obviously this patch builds upon the tests and work of #312.
I think the new approach might give us the best of all worlds :)
Comment #329
pwolanin CreditAttribution: pwolanin as a volunteer and at SciShield commentedInteresting idea. We should not check the static prefix since that's duplicating the SQL match. See earlier patches and comments.
I think the change to getRoutesByPath() is wrong and will break the case-insensitive behavior on sqlite and pgsql.
I think this change is also wrong:
Those 2 pieces are matched to make the pattern outline be consistently lower case.
Comment #330
mpdonadioThis should show what #328 would look like if we didn't commit #306.
Comment #331
alexpottThanks @mpdonadio - that's really helpful - so people can see the test coverage.
I see the point about getRoutesByPath() we need this to produce the a case insensitive full list before doing the case sensitive match (and then case insensitive match if required). Also yep the static prefix is covered by the query. I've added more commentary. The patch attached should fix the postgres issues and have the full BC layer.
Comment #332
alexpottI considered using SQL's
LOWER()
in the query in\Drupal\Core\Routing\RouteProvider::getRoutesByPath()
. @pwolanin was very against this because he asserted that it made the routing system not database agnostic. All the tests passed on both postgres and mysql locally - so it wouldn't affect supported databases. The advantage of doing this is that all the case changing occurs in the same method. The theoretical disadvantage of this is potential differences in how PHP and different SQLs implementing lowering for UTF8 characters. (I believe we'd have quite a few other problems if this was really true for the out-of-the-box supported databases.) Given this discussion, I moved the lower casing from the RouteCompiler to CompiledRoute just in case anyone is mad enough to use CompiledRoute outside of the RouteCompiler. At least it'll be somewhat consistent.Comment #333
alexpottIn #328 I said I had a fail in RouterTest - that was because of #2854817: Duplicate X-Content-Type-Options headers both with the value nosniff which is not related to this.
Comment #334
catchWe can use LIKE for case-insensitive cross-database queries since it's mapped to ILIKE on postgres. LOWER() on the other hand can't use indexes and we spent literally years deciding on a way to stop using it for that reason. I'm not sure if this is a better solution than what's in the patch, but it's an option.
Comment #335
alexpottIndexing - that's a great point. At the moment we take advantage of the
KEY `pattern_outline_parts` (`pattern_outline`(191),`number_parts`)
on the router table by using the IN. Therefore the solution to not use LOWER or LIKEs with ORs is the best. Thanks @catch and @pwolanin for convincing me :)Comment #336
catchOK so that makes sense, but I'm not sure the complexity of the bc layer for duplicate routes is worth it considering as far as we know it's a very theoretical break. Type http://GoOgLe.com into a browser and it works fine, path aliases are already case insensitive and have been for years - all these examples and more have been given in this issue, which should have been a quick fix for a regression (or really, a revert of the original patch that broke this) at the time.
Comment #337
pwolanin CreditAttribution: pwolanin as a volunteer and at SciShield commented@catch - I was fine with the theoretical break, but xjm made the argument that if on upgrade to 8.3 your path or aliases started delivering different content you'd have some right to be upset and it's not something we can readily fix. So, I think this is a reasonable compromise since if you normally make everything lower case, the overhead basically nothing, and the fallback costs you an extra preg_match or a few at most.
Some nits:
Please put back this indentation fix
Please put back this doc fix
I don't think we usually use underscore method naming now in core? I see a lot more that are named like
doMatchCollection()
(maybe a dawehner pattern)Comment #338
alexpottWe can do a little less process the next time around if routes are rejected from reasons other than case sensitivity if they match the regex.
fixed all the nits from #337.
Comment #339
pwolanin CreditAttribution: pwolanin as a volunteer and at SciShield commentedLooks like good changes.
We could probably omit the
$routes->count() > 0
check since the foreach is basically doing the same, but it's not too critical either way.Comment #340
xjmFew minor docs notes, one question, one proposed outfollowup.
case-sensitive.
No hyphen.
Optimize. Also, remove also.
Whether, case-sensitive.
These comments don't follow our coding standards, but they're copied from the Symfony class. Not sure how we handle that in general, but we do update the docblock at least.
So, I can see that these routes are already defined for the router test, but maybe a followup to give them meaningful names, as it's difficult to tell what we are actually trying to test otherwise.
Huh, so this is interesting. I did not realize this was in HEAD. So the route compiler was trying and failing before to support case-insensitive matching?
Comment #341
xjmIt looks like there was some documentation added in a previous commit on this issue that claims the behavior is case-insensitive already, followed by "@todo make this actually true":
From
core/lib/Drupal/Core/Routing/routing.api.php
.Comment #342
xjmAh, I see that #341 was actually the result of the past commit on this issue which is still in HEAD and not reverted, so this will be a third commit on a 300+ comment issue. Carrying on.
Comment #343
alexpott@xjm re #340.7 - we never reverted the commit in #309 - so atm 8.4.x is case sensitive.
Looking #341 - makes me think hmmm about all the work to provide a BC layer since the commit. We did a commit back in #243explicitly to say the path should be treated as case-insensitive. I'd forgotten about that. Also we really ought to have created a new issue after that. This said we did make that commit during the release of 8.1.0 not 8.0.0 so there is still an argument that the bc layer is valid.
Too many commits on this issue - ho hum.
Patch attached addresses #340 and #341. I've also change case insensitive because https://en.wikipedia.org/wiki/Case_sensitivity says
(Amusingly the article is also inconsistent about case sensitive or case-sensitive. Our code base is too and so is PHPs probably everyone's is)
Re #340.5 Fixed to obey our coding standards.
Re #340.6 I've changed the route names and paths to include the words "case sensitive"
Also attaching a complete patch so it is easy to see the new work plus #309.
Comment #344
xjmI tested installing 8.3.x and upgrading (with #338, not #341). Based on that testing, I do still think the BC layer is a good idea. While this text is indeed found in one bit of disconnected documentation buried within a long topic in core, it can have actual disruptive consequences without the latest patch.
Comment #345
xjm+1 for this added documentation. I have a suggestion to clarify it a little; will try to post a patch shortly.
Comment #346
xjmShould we deprecate the case-sensitive fallback for Drupal 9? Let's discuss that in a followup maybe?
Comment #347
xjmAttached clarifies the order for how route paths are matched, and explains in more detail why they should be avoided.
Comment #348
dawehnerAs another followup I'm wondering whether the parent class is actually still needed
I had no idea that we also support this feature, but yeah I guess we also cannot drop that ever.
Also another follow up, these kind of stuff should be already checked by route filters, isn't it?
Here is a question: Should we somehow test the potential problem that the order of entries in the routing.yml files matter?
Comment #349
alexpottre #348.1 - there's 8 methods not overridden so I'm not sure that we should.
re #348.2 and 3 - indeed there's probably a few things like this in the routing system.
re #348.4...
Fortunately I don't think that they do - see
\Drupal\Core\Routing\RouteProvider::getRoutesByPath()
And then the sort does:
And routes can't have the same name - that's a bug and not supported. And also not part of this issue - this is about routes having the same path.
Comment #350
xjmExtra space at the beginning of this line somehow; can be fixed on commit.
Comment #351
pwolanin CreditAttribution: pwolanin as a volunteer and at SciShield commentedNice addition to the api docs - should probably copy that to the change record also.
Comment #352
xjmFor #348 1-3, I was also struggling with how much to preserve the integrity of the original method, minus what we're overriding. I'm not sure how to handle that problem.
This also reminded me to check whether we ever actually tried to discuss this upstream. In #3 alexpott referenced this previous Symfony PR about it from 2011:
https://github.com/symfony/symfony/pull/297
A few solutions similar to ours, but different, are also proposed at:
http://stackoverflow.com/questions/17538890/how-can-i-make-routing-case-...
And we've done lots of research on this issue that shows that both web clients and the internet generally (in addition to people) do not reliably behave in a case-sensitive way for paths. So I wonder if it's worth reopening the discussion with Symfony six years after the original PR. The original PR was a behavior change, but maybe an API addition to opt in to case-insensitive routing for paths would be considered instead. Because surely we're not the only project to run into this issue (and the comments spanning 5 years on those links also suggest that).
We should not block this issue on any of that; I think the current approach is what we should have done in the first place and backportable to 8.3.x given how problematic this issue is for users and implementations (as well as D6/7 migrations).
For #348.4. @dawehner wrote:
@alexpott replied:
I did not read @dawehner's comment the same way. The order of the route definitions does matter, because the previous solution and 8.4.x HEAD picks the first case-insensitive match, for whatever "first" is. That's what I thought @dawehner was getting at, at least, and maybe proposing we extend the test coverage for it. See illustration in #344. Even if we read code and think "looks like we don't need to worry about that potential problem," we should still also test those potential problems explicitly. For me, the test coverage looked sufficient to this bug, but maybe @dawehner can clarify.
Sounds also though like we have requested a few followups:
Finally, I really think we should have reverted the 8.4.x patch and then worked on a combined patch for this followup for my BC concern. (Actually, I should have reverted the patch, but I did not do so because I wanted to avoid an argument with other committers and was stressed about the pressure to commit and backport the previous patch, so that's partly my fault for not just overcoming that and doing my job.) However, in order to backport this, I'd like to revert the previous 8.4.x commit and recommit a combined patch as I feel should have happened to begin with, rather than cherry-picking two separate commits a month apart. Would others be okay with that now?
Comment #353
xjmOh and #351 also proposes updating the change record. I think that's a good idea; the CR needs to be updated for the current approach anyway, and its title could be better (we don't need to be defensive in the title about the fact that we're treating it as a bug; the title should just communicate succinctly that we are restoring support for the D7 default case-insensitive behavior, and now while also retaining BC for the 8.0.x-8.2.x behavior).
It's actually NW for CR updates I think. Thanks!
Comment #355
alexpottRe performance testing: given there are no additional file reads or database queries or cache gets I agree with #337
+1 for reverting. It'll make the complete solution easier to grok. Given I committed the patch to be reverted I guess I should do that. Done. New full patch + fix #350
Comment #356
alexpottUpdated the CR using @xjm's text and suggestions and fixed a typo in the docs.
Comment #357
xjmThanks @alexpott. I made a few small edits to the CR. It looks good to me now.
Sneaky space is still there, still fixable on commit.
Comment #358
himanshu-dixit CreditAttribution: himanshu-dixit as a volunteer commentedHere is the patch.
Comment #359
xjmSince this has BC and resolves the minor-version-update-changes-my-content bug/BC break I was worried about, I'm also comfortable backporting this now in order to minimize future disruptions. I know @catch is as well. So explicitly moving to 8.3.x, and tagging RC deadline for visibility.
Comment #360
catchYeah the fallback only happens in the case of what would be a 404 now, so I don't think it's worth testing. I'd be concerned if we were doing two queries (query cache issues etc.) but we're not.
Comment #361
xjmFor the other followups in #352, now excluding the first one about profiling given #360 and #337.
Comment #363
pwolanin CreditAttribution: pwolanin as a volunteer and at SciShield commentedThis method doesn't actually throw exceptions, so these @throws should be removed
This method doesn't actually throw exceptions, so these @throws should be removed
Comment #364
pwolanin CreditAttribution: pwolanin as a volunteer and at SciShield commentedWith doxygen fixes
Comment #365
amateescu CreditAttribution: amateescu for Pfizer, Inc. commentedPretty sure there's an extra 'w' in there :)
Comment #366
xjmSo I think we just need to address #352 (edit: and that extra 'w'), and then do a final review. Today is the last day to backport this fix for RC.
Comment #367
himanshu-dixit CreditAttribution: himanshu-dixit as a volunteer commentedFixing #365 in the new patch,
Comment #368
dawehnerGiven that the order of stuff in yml files is a potential pre-existing problem, I think this totally doesn't belong here.
Comment #369
catchI opened #2856571: Decide whether to deprecate case sensitive route matching for 9.x.
Is .routing.yml anything specific to routing or an issue with YAML/the YAML parser itself? Not sure really what to write for that or the other follow-ups.
Comment #372
catchCommitted/pushed to 8.4.x and cherry-picked to 8.3.x. Thanks!
Marking this fixed, I'll try to speak to xjm later about any further follow-ups that are necessary. For me I personally do not have the energy for an upstream discussion about this, if someone else does please open an issue and link from here though.
Comment #373
xjm#2856571: Decide whether to deprecate case sensitive route matching for 9.x is the key one I think; it'd make sense to discuss whether to open upstream discussion there as well, since that will affect our D9 implementation.
Comment #375
xjmAlso tagging for the release notes as agreed. :) And I added my thoughts about upstream to #2856571: Decide whether to deprecate case sensitive route matching for 9.x.
For the remaining followups, anyone who's interested in filing an issue for #352 points 2 and 3 is encouraged to, but those two are not in scope for the definition of "done" here. :)
Thanks everyone!
Comment #377
himanshu-dixit CreditAttribution: himanshu-dixit as a volunteer and at Google Summer of Code commentedAttributing the contribution to GSoC.