Splitting off part of the work from #2075889: Make Drupal handle incoming paths in a case-insensitive fashion for routing

Problem/Motivation

Whilst working on #1987898: Convert user_view_page() to a new style controller and #2075889: Make Drupal handle incoming paths in a case-insensitive fashion for routing we note that path aliases handling will be case insensitive on mysql but case sensitive on pgsql and others.

So on some sites:

/Path/Alias and /path/alias will resolve the same, on other sites one will be a 404

Proposed resolution

we agreed to fix this by keeping most of the existing behavior from Drupal 7.

/Path/Alias- not 404. Same as /path/alias on all supported databases

Remaining tasks

User interface changes

none

API changes

Change the query to the select builder so we take advantage of the ILIKE on pgsql

Data model changes

none

Beta phase evaluation

Reference: https://www.drupal.org/core/beta-changes
Issue category Bug because links in D6/D7 sites can break during an upgrade to D8.
Issue priority Major because previously working links can break during a D8 upgrade. Not critical because:
  • There are workarounds (e.g. .htaccess rules).
  • It can't actually result in any data integrity or security issues, just a whole lot of embarrassing broken links.
  • We might be able to fix it in a patch release safely (but better before release).

See #23 for more details.

Disruption Disruptive for sites running on beta versions that added case-sensitive aliases in Drupal 6-7 or in Drupal 8.
CommentFileSizeAuthor
#53 2584243-53.patch20.59 KBstefan.r
#53 interdiff-50-53.patch2.92 KBstefan.r
#50 increment.txt4.24 KBpwolanin
#50 2584243-50.patch21.04 KBpwolanin
#42 2584243-42.patch21.33 KBstefan.r
#42 interdiff-38-42.txt3.49 KBstefan.r
#38 2584243-38.patch20.83 KBstefan.r
#38 interdiff-35-38.txt10.05 KBstefan.r
#36 interdiff.txt14.68 KBstefan.r
#36 2584243-35.patch18.67 KBstefan.r
#35 2584243-35.patch18.67 KBstefan.r
#35 interdiff.txt14.68 KBstefan.r
#33 interdiff.txt958 bytesdawehner
#33 2584243-33.patch13.79 KBdawehner
#30 2584243-30.patch13.79 KBdawehner
#30 interdiff.txt6.49 KBdawehner
#16 interdiff.txt999 byteskgoel
#16 2584243-16.patch7.77 KBkgoel
#12 increment.txt6.86 KBpwolanin
#12 2584243-12.patch7.04 KBpwolanin
#4 2584243-increment.txt479 bytespwolanin
#4 2584243-4.patch8.15 KBpwolanin
#3 2584243-2-test-only.patch1.01 KBpwolanin
#2 2584243-2.patch8.39 KBpwolanin
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

pwolanin created an issue. See original summary.

pwolanin’s picture

This has just has the 3 hunks I think are needed from the other patch https://www.drupal.org/files/issues/2075889-197_0.patch

pwolanin’s picture

pwolanin’s picture

YesCT’s picture

+++ b/core/lib/Drupal/Core/Path/AliasStorageInterface.php
@@ -135,8 +135,9 @@ public function languageAliasExists();
-   * @param string[]|null $keys
-   *   (optional) Search keys.
+   * @param string|null $keys
+   *   (optional) Search keyword that may include one or more '*' as a wildcard
+   *   value.

is this correcting incorrect docs, or is it changing the param?

pwolanin’s picture

@YesCt - it's fixing the docs since the code actually just takes a string, not an array of strings.

dawehner’s picture

+++ b/core/lib/Drupal/Core/Path/AliasStorage.php
@@ -140,19 +141,22 @@ public function preloadPathAlias($preloaded, $langcode) {
+    $select = $this->connection->select('url_alias');
+    $select->fields('url_alias', ['source', 'alias']);
+    $select->condition('source', $preloaded, 'IN');
     if ($langcode == LanguageInterface::LANGCODE_NOT_SPECIFIED) {
-      // Prevent PDO from complaining about a token the query doesn't use.
+      // Don't put the same value in the IN query twice.
       unset($args[':langcode']);
-      $result = $this->connection->query('SELECT source, alias FROM {url_alias} WHERE source IN ( :system[] ) AND langcode = :langcode_undetermined ORDER BY pid ASC', $args);
     }
     elseif ($langcode < LanguageInterface::LANGCODE_NOT_SPECIFIED) {
-      $result = $this->connection->query('SELECT source, alias FROM {url_alias} WHERE source IN ( :system[] ) AND langcode IN (:langcode, :langcode_undetermined) ORDER BY langcode ASC, pid ASC', $args);
+      $select->orderBy('langcode', 'ASC');
     }
     else {
-      $result = $this->connection->query('SELECT source, alias FROM {url_alias} WHERE source IN ( :system[] ) AND langcode IN (:langcode, :langcode_undetermined) ORDER BY langcode DESC, pid ASC', $args);
+      $select->orderBy('langcode', 'DESC');
     }
-
-    return $result->fetchAllKeyed();
+    $select->condition('langcode', $args, 'IN');
+    $select->orderBy('pid', 'ASC');
+    return $select->execute()->fetchAllKeyed();

@@ -160,23 +164,26 @@ public function preloadPathAlias($preloaded, $langcode) {
     );
     // See the queries above.
+    $select = $this->connection->select('url_alias');
+    $select->fields('url_alias', ['alias']);
     if ($langcode == LanguageInterface::LANGCODE_NOT_SPECIFIED) {

So i'm wondering whether we really need to switch from simple queries to an object. Is this a requirement to fix the issue?

Crell’s picture

Another alternative is to just have separate MySQL, Postgres, and SQLite implementations of the AliasStorage class, each optimized accordingly. That's fully supported in core and something we should take more advantage of. :-)

kgoel’s picture

Another alternative is to just have separate MySQL, Postgres, and SQLite implementations of the AliasStorage class, each optimized accordingly. That's fully supported in core and something we should take more advantage of. :-)

+1
I like the idea of separate implementation. It's sounds very straight forward but don't know how easy/difficult it would be to implement this.

pwolanin’s picture

Status: Needs review » Needs work

@dawehner - I thought we needed to, but maybe not - it seems the pgsql driver also replaces queries in strings.

\Drupal\Core\Database\Driver\pgsql\Connection::prepareQuery()

    return parent::prepareQuery(preg_replace('/ ([^ ]+) +(I*LIKE|NOT +I*LIKE) /i', ' ${1}::text ${2} ', $query));

So you are right, we could just make them LIKE instead of select objects

Crell’s picture

kgoel: Quite simple: https://www.palantir.net/blog/d8ftw-customizing-your-back-end

(That's assuming the challenge is cross-DB SQL. If that's not the issue that won't help.)

pwolanin’s picture

I don't think it's worth the complexity when we already have a way to make it DB agnostic using LIKE and it's already got the backend_overridable tag.

Should probably include an alias in the test that has _ and/or % and one that does not that would match?

This puts pack the simple query() calls but escapes all the values for LIKE which was missing before.

catch’s picture

Status: Needs review » Needs work

Looks like real fails on postgres and sqlite.

pwolanin’s picture

Hmm, should I not have escaped the values?

effulgentsia’s picture

Priority: Major » Critical

I discussed this with the other committers, and we decided to raise this to Critical for now.

Because, the upgrade path might require data loss on case-sensitive databases such as Sqlite. For example, on Sqlite, you can in both 7.x and 8.0.0-rc2 create "foo" and "Foo" as aliases to different paths. If we fix this issue in a way that starts disallowing that, then figuring out how to migrate such 7.x databases will be Migrate's responsibility, which is still flagged as an experimental module, so that's ok. But updating existing 8.x databases will be hook_update_N()'s responsibility. And making such an update function delete semi-duplicate-but-not-really aliases might be more palatable during RC than after people deploy sites on 8.0.0. For example, a similar kind of deletion of same-except-for-case records in #1260938: d6 to d7 update fails on file duplicates '#7061 Integrity constraint violation' ended up taking a very long time to land.

That said, if we don't solve this relatively quickly, we may choose to downgrade it rather than hold up 8.0.0's release, but that's a discussion/decision for a later date. Hopefully, it won't come to that.

kgoel’s picture

dawehner’s picture

+++ b/core/modules/path/src/Tests/PathAliasTest.php
@@ -82,6 +83,13 @@ function testAdminAlias() {
+    // Confirm that the alias works in a case-insensitive way.
+    $this->drupalGet(Unicode::strtolower($edit['alias']));
+    $this->assertText($node1->label(), 'Alias works lower case.');
+    $this->assertResponse(200);
+    $this->drupalGet(Unicode::strtoupper($edit['alias']));
+    $this->assertText($node1->label(), 'Alias works upper case.');
+    $this->assertResponse(200);

@@ -93,7 +101,7 @@ function testAdminAlias() {
     // Confirm that the alias works.
-    $this->drupalGet($edit['alias']);
+    $this->drupalGet(Unicode::strtolower($edit['alias']));
     $this->assertText($node1->label(), 'Changed alias works.');
     $this->assertResponse(200);
 
@@ -225,7 +233,7 @@ function testNodeAlias() {

@@ -225,7 +233,7 @@ function testNodeAlias() {
     $this->drupalPostForm('node/' . $node1->id() . '/edit', $edit, t('Save'));
 
     // Confirm that the alias works.
-    $this->drupalGet($edit['path[0][alias]']);
+    $this->drupalGet(Unicode::strtolower($edit['path[0][alias]']));
     $this->assertText($node1->label(), 'Changed alias works.');

Just a general comment, it would be nice to also test all the various changed methods like save() / delete() etc ... lookupPathSource and aliasExists

catch’s picture

Assigned: catch » Unassigned

Doesn't need to be assigned to me.

dawehner’s picture

Just to clarify my previous comment, the amount of additional tests does not really match yet the amount of changed functionality.

catch’s picture

Do we actually need the case-insensitivity when looking up alias from source? That's not used for incoming paths.

dawehner’s picture

This is a valid question.

public function aliasExists($alias, $langcode, $source = NULL) {
+ // Use LIKE and NOT LIKE for case-insensitive matching.
$query = $this->connection->select('url_alias')
- ->condition('alias', $alias)
+ ->condition('alias', $this->connection->escapeLike($alias), 'LIKE')
->condition('langcode', $langcode);
if (!empty($source)) {
- $query->condition('source', $source, '<>');
+ $query->condition('source', $this->connection->escapeLike($source), 'NOT LIKE');
}

So yeah that example is certainly not needed ... ideally we would distinct between what you use on a more storage API level
and what you use on runtime, which should have the case insensitive behaviour.

pwolanin’s picture

Status: Needs review » Needs work

I'm not sure I understand. IF you don't make aliasExists behave consistently with the other function, that will lead to inconsistent behavior in the UI. That function is used in form validation.

Needs work sicne clearly it's still failing on postgres and sqlite

catch’s picture

The alias lookup needs to be case insensitive, but the source lookup implies we're storing case insensitive source. Do we actually allow people to enter the source path in random case in 7.x? If we do then maybe that's needed.

pwolanin’s picture

Oh, hmm - well we plan to handle routing in a case-insensitive way, then I think source lookup should be case insensitive?

mradcliffe’s picture

We allow to add aliases in mix case in 6.x.

catch’s picture

catch’s picture

Issue tags: +minor version target

Tagging minor version target again, just to make it clear we can't fix this in a patch release.

pwolanin’s picture

@catch - we are not talking to the routing system at all here.

dawehner’s picture

Issue tags: +D8 Accelerate

Looking into the issue for a while

dawehner’s picture

Status: Needs work » Needs review
FileSize
6.49 KB
13.79 KB

I think its fine to convert all API functions, its more consistent, though a better help text for the validation would be nice
Also adding documentation about the behaviour isn't a bad idea, as well as some general tests.

Status: Needs review » Needs work

The last submitted patch, 30: 2584243-30.patch, failed testing.

swentel’s picture

+++ b/core/modules/path/src/Form/PathFormBase.php
@@ -180,8 +180,16 @@ public function validateForm(array &$form, FormStateInterface $form_state) {
+        $form_state->setErrorByName('alias', t('There is an alias with other case-sensitivity: %actual_alias is already in use in this language.', ['%actual_alias' => $stored_alias['alias']]));

Hmm, this sounds a bit verbose, and maybe a bit unclear for some people ? Don't have a better proposal atm though.

dawehner’s picture

Hmm, this sounds a bit verbose, and maybe a bit unclear for some people ? Don't have a better proposal atm though.

Yeah I certainly don't think its at all a good sentence, maybe someone has a better idea?

Fixed the failure.

stefan.r’s picture

Reviewing this and having a look at the sqlite/pgsql test failures

stefan.r’s picture

stefan.r’s picture

Some small changes, test coverage and an attempt at fixing the SQLite/PostgreSQL test failure.

I changed the db_query into a dynamic query as SQLite requires an ESCAPE '\' after the LIKE, which dynamic queries do automatically. I tried to hardcode the the ESCAPE '/' into the db_query but didn't succeed as I kept getting an unknown token error. We don't do LIKE x ESCAPE y anywhere else in core either so the dynamic query may be the best way to go?

I also suspect SQLite has a collation issue as it can't find the upper-case version of some non-ASCII characters (#1518506: Normalize how case sensitivity is handled across database engines) so skipping those in the test when using SQLite for now...

Also what about preloadPathAlias, we skip the case insensitivity there?

swentel’s picture

+++ b/core/lib/Drupal/Core/Path/AliasStorage.php
@@ -15,8 +15,9 @@
+ * fields, so the the aliases '/test-alias' and '/test-Alias' are considered

the the

The form error sounds a lot clearer to me already!

stefan.r’s picture

This adds case insensitivity to preloadPathAlias (do we need it though?), and adds a todo pointing to a followup I just opened at #2607432: SQLite driver does not allow for case insensitive LIKE comparisons on non-ASCII characters. It also cleans up the dynamic queries a bit.

I just noticed we already tried and then reverted using the query builder in #12 - does the approach in this patch pose a significant performance slowdown? We could go back to hardcoded SQL queries but SQLite has a problem with wildcards in the LIKE there (% and _ need to be escaped, which might require unnecessary changes), and in #12 we said doing database-specific implementations was not worth the complexity.

mradcliffe’s picture

+++ b/core/lib/Drupal/Core/Path/AliasStorage.php
@@ -128,90 +144,103 @@ public function delete($conditions) {
     if ($langcode == LanguageInterface::LANGCODE_NOT_SPECIFIED) {
...
+      $select->condition('langcode', LanguageInterface::LANGCODE_NOT_SPECIFIED);
...
     elseif ($langcode < LanguageInterface::LANGCODE_NOT_SPECIFIED) {
...
     if ($langcode == LanguageInterface::LANGCODE_NOT_SPECIFIED) {
...
     elseif ($langcode > LanguageInterface::LANGCODE_NOT_SPECIFIED) {
...
     if ($langcode == LanguageInterface::LANGCODE_NOT_SPECIFIED) {
...
     elseif ($langcode > LanguageInterface::LANGCODE_NOT_SPECIFIED) {

Should we use $langcode_undetermined instead of LanguageInterface::LANGCODE_NOT_SPECIFIED for consistency?

stefan.r’s picture

@mradcliffe the original code didn't do so either despite having the variable available and I wanted to avoid unrelated changes, but other than that I don't see why not...

Status: Needs review » Needs work

The last submitted patch, 38: 2584243-38.patch, failed testing.

stefan.r’s picture

Status: Needs review » Needs work

The last submitted patch, 42: 2584243-42.patch, failed testing.

stefan.r’s picture

Status: Needs work » Needs review

The last submitted patch, 30: 2584243-30.patch, failed testing.

The last submitted patch, 38: 2584243-38.patch, failed testing.

dawehner’s picture

+++ b/core/lib/Drupal/Core/Path/AliasStorage.php
@@ -128,90 +145,106 @@ public function delete($conditions) {
    */
   public function preloadPathAlias($preloaded, $langcode) {
...
    */
   public function lookupPathSource($path, $langcode) {

So yeah just those 2 queries potentially runs often, on the other hand due to our caching, I guess we need to care less about it

stefan.r’s picture

@dawehner the difference between the dynamic query in lookupPathSource vs the db_query is about .3 ms on average on my slow laptop.

For the routing we only do the database query on the first page visit, right? Most pages have a few calls to Url::fromUri() as well, which goes through PathValidator::getPathAttributes() and then does a PathProcessorAlias::getPathByAlias(). So altogether I see some 3-4 calls to these newly introduced dynamic queries per page on average.

As to how to proceed here while still fixing SQLite, we could either use this patch, take the ~1ms performance hit and optimize in another release, or alternatively we could find a way to make LIKE x ESCAPE y work in a db_query.

pwolanin’s picture

If we cared most about performance, we'd have used the original approach of forcing aliases to lower case.

I had a version of this earlier on using dynamic queries that radically cut the different, confusing, conditions and repeated code. See patch at #4. Can we go back to something like that plus the missing LIKE escaping?

pwolanin’s picture

Ok, applying that pattern. Also, the fixes all select objects to be $select, which is important to reduce confusion when looking at those methods.

stefan.r’s picture

Interdiff looks great, does anything still needs to happen to the current patch?

chx’s picture

Why on earth did we add local variables for constants??

    $langcode_undetermined = LanguageInterface::LANGCODE_NOT_SPECIFIED;

is this expected to change? Is there a tax on the number of characters? Is there some incredible PHP bug we are coding around? I am so baffled.

stefan.r’s picture

catch’s picture

This looks good to me. If we're really worried about the query building, then switching to database-specific implementations would do it, but then we should reconsider the LIKE/ILIKE optimization we have in the database layer, which is used for a lot of things other than path aliases like user names.

If I mark this RTBC I can't really commit it, but no complaints.

plach’s picture

Status: Needs review » Reviewed & tested by the community

If I mark this RTBC I can't really commit it, but no complaints.

Easily solved :)

As a side note, it seems we are applying the same logic multiple times, what about creating a follow-up to factor it out to an helper method and reuse it?

dawehner’s picture

Thank you peter and stefan

The last submitted patch, 53: interdiff-50-53.patch, failed testing.

The last submitted patch, 30: 2584243-30.patch, failed testing.

The last submitted patch, 38: 2584243-38.patch, failed testing.

The last submitted patch, 53: interdiff-50-53.patch, failed testing.

catch’s picture

Removing tags now this is both critical and RTBC.

catch’s picture

Status: Reviewed & tested by the community » Fixed

Committed/pushed to 8.0.x, thanks!

  • catch committed f6052ec on 8.0.x
    Issue #2584243 by stefan.r, pwolanin, dawehner, kgoel: Make Drupal...

  • catch committed f6052ec on 8.1.x
    Issue #2584243 by stefan.r, pwolanin, dawehner, kgoel: Make Drupal...

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.