Currently in pathauto's alias cleaner it looks to avoid url paths with the function

public function cleanTokenValues(&$replacements, $data = array(), $options = array()) {
    foreach ($replacements as $token => $value) {
      // Only clean non-path tokens.
      if (!preg_match('/(path|alias|url|url-brief)\]$/', $token)) {
        $replacements[$token] = $this->cleanString($value, $options);
      }
    }
  }

however this preg_match is way too broad, if you create your own field named, for example, field_url, then cleanString will not be passed over it. If you create your own field called field_curling (and so on....) then it will also not work because 'curling' contains 'url'.

What are the actual fields that this preg_match is trying to prevent being cleaned? Would a pattern like '/(^path|^alias|^url|^url-brief)\]$/' be more appropriate?

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Tom.W created an issue. See original summary.

cburschka’s picture

Note: The "curling" field wouldn't break because the pattern looks for the \] at the end.

After looking at the other available tokens, it seems that the targeted tokens are ones like site:url, node:url, site:url-brief and site:login-url. The latter seems to be the only token matching url] but not :url], so it could be added explicitly.

Based on this and the fact that field machine names cannot contain :, I would suggest the following patch.

Status: Needs review » Needs work
cburschka’s picture

If the unit test is correct in wanting [array:join-path] to bypass cleaning, then we could look for [:-] instead of just :. This is still safe because field names also can't contain -.

However, imho this is pretty vague, and it would probably be a better idea to explicitly list the non-transliterable tokens instead of relying on magic strings.

Berdir’s picture

Yes, this is tricky. Thanks for working on this.

I've also seen problems related to doing something like ...:url:relative] when url isn't the last part but the thing that comes later should inherit this thing. So if we explicitly switch to :url, we could maybe look for :url at the end or :url: anywhere in the token?

cburschka’s picture

Status: Needs work » Needs review
FileSize
635 bytes

Would /:(alias|path|login-url|url|url-brief)(:|\]$)/ be a good idea? This matches tokens that have alias,path,login-url,url,url-brief anywhere in the name, delimited by :

Status: Needs review » Needs work

The last submitted patch, 6: 2773573-6.patch, failed testing.

cburschka’s picture

And this one updates the unit test to check :path instead of :join-path.

(It might be worth making this configurable - if not as a full regex, then at least as a list of names X,Y,Z that are matched as :(X|Y|Z): or :(X|Y|Z)\]$. But that's a bigger feature addition.)

cburschka’s picture

Status: Needs work » Needs review

Status: Needs review » Needs work

The last submitted patch, 8: 2773573-8.patch, failed testing.

cburschka’s picture

Status: Needs work » Needs review
FileSize
645 bytes

Derp - I missed the fact that the join-path token isn't arbitrary but from tokens.inc. Including it in the regex now.

Berdir’s picture

Status: Needs review » Needs work
Issue tags: +Needs tests

Nice. It's good that the existing tests are passing but it would be really important to have some additional tests to ensure that additioanal things that we are now supporting also work.

We don't actually need to set up things like nodes to have real token replacements working, we can just create a fake $replacements array like in \Drupal\Tests\pathauto\Kernel\PathautoTokenTest::testPathautoTokens() with various examples, for example [node:field_url], [node:url], [node:url:relative] and so on and pass them to that function to make sure they work as expected.

Berdir’s picture

Status: Needs work » Needs review
Issue tags: -Needs tests
FileSize
5.75 KB

Turns out we could use this to be configurable in a project, so using this to also add a setting as well as tests.

Status: Needs review » Needs work

The last submitted patch, 13: pathauto-path-safe-tokens-2773573-13.patch, failed testing. View results

Berdir’s picture

Status: Needs work » Needs review
FileSize
5.73 KB

Bad patch.

Berdir’s picture

another typo in update function name fixed.

The last submitted patch, 15: pathauto-path-safe-tokens-2773573-15.patch, failed testing. View results
- codesniffer_fixes.patch Interdiff of automated coding standards fixes only.

Status: Needs review » Needs work

The last submitted patch, 16: pathauto-path-safe-tokens-2773573-16.patch, failed testing. View results
- codesniffer_fixes.patch Interdiff of automated coding standards fixes only.

Berdir’s picture

Adding a space in UI and removing it again on save. We also have no tests yet for the UI, but tested in our project and it works.

Status: Needs review » Needs work

The last submitted patch, 19: pathauto-path-safe-tokens-2773573-17.patch, failed testing. View results
- codesniffer_fixes.patch Interdiff of automated coding standards fixes only.

Primsi’s picture

Primsi’s picture

Status: Needs work » Needs review

Status: Needs review » Needs work

The last submitted patch, 21: pathauto-path-safe-tokens-2773573-21.patch, failed testing. View results
- codesniffer_fixes.patch Interdiff of automated coding standards fixes only.

Berdir’s picture

Fixed that new test.

  • Berdir committed 2705954 on 8.x-1.x
    Issue #2773573 by Berdir, cburschka, Primsi: AliasCleaner preg_match for...
Berdir’s picture

Status: Needs review » Fixed

Committed.

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.

karenann’s picture

I recently updated from pathauto_8.x-1.2 to 8.x-1.3.

I have a content type that uses the path pattern [node:menu-link:parents:join-path]/[node:title]

With this update, I noticed that my URLs were updating from, e.g., /community/campus-map/parking to /communitycampus-map/parking.

I tested with other pages and found the same thing - the slash between the first and second argument was being removed.

I went through the repository and found this commit:
https://cgit.drupalcode.org/pathauto/commit/src/AliasCleaner.php?id=2705...

diff --git a/src/AliasCleaner.php b/src/AliasCleaner.php
index 323a99a..376e22f 100644
--- a/src/AliasCleaner.php
+++ b/src/AliasCleaner.php
@@ -335,7 +335,9 @@ class AliasCleaner implements AliasCleanerInterface {
   public function cleanTokenValues(&$replacements, $data = array(), $options = array()) {
     foreach ($replacements as $token => $value) {
       // Only clean non-path tokens.
-      if (!preg_match('/(path|alias|url|url-brief)\]$/', $token)) {
+      $config = $this->configFactory->get('pathauto.settings');
+      $safe_tokens = implode('|', (array) $config->get('safe_tokens'));
+      if (!preg_match('/:(' . $safe_tokens . ')(:|\]$)/', $token)) {
         $replacements[$token] = $this->cleanString($value, $options);
       }
     }

When I reverted this single commit, my problem disappeared.

I was unsure if this would be best suited to be a part of this issue or if I should create a new issue. I figured that I'd start here and go from there.

Thanks! Karen

Berdir’s picture

See #3006242-4: First slash in token [term:parents:join-path] gets stripped for example. This added an update function to update configuration, you probably reverted that with a config import.

karenann’s picture

Thanks, Berdir. Following the instructions in the comment you linked me to solved my issue.

sker101’s picture

After updating the module using the update hook "pathauto_update_8107" provided in the patch, I'm seeing a duplicate value `alias` in the safe_token array.
Is the duplicated `alias` expected to be in the configuration?