Problem/Motivation

Now, the punctuation action just have remove, replace by separator and do nothing.
But we have a case about '&' to 'and', means Replace by custom string, not separator.

Steps to reproduce

1) go to /admin/config/search/path/settings;

2) uncollapse "Punctuation" section and set:
- Ampersand (&) for "Replace by custom string" with "e";
- Less-than sign (<) for "Replace by custom string" with "lt";
- Greater-than sign (>) for "Replace by custom string" with "rt";

3) add a new pattern in /admin/config/search/path/patterns/add:
- Pattern type: Content
- Path pattern: articles/[node:title]
- Content type: Article

4) create the articles and check its paths:

article name: bom dia & cia
generated path: /articles/bom-dia-e-cia

article name: <span> yay > nay </span>
generated path: /articles/yay-rt-nay

article name: foo<input>< bar
generated path: /articles/foolt-bar

article name: foo<input><bar
generated path: /articles/fooltbar

Proposed resolution

Add a new action about 'Replace by custom string' to support it.

Remaining tasks

- Review.

User interface changes

Before:

Existing UI

After:

Proposed UI

API changes

Data model changes

Issue fork pathauto-3256303

Command icon Show commands

Start within a Git clone of the project using the version control instructions.

Or, if you do not have SSH keys set up on git.drupalcode.org:

Comments

Oscaner created an issue. See original summary.

Oscaner’s picture

StatusFileSize
new3.05 KB
Oscaner’s picture

StatusFileSize
new3.48 KB
new438 bytes

Fix schema errors.

andregp’s picture

I applied the patch #3 and ran some tests on my Drupal 9.4.x site and everything went fine.

Before patch:

After patch:

As you can see bellow the aliases were correctly created:
Path #3 working
Path #3 working
Path #3 working

+1 RTBC

andregp’s picture

Status: Active » Needs review
andregp’s picture

Status: Needs review » Reviewed & tested by the community

Sorry, changing to correct status

berdir’s picture

Status: Reviewed & tested by the community » Needs work
Issue tags: +Needs tests

Seems like a useful feature, but to be able to ensure this continues to work in the future, it needs tests.

aspilicious’s picture

Status: Needs work » Needs review
StatusFileSize
new3.49 KB

Reroll, needs this for a client who wants to replace '+' with 'plus'.

aspilicious’s picture

Status: Needs review » Needs work

Still needs work for the tests

jobsons’s picture

Assigned: Unassigned » jobsons

Hi, I will try to work on the tests!

jobsons’s picture

Assigned: jobsons » Unassigned
Issue tags: +Needs work
StatusFileSize
new26.5 KB

I worked on the tests, all the punctuation test are working fine but the less than (<) sign test. I also made a manual test of the custom replacement of the less than sign (<) and it's not working as it should, this is the reason I changed the issue to needs work. Here's the patch with the tests.

Wagnerarao made their first commit to this issue’s fork.

beatrizramos’s picture

Assigned: Unassigned » beatrizramos

I'll try to work on this issue

paulocs’s picture

Assigned: beatrizramos » Unassigned
NewZeal’s picture

This patch no longer works due to new 'replace with underscore' option in pathauto. When applied, the options that are meant for custom punctuation are displayed as 'replace with underscore'.

ngkoutsaik’s picture

I tested this patch and the code. The solution still works. If you set the separator to empty it works without any issues.

ngkoutsaik’s picture

Assigned: Unassigned » ngkoutsaik

I have found why the test fails. I am going to see if I can find a workaround to it

ngkoutsaik’s picture

Assigned: ngkoutsaik » Unassigned

I, unfortunately, don't have any more time to work on this. I have moved the patches to a branch. I refactored the tests to be in a loop, so it can be easier/automatic to test new symbols.

Regarding the failing tests. The issue is the last 4 elements. If you remove those the test passes. I checked with the debugger. It seems after less_than character the page is not found. That is because the alias is not generated.
The things I tried were cleaning the cache at every checkAlias. After that, I tried deleting the newly created node every time.
Last was add special character escaping to each and every character but that also failed.

Last note, the provided solution works well (without the tests in mind). Maybe someone with a better knowledge of the module will be luckier!

If I missed anything let me know.
Thanks.

pflora’s picture

Assigned: Unassigned » pflora

I'll see if i can come up with a fix for the tests!

pflora’s picture

Assigned: pflora » Unassigned
Status: Needs work » Needs review

I found the problem that caused the tests to fail. Inside the AliasCleaner.php, at line 237, the PlainTextOutput::renderFromHtml() method was removing HTML tags from the string, but not only the tags. Even if there is only a single "<", it was being removed from the original string (that is due to the strip_tags() method). Since the tests used only a single character ("<"), it was being removed and turned into "", and an empty string couldn't be used to generate the alias. I added the following check to ensure that there aren't HTML tags on the URL, and if that's the case, the module will replace the special characters inside the string. If there are HTML tags, then the mentioned method will remove them.

    // Remove all HTML tags from the string.
    $output = Html::decodeEntities($string);

    if (!preg_match('/[<].+?[>]/', $output)) {
      $output = strtr($output, $this->cleanStringCache['punctuation']);
    }
    $output = PlainTextOutput::renderFromHtml($output);

I have ran all the tests again and every test seems to be passing. I don't know if the regex that i came up with is the best solution for this problem, so feel free to give any feedback on how to improve it!

Status: Needs review » Needs work

The last submitted patch, 11: 3256303_11.patch, failed testing. View results

ngkoutsaik’s picture

Status: Needs work » Needs review

Hi,
I tested your solution and the tests pass. Additionally, I created some nodes and I tried some combinations like <span> yay > nay </span> for title. The alias was generated correctly.

I will move this to needs review, as I committed some work and cannot also review it.

mpaulo’s picture

Assigned: Unassigned » mpaulo

Reviewing this.

mpaulo’s picture

Status: Needs review » Needs work
Issue tags: -Needs work

Having an HTML tag on the node title won't make it fall into the special case (#20) to early replace the $output before calling PlainTextOutput::renderFromHtml($output);, which doesn't handle replacing partial or broken HTML code correctly.

Consider the following case, of enabling the replacement of the '<' char to 'lt', for a node with the title foo<input><bar:

- Expected: "fooltbar"
- Actual: "foo"

I think it's agreeable, that <bar shouldn't be identified as an HTML tag and be removed.

Also, on a sidenote, PlainTextOutput::renderFromHtml seems to doesn't consider a tag opener part of a tag if it precedes a space (e.g. foo<input>< bar results in "foolt-bar")

mpaulo’s picture

Assigned: mpaulo » Unassigned
mpaulo’s picture

Here is the issue that considered this sanitization in the first place. #167786

Maybe we can come up with a less strict HTML parser, with a regex (e.g. /<[^ ].*?>/), and ditch PlainTextOutput::renderFromHtml altogether, so it still solves the original problem, which seemed to be to increase the readability of URLs generated from tokens containing HTML.

gquisini’s picture

Assigned: Unassigned » gquisini

I'll try to work on this

gquisini’s picture

Assigned: gquisini » Unassigned
Status: Needs work » Needs review
StatusFileSize
new551 bytes
new119.27 KB
new114.57 KB
new115.62 KB
new144.83 KB

Using part of the suggestion made in comment #27, I have made a change that does not use PlainTextOutput::renderFromHtml to clear the tags.
Since I don't know if this is a "good" thing to do, I generated a patch so that it doesn't cause a problem in MR.

Steps:
- Install pathauto
- Switch to MR
- Apply the patch
- Go to /admin/config/search/path/patterns/add and create a path with the value "articles/[node:title]".
- Go to /admin/config/search/path/settings#edit-punctuation and replace "less-than/greater-than" with a custom string

Use cases:
- Create an article with title <span> yay > nay </span>
- Create an article with title foo<input>< bar
- Create an article with title foo<input><bar

I'll attach some screenshots with the manual tests I did.
Again, I don't know if this is a good thing to do, it's just a try.

sophiavs’s picture

Assigned: Unassigned » sophiavs

I'll do this review

sophiavs’s picture

Assigned: sophiavs » Unassigned

I think the method is ok and it worked fine on my local. For me is ok to pass to RTBC.

lucassc’s picture

Assigned: Unassigned » lucassc
lucassc’s picture

Assigned: lucassc » Unassigned

I think we can move forward with this approach since preg_replace with /<[^>]*>/ fits well in this case and works as expected, other than PlainTextOutput::renderFromHtml method which removes not only HTML tags but also anything that looks like this (e.g. <bar).

I applied @gquisini patch in the MR's branch, removed the checker in #20 and fix some CS problems introduced by the issue.

Automated tests are passing now and I tested it manually once more and everything works well. Steps to reproduce:
1) go to /admin/config/search/path/settings;

2) uncollapse "Punctuation" section and set:
- Ampersand (&) for "Replace by custom string" with "e";
- Less-than sign (<) for "Replace by custom string" with "lt";
- Greater-than sign (>) for "Replace by custom string" with "rt";

3) add a new pattern in /admin/config/search/path/patterns/add:
- Pattern type: Content
- Path pattern: articles/[node:title]
- Content type: Article

4) create the articles and check its paths:

article name: bom dia & cia
generated path: /articles/bom-dia-e-cia

article name: <span> yay > nay </span>
generated path: /articles/yay-rt-nay

article name: foo<input>< bar
generated path: /articles/foolt-bar

article name: foo<input><bar
generated path: /articles/fooltbar

Please review this properly, as I've committed some work and can't review.

lucassc’s picture

Issue summary: View changes

Added "Steps to reproduce" and "UI changes" in the IS.

lucassc’s picture

Issue summary: View changes
Issue tags: -Needs tests +Needs review

Formatting adjusted.

sophiavs’s picture

Assigned: Unassigned » sophiavs

I'll review

sophiavs’s picture

Assigned: sophiavs » Unassigned
Status: Needs review » Reviewed & tested by the community

I reproduce both tests commented on #29 and #33 and the path changed just as expected. I also tried with others pontuations and worked fine. Passing to rtbc.

mably made their first commit to this issue’s fork.

mably’s picture

Assigned: Unassigned » berdir
Issue tags: -

Code review of MR #3256303

The four commits add support for replacing punctuation characters with custom strings in URL aliases, instead of only being able to remove them, replace them with the separator, or leave them as-is.

Changes

PathautoGeneratorInterface — New constant PUNCTUATION_REPLACE_CUSTOM = 3.

AliasCleaner — Three changes:

  • New PUNCTUATION_REPLACE_CUSTOM case in cleanString() that looks up the custom replacement string from the punctuation_replace_custom.<name> config key.
  • The punctuation array is extracted into a reusable getPunctuationArray() method, used by both the cleaner and the tests/settings form.
  • HTML tag removal changed from PlainTextOutput::renderFromHtml() to a regex-based preg_replace('/<[^>]*>/', '', $output). This is necessary because the previous approach uses strip_tags() which consumes < and > characters even when they are not part of real HTML tags. The regex only strips complete tags, leaving standalone < or > characters intact for the custom replacement via strtr() that runs next.

PathautoSettingsForm — Adds a "Replace by custom string" option to each punctuation select, with a conditionally visible textfield (via #states) for entering the custom replacement. Empty values are filtered on save.

Config schema — New punctuation_replace_custom sequence of strings.

Tests:

  • Kernel test (testCleanStringPunctuationReplaceCustom): tests each punctuation character individually with a custom replacement string and verifies cleanString() applies it correctly.
  • Functional test (testPunctuationReplaceSettings): verifies the settings form saves and persists custom replacement values.