Steps to reproduce:

  1. Configure node pathauto to use node title as slug [node:title].
  2. Create new node with "<" in title (test<nodetitle)
  3. Save the node

Expected result:

Symbol "<" in the title, shouldn't cut all further text in slug. This symbol should only be excluded, (replaced with delimiter, etc).
The main condition to reproduce is that after '<' must be no other symbols or spaces, but juts a letters.

Actual result:

Symbol "<" in the title, cut all further text in path.

Comments

dstorozhuk created an issue. See original summary.

dstorozhuk’s picture

WOW. Something wrong with Drupal. Check this issue input and output.
Probably the Drupal has same issue, or pathauto user drupal functionality which cuts off the text after "<".

dstorozhuk’s picture

Issue summary: View changes
lpsolit’s picture

Priority: Major » Normal
Status: Needs work » Closed (cannot reproduce)

I cannot reproduce your issue. Are you sure you configured Pathauto to remove < or to replace it by a separator? Check the list of special characters at admin/config/search/path/settings. The "<" character is near the end of the list.

dstorozhuk’s picture

@lpsolit, if you will take a look on ticket description display and ticket description text in edit form - you will see that it is different.
#2 in steps to reproduce.
The text is also cute after '<'.
The main condition to reproduce is that after '<' must be no other symbols or spaces, but juts a letters.

Looks like it is Drupal core text function issue whic also used in in pathauto module.

dstorozhuk’s picture

Issue summary: View changes
Status: Closed (cannot reproduce) » Needs work
dstorozhuk’s picture

@lpsolit, i reopened the issue, but if still "can't reproduce" - it is ok.

lpsolit’s picture

Ah, I added a space after '<' (foo < bar) which is why I couldn't reproduce. Sorry! If I type "foo<bar", only "foo" is returned.

The reason is that AliasCleaner::cleanString() wants to remove HTML entities from strings:

// Remove all HTML tags from the string.
$output = Html::decodeEntities($string);
$output = PlainTextOutput::renderFromHtml($output);

PlainTextOutput::renderFromHtml() is the one removing HTML entities so that you can safely use $output without risking XSS vulnerabilities. Imagine if someone types:

Foo <script>alert('bar')</script>

then without PlainTextOutput::renderFromHtml() and if the admin didn't correctly ask to remove special characters, then you would inject JS code into your page, which could lead to security issues. Not sure what to do in your specific case, but I would say that security matters more than the few cases where someone types <bar>. @Berdir: any idea?

lpsolit’s picture

And of course, the end of my first line was removed. :) I typed:

Sorry! If I type "foo<bar", only "foo" is returned.

berdir’s picture

Wondering if that really makes sense, though. An alias is not HTML and it will never be executed as HTML. And we actually remove < characters anyway.

This is a direct port of the 7.x code:

// Remove all HTML tags from the string.
$output = strip_tags(decode_entities($string));

And it was added *a long* time ago in #167786: Strip HTML tags from raw tokens.

Happy to try and remove that and see what happens. Who knows how pathauto actually worked back then.

I guess one argument for this is that when a token is used that actually contains HTML, like rendered fields or so, then we want to strip that. Try with a [node:some_field] token, for example, pretty sure that will be a mess then.

berdir’s picture

Issue summary: View changes
dstorozhuk’s picture

Issue summary: View changes

@Berdir, are you talking about this pice of code in src/AliasCleaner.php:228?

    $output = Html::decodeEntities($string);
    $output = PlainTextOutput::renderFromHtml($output);

?

pflora’s picture

I've come across this problem while working on this issue.

My understanding of the logic used in AliasCleaner isthat we call Html::decodeEntities($string) passing the $string variable and then call PlainTextOutput::renderFromHtml(). But renderFromHtml() just calls Html::decodeEntities() again, but passing as the argument " strip_tags((string) $string) " . So the problem here is with the strip_tags() method that will remove anything after a "<" character, preventing strings like "this

In regards to what @Berdir said in #10, I woudl like to avoid using PlainTextOutput::renderFromHtml(), or at least avoid using strip_tags(). Maybe we could use Html::escape() ?
As for what @LpSolit mentioned in #8, are we only worried about the

tags? Because I think we could handle that with a simple regex (or maybe there is another simpler way).
mably’s picture

mably’s picture

Status: Needs work » Closed (duplicate)

Closing as duplicate of #3256303.

Now that this issue is closed, review the contribution record.

As a contributor, attribute any organization that helped you, or if you volunteered your own time.

Maintainers, credit people who helped resolve this issue.