Follow-up to #2558791: "!"-prefixed tokens should Xss::filterAdmin() but not affect safeness

Problem/Motivation

A large part of the usage of !placeholders (and @placeholders) is for href attributes. This is not appropriate as we still want to filter bad protocols.

Proposed resolution

Introduction of a new ":placeholder" that always escapes and filters for bad protocols as well, for URL attributes such as "href" and "src". We will address non-URL attributes in #2570431: Document that certain (non-"href") attribute values in t() and SafeMarkup::format() are not supported and may be insecure

Remaining tasks

Review patch
Commit patch

User interface changes

N/A

API changes

Introduction of a new ":placeholder" specifically for URL attributes

CommentFileSizeAuthor
#150 add_a_new_placeholder-2565895-150.patch16.21 KBjcnventura
#150 interdiff-145-150.txt757 bytesjcnventura
#145 2565895-145.patch15.61 KBstefan.r
#145 interdiff-142-145.txt6.07 KBstefan.r
#144 2565895-142.patch15.29 KBstefan.r
#144 interdiff-138-142.txt2.04 KBstefan.r
#138 interdiff-135-138.txt763 bytesstefan.r
#138 add_a_new_placeholder-2565895-138.patch15.21 KBstefan.r
#137 interdiff-92-135.txt7.89 KBstefan.r
#135 interdiff.txt2.18 KBlauriii
#135 add_a_new_placeholder-2565895-135.patch15.21 KBlauriii
#134 add_a_new_placeholder-2565895-134.patch15.28 KBlauriii
#130 2565895-130.patch14.94 KBstefan.r
#130 interdiff-126-130.txt2.13 KBstefan.r
#126 2565895-126.patch14.99 KBstefan.r
#126 interdiff-109-126.txt3.99 KBstefan.r
#110 2565895-109.patch13.4 KBstefan.r
#110 interdiff-92-109.patch5.7 KBstefan.r
#92 interdiff.txt2.67 KBeffulgentsia
#92 2565895-92.patch11.72 KBeffulgentsia
#91 interdiff.txt3.02 KBdawehner
#91 2565895-91.patch12.53 KBdawehner
#89 interdiff.txt933 bytesdawehner
#89 2565895-89.patch12.43 KBdawehner
#86 interdiff.txt4.24 KBeffulgentsia
#86 2565895-86.patch11.52 KBeffulgentsia
#84 interdiff.txt4.67 KBeffulgentsia
#84 2565895-84.patch9.36 KBeffulgentsia
#80 interdiff.txt5.59 KBdawehner
#80 2565895-80.patch7.99 KBdawehner
#73 2565895-73.patch8.83 KBdawehner
#73 interdiff.txt2.3 KBdawehner
#69 interdiff.txt4.8 KBdawehner
#69 2565895-68.patch8.72 KBdawehner
#66 interdiff.txt1.52 KBgoogletorp
#66 2565895-66.patch6.83 KBgoogletorp
#64 interdiff.txt2.82 KBgoogletorp
#64 2565895-65.patch7.01 KBgoogletorp
#61 interdiff.txt987 bytesgoogletorp
#61 2565895-61.patch6.18 KBgoogletorp
#60 interdiff.txt1.52 KBdawehner
#60 2565895-60.patch6.17 KBdawehner
#54 interdiff.txt3.16 KBdawehner
#54 2565895-54.patch6.22 KBdawehner
#41 interdiff.txt2.25 KBlauriii
#41 add_a_new_placeholder-2565895-41.patch4.34 KBlauriii
#38 2565895-38.patch3.06 KBgoogletorp
#35 interdiff-23.txt3.5 KBgoogletorp
#35 2565895-35.patch3.5 KBgoogletorp
#29 2565895-29.patch75.45 KBgoogletorp
#26 interdiff.txt69.06 KBgoogletorp
#26 2565895-26.patch69.06 KBgoogletorp
#23 interdiff.txt69.06 KBgoogletorp
#23 2565895-23.patch75.45 KBgoogletorp
#22 interdiff.txt3.74 KBdawehner
#22 2565895-22.patch6.56 KBdawehner
#13 2565895-13.patch4.46 KBstefan.r
#13 interdiff-6-13.txt1.36 KBstefan.r
#12 2565895-12.patch1.43 KBstefan.r
#12 interdiff-6-12.txt4.04 KBstefan.r
#6 interdiff.txt1.61 KBdawehner
#6 2565895-6.patch4.45 KBdawehner
#4 2565895-4.patch2.84 KBdawehner
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

stefan.r created an issue. See original summary.

dawehner’s picture

If it is a different behaviour it should be named different, so let's try out ":"

dawehner’s picture

Assigned: Unassigned » dawehner

.

dawehner’s picture

Assigned: dawehner » Unassigned
Status: Active » Needs review
FileSize
2.84 KB

So something like this?

dawehner’s picture

Converted one usecase, could not find one which does pass user supplied content to t().

xjm’s picture

So these cover strings of URLs, but not objects.

Where does the URL generator fit in? Many of the usecases in core for this are (e.g.) URLs generated from routes. I'm concerned about further increasing the API surface area of URL handling as well as the surface area for t(). (Not saying we shouldn't go forward with this, but we should discuss.)

catch’s picture

Title: Change !placeholder to support URLs specifically, introduce a new placeholder or change behavior of @placeholders » Add a new :placeholder to SafeMarkup::format() for URLs that handles bad protocols

@dawehner install.core.inc has a couple of examples like this for user-supplied URLs.

if (!$translation_available) {
      $requirements['translation available'] = array(
        'title'       => t('Translation'),
        'value'       => t('The %language translation is not available.', array('%language' => $language)),
        'severity'    => REQUIREMENT_ERROR,
        'description' => t('The %language translation file is not available at the translation server. <a href="@url">Choose a different language</a> or select English and translate your website later.', array('%language' => $language, '@url' => UrlHelper::stripDangerousProtocols($_SERVER['SCRIPT_NAME']))),
      );
    }

@xjm we could choose to indicate for generated URLs somehow (similar to how we do with @) that the URL doesn't need protocol filtering, but otherwise I don't think there's much to do there? Also the protocol filtering can't do any harm there even if it runs all the time, it'd just be a no-op.

dawehner’s picture

So these cover strings of URLs, but not objects.

Where does the URL generator fit in? Many of the usecases in core for this are (e.g.) URLs generated from routes. I'm concerned about further increasing the API surface area of URL handling as well as the surface area for t(). (Not saying we shouldn't go forward with this, but we should discuss.)

While I totally agree it would be nice, I think expanding the API for URL objects feels out of scope of this particular issue. This issue is about taking care of the stripping,
not about making it easier to pass in URLs, not provided as strings into SafeMark::format(). This at least for me is an additional discussion on top of the placeholder.

dawehner’s picture

IMHO we want steps forward and having the placeholder stops the string changing. Anything afterwards can be done

stefan.r’s picture

#9 makes sense.. I had a look at Url object support, but that seems like a separate issue - opened #2567237: Url object support for SafeMarkup::format() placeholders in case we want to discuss that.

stefan.r’s picture

Merging in the non-Url object related changes, mostly we didn't need the extra Html:escape()

stefan.r’s picture

joelpittet’s picture

Converted one usecase, could not find one which does pass user supplied content to t().

It's quite common to put hard coded/developer supplied @!urls. How often are we going to run into user supplied values for URLs?

I'd bet most cases for this are very rare and the developer that does use it should sanitize their own variables for that rare use-case. As was done in the example @catch pointed out in #8

The vast majority of !url/@url placeholders we have in core are hardcoded URLs to doc references. This new API for :url on SafeMarkup::format() does not seem to do us any favours for those which don't need the help(the developer hardcoded URLs).

IMO, we don't need to do this and we shouldn't add features where the benefit is doesn't outweigh the cost. It may be worth doing a bit of an audit on D7 contrib and core to see how frequently this used for user-supplied placeholders before continuing driving this issue. @stefan.r gathered some here #2558791-89: "!"-prefixed tokens should Xss::filterAdmin() but not affect safeness so I'm going to crunch those and maybe eat my words later;)

The last submitted patch, 12: 2565895-12.patch, failed testing.

cilefen’s picture

Nice work!

  1. +++ b/core/lib/Drupal/Component/Utility/SafeMarkup.php
    @@ -215,6 +220,11 @@ public static function format($string, array $args) {
    +        case ':':
    +          // Strip dangerous protocols here.
    +          $args[$key] = UrlHelper::filterBadProtocol($value);
    

    I think this comment is unnecessary. I realize it is in parallel with the other case statements but frankly we don't need those comments either. That said, the comment is accurate.

  2. +++ b/core/modules/views/views.module
    @@ -36,7 +36,7 @@ function views_help($route_name, RouteMatchInterface $route_match) {
    -      $output .= '<p>' . t('For more information, see the <a href="!views">online documentation for the Views module</a>.', array('!views' => 'https://www.drupal.org/documentation/modules/views')) . '</p>';
    +      $output .= '<p>' . t('For more information, see the <a href=":views">online documentation for the Views module</a>.', array(':views' => 'https://www.drupal.org/documentation/modules/views')) . '</p>';
    

    I thought it would be good to find some kind of dynamic, user-entered example in core but I am not sure there is such a thing.

  3. +++ b/core/tests/Drupal/Tests/Component/Utility/SafeMarkupTest.php
    @@ -221,6 +221,10 @@ function providerFormat() {
    +    $tests['javascript-protocol-url'] = ['Simple text <a href=":url">giraffe</a>', [':url' => 'javascript://example.com?foo&bar'], 'Simple text <a href="//example.com?foo&amp;bar">giraffe</a>', 'Support for filtering bad protocols', TRUE];
    +    $tests['external-url'] = ['Simple text <a href=":url">giraffe</a>', [':url' => 'http://example.com?foo&bar'], 'Simple text <a href="http://example.com?foo&amp;bar">giraffe</a>', 'Support for filtering bad protocols', TRUE];
    +    $tests['relative-url'] = ['Simple text <a href=":url">giraffe</a>', [':url' => '/node/1?foo&bar'], 'Simple text <a href="/node/1?foo&amp;bar">giraffe</a>', 'Support for filtering bad protocols', TRUE];
    

    Might we try sending some non-url junk and a script to a ':' replacement token to see the reaction we get?

cilefen’s picture

The reason for #17-3 is that we are filtering user input, so we may as well not assume a URL-like thing was entered.

joelpittet’s picture

Actually going to get a larger dataset;)
"Gotta downloading them all"
https://www.drupal.org/sandbox/greggles/1481160

effulgentsia’s picture

IMO, we don't need to do this and we shouldn't add features where the benefit is doesn't outweigh the cost.

In Drupal 7, it is far more common to use @url instead of !url within t() strings. From what I could track down, in Drupal 8, this was all converted to !url due to the comment in #1876906-21: Implement hook_help() for views_ui.module which was then followed with updating the hook_help() standard (see comment #25 of that issue), and a meta issue to convert all of D8's hook_help() to comply with that new standard.

In my opinion, that comment that started that train rolling was wrong. Even URLs generated from routes still go through outbound route processors. You can easily have processors that add query parameters. Which then means you get & into your URLs. http://www.htmlhelp.com/tools/validator/problems.html#amp is clear about that needing to get escaped when put into the value of an HTML attribute such as href.

Therefore, at a minimum, I believe we need to revert all those !url to @url, to match D7. But, if we're going to go through the work of doing that and break all those translation strings in the process, then why not go to :url and get the added benefit of protocol safety? Even if there are very few contrib uses of inserting user-entered URLs into translated strings, why not provide the extra security hardening for those cases? And I think it's better DX to say always use :url within t() strings than it is to say use @url for non-user-entered-urls and :url for user-entered-urls. Especially since sometimes it's not even always clear whether the URL has a user-entered component. For example, $_SERVER['SCRIPT_NAME'] might not seem to be user-entered, but actually might be, so install.core.inc calls UrlHelper::stripDangerousProtocols() on those placeholders, but why make calling code like that have to think about stuff like that?

+++ b/core/lib/Drupal/Component/Utility/SafeMarkup.php
@@ -215,6 +220,11 @@ public static function format($string, array $args) {
+        case ':':
+          // Strip dangerous protocols here.
+          $args[$key] = UrlHelper::filterBadProtocol($value);
+          break;

Per above, we also need to Html::escape() after filtering protocols.

effulgentsia’s picture

Per above, we also need to Html::escape() after filtering protocols.

Oh wait, I see you're using filterBadProtocol() instead of stripDangerousProtocols(). Since the former does the escaping, we don't need to redo it. But, I do wonder whether we should do stripDangerousProtocols() followed by Html::escape() instead. Since filterBadProtocol() also does a decodeEntities(), but I don't think the URLs that we pass in are html encoded to begin with. Not sure if there is any downside to decodeEntities() running on a plain-text URL. The docs of check_url() says "UrlHelper::filterBadProtocol() is functionality equivalent to check_url() apart from the fact it is protected from double escaping bugs", which implies no downside, but I don't know whether to believe that.

effulgentsia’s picture

The docs of check_url() says "UrlHelper::filterBadProtocol() is functionality equivalent to check_url() apart from the fact it is protected from double escaping bugs", which implies no downside, but I don't know whether to believe that.

Based on preliminary research, I think the comment is wrong and we should not use UrlHelper::filterBadProtocol(). Because according to http://blog.lunatech.com/2009/02/03/what-every-web-developer-must-know-a... (maybe not an authoritative source, but I didn't find anything contradicting it yet), both "&" and ";" are allowed in fragments, which means http://example.com/#&lt; is a valid plain-text (i.e., the &lt; in the fragment is literal, not a code for <) URL. Which means :url should turn that into http://example.com/#&amp;lt;, but filterBadProtocol() fails to do that because it does an initial decodeEntities(), which is incorrect for this case.

dawehner’s picture

Therefore, at a minimum, I believe we need to revert all those !url to @url, to match D7. But, if we're going to go through the work of doing that and break all those translation strings in the process, then why not go to :url and get the added benefit of protocol safety?

What I kinda think is that the process of converting it to :url and introducing the :url placeholder in its final form could be separate. We want to break the strings for the translators as early as possible. So we could make :url behave like !url temporarily (as this is what all those hook_help() implementations use at the moment) and then switch over later.

And I think it's better DX to say always use :url within t() strings than it is to say use @url for non-user-entered-urls and :url for user-entered-urls.

Yeah I agree, the escaping never was a big performance problem to be honest. Having a clear pattern for users is more important, and well, if they know what they do, they can still use @url, but its their own thing.

Oh wait, I see you're using filterBadProtocol() instead of stripDangerousProtocols(). Since the former does the escaping, we don't need to redo it. But, I do wonder whether we should do stripDangerousProtocols() followed by Html::escape() instead

Yeah in earlier matches, we escaped, and then called stripDangerousProtocols(), which could be simplified to your suggested code indeed.

http://blog.lunatech.com/2009/02/03/what-every-web-developer-must-know-a...

Interesting blog post! At least in the URL generator we use http_build_query() for example, which URL encodes the generated URL.
In total this would then result into the following executed code in Safemarkup::format():

>>> \Drupal\Component\Utility\Html::escape('http://example.com?' . http_build_query(['foo' => 'baz&baz#giraffe']))
=> "http://example.com?foo=baz%26baz%23giraffe"
googletorp’s picture

I've added the usage for this all of the obvious places (placesholders starting with @url), this is provide a good broad testing that everything still works with the new placeholder.

Do we need manual testing for JavaScript strings, to ensure that they still work as expected?

Status: Needs review » Needs work

The last submitted patch, 23: 2565895-23.patch, failed testing.

The last submitted patch, 23: 2565895-23.patch, failed testing.

googletorp’s picture

I made a silly mistake with the places, replaing @ with ! when it should be :

Based my patch of #22

Also unsure why tests are failing, I have weird failing tests on local HEAD as well, so don't know if my setup is broken or something odd is happening.

googletorp queued 22: 2565895-22.patch for re-testing.

googletorp’s picture

Status: Needs work » Needs review
googletorp’s picture

FileSize
75.45 KB

Ok, I need to stop working on core. Making a lot of silly mistakes, patch form 26 was actual interdiff, so uploading the correct one here

The last submitted patch, 26: 2565895-26.patch, failed testing.

The last submitted patch, 26: 2565895-26.patch, failed testing.

xjm’s picture

believe that.
Based on preliminary research, I think the comment is wrong and we should not use UrlHelper::filterBadProtocol(). Because according to http://blog.lunatech.com/2009/02/03/what-every-web-developer-must-know-a... (maybe not an authoritative source, but I didn't find anything contradicting it yet), both "&" and ";" are allowed in fragments, which means http://example.com/#< is a valid plain-text (i.e., the < in the fragment is literal, not a code for <) URL. Which means :url should turn that into http://example.com/#&lt;, but filterBadProtocol() fails to do that because it does an initial decodeEntities(), which is incorrect for this case

I think this comment is the wrong way around. We must do the decode entities first, and then re-escape them -- otherwise, this placeholder is incompatible with the URL generator, because special characters will be double escaped. (@alexpott and I discussed this at length a couple weeks ago.)

xjm’s picture

Some test cases of using the placheolder together with various URLs containing special characters and using the URL generator in addition to those with simple strings would be helpful.

Also, can I suggest we put the conversions in a second issue rather than in this patch?

dawehner’s picture

Also, can I suggest we put the conversions in a second issue rather than in this patch?

+136*2^256

googletorp’s picture

FileSize
3.5 KB
3.5 KB

I guess the patch becomes to masive if we start doing conversion in the patch. So I made a patch from #23 but removed the conversions that was already in the patch, to make the patch only add the tests and the new placeholder token type.

Status: Needs review » Needs work

The last submitted patch, 35: 2565895-35.patch, failed testing.

The last submitted patch, 35: 2565895-35.patch, failed testing.

googletorp’s picture

Status: Needs work » Needs review
FileSize
3.06 KB

This should be the actual correct patch

xjm’s picture

Status: Needs review » Needs work

NW for at least the tests for #33 (interaction with the URL generator).

I still think we do need the decodeEntities() (or filterWhateverProtocol() instead of stripWhatever()). (Also, somewhere I think there is an issue to rename those methods and deprecate the old confusing names? It'd be worth referencing here.) But the tests will help illustrate.

catch’s picture

Fwiw I'd be fine with adding the placeholder then doing the protocol handling in a followup. There's no translation or Api
break adding protocol handling and the main thing here is cutting down the uses of. !placeholder.

lauriii’s picture

Status: Needs work » Needs review
FileSize
4.34 KB
2.25 KB

Added some test coverage

Status: Needs review » Needs work

The last submitted patch, 41: add_a_new_placeholder-2565895-41.patch, failed testing.

The last submitted patch, 41: add_a_new_placeholder-2565895-41.patch, failed testing.

The last submitted patch, 41: add_a_new_placeholder-2565895-41.patch, failed testing.

joelpittet’s picture

In a follow-up to #18.

Ok I did some crunching to figure out how much impact would providing a safe attribute :placeholder help us with t() using real data from Drupal 7 contrib modules and themes. Keep in mind, developers are likely to use :placeholder wrong just as they have with with !placeholder and the other two existing placeholders and it's less explicit and trickier to notice it's incorrect.

Funny fact: I bet @xjm before crunching all D7 data I'd find some silly module using <a href="%url"> in t() and sure enough I found one:)

Downloaded all the themes and modules using @greggles module.
https://www.drupal.org/project/1481160/git-instructions

Then while that was downloading, I wrote a script to parse all the t() functions (kinda like potx but with arguments as my context, and likely too much regex than sane).

Using ag (like grep/ack) to get all the t() lines.
ag --nonumbers --nofilename -C0 '[^\w](format_string|t)\(.+=\\?"[!@%][^"\W]+".*, array\([^\$]+\$' > ../modules.txt

It just gets all the t/format_string() functions that have an attribute with one of our placeholders inside it and checks for a variable some time after for more context in the grep.

Then drove all those lines into a script to parse further:
cat ../modules.txt | ./test.php

Test script here: (could use a bunch of love and would love to do potx file parsing for better accuracy than regex but whatever, it does the job)
https://gist.github.com/7bf0190a678a691fe188

tl;dr;

Results Summary

~All D7 Modules

Suspects: 3557
--------
Safe: 3381               95.1%
Safeish: 5               0.1%;
Potentially Unsafe: 86   2.4%
Likely Unsafe: 50        1.4%
Eh?: 29                  0.8%
%WTF: 5                  0.1%
Likely Safe: 1           0%

~All D7 Themes

Suspects: 118
--------
Safe: 114                 96.6%
Likely Unsafe: 3          2.5%
Potentially Unsafe: 1     0.8%

Note: the ones that are likely unsafe/potentially unsafe. Are just because I didn't bother to look if those variables were hardcoded strings in the same function, which I would bet dollars to donuts half of them are.

Conclusion

Adding an extra placeholder, would cause a lot of work for people, with very little gain. I suggest running this kind of script, issue an SA on the projects who are doing it incorrect.

Make developers use either !placeholder with UrlHelper::filterBadProtocol()/check_url() OR
@placeholder with UrlHelper::stripDangerousProtocols()/drupal_strip_dangerous_protocols().

Pick one to be the recommended, avoid both to alleviate confusion, and say no to extra work for little benefit and more confusion.

*hops off soap box*

joelpittet’s picture

@effulgentsia thank you very much for #21. I'd say maybe that deserves it's own place in the issue summary so it's kept in mind if we continue this issue.

dawehner’s picture

dawehner’s picture

Make developers use either !placeholder with UrlHelper::filterBadProtocol()/check_url() OR
@placeholder with UrlHelper::stripDangerousProtocols()/drupal_strip_dangerous_protocols().

Well, this is the thing. It is not entirely obvious what to choose. Having a more semantic placeholder would improve the DX here, IMHO, but for sure, for most t() calls (as we have gazillons)
we don't have to deal with that problem. Just imagine how big the patch actually would have to be.

joelpittet’s picture

Well, this is the thing. It is not entirely obvious what to choose. Having a more semantic placeholder would improve the DX here, IMHO

:placeholder is not semantic as ":" doesn't = URL, and it conflicts mentally for me with :placeholder which are used quite frequently for SQL placeholders.

Just imagine how big the patch actually would have to be.

No need to imagine we've been working on massive find and replaces for these placeholders. #2506445-98: Replace !placeholder with @placeholder in t() and format_string() for non-URLs in tests which was a mega patch of around ~500K :P

effulgentsia’s picture

At least in the URL generator we use http_build_query() for example, which URL encodes the generated URL

Yes, but it does not do so for $options['fragment'] nor should it.

We must do the decode entities first, and then re-escape them -- otherwise, this placeholder is incompatible with the URL generator, because special characters will be double escaped.

I'm still not seeing where the URL generator does any escaping for HTML. And if it does, that would seem like a bug to me, since there's plenty of use cases for returning URLs in non-HTML responses, including in HTTP headers for redirects. URL encoding and HTML encoding are two completely orthogonal operations/formats, so I'm not clear which "special characters" you're referring to. In any case, happy to punt this discussion to a different issue. I added tests to #2567795: Introduce a : placeholder which works like ! for now to cover the case of #21, so we can discuss the right approach there or in whatever issue wants to change those tests.

:placeholder is not semantic as ":" doesn't = URL, and it conflicts mentally for me with :placeholder which are used quite frequently for SQL placeholders.

Fair point. I thought I was clever when suggesting ":" as a mnemonic for "something related to ':' needs attention", but the overlap with the SQL placeholders might make that a bad prefix. Let's see if we can come to a consensus on the best prefix in #2567795: Introduce a : placeholder which works like ! for now.

dawehner’s picture

+++ b/core/tests/Drupal/Tests/Component/Utility/SafeMarkupTest.php
@@ -225,6 +227,21 @@
+    $container = new ContainerBuilder();
+    $container->set('router.no_access_checks', $router);
+    $container->set('url_generator', $url_generator);
+    \Drupal::setContainer($container);
+    $tests['url-generator'] = ['Simple text <a href=":url">kitten</a>', [':url' => Url::fromRoute('test_route')->toString()], 'Simple text <a href="/kittens/are/cute">kitten</a>', 'Support for using URL object', TRUE];

I think this test coverage is pointless ... we want to have a KernelTest in order to know that the url generator works together with the placeholder properly. This test just does the same as the string

dawehner’s picture

Working on proper test coverage, sorry.

dawehner’s picture

Status: Needs work » Needs review
FileSize
6.22 KB
3.16 KB

This uses the actual url generator mechanisms in tests, but it fails at the moment.

catch’s picture

Priority: Major » Critical

We discussed this on a hangout with, er, lots of people, and was consensus was to bump it to critical:

- it makes explicit what to do with URLs in t()
- it goes a long way towards removing !placeholder
- it makes it harder to introduce double escaping bugs by for example XSS::filterBadProtocol() and @ (you'd have to use UrlHelper::stripDangerousProtocols() with @).

Also while there's a minor caveat about :placeholder and database placeholders, given the different context and the fact that interaction with db_query() is much less in 8.x, we thought the various benefits of : outweighed this.

Might have missed something, but bumping for now.

Status: Needs review » Needs work

The last submitted patch, 54: 2565895-54.patch, failed testing.

xjm’s picture

  1. +++ b/core/lib/Drupal/Component/Utility/SafeMarkup.php
    @@ -186,6 +186,10 @@ public static function checkPlain($text) {
    +   *   - :variable: URL placeholder, escaped to HTML but also stripped of bad
    

    So, our documentation standard here for lists in docblocks makes this confusing... the :variable: kinda makes it look like you need colons on both sides of it. I guess there probably isn't anything we can do about that.

  2. +++ b/core/tests/Drupal/KernelTests/Component/Utility/SafeMarkupKernelTest.php
    @@ -0,0 +1,79 @@
    +      if ($arg instanceof Url) {
    +        return $arg->toString();
    +      }
    

    So this is part of what I was getting at with the URL generator comment... I think it will confuse people that they have to do

    t('..<a href=":url">...', [':url' => {Url::fromUri($var)}->toString()]);
    

    or

    t('..<a href=":url">...', [':url' => $this->urlGenerator->generateFromRoute($var)]);
    

    or whatever... if the idea is that :url placeholder is for URLs, but they need to be a string, but the string should be generated from a URL object or the URL generator, which is in turn generated from another string... that's not easy to grok. I wonder if in a followup it could accept a Url object?

    @catch pointed out that the cost of doing protocol filtering twice probably isn't that bad, but it does add yet another thing to keep track of in the problem space of URL and link generation, and another thing for people to conflate.

xjm’s picture

stefan.r’s picture

@xjm regarding point 1: I still think that is worth addressing, either through deviating from the standards or explicitly pointing out that it has _one_ colon in the description.

Regarding point 2: we have a followup at #2567237: Url object support for SafeMarkup::format() placeholders

dawehner’s picture

Status: Needs work » Needs review
FileSize
6.17 KB
1.52 KB

or whatever... if the idea is that :url placeholder is for URLs, but they need to be a string, but the string should be generated from a URL object or the URL generator, which is in turn generated from another string... that's not easy to grok. I wonder if in a followup it could accept a Url object?

Yeah I totally think we should talk about doing that, but as we all know, it is out of the scope of this particular issue, but also in the case here, it could safe us some time, if we don't actually safe some more time, potentially.

Regarding the documentation, what about putting it in quotes?

Fixed the test coverage ... anyone know why 'Hey giraffe <a href=":url">MUUUH</a>', [':url' => Url::fromUri('base://foo?bar#baz&;')]
would actually result in Hey giraffe <a href="/foo%3Fbar%23baz%26%3B">MUUUH</a> and not 'Hey giraffe <a href="/foo?bar=&amp;">MUUUH</a>'

googletorp’s picture

Fixed test.

#60 This is to be expected, since ::fromUri takes fragments and url queries in an additional option arg. What you are trying to create is and url that contains ?#&; There chars has to be encoded sinse they are used to specify URL fragment and query.

You can test it out at http://meyerweb.com/eric/tools/dencoder/ or just use php.

The last submitted patch, 60: 2565895-60.patch, failed testing.

Status: Needs review » Needs work

The last submitted patch, 61: 2565895-61.patch, failed testing.

googletorp’s picture

Status: Needs work » Needs review
FileSize
7.01 KB
2.82 KB

I fixed the fail on the \InvalidException test and added some more tests for weird fragment/query combinations, seems like some of them matched what dawehner tried to do.

effulgentsia’s picture

+++ b/core/tests/Drupal/KernelTests/Component/Utility/SafeMarkupKernelTest.php
@@ -0,0 +1,96 @@
+    $data[] = ['Hey giraffe <a href=":url">MUUUH</a>', [':url' => Url::fromUri('base://foo?bar#baz&;')], 'Hey giraffe <a href="/foo%3Fbar%23baz%26%3B">MUUUH</a>'];
+    $data[] = ['Hey giraffe <a href=":url">MUUUH</a>', [':url' => Url::fromUri('base://foo#bar', ['query' => ['baz' => 'pony#']])], 'Hey giraffe <a href="/foo%23bar?baz=pony%23">MUUUH</a>'];
+    $data[] = ['Hey giraffe <a href=":url">MUUUH</a>', [':url' => Url::fromUri('base://foo&bar', ['fragment' => 'baz&'])], 'Hey giraffe <a href="/foo%26bar#baz&amp;">MUUUH</a>'];

The bug is in where Url::fromUri() does this: $url = new static($uri, array(), $options);. Above that, it already moved fragment and query into $uri_options and out of $uri_parts, but then discards that work by reverting to the original $uri and $options, and code running after that doesn't handle $uri containing queries and fragments. Fixing that is out of scope for this issue, so instead, maybe we should remove that first assertion from above, and rely only on the other 2?

googletorp’s picture

Thanks @effulgentsia

Makes a lot of sense. I've edited the tests, so that we don't have any conflicts with the fromUri bug.

joelpittet’s picture

+++ b/core/tests/Drupal/Tests/Component/Utility/SafeMarkupTest.php
@@ -221,6 +221,11 @@ function providerFormat() {
+    $tests['relative-url'] = ['Simple text <a href=":url">giraffe</a>', [':url' => '/node/1?foo&bar'], 'Simple text <a href="/node/1?foo&amp;bar">giraffe</a>', 'Support for filtering bad protocols', TRUE];
+    $tests['fragment-with-special-chars'] = ['Simple text <a href=":url">giraffe</a>', [':url' => 'http://example.com/#&lt;'], 'Simple text <a href="http://example.com/#&amp;lt;">giraffe</a>', 'Support for filtering bad protocols', TRUE];

Can we add a mailto: protocol test in this and one for javascript:alert(String.fromCharCode(88,83,83))
To ensure we have support for these?

#2567743-48: Add protocol filtering to Attribute

The last submitted patch, 54: 2565895-54.patch, failed testing.

dawehner’s picture

The bug is in where Url::fromUri() does this: $url = new static($uri, array(), $options);. Above that, it already moved fragment and query into $uri_options and out of $uri_parts, but then discards that work by reverting to the original $uri and $options, and code running after that doesn't handle $uri containing queries and fragments. Fixing that is out of scope for this issue, so instead, maybe we should remove that first assertion from above, and rely only on the other 2?

Yeah I was sure that the protocol stripping code isn't the problem, well, at least we know about the problem now.

Sure, more test coverage is really never a bad idea.

The last submitted patch, 60: 2565895-60.patch, failed testing.

The last submitted patch, 61: 2565895-61.patch, failed testing.

larowlan’s picture

This looks ready, just a couple of minor nits that can be fixed on commit.

  1. +++ b/core/lib/Drupal/Component/Utility/SafeMarkup.php
    @@ -186,6 +186,10 @@ public static function checkPlain($text) {
    +   *   - ':variable': URL placeholder, escaped to HTML but also stripped of bad
    

    how about 'HTML is escaped and bad protocols like "javascript:" are stripped'?

  2. +++ b/core/lib/Drupal/Component/Utility/SafeMarkup.php
    @@ -186,6 +186,10 @@ public static function checkPlain($text) {
    +   *     secure when placing a URL into an HTML attribute (such as "src" or
    

    nit: into a instead of into an (I think)

  3. +++ b/core/lib/Drupal/Core/DependencyInjection/DependencySerializationTrait.php
    @@ -49,6 +49,9 @@ public function __sleep() {
    +    if (!\Drupal::hasContainer()) {
    

    Should we document when this could happen?

  4. +++ b/core/tests/Drupal/KernelTests/Component/Utility/SafeMarkupKernelTest.php
    @@ -0,0 +1,97 @@
    + * Providers an integration test coverage for SafeMarkup with other systems.
    

    Provides a test covering integration of SafeMarkup with other systems?

dawehner’s picture

Thank you @larowlan

larowlan’s picture

Status: Needs review » Reviewed & tested by the community

looks good to me

effulgentsia’s picture

Status: Reviewed & tested by the community » Needs review
+++ b/core/tests/Drupal/KernelTests/Component/Utility/SafeMarkupKernelTest.php
@@ -0,0 +1,97 @@
+  /**
+   * @dataProvider providerTestSafeMarkupWithException
+   * @expectedException \InvalidArgumentException
+   */
+  public function testSafeMarkupWithException($string, array $args) {
+    $args = self::prepareSafeMarkupArgs($args);
+
+    // Should throw an \InvalidArgumentException.
+    SafeMarkup::format($string, $args);
+  }
+
+  public function providerTestSafeMarkupWithException() {
+    $data = [];
+    $data['js-protocol'] = ['Hey giraffe <a href=":url">MUUUH</a>', ['@url' => Url::fromUri("javascript:alert('xss')")]];
+    $data['js-with-fromCharCode'] = ['Hey giraffe <a href=":url">MUUUH</a>', ['@url' => Url::fromUri("javascript:alert(String.fromCharCode(88,83,83))")]];
+
+    return $data;
+  }

I'm not clear on what the intent of this test is. Those last couple lines are setting ':url' in the string, but '@url' in the arguments. But also, the InvalidArgumentException is thrown by the fromUri()->toString(), which is happening in prepareSafeMarkupArgs(), not where the "Should throw" comment is. And SafeMarkup::format() itself should not throw an exception, it should just filter the protocol away. And if we're wanting to test that a Url::fromUri() throws an exception during its toString(), why are we testing that in a class called SafeMarkupKernelTest?

alexpott’s picture

Status: Needs review » Needs work
+++ b/core/lib/Drupal/Core/DependencyInjection/DependencySerializationTrait.php
@@ -49,6 +49,11 @@ public function __sleep() {
+    // In PHPUnit tests we might want to use objects like \Drupal\Core\Url in
+    // data providers, which are executed before any setup code.
+    if (!\Drupal::hasContainer()) {
+      return;
+    }

Do we have to do this? it kind of feels wrong. If we created the urls in the tests rather the providers it would unnecessary.

effulgentsia’s picture

So you mean, have the providers provide the URIs / route names, and have the test do the Url::from*() part? I'd be fine with that. Frankly, I'd also be fine without the SafeMarkupKernelTest at all. I'm not really sure why we need to test the integration of SafeMarkup::format() and $url->toString(). The format of URLs is defined by an RFC, so IMO, unit tests that test that $url->toString() produces valid URL strings and unit tests that test that SafeMarkup::format() works with valid URL strings should be sufficient. But I'm also fine with the integration test if others see value in it.

xjm’s picture

  1. +++ b/core/lib/Drupal/Component/Utility/SafeMarkup.php
    @@ -186,6 +186,9 @@ public static function checkPlain($text) {
    +   *   - ':variable': HTML is escaped and bad protocols like "javascript:" are
    

    So the single quotes do violate our docs standards, which at least at one point specifically forbade putting single quotes around such values in lists, and also when the other placeholders don't have them it still confuses. IMO it's probably better to just use the standard format, despite the weirdness.

    Also, maybe we should say "dangerous" instead of "bad"? They're totally legitimate HTML, just not safe for user input.

  2. +++ b/core/lib/Drupal/Component/Utility/SafeMarkup.php
    @@ -201,6 +204,7 @@ public static function checkPlain($text) {
    +   * @see \Drupal\Component\Utility\UrlHelper::filterBadProtocol
    

    Minor: needs parens.

  3. +++ b/core/tests/Drupal/KernelTests/Component/Utility/SafeMarkupKernelTest.php
    @@ -0,0 +1,97 @@
    +      if ($arg instanceof Url) {
    +        return $arg->toString();
    +      }
    +      return $arg;
    

    I also feel still this is kind of not right. It obscures the fact that Url objects and strings are not actually equivalent in this context. If anything, I'd rather put the toString() in the provider. Then we could remove it if #2567237: Url object support for SafeMarkup::format() placeholders ends up going forward.

  4. Also, I apparently never clicked "submit" on this earlier (in reply to @googletorp):

    Aha! So, yes, the entity encoding. Is also part of the expected escaping for a URL -- but for making the URL itself, not for XSS-proofing the HTML source.

    How do we reconcile that for the developer's understanding? If this placeholder does perform the minimum sanitization needed, but does not do everything to generate a correct URL other than that. Those are two separate domains, but they are both impacting the same string placeholder when the developer writes this simple string. It would confuse me.

    Should we mention specifically in the docs that it only sanitizes and that it should often be used with the URL generation APIs? and reference them?

xjm’s picture

Also, I disagree with the last bit of #77 -- I do think an integration test is valuable, for the reasons stated above. This is how this API is going to be used 90%+ of the time, so it deserves an integration test.

dawehner’s picture

Status: Needs work » Needs review
FileSize
7.99 KB
5.59 KB

Also, I disagree with the last bit of #77 -- I do think an integration test is valuable, for the reasons stated above. This is how this API is going to be used 90%+ of the time, so it deserves an integration test.

Yeah I have to agree with that. I wasn't aware of the fact anymore that the unrouted URL assembler actually doesn't render those URLs, which is nice. Whether
the test has to be part of a thing called SafeMarkup, I don't know, but I think as a documentation kinda only test, its worth keeping.

Also, maybe we should say "dangerous" instead of "bad"? They're totally legitimate HTML, just not safe for user input.

Yeah its also the word which is used in the actual function name called in that particular case.

I'd rather put the toString() in the provider.

Well, yeah this is also not possible, given the context in which the data provider is executed (before any setup method). I changed the test code a bit.

googletorp’s picture

+++ b/core/tests/Drupal/KernelTests/Component/Utility/SafeMarkupKernelTest.php
@@ -0,0 +1,94 @@
+   *   Array containing SafeMarkup keys with a value of type string or Url.

Nit: Remove "string or" since we expect the value to always be of type/instance Url.

effulgentsia’s picture

Re #79: ok.

Re #78.3: that conflicts with #76. Not sure how to reconcile the two desires. Should we create a separate Url class just for this test that extends Drupal\Core\Url and just overrides the __wakeup() method?

Re #78.4: adding docs that :placeholder doesn't validate that what is passed in is a RFC 3986 formatted URL probably makes sense. We might want to add some tests, such as llamas: they are cute and <span>not a url</span>. By the way, even PHP's parse_url() parses both of those as though they're URLs. Apparently neither of those are seriously malformed enough. Or, if we do decide we want to validate that it's a URL, we'll need a validation function that's less permissive than parse_url(), but more permissive than UrlHelper::isValid(), because that only considers "http", "https", and "feed" as valid schemes.

dawehner’s picture

Re #78.3: that conflicts with #76. Not sure how to reconcile the two desires. Should we create a separate Url class just for this test that extends Drupal\Core\Url and just overrides the __wakeup() method?

We now create the url objects later, so we don't need to do that at all.

@effulgentsia
So what can we do about point 78.4?

effulgentsia’s picture

So what can we do about point 78.4?

Let's start with this. Any feedback on these tests or additional ones anyone would like?

I'll also add some docs later tonight.

Status: Needs review » Needs work

The last submitted patch, 84: 2565895-84.patch, failed testing.

effulgentsia’s picture

Status: Needs work » Needs review
FileSize
11.52 KB
4.24 KB

I think this addresses all remaining feedback.

Status: Needs review » Needs work

The last submitted patch, 86: 2565895-86.patch, failed testing.

The last submitted patch, 86: 2565895-86.patch, failed testing.

dawehner’s picture

Status: Needs work » Needs review
FileSize
12.43 KB
933 bytes
+++ b/core/tests/Drupal/Tests/Component/Utility/SafeMarkupTest.php
@@ -238,9 +238,16 @@
-    $tests['mailto-protocol'] = ['Hey giraffe <a href=":url">MUUUH</a>', [':url' => 'mailto://test@example.com'], 'Hey giraffe <a href="mailto://test@example.com">MUUUH</a>', '', TRUE];
+    $tests['mailto-protocol'] = ['Hey giraffe <a href=":url">MUUUH</a>', [':url' => 'mailto:test@example.com'], 'Hey giraffe <a href="mailto:test@example.com">MUUUH</a>', '', TRUE];

ups

Here is one way to fix the failure. The ComponentTest ensures that no references to Drupal\Core exists in a file. We could limit that by just code references and exclude the rule for comments, which should be fine IMHO, as it helps everyone to better understand the problem. On the longrun SafeMarkup / the functionality itself, will be somewhere else probably anyway.

plach’s picture

This looks great, I found only minor things :)

  1. +++ b/core/lib/Drupal/Component/Utility/SafeMarkup.php
    @@ -179,13 +179,36 @@ public static function checkPlain($text) {
    +   *   - :variable: Sanitizes a URL for use in an HTML attribute (such as "src"
    +   *     or "href"). This is the only placeholder type that is secure for use
    +   *     within attributes. The exact behavior depends on the value:
    +   *     - If the value is a well-formed (per RFC 3986) relative URL or
    

    Can we add a note that this should be always used, even for generated URLs, given that we need also plain output format escaping aside from sanitization? #19 has a very good point about this.

  2. +++ b/core/tests/Drupal/KernelTests/Component/Utility/SafeMarkupKernelTest.php
    @@ -0,0 +1,101 @@
    +    // @todo Extra hack to avoid test fails, remove this once
    +    // https://www.drupal.org/node/2553661 is fixed.
    

    Missing indentation in the second line.

  3. +++ b/core/tests/Drupal/KernelTests/Component/Utility/SafeMarkupKernelTest.php
    @@ -0,0 +1,101 @@
    +   * @param array $args
    +   *   Array containing:
    ...
    +   *
    +   * @return array
    +   *   Array containing:
    +   *   - ':url': A URL string.
    +   */
    +  protected static function prepareSafeMarkupArgs($args) {
    +    $args[':url'] += [1 => []];
    +    $args[':url'] = Url::fromUri($args[':url'][0], $args[':url'][1])->toString();
    +    return $args;
    +  }
    

    This is a bit convolute and hard to parse: can we pass the arguments currently passed in $args[':url'] directly and keep the current return value as-is? This would also simplify the data provider.

  4. +++ b/core/tests/Drupal/KernelTests/Component/Utility/SafeMarkupKernelTest.php
    @@ -0,0 +1,101 @@
    +  public function testSafeMarkup($string, array $args, $expected) {
    ...
    +  public function testSafeMarkupWithException($string, array $args) {
    

    Missing PHP docs

dawehner’s picture

Thank you for your review plach :)

Missing PHP docs

Let's add some pointless documentation :)

This is a bit convolute and hard to parse: can we pass the arguments currently passed in $args[':url'] directly and keep the current return value as-is? This would also simplify the data provider.

Not really sure how you think it it should look like. The solution I came up with is IMHO not nicer at all.

Good point about always using :placeholder.

effulgentsia’s picture

+++ b/core/tests/Drupal/Tests/Component/DrupalComponentTest.php
@@ -68,6 +68,13 @@ protected function findPhpClasses($dir) {
+    // Remove multiline comments.
+    $contents = preg_replace('@/\*.*?\*/@s', '', $contents);
+    $contents = preg_replace('/\n\s*\n/', "\n", $contents);
+    // Removes single line '//' comments.
+    $contents = preg_replace('@[ \t]*//.*[ \t]*[\r\n]@', '', $contents);

Thanks for this fix, but seems like that should be discussed in its own issue. So this patch removes that and works around the constraint.

The improvements in #91 look great to me.

can we pass the arguments currently passed in $args[':url'] directly and keep the current return value as-is?

For passing a URL object to SafeMarkup::format() , we have #2567237: Url object support for SafeMarkup::format() placeholders. Constructing a URL object in the data provider is problematic due to #76. So I think with the improvement in #91, this is as good as we can get.

@plach: are all your concerns addressed?

dawehner’s picture

/me sighs about the incredible slow progress

#2570225: Consider allowing \Drupal\Core\... in documentation for components is the follow up

dawehner’s picture

double comment

effulgentsia’s picture

Assigned: Unassigned » xjm
Status: Needs review » Reviewed & tested by the community
Issue tags: +blocker

Agreed. Because this blocks #2560783: Replace !placeholder with :placeholder for URLs in hook_help() implementations and related work, I'll go ahead and move it to RTBC now, and assign to @xjm, who I know wants to review this one more time before this lands. @plach or anyone else: please still feel free to knock it back in the meantime if you have remaining concerns.

dawehner’s picture

Thank you @effulgentsia!!

plach’s picture

It's good to go on my end, this can be fixed in a follow-up I guess, if we need to do it at all:

+++ b/core/tests/Drupal/KernelTests/Component/Utility/SafeMarkupKernelTest.php
@@ -0,0 +1,106 @@
+  public function testSafeMarkup($string, array $args, $expected) {
...
+  public function testSafeMarkupWithException($string, array $args) {

I actually meant that the test methods have no description and @param docs, which are definitely more useful than data provider PHP docs, which I agree could be skipped altogether.

David_Rothstein’s picture

Status: Reviewed & tested by the community » Needs review
  1. This needs followup issues to convert all instances in Drupal core to the new placeholder, right? I found #2560783: Replace !placeholder with :placeholder for URLs in hook_help() implementations but that only covers a relatively small subset (doesn't cover anything outside of hook_help; doesn't cover <a href="@placeholder">). Are there other issues or do they still need to be filed?

    If we don't convert all of Drupal core to use this then I don't see the point of this issue (i.e. I agree with @joelpittet's concerns). But if all of core uses it, then there is some value because it means anyone looking at examples of how URLs are handled in t() by Drupal core gets an example that will be secure regardless of what URL they throw at it.

  2. +        case ':':
    +          $args[$key] = Html::escape(UrlHelper::stripDangerousProtocols($value));
    +          break;
    

    Why isn't this checking SafeMarkup::isSafe() before calling Html::escape() (like the other placeholders do)?

  3. +   *   - @variable:..... Use this as the default choice for anything
    +   *     displayed on a page on the site, but not within HTML attributes.
    ....
    +   *   - %variable: ....  As with @variable, do not use this within HTML attributes.
    ....
    +   *   - :variable: Use this placeholder if you pass in a URL. The URL is
    +   *     sanitized for use in an HTML attribute (such as "src" or "href"). This
    +   *     is the only placeholder type that is secure for use within attributes.
    

    So what happens if I want to pass in an HTML attribute that doesn't contain a URL? This documentation tells me I have no option...

    I think in reality @variable is fine (and in fact desirable) to use for many attributes, just not all of them. For example, Xss::attributes() skips protocol-filtering for title, alt, and data-*.

    I am not sure the best way to handle this. We could just say for @variable:

    Use this as the default choice for anything displayed on a page on the site, except within HTML attributes. For HTML attributes, only use @variable for cases like 'title' and 'alt' that are designed to have arbitrary content within them. For all other HTML attributes, use :variable instead.

    Perhaps it could use some examples too: <img src=":variable" alt="@variable" /> or <a href=":variable">@variable</a>

  4. +   *   - :variable: Use this placeholder if you pass in a URL. The URL is
    +   *     sanitized for use in an HTML attribute (such as "src" or "href"). This
    +   *     is the only placeholder type that is secure for use within attributes.
    +   *     The exact behavior depends on the value:
    +   *     - If the value is a well-formed (per RFC 3986) relative URL or
    +   *       absolute URL that does not use a dangerous protocol (like
    +   *       "javascript:"), then the URL is escaped to HTML. This includes all
    +   *       URLs generated via Url::toString() and UrlGeneratorTrait::url()
    +   *       (see below for the fully qualified names of these classes).
    +   *     - If the value is a well-formed absolute URL with a dangerous
    +   *       protocol, the protocol is stripped. This process is repeated on the
    +   *       remaining URL until it is stripped down to a safe protocol. Then the
    +   *       remaining URL is escaped to HTML.
    +   *     - If the value is not a well-formed URL, the sanitization behavior is
    +   *       undefined. Currently, it invokes the same logic as for well-formed
    +   *       URLs, which strips most substrings that precede a ":" and escapes
    +   *       for HTML, but that may change at any time, including in a patch
    +   *       release of Drupal. The result is secure to use within attributes,
    +   *       but may not produce valid HTML (e.g., malformed URLs within "href"
    +   *       attributes fail HTML validation). This can be avoided by using
    +   *       Url::fromUri($possibly_not_a_url)->toString(), which either throws
    +   *       an exception or returns a well-formed URL.
    

    I think this is too long, and most developers won't read or fully understand it. It should be shorter (like the other examples). Plus, given that this is for attributes in general (not just URLs) the last point seems like it might be a bad idea.

    Could this just be replaced with the following, which is a bit more inline with how the other placeholders are documented?

    :variable: Escaped to HTML using \Drupal\Component\Utility\Html::escape(), and filtered for dangerous protocols using \Drupal\Component\Utility\UrlHelper::stripDangerousProtocols(). Use this as the default choice for anything displayed in an HTML attribute, for example URLs displayed within a "src" or "href" attribute such as <a href=":variable">@variable</a>. When :variable comes from arbitrary user input, the result is secure, but not guaranteed to be a valid URL (which means the resulting output could fail HTML validation). To guarantee a valid URL, use \Drupal\Core\Url::fromUri($user_input)->toString() (which either throws an exception or returns a well-formed URL) before passing the result into a :variable placeholder.

    I think if any other details are needed, they could go on the documentation of UrlHelper::stripDangerousProtocols() rather than here.

justAChris’s picture

catch’s picture

Why isn't this checking SafeMarkup::isSafe() before calling Html::escape() (like the other placeholders do)?

SafeMarkup::isSafe() does not mean 'safe for an HTML attribute', only for an HTML fragment/node see #2569485: Add AttributeSafeStringInterface and UriAttributeSafeStringInterface.

stefan.r’s picture

So what happens if I want to pass in an HTML attribute that doesn't contain a URL? This documentation tells me I have no option...

This would probably need yet another different placeholder...

effulgentsia’s picture

I'll open an issue for #101.

David_Rothstein’s picture

Re #101 and #102, I don't understand why we would want separate placeholders for URLs and attributes? How are they different? A URL in 'href' or 'src' is just a special case of an HTML attribute.

SafeMarkup::isSafe() does not mean 'safe for an HTML attribute', only for an HTML fragment/node see #2569485: Add AttributeSafeStringInterface and UriAttributeSafeStringInterface.

Sure, I wasn't suggesting to wrap the whole thing in an isSafe() check, just the HTML::escape() part (to avoid double-escaping like all the others do). I agree UrlHelper::stripDangerousProtocols() needs to run no matter what.

David_Rothstein’s picture

SafeMarkup::isSafe() does not mean 'safe for an HTML attribute', only for an HTML fragment/node see #2569485: Add AttributeSafeStringInterface and UriAttributeSafeStringInterface.

Sure, I wasn't suggesting to wrap the whole thing in an isSafe() check, just the HTML::escape() part (to avoid double-escaping like all the others do). I agree UrlHelper::stripDangerousProtocols() needs to run no matter what.

Oh, I just read this more carefully and now I think I see your point. XSS filtering doesn't make it safe for an attribute even when combined with UrlHelper::stripDangerousProtocols().

I think there should be a code comment explaining this, plus maybe a @todo to change later? - since there really should be separate isSafe() methods someday for figuring out whether something is XSS-filtered (i.e. made into safe HTML) vs. escaped (i.e. made into safe plain text).

stefan.r’s picture

As currently we don't have a way to tell if something has already been escaped, @dawehner mentioned maybe we could use the $double_encode flag on htmlspecialchars(). Considering it would only be used on URLs, might that actually be fine?

effulgentsia’s picture

I don't think so, not for URLs. #& and #&amp; are semantically different URLs, and functions that operate on URLs should not treat the latter as an HTML-encoding of the former. Functions that want to take an HTML-encoded URL as their argument, such as UrlHelper::filterBadProtocol(), should be clear that that's what they expect, and only be called from places that know that that's what they're passing (such as from Xss::attributes()).

For the general problem of wanting a double-escape-proof attribute filter, there's #2569041: Figure out what to do about attribute filtering in Twig and its related issues.

catch’s picture

Status: Needs review » Needs work

I think it's worth moving some of the documentation as #94.4 suggests to stripDangerousProtocols() and this seems the right place to do it - since we're going to expect people to use this as soon as this patch lands, and they might read those long docs. CNW for that.

The rest I think we have plenty of major and/or critical issues open at the moment actively discussing how to handle properly, and I'd rather continue the discussion in those issues, so for me this is RTBC again with that docs move-around.

David_Rothstein’s picture

So #2570431: Document that certain (non-"href") attribute values in t() and SafeMarkup::format() are not supported and may be insecure points out that in addition to t('...<a href="@url">....'), examples like t('...<a title="@title">....') aren't actually guaranteed to be safe in Drupal 8 either. The latter is a regression from Drupal 7 (and a critical issue in its own right), but I can see it being a separate issue.

My main point regarding this issue stands, though - we shouldn't try to separate out "URLs" from "HTML attributes" in the documentation, since that distinction isn't consistent with the rest of Drupal. There are 3 types of HTML attributes handled by Xss::attributes(), currently as follows:

  1. The crazy ones like "style" and "on*", which are never safe for user input.
  2. Special whitelisted ones like "title", "alt", and "data-", which require user input to be escaped to plain text but do not need to be filtered for dangerous protocols.
  3. Everything else, which require user input to be escaped to plain text and also filtered for dangerous protocols.

URLs as found in 'href' and 'src' are a special case of #3, not their own category. Certainly 'href' and 'src' are the most common cases (and maybe even the only ones we want to encourage) but the documentation here should still make clear that ":variable" must be used if you have something that is going into an HTML attribute, regardless of whether it's expected to be a URL or not. Hence my suggestion in #98.4.

stefan.r’s picture

Status: Needs work » Needs review
FileSize
5.7 KB
13.4 KB

Moving around those docs

The last submitted patch, 110: interdiff-92-109.patch, failed testing.

The last submitted patch, 110: interdiff-92-109.patch, failed testing.

Status: Needs review » Needs work

The last submitted patch, 110: 2565895-109.patch, failed testing.

The last submitted patch, 110: 2565895-109.patch, failed testing.

David_Rothstein’s picture

Looks pretty good to me - test failure seems probably bogus.

  1. +   * @see \Drupal\Component\Html::escape()
    

    Should be \Drupal\Component\Utility\Html::escape()?

  2. +        case ':':
    +          $args[$key] = Html::escape(UrlHelper::stripDangerousProtocols($value));
    +          break;
    

    I think we should still add a code comment here per #104 (maybe no @todo though - I'm not really sure what issue the @todo would point to anyway :)

    Something like: "HTML attributes must be escaped unconditionally (even if they were already marked safe) since content that has been filtered for XSS can still contain characters that are unsafe for use in attributes."

  3. +   * - If the value is not a well-formed URL, the sanitization behavior is
    +   *   undefined. Currently, it invokes the same logic as for well-formed
    +   *   URLs, which strips most substrings that precede a ":", but that may
    +   *   change at any time, including in a patch release of Drupal. The result is
    +   *   secure to use within attributes, but may not produce valid HTML (e.g.,
    +   *   malformed URLs within "href" attributes fail HTML validation). This can
    +   *   be avoided by using Url::fromUri($possibly_not_a_url)->toString(), which
    +   *   either throws an exception or returns a well-formed URL.
    ...
       public static function stripDangerousProtocols($uri) {
    

    I'm pretty sure we don't want this paragraph on stripDangerousProtocols() either.

  4. +   * @see \Drupal\Core\UrlGeneratorTrait::url()
    

    Should be \Drupal\Core\Routing\UrlGeneratorTrait::url()?

stefan.r’s picture

":variable" must be used if you have something that is going into an HTML attribute, regardless of whether it's expected to be a URL or not.

Which other attributes need stripping bad protocols? Certainly not all of them, as on some of them the stripping would actually be problematic?

stefan.r’s picture

And I actually just read David_Rothstein's writeup that I copy-pasted into the patch more closely, not too sure about this bit here as the issue was supposed to be about URL attributes: "use this as the default choice for anything displayed in an HTML attribute"

#115.2 seems like a useful clarification.

David_Rothstein’s picture

Which other attributes need stripping bad protocols? Certainly not all of them, as on some of them the stripping would actually be problematic?

There are some examples at https://www.owasp.org/index.php/XSS_Filter_Evasion_Cheat_Sheet (search the page for "javascript:"). To be honest, I'm not sure if any besides "href" actually lead to a problem in modern browsers (and I'm 100% sure that the current whitelist in Xss::attributes() is more restrictive than it needs to be) but my point is that we should always be recommending that developers do the same thing that Drupal core does. Whatever attributes Drupal itself filters dangerous protocols for in Xss::attributes() - that (at a minimum) is what we should be telling people to filter via this new API. Otherwise it's inconsistent, and possibly insecure. If we change Xss::attributes() someday to relax the behavior, then the documentation here could certainly be changed too.

stefan.r’s picture

href and src seems to cover most of it still.

https://stackoverflow.com/questions/2725156/complete-list-of-html-tag-at... lists a few more.

But I still don't see how this means we need to run bad protocol filtering on attributes where we aren't expecting user input to be a URL?

RainbowArray’s picture

I'm not up to speed on all the details of this issue, but just a note that srcset also uses URLs in the Responsive Image module, if that's relevant.

David_Rothstein’s picture

But I still don't see how this means we need to run bad protocol filtering on attributes where we aren't expecting user input to be a URL?

If it's user input it doesn't matter what we expect - it can be whatever it wants to be :) If there are browsers that can get tricked into interpreting it as a URL with e.g. a javascript: protocol, that's when the problems arise.

Reading through #2105841: Xss filter() mangles image captions and title/alt/data attributes it seems like the discussion there was pretty happy to have a much larger whitelist of do-not-need-to-be-protocol-filtered attributes than wound up in the final patch, but it's still a whitelist - the fallback behavior is to run the protocol-filtering (which for most attributes won't hurt anyway). So I really just want to avoid adding documentation here that implies ":variable" is the wrong choice for non-URL attributes, when in fact it's doing the exact sanitization we generally recommend for them. This was confusing and really sends people down the wrong path.

If it helps, I'd be happy with having the documentation also explicitly discourage passing in arbitrary user input to most attributes in the first place -- for most attributes it's not likely what the developer really wanted to do (even though it's safe with ":variable") and for "style" and "on*" it's of course unsafe entirely and never what the developer really wanted to do.

stefan.r’s picture

If it's user input it doesn't matter what we expect - it can be whatever it wants to be :) If there are browsers that can get tricked into interpreting it as a URL with e.g. a javascript: protocol, that's when the problems arise.

Not really following the point here, which non-URL attributes would be tricked into interpreting input as a URL? src and href are out (those are always URLs/relative paths), any of the others in the list in #119?

We would have to be expecting a URL/path necessarily right? If we do src="http://www.example.com/@color.png", or href="#@anchor", we shouldn't need protocol filtering as the protocol is already defined through the code.

If it helps, I'd be happy with having the documentation also explicitly discourage passing in arbitrary user input to most attributes in the first place

Yes, until we solve #2570431: Document that certain (non-"href") attribute values in t() and SafeMarkup::format() are not supported and may be insecure that would make more sense...

catch’s picture

URLs as found in 'href' and 'src' are a special case of #3, not their own category. Certainly 'href' and 'src' are the most common cases (and maybe even the only ones we want to encourage) but the documentation here should still make clear that ":variable" must be used if you have something that is going into an HTML attribute, regardless of whether it's expected to be a URL or not. Hence my suggestion in #98.4.

We can point to the whitelist, but using this for 'tite' or 'alt' attributes will completely break valid strings in unexpected ways, so it needs to be clear not to use it for that.

Copying my example from another issue reply to David:

\Drupal\Component\Utility\Urlhelper::stripDangerousProtocols("Llamas: they're cute");
" they're cute"
catch’s picture

I'd be happy with having the documentation also explicitly discourage passing in arbitrary user input to most attributes in the first place

Yes let's just do that here. As far as I know core only uses href so we won't be contradicting our own docs.

fgm’s picture

srcset is another major attribute in the same line as href and src, albeit with different validity rules. Let's not forget it.

stefan.r’s picture

Status: Needs work » Needs review
FileSize
3.99 KB
14.99 KB
stefan.r’s picture

srcset would be nice but not sure if it's blocking the critical as it is not used in core yet and we haven't looked into how to sanitize that so I don't think we can just add that in the ":" docs?

Status: Needs review » Needs work

The last submitted patch, 126: 2565895-126.patch, failed testing.

The last submitted patch, 126: 2565895-126.patch, failed testing.

stefan.r’s picture

Status: Needs work » Needs review
FileSize
2.13 KB
14.94 KB

Fixing the test fails by using the same solution used elsewhere in Component (leaving the FQN with Drupal\Core for the @see only).

RainbowArray’s picture

#127: srcset is indeed used in core, within the Responsive Image module.

David_Rothstein’s picture

which non-URL attributes would be tricked into interpreting input as a URL? src and href are out (those are always URLs/relative paths), any of the others in the list in #119?

Not sure if there are actually any, but the current behavior of Drupal's XSS filtering is that if we don't know the attribute is specifically safe, we must protocol-filter it (regardless of what type of attribute it is). There are some pretty wacky examples at https://www.owasp.org/index.php/XSS_Filter_Evasion_Cheat_Sheet (e.g. <TABLE BACKGROUND="javascript:alert('XSS')"> and <IMG LOWSRC="javascript:alert('XSS')">) although I guess those are considered URL attributes too.

Anyway, I'm happy with the wording in the current patch - I think it now does a good job explaining the situation and leaves the rest to #2570431: Document that certain (non-"href") attribute values in t() and SafeMarkup::format() are not supported and may be insecure.

+   * - If the value is not a well-formed URL, the sanitization behavior is
+   *   undefined. Currently, it invokes the same logic as for well-formed
+   *   URLs, which strips most substrings that precede a ":", but that may
+   *   change at any time, including in a patch release of Drupal. The result is
+   *   secure to use within attributes, but may not produce valid HTML (e.g.,
+   *   malformed URLs within "href" attributes fail HTML validation). This can
+   *   be avoided by using Url::fromUri($possibly_not_a_url)->toString(), which
+   *   either throws an exception or returns a well-formed URL.
...
   public static function stripDangerousProtocols($uri) {

I still don't understand why this paragraph is here (or at least the first half of the paragraph) - can someone explain the motivation for including it? Why would we want to change stripDangerousProtocols() to do something other than strip off protocols that it doesn't know are safe?

+   *     attributes, ensuring the value is always wrapped in quotes:
+   *     - Secure: <a href=":variable">@variable</a>
+   *     - Insecure: <a href=:variable>@variable</a>

Very good point.

stefan.r’s picture

David_Rothstein see #82/#86 for explanation about the malformed bit.

You made a good point about how people will ignore the docs anyway... and as apparently it's too difficult/expensive to check for some of this on run-time, it could be interesting to look into doing automated checks on drupal.org contrib module source code at release time.

And I may be wrong but all the wacky and non-wacky examples would still be specifically for URLs as described in the docs, so I think we're good here :)

lauriii’s picture

lauriii’s picture

Some docs improvements

joelpittet’s picture

Assigned: xjm » Unassigned
Status: Needs review » Needs work

Documentation review

  1. +++ b/core/lib/Drupal/Component/Utility/PlaceholderTrait.php
    @@ -48,6 +48,14 @@ protected static function placeholderFormat($string, array $args, &$safe = TRUE)
    +          // @todo decide what to do about non-URL attribute values (#2570431)
    

    Do we really need this @todo? I think we can just remove it. It's not the responsibility of the :placeholder to care about non-URL attributes.

  2. +++ b/core/lib/Drupal/Component/Utility/SafeMarkup.php
    @@ -162,12 +162,19 @@ public static function checkPlain($text) {
    +   * HTML attribute value, as only URL attributes such as "src" and "href" are
    

    Maybe we could add a reference to find these attributes out? Maybe
    http://www.w3.org/html/wg/drafts/html/master/index.html#attributes-1

  3. +++ b/core/lib/Drupal/Component/Utility/SafeMarkup.php
    @@ -180,13 +187,27 @@ public static function checkPlain($text) {
    +   *     displayed on a page on the site, but not within HTML attributes.
    

    This is not true, this is good for non-URL attribtues like title=""

  4. +++ b/core/lib/Drupal/Component/Utility/SafeMarkup.php
    @@ -180,13 +187,27 @@ public static function checkPlain($text) {
    +   *   - :variable: Escaped to HTML using Html::escape(), and filtered for
    

    nit: no comma needed.

Unassigned @xjm because we can fix these things before she reviews again.

stefan.r’s picture

Status: Needs work » Needs review
FileSize
7.89 KB

Thanks @joelpittet, not sure about 1-3 but will fix that comma :)

  1. I do think the @todo linking to the critical is helpful here, it was added by @catch's suggestion and hopefully we'll be able to remove it soon :)
  2. It is pretty clear what is a URL attribute or not, right? We only use href in core yet and we're not sure that all of them work correctly either (ie srcset might have different rules) and they may vary between HTML versions as well, so wouldn't want to hold up the issue over this :/
  3. Actually the @ placeholder may be problematic on some cases for title or alt as well, so this may have to wait until #2570431: Document that certain (non-"href") attribute values in t() and SafeMarkup::format() are not supported and may be insecure is sorted out:
    +++ b/core/lib/Drupal/Component/Utility/SafeMarkup.php
    @@ -162,12 +162,19 @@ public static function checkPlain($text) {
    +   * This method is not intended for passing arbitrary user input into any
    +   * HTML attribute value, as only URL attributes such as "src" and "href" are
    +   * supported (using ":variable"). Never use this method on unsafe HTML
    +   * attributes such as "on*" and "style" and take care when using this with
    +   * unsupported attributes such as "title" or "alt" as this can lead to
    +   * unexpected output.
    
  4. Will reroll
stefan.r’s picture

I think this is RTBC by the way - leaving NR for @joelpittet to check

David_Rothstein’s picture

Looks pretty good to me now - my concern above regarding the stripDangerousProtocols() documentation was addressed in one of the rerolls above. (I guess the original idea there, when that documentation was on SafeMarkup::format() instead, is that maybe someday the :placeholder could do some kind of URL validation also, but I think that's not a good idea.)

One thing:

+   *   precede a ":". The result is secure to use within a URL accepting
+   *   attributes but may not produce valid HTML (e.g., malformed URLs with

What does that mean? - did it mean to say "an attribute accepting URLs"?

And I'm glad I'm not the only one who made the mistake to think <a title="@title"> is safe in D8 like it is in D7 :) I think that definitely justifies #2570431: Document that certain (non-"href") attribute values in t() and SafeMarkup::format() are not supported and may be insecure being critical.

David_Rothstein’s picture

Status: Needs review » Needs work

Oh wait, this is really wrong for another reason too:

+   *   precede a ":". The result is secure to use within a URL accepting
+   *   attributes but may not produce valid HTML (e.g., malformed URLs within

The result of stripDangerousProtocols() is not actually secure to use in an HTML attribute - in fact it says so later on in the @return documentation of that method ("As with all plain-text strings, this return value must not be output to an HTML page without being sanitized first"). So I think we have to change the above to not imply that it's secure to stick directly into an attribute.

stefan.r’s picture

Hmm the bit from #139 seems to have gotten introduced in #135 only but the suggestion it's secure is very wrong indeed :)

pwolanin’s picture

Assigned: Unassigned » pwolanin

Taking a look at this

pwolanin’s picture

This looks weird:

+  protected static function prepareSafeMarkupArgs($args) {
+    $args[':url'] = call_user_func_array([Url::class, 'fromUri'], $args[':url'])->toString();
+    return $args;
+  }

It feels like this patch is doing rather more than I expected.

stefan.r’s picture

Assigned: pwolanin » Unassigned
Status: Needs work » Needs review
FileSize
2.04 KB
15.29 KB

about to clarify #143 further

stefan.r’s picture

Addressing feedback from @pwolanin in #143. Discussed this here in BCN and this was more about the test being a bit confusing rather than functionality of the patch.

pwolanin’s picture

Looks better, but we have the same weirdness in core/tests/Drupal/Tests/Component/Utility/SafeMarkupTest.php

The last submitted patch, 144: 2565895-142.patch, failed testing.

stefan.r’s picture

@pwolanin I don't see that we do, the test seems fine there? We don't do any argument value conversion there...

alexpott’s picture

Status: Needs review » Needs work

This is looking really good to me. The documentation reads well and is clear about the use-cases for :placeholder and the differences between it and @placeholder.

+++ b/core/tests/Drupal/KernelTests/Component/Utility/SafeMarkupKernelTest.php
@@ -0,0 +1,157 @@
+    // @todo Extra hack to avoid test fails, remove this once
+    //   https://www.drupal.org/node/2553661 is fixed.
+    FileCacheFactory::setPrefix(Settings::getApcuPrefix('file_cache', $this->root));

This issue has landed. You can remove the hack.

jcnventura’s picture

Status: Needs work » Needs review
FileSize
757 bytes
16.21 KB

Removed the hack. Let's see if the test bots like it.

pwolanin’s picture

Minor - this looks a little confusing since there is a : at the beginning and end:

+ * - :variable: Escaped to HTML using Html::escape() and filtered for

Not sure I have any solution.

jcnventura’s picture

Indeed, even though my brain filtered that trailing colon automatically. I'm not if there's anything we can do that wouldn't make it worse.

I suggest to leave it as is.

stefan.r’s picture

@pwolanin see #78, maybe we can go with this anyway despite the confusing-ness... we repeat the proper format often enough elsewhere in the docs and code examples

alexpott’s picture

Has @David_Rothstein's comment in #140 been addressed?

dawehner’s picture

Not sure I have any solution.

Sighs, it is not worth to discuss about that IMHO. Earlier versions of the patches had used ':variable':

stefan.r’s picture

@alexpott, yes, the comment has been changed to:

+   * - If the value is not a well-formed URL, the same sanitization behavior as
+   *   for well-formed URLs will be invoked, which strips most substrings that
+   *   precede a ":". The result can be used in URL attributes such as "href"
+   *   or "src" (only after calling Html::escape() separately), but this may not
+   *   produce valid HTML (e.g., malformed URLs within "href" attributes fail
+   *   HTML validation). This can be avoided by using
+   *   Url::fromUri($possibly_not_a_url)->toString(), which either throws an
+   *   exception or returns a well-formed URL.
catch’s picture

Status: Needs review » Reviewed & tested by the community
+++ b/core/lib/Drupal/Component/Utility/UrlHelper.php
@@ -304,7 +304,22 @@ public static function setAllowedProtocols(array $protocols = array()) {
+   *   URLs generated via Url::toString() and UrlGeneratorTrait::url().

I needed to double check this was true for a URL like http://example.com/:javascript - and it is, stripDangerousProtocols explicitly handles that.

Apart from that this is RTBC for me. The only slightly confusing bit is the @todo for non-attribute support, but we have that critical issue open at the moment to discuss/document not supporting/add support for those.

Marking RTBC.

alexpott’s picture

Status: Reviewed & tested by the community » Fixed

Committed c5d96cb and pushed to 8.0.x. Thanks!

  • alexpott committed c5d96cb on 8.0.x
    Issue #2565895 by dawehner, stefan.r, googletorp, effulgentsia, lauriii...
googletorp’s picture

Awesome, great work all.

stefan.r’s picture

CR added at https://www.drupal.org/node/2571689, can anyone review and publish?

pfrenssen’s picture

CR reviewed and published, thanks!

hass’s picture

Most usage for URLs was @url and not !url.

How will you solve the problem that this change will invalidate all current translations? Translators have really enough to do and cannot waste their time with translating all again. Many of them are not searching for old strings with @ chars inside.

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.

hass’s picture

Where is the case that changes all url placeholders in translatable strings?

This will invalidate all translated strings and has RC deadline as I know. Or do we only have this placeholder, but it is not used?

Gábor Hojtsy’s picture

Looks like this issue missed adding Twig translatable string support for this placeholder. It is possible to reproduce the % placeholder in twig, but not :. Let's not make Twig a second class citizen when it comes to translatability. #2592573: The :placeholder translation placeholder type not supported in Twig.

larowlan’s picture