Follow-up to #2558791: "!"-prefixed tokens should Xss::filterAdmin() but not affect safeness
Problem/Motivation
A large part of the usage of !placeholders (and @placeholders) is for href attributes. This is not appropriate as we still want to filter bad protocols.
Proposed resolution
Introduction of a new ":placeholder" that always escapes and filters for bad protocols as well, for URL attributes such as "href" and "src". We will address non-URL attributes in #2570431: Document that certain (non-"href") attribute values in t() and SafeMarkup::format() are not supported and may be insecure
Remaining tasks
Review patch
Commit patch
User interface changes
N/A
API changes
Introduction of a new ":placeholder" specifically for URL attributes
Comment | File | Size | Author |
---|---|---|---|
#150 | add_a_new_placeholder-2565895-150.patch | 16.21 KB | jcnventura |
#150 | interdiff-145-150.txt | 757 bytes | jcnventura |
#22 | interdiff.txt | 3.74 KB | dawehner |
#22 | 2565895-22.patch | 6.56 KB | dawehner |
#13 | 2565895-13.patch | 4.46 KB | stefan.r |
Comments
Comment #2
dawehnerIf it is a different behaviour it should be named different, so let's try out ":"
Comment #3
dawehner.
Comment #4
dawehnerSo something like this?
Comment #6
dawehnerConverted one usecase, could not find one which does pass user supplied content to t().
Comment #7
xjmSo these cover strings of URLs, but not objects.
Where does the URL generator fit in? Many of the usecases in core for this are (e.g.) URLs generated from routes. I'm concerned about further increasing the API surface area of URL handling as well as the surface area for t(). (Not saying we shouldn't go forward with this, but we should discuss.)
Comment #8
catch@dawehner install.core.inc has a couple of examples like this for user-supplied URLs.
@xjm we could choose to indicate for generated URLs somehow (similar to how we do with @) that the URL doesn't need protocol filtering, but otherwise I don't think there's much to do there? Also the protocol filtering can't do any harm there even if it runs all the time, it'd just be a no-op.
Comment #9
dawehnerWhile I totally agree it would be nice, I think expanding the API for URL objects feels out of scope of this particular issue. This issue is about taking care of the stripping,
not about making it easier to pass in URLs, not provided as strings into SafeMark::format(). This at least for me is an additional discussion on top of the placeholder.
Comment #10
dawehnerIMHO we want steps forward and having the placeholder stops the string changing. Anything afterwards can be done
Comment #11
stefan.r CreditAttribution: stefan.r commented#9 makes sense.. I had a look at Url object support, but that seems like a separate issue - opened #2567237: Url object support for SafeMarkup::format() placeholders in case we want to discuss that.
Comment #12
stefan.r CreditAttribution: stefan.r commentedMerging in the non-Url object related changes, mostly we didn't need the extra Html:escape()
Comment #13
stefan.r CreditAttribution: stefan.r commentedoops
Comment #14
joelpittetIt's quite common to put hard coded/developer supplied
@!url
s. How often are we going to run into user supplied values for URLs?I'd bet most cases for this are very rare and the developer that does use it should sanitize their own variables for that rare use-case. As was done in the example @catch pointed out in #8
The vast majority of
!url/@url
placeholders we have in core are hardcoded URLs to doc references. This new API for:url
on SafeMarkup::format() does not seem to do us any favours for those which don't need the help(the developer hardcoded URLs).IMO, we don't need to do this and we shouldn't add features where the benefit is doesn't outweigh the cost. It may be worth doing a bit of an audit on D7 contrib and core to see how frequently this used for user-supplied placeholders before continuing driving this issue. @stefan.r gathered some here #2558791-89: "!"-prefixed tokens should Xss::filterAdmin() but not affect safeness so I'm going to crunch those and maybe eat my words later;)
Comment #16
cilefen CreditAttribution: cilefen commentedNice work!
I think this comment is unnecessary. I realize it is in parallel with the other case statements but frankly we don't need those comments either. That said, the comment is accurate.
I thought it would be good to find some kind of dynamic, user-entered example in core but I am not sure there is such a thing.
Might we try sending some non-url junk and a script to a ':' replacement token to see the reaction we get?
Comment #17
cilefen CreditAttribution: cilefen commentedThe reason for #17-3 is that we are filtering user input, so we may as well not assume a URL-like thing was entered.
Comment #18
joelpittetActually going to get a larger dataset;)
"Gotta downloading them all"
https://www.drupal.org/sandbox/greggles/1481160
Comment #19
effulgentsia CreditAttribution: effulgentsia at Acquia commentedIn Drupal 7, it is far more common to use
@url
instead of!url
within t() strings. From what I could track down, in Drupal 8, this was all converted to!url
due to the comment in #1876906-21: Implement hook_help() for views_ui.module which was then followed with updating the hook_help() standard (see comment #25 of that issue), and a meta issue to convert all of D8's hook_help() to comply with that new standard.In my opinion, that comment that started that train rolling was wrong. Even URLs generated from routes still go through outbound route processors. You can easily have processors that add query parameters. Which then means you get
&
into your URLs. http://www.htmlhelp.com/tools/validator/problems.html#amp is clear about that needing to get escaped when put into the value of an HTML attribute such ashref
.Therefore, at a minimum, I believe we need to revert all those
!url
to@url
, to match D7. But, if we're going to go through the work of doing that and break all those translation strings in the process, then why not go to:url
and get the added benefit of protocol safety? Even if there are very few contrib uses of inserting user-entered URLs into translated strings, why not provide the extra security hardening for those cases? And I think it's better DX to say always use:url
within t() strings than it is to say use @url for non-user-entered-urls and :url for user-entered-urls. Especially since sometimes it's not even always clear whether the URL has a user-entered component. For example,$_SERVER['SCRIPT_NAME']
might not seem to be user-entered, but actually might be, so install.core.inc calls UrlHelper::stripDangerousProtocols() on those placeholders, but why make calling code like that have to think about stuff like that?Per above, we also need to Html::escape() after filtering protocols.
Comment #20
effulgentsia CreditAttribution: effulgentsia at Acquia commentedOh wait, I see you're using filterBadProtocol() instead of stripDangerousProtocols(). Since the former does the escaping, we don't need to redo it. But, I do wonder whether we should do stripDangerousProtocols() followed by Html::escape() instead. Since filterBadProtocol() also does a decodeEntities(), but I don't think the URLs that we pass in are html encoded to begin with. Not sure if there is any downside to decodeEntities() running on a plain-text URL. The docs of check_url() says "UrlHelper::filterBadProtocol() is functionality equivalent to check_url() apart from the fact it is protected from double escaping bugs", which implies no downside, but I don't know whether to believe that.
Comment #21
effulgentsia CreditAttribution: effulgentsia at Acquia commentedBased on preliminary research, I think the comment is wrong and we should not use UrlHelper::filterBadProtocol(). Because according to http://blog.lunatech.com/2009/02/03/what-every-web-developer-must-know-a... (maybe not an authoritative source, but I didn't find anything contradicting it yet), both "&" and ";" are allowed in fragments, which means
http://example.com/#<
is a valid plain-text (i.e., the<
in the fragment is literal, not a code for<
) URL. Which means:url
should turn that intohttp://example.com/#&lt;
, but filterBadProtocol() fails to do that because it does an initial decodeEntities(), which is incorrect for this case.Comment #22
dawehnerWhat I kinda think is that the process of converting it to :url and introducing the :url placeholder in its final form could be separate. We want to break the strings for the translators as early as possible. So we could make :url behave like !url temporarily (as this is what all those hook_help() implementations use at the moment) and then switch over later.
Yeah I agree, the escaping never was a big performance problem to be honest. Having a clear pattern for users is more important, and well, if they know what they do, they can still use @url, but its their own thing.
Yeah in earlier matches, we escaped, and then called stripDangerousProtocols(), which could be simplified to your suggested code indeed.
Interesting blog post! At least in the URL generator we use http_build_query() for example, which URL encodes the generated URL.
In total this would then result into the following executed code in Safemarkup::format():
Comment #23
googletorp CreditAttribution: googletorp as a volunteer and at Reveal IT commentedI've added the usage for this all of the obvious places (placesholders starting with @url), this is provide a good broad testing that everything still works with the new placeholder.
Do we need manual testing for JavaScript strings, to ensure that they still work as expected?
Comment #26
googletorp CreditAttribution: googletorp as a volunteer and at Reveal IT commentedI made a silly mistake with the places, replaing @ with ! when it should be :
Based my patch of #22
Also unsure why tests are failing, I have weird failing tests on local HEAD as well, so don't know if my setup is broken or something odd is happening.
Comment #28
googletorp CreditAttribution: googletorp as a volunteer and at Reveal IT commentedComment #29
googletorp CreditAttribution: googletorp as a volunteer and at Reveal IT commentedOk, I need to stop working on core. Making a lot of silly mistakes, patch form 26 was actual interdiff, so uploading the correct one here
Comment #32
xjmI think this comment is the wrong way around. We must do the decode entities first, and then re-escape them -- otherwise, this placeholder is incompatible with the URL generator, because special characters will be double escaped. (@alexpott and I discussed this at length a couple weeks ago.)
Comment #33
xjmSome test cases of using the placheolder together with various URLs containing special characters and using the URL generator in addition to those with simple strings would be helpful.
Also, can I suggest we put the conversions in a second issue rather than in this patch?
Comment #34
dawehner+136*2^256
Comment #35
googletorp CreditAttribution: googletorp as a volunteer and at Reveal IT commentedI guess the patch becomes to masive if we start doing conversion in the patch. So I made a patch from #23 but removed the conversions that was already in the patch, to make the patch only add the tests and the new placeholder token type.
Comment #38
googletorp CreditAttribution: googletorp as a volunteer and at Reveal IT commentedThis should be the actual correct patch
Comment #39
xjmNW for at least the tests for #33 (interaction with the URL generator).
I still think we do need the
decodeEntities()
(orfilterWhateverProtocol()
instead ofstripWhatever()
). (Also, somewhere I think there is an issue to rename those methods and deprecate the old confusing names? It'd be worth referencing here.) But the tests will help illustrate.Comment #40
catchFwiw I'd be fine with adding the placeholder then doing the protocol handling in a followup. There's no translation or Api
break adding protocol handling and the main thing here is cutting down the uses of. !placeholder.
Comment #41
lauriiiAdded some test coverage
Comment #46
joelpittetIn a follow-up to #18.
Ok I did some crunching to figure out how much impact would providing a safe attribute
:placeholder
help us witht()
using real data from Drupal 7 contrib modules and themes. Keep in mind, developers are likely to use:placeholder
wrong just as they have with with!placeholder
and the other two existing placeholders and it's less explicit and trickier to notice it's incorrect.Funny fact: I bet @xjm before crunching all D7 data I'd find some silly module using
<a href="%url">
int()
and sure enough I found one:)Downloaded all the themes and modules using @greggles module.
https://www.drupal.org/project/1481160/git-instructions
Then while that was downloading, I wrote a script to parse all the
t()
functions (kinda like potx but with arguments as my context, and likely too much regex than sane).Using ag (like grep/ack) to get all the t() lines.
ag --nonumbers --nofilename -C0 '[^\w](format_string|t)\(.+=\\?"[!@%][^"\W]+".*, array\([^\$]+\$' > ../modules.txt
It just gets all the
t/format_string()
functions that have an attribute with one of our placeholders inside it and checks for a variable some time after for more context in the grep.Then drove all those lines into a script to parse further:
cat ../modules.txt | ./test.php
Test script here: (could use a bunch of love and would love to do potx file parsing for better accuracy than regex but whatever, it does the job)
https://gist.github.com/7bf0190a678a691fe188
tl;dr;
Results Summary
~All D7 Modules
~All D7 Themes
Note: the ones that are likely unsafe/potentially unsafe. Are just because I didn't bother to look if those variables were hardcoded strings in the same function, which I would bet dollars to donuts half of them are.
Conclusion
Adding an extra placeholder, would cause a lot of work for people, with very little gain. I suggest running this kind of script, issue an SA on the projects who are doing it incorrect.
Make developers use either
!placeholder
withUrlHelper::filterBadProtocol()/check_url()
OR@placeholder
withUrlHelper::stripDangerousProtocols()/drupal_strip_dangerous_protocols()
.Pick one to be the recommended, avoid both to alleviate confusion, and say no to extra work for little benefit and more confusion.
*hops off soap box*
Comment #47
joelpittet@effulgentsia thank you very much for #21. I'd say maybe that deserves it's own place in the issue summary so it's kept in mind if we continue this issue.
Comment #48
dawehner@catch
Added a new issue for just that: #2567795: Introduce a : placeholder which works like ! for now
Comment #49
dawehnerWell, this is the thing. It is not entirely obvious what to choose. Having a more semantic placeholder would improve the DX here, IMHO, but for sure, for most t() calls (as we have gazillons)
we don't have to deal with that problem. Just imagine how big the patch actually would have to be.
Comment #50
joelpittet:placeholder is not semantic as ":" doesn't = URL, and it conflicts mentally for me with
:placeholder
which are used quite frequently forSQL placeholders
.No need to imagine we've been working on massive find and replaces for these placeholders. #2506445-98: Replace !placeholder with @placeholder in t() and format_string() for non-URLs in tests which was a mega patch of around ~500K :P
Comment #51
effulgentsia CreditAttribution: effulgentsia at Acquia commentedYes, but it does not do so for $options['fragment'] nor should it.
I'm still not seeing where the URL generator does any escaping for HTML. And if it does, that would seem like a bug to me, since there's plenty of use cases for returning URLs in non-HTML responses, including in HTTP headers for redirects. URL encoding and HTML encoding are two completely orthogonal operations/formats, so I'm not clear which "special characters" you're referring to. In any case, happy to punt this discussion to a different issue. I added tests to #2567795: Introduce a : placeholder which works like ! for now to cover the case of #21, so we can discuss the right approach there or in whatever issue wants to change those tests.
Fair point. I thought I was clever when suggesting ":" as a mnemonic for "something related to ':' needs attention", but the overlap with the SQL placeholders might make that a bad prefix. Let's see if we can come to a consensus on the best prefix in #2567795: Introduce a : placeholder which works like ! for now.
Comment #52
dawehnerI think this test coverage is pointless ... we want to have a KernelTest in order to know that the url generator works together with the placeholder properly. This test just does the same as the string
Comment #53
dawehnerWorking on proper test coverage, sorry.
Comment #54
dawehnerThis uses the actual url generator mechanisms in tests, but it fails at the moment.
Comment #55
catchWe discussed this on a hangout with, er, lots of people, and was consensus was to bump it to critical:
- it makes explicit what to do with URLs in t()
- it goes a long way towards removing !placeholder
- it makes it harder to introduce double escaping bugs by for example XSS::filterBadProtocol() and @ (you'd have to use UrlHelper::stripDangerousProtocols() with @).
Also while there's a minor caveat about :placeholder and database placeholders, given the different context and the fact that interaction with db_query() is much less in 8.x, we thought the various benefits of : outweighed this.
Might have missed something, but bumping for now.
Comment #57
xjmSo, our documentation standard here for lists in docblocks makes this confusing... the
:variable:
kinda makes it look like you need colons on both sides of it. I guess there probably isn't anything we can do about that.So this is part of what I was getting at with the URL generator comment... I think it will confuse people that they have to do
or
or whatever... if the idea is that
:url
placeholder is for URLs, but they need to be a string, but the string should be generated from a URL object or the URL generator, which is in turn generated from another string... that's not easy to grok. I wonder if in a followup it could accept a Url object?@catch pointed out that the cost of doing protocol filtering twice probably isn't that bad, but it does add yet another thing to keep track of in the problem space of URL and link generation, and another thing for people to conflate.
Comment #58
xjmComment #59
stefan.r CreditAttribution: stefan.r commented@xjm regarding point 1: I still think that is worth addressing, either through deviating from the standards or explicitly pointing out that it has _one_ colon in the description.
Regarding point 2: we have a followup at #2567237: Url object support for SafeMarkup::format() placeholders
Comment #60
dawehnerYeah I totally think we should talk about doing that, but as we all know, it is out of the scope of this particular issue, but also in the case here, it could safe us some time, if we don't actually safe some more time, potentially.
Regarding the documentation, what about putting it in quotes?
Fixed the test coverage ... anyone know why
'Hey giraffe <a href=":url">MUUUH</a>', [':url' => Url::fromUri('base://foo?bar#baz&;')]
would actually result in
Hey giraffe <a href="/foo%3Fbar%23baz%26%3B">MUUUH</a>
and not'Hey giraffe <a href="/foo?bar=&">MUUUH</a>'
Comment #61
googletorp CreditAttribution: googletorp as a volunteer and at Reveal IT commentedFixed test.
#60 This is to be expected, since ::fromUri takes fragments and url queries in an additional option arg. What you are trying to create is and url that contains ?#&; There chars has to be encoded sinse they are used to specify URL fragment and query.
You can test it out at http://meyerweb.com/eric/tools/dencoder/ or just use php.
Comment #64
googletorp CreditAttribution: googletorp as a volunteer and at Reveal IT commentedI fixed the fail on the \InvalidException test and added some more tests for weird fragment/query combinations, seems like some of them matched what dawehner tried to do.
Comment #65
effulgentsia CreditAttribution: effulgentsia at Acquia commentedThe bug is in where Url::fromUri() does this:
$url = new static($uri, array(), $options);
. Above that, it already moved fragment and query into $uri_options and out of $uri_parts, but then discards that work by reverting to the original $uri and $options, and code running after that doesn't handle $uri containing queries and fragments. Fixing that is out of scope for this issue, so instead, maybe we should remove that first assertion from above, and rely only on the other 2?Comment #66
googletorp CreditAttribution: googletorp as a volunteer and at Reveal IT commentedThanks @effulgentsia
Makes a lot of sense. I've edited the tests, so that we don't have any conflicts with the fromUri bug.
Comment #67
joelpittetCan we add a
mailto:
protocol test in this and one forjavascript:alert(String.fromCharCode(88,83,83))
To ensure we have support for these?
#2567743-48: Add protocol filtering to Attribute
Comment #69
dawehnerYeah I was sure that the protocol stripping code isn't the problem, well, at least we know about the problem now.
Sure, more test coverage is really never a bad idea.
Comment #72
larowlanThis looks ready, just a couple of minor nits that can be fixed on commit.
how about 'HTML is escaped and bad protocols like "javascript:" are stripped'?
nit: into a instead of into an (I think)
Should we document when this could happen?
Provides a test covering integration of SafeMarkup with other systems?
Comment #73
dawehnerThank you @larowlan
Comment #74
larowlanlooks good to me
Comment #75
effulgentsia CreditAttribution: effulgentsia at Acquia commentedI'm not clear on what the intent of this test is. Those last couple lines are setting ':url' in the string, but '@url' in the arguments. But also, the InvalidArgumentException is thrown by the fromUri()->toString(), which is happening in
prepareSafeMarkupArgs()
, not where the "Should throw" comment is. And SafeMarkup::format() itself should not throw an exception, it should just filter the protocol away. And if we're wanting to test that a Url::fromUri() throws an exception during its toString(), why are we testing that in a class calledSafeMarkupKernelTest
?Comment #76
alexpottDo we have to do this? it kind of feels wrong. If we created the urls in the tests rather the providers it would unnecessary.
Comment #77
effulgentsia CreditAttribution: effulgentsia at Acquia commentedSo you mean, have the providers provide the URIs / route names, and have the test do the Url::from*() part? I'd be fine with that. Frankly, I'd also be fine without the SafeMarkupKernelTest at all. I'm not really sure why we need to test the integration of SafeMarkup::format() and $url->toString(). The format of URLs is defined by an RFC, so IMO, unit tests that test that $url->toString() produces valid URL strings and unit tests that test that SafeMarkup::format() works with valid URL strings should be sufficient. But I'm also fine with the integration test if others see value in it.
Comment #78
xjmSo the single quotes do violate our docs standards, which at least at one point specifically forbade putting single quotes around such values in lists, and also when the other placeholders don't have them it still confuses. IMO it's probably better to just use the standard format, despite the weirdness.
Also, maybe we should say "dangerous" instead of "bad"? They're totally legitimate HTML, just not safe for user input.
Minor: needs parens.
I also feel still this is kind of not right. It obscures the fact that Url objects and strings are not actually equivalent in this context. If anything, I'd rather put the toString() in the provider. Then we could remove it if #2567237: Url object support for SafeMarkup::format() placeholders ends up going forward.
Also, I apparently never clicked "submit" on this earlier (in reply to @googletorp):
Comment #79
xjmAlso, I disagree with the last bit of #77 -- I do think an integration test is valuable, for the reasons stated above. This is how this API is going to be used 90%+ of the time, so it deserves an integration test.
Comment #80
dawehnerYeah I have to agree with that. I wasn't aware of the fact anymore that the unrouted URL assembler actually doesn't render those URLs, which is nice. Whether
the test has to be part of a thing called SafeMarkup, I don't know, but I think as a documentation kinda only test, its worth keeping.
Yeah its also the word which is used in the actual function name called in that particular case.
Well, yeah this is also not possible, given the context in which the data provider is executed (before any setup method). I changed the test code a bit.
Comment #81
googletorp CreditAttribution: googletorp as a volunteer and at Reveal IT commentedNit: Remove "string or" since we expect the value to always be of type/instance Url.
Comment #82
effulgentsia CreditAttribution: effulgentsia at Acquia commentedRe #79: ok.
Re #78.3: that conflicts with #76. Not sure how to reconcile the two desires. Should we create a separate Url class just for this test that extends Drupal\Core\Url and just overrides the __wakeup() method?
Re #78.4: adding docs that
:placeholder
doesn't validate that what is passed in is a RFC 3986 formatted URL probably makes sense. We might want to add some tests, such asllamas: they are cute
and<span>not a url</span>
. By the way, even PHP's parse_url() parses both of those as though they're URLs. Apparently neither of those are seriously malformed enough. Or, if we do decide we want to validate that it's a URL, we'll need a validation function that's less permissive than parse_url(), but more permissive than UrlHelper::isValid(), because that only considers "http", "https", and "feed" as valid schemes.Comment #83
dawehnerWe now create the url objects later, so we don't need to do that at all.
@effulgentsia
So what can we do about point 78.4?
Comment #84
effulgentsia CreditAttribution: effulgentsia at Acquia commentedLet's start with this. Any feedback on these tests or additional ones anyone would like?
I'll also add some docs later tonight.
Comment #86
effulgentsia CreditAttribution: effulgentsia at Acquia commentedI think this addresses all remaining feedback.
Comment #89
dawehnerups
Here is one way to fix the failure. The ComponentTest ensures that no references to Drupal\Core exists in a file. We could limit that by just code references and exclude the rule for comments, which should be fine IMHO, as it helps everyone to better understand the problem. On the longrun SafeMarkup / the functionality itself, will be somewhere else probably anyway.
Comment #90
plachThis looks great, I found only minor things :)
Can we add a note that this should be always used, even for generated URLs, given that we need also plain output format escaping aside from sanitization? #19 has a very good point about this.
Missing indentation in the second line.
This is a bit convolute and hard to parse: can we pass the arguments currently passed in
$args[':url']
directly and keep the current return value as-is? This would also simplify the data provider.Missing PHP docs
Comment #91
dawehnerThank you for your review plach :)
Let's add some pointless documentation :)
Not really sure how you think it it should look like. The solution I came up with is IMHO not nicer at all.
Good point about always using :placeholder.
Comment #92
effulgentsia CreditAttribution: effulgentsia at Acquia commentedThanks for this fix, but seems like that should be discussed in its own issue. So this patch removes that and works around the constraint.
The improvements in #91 look great to me.
For passing a URL object to SafeMarkup::format() , we have #2567237: Url object support for SafeMarkup::format() placeholders. Constructing a URL object in the data provider is problematic due to #76. So I think with the improvement in #91, this is as good as we can get.
@plach: are all your concerns addressed?
Comment #93
dawehner/me sighs about the incredible slow progress
#2570225: Consider allowing \Drupal\Core\... in documentation for components is the follow up
Comment #94
dawehnerdouble comment
Comment #95
effulgentsia CreditAttribution: effulgentsia at Acquia commentedAgreed. Because this blocks #2560783: Replace !placeholder with :placeholder for URLs in hook_help() implementations and related work, I'll go ahead and move it to RTBC now, and assign to @xjm, who I know wants to review this one more time before this lands. @plach or anyone else: please still feel free to knock it back in the meantime if you have remaining concerns.
Comment #96
dawehnerThank you @effulgentsia!!
Comment #97
plachIt's good to go on my end, this can be fixed in a follow-up I guess, if we need to do it at all:
I actually meant that the test methods have no description and @param docs, which are definitely more useful than data provider PHP docs, which I agree could be skipped altogether.
Comment #98
David_Rothstein CreditAttribution: David_Rothstein as a volunteer commented<a href="@placeholder">
). Are there other issues or do they still need to be filed?If we don't convert all of Drupal core to use this then I don't see the point of this issue (i.e. I agree with @joelpittet's concerns). But if all of core uses it, then there is some value because it means anyone looking at examples of how URLs are handled in t() by Drupal core gets an example that will be secure regardless of what URL they throw at it.
Why isn't this checking SafeMarkup::isSafe() before calling Html::escape() (like the other placeholders do)?
So what happens if I want to pass in an HTML attribute that doesn't contain a URL? This documentation tells me I have no option...
I think in reality @variable is fine (and in fact desirable) to use for many attributes, just not all of them. For example, Xss::attributes() skips protocol-filtering for title, alt, and data-*.
I am not sure the best way to handle this. We could just say for @variable:
Perhaps it could use some examples too:
<img src=":variable" alt="@variable" />
or<a href=":variable">@variable</a>
I think this is too long, and most developers won't read or fully understand it. It should be shorter (like the other examples). Plus, given that this is for attributes in general (not just URLs) the last point seems like it might be a bad idea.
Could this just be replaced with the following, which is a bit more inline with how the other placeholders are documented?
I think if any other details are needed, they could go on the documentation of UrlHelper::stripDangerousProtocols() rather than here.
Comment #99
justAChris CreditAttribution: justAChris as a volunteer commented#98.1: Just added #2570355: Replace remaining !placeholder and @placeholder with :placeholder for URLs
Comment #100
catchSafeMarkup::isSafe() does not mean 'safe for an HTML attribute', only for an HTML fragment/node see #2569485: Add AttributeSafeStringInterface and UriAttributeSafeStringInterface.
Comment #101
stefan.r CreditAttribution: stefan.r commentedThis would probably need yet another different placeholder...
Comment #102
effulgentsia CreditAttribution: effulgentsia at Acquia commentedI'll open an issue for #101.
Comment #103
David_Rothstein CreditAttribution: David_Rothstein as a volunteer commentedRe #101 and #102, I don't understand why we would want separate placeholders for URLs and attributes? How are they different? A URL in 'href' or 'src' is just a special case of an HTML attribute.
Sure, I wasn't suggesting to wrap the whole thing in an isSafe() check, just the HTML::escape() part (to avoid double-escaping like all the others do). I agree UrlHelper::stripDangerousProtocols() needs to run no matter what.
Comment #104
David_Rothstein CreditAttribution: David_Rothstein as a volunteer commentedOh, I just read this more carefully and now I think I see your point. XSS filtering doesn't make it safe for an attribute even when combined with UrlHelper::stripDangerousProtocols().
I think there should be a code comment explaining this, plus maybe a @todo to change later? - since there really should be separate isSafe() methods someday for figuring out whether something is XSS-filtered (i.e. made into safe HTML) vs. escaped (i.e. made into safe plain text).
Comment #105
effulgentsia CreditAttribution: effulgentsia at Acquia commentedOpened #2570431: Document that certain (non-"href") attribute values in t() and SafeMarkup::format() are not supported and may be insecure.
Comment #106
stefan.r CreditAttribution: stefan.r commentedAs currently we don't have a way to tell if something has already been escaped, @dawehner mentioned maybe we could use the $double_encode flag on htmlspecialchars(). Considering it would only be used on URLs, might that actually be fine?
Comment #107
effulgentsia CreditAttribution: effulgentsia at Acquia commentedI don't think so, not for URLs.
#&
and#&
are semantically different URLs, and functions that operate on URLs should not treat the latter as an HTML-encoding of the former. Functions that want to take an HTML-encoded URL as their argument, such as UrlHelper::filterBadProtocol(), should be clear that that's what they expect, and only be called from places that know that that's what they're passing (such as from Xss::attributes()).For the general problem of wanting a double-escape-proof attribute filter, there's #2569041: Figure out what to do about attribute filtering in Twig and its related issues.
Comment #108
catchI think it's worth moving some of the documentation as #94.4 suggests to stripDangerousProtocols() and this seems the right place to do it - since we're going to expect people to use this as soon as this patch lands, and they might read those long docs. CNW for that.
The rest I think we have plenty of major and/or critical issues open at the moment actively discussing how to handle properly, and I'd rather continue the discussion in those issues, so for me this is RTBC again with that docs move-around.
Comment #109
David_Rothstein CreditAttribution: David_Rothstein as a volunteer commentedSo #2570431: Document that certain (non-"href") attribute values in t() and SafeMarkup::format() are not supported and may be insecure points out that in addition to
t('...<a href="@url">....')
, examples liket('...<a title="@title">....')
aren't actually guaranteed to be safe in Drupal 8 either. The latter is a regression from Drupal 7 (and a critical issue in its own right), but I can see it being a separate issue.My main point regarding this issue stands, though - we shouldn't try to separate out "URLs" from "HTML attributes" in the documentation, since that distinction isn't consistent with the rest of Drupal. There are 3 types of HTML attributes handled by Xss::attributes(), currently as follows:
URLs as found in 'href' and 'src' are a special case of #3, not their own category. Certainly 'href' and 'src' are the most common cases (and maybe even the only ones we want to encourage) but the documentation here should still make clear that ":variable" must be used if you have something that is going into an HTML attribute, regardless of whether it's expected to be a URL or not. Hence my suggestion in #98.4.
Comment #110
stefan.r CreditAttribution: stefan.r commentedMoving around those docs
Comment #115
David_Rothstein CreditAttribution: David_Rothstein as a volunteer commentedLooks pretty good to me - test failure seems probably bogus.
Should be \Drupal\Component\Utility\Html::escape()?
I think we should still add a code comment here per #104 (maybe no @todo though - I'm not really sure what issue the @todo would point to anyway :)
Something like: "HTML attributes must be escaped unconditionally (even if they were already marked safe) since content that has been filtered for XSS can still contain characters that are unsafe for use in attributes."
I'm pretty sure we don't want this paragraph on stripDangerousProtocols() either.
Should be \Drupal\Core\Routing\UrlGeneratorTrait::url()?
Comment #116
stefan.r CreditAttribution: stefan.r commentedWhich other attributes need stripping bad protocols? Certainly not all of them, as on some of them the stripping would actually be problematic?
Comment #117
stefan.r CreditAttribution: stefan.r commentedAnd I actually just read David_Rothstein's writeup that I copy-pasted into the patch more closely, not too sure about this bit here as the issue was supposed to be about URL attributes: "use this as the default choice for anything displayed in an HTML attribute"
#115.2 seems like a useful clarification.
Comment #118
David_Rothstein CreditAttribution: David_Rothstein as a volunteer commentedThere are some examples at https://www.owasp.org/index.php/XSS_Filter_Evasion_Cheat_Sheet (search the page for "javascript:"). To be honest, I'm not sure if any besides "href" actually lead to a problem in modern browsers (and I'm 100% sure that the current whitelist in Xss::attributes() is more restrictive than it needs to be) but my point is that we should always be recommending that developers do the same thing that Drupal core does. Whatever attributes Drupal itself filters dangerous protocols for in Xss::attributes() - that (at a minimum) is what we should be telling people to filter via this new API. Otherwise it's inconsistent, and possibly insecure. If we change Xss::attributes() someday to relax the behavior, then the documentation here could certainly be changed too.
Comment #119
stefan.r CreditAttribution: stefan.r commentedhref and src seems to cover most of it still.
https://stackoverflow.com/questions/2725156/complete-list-of-html-tag-at... lists a few more.
But I still don't see how this means we need to run bad protocol filtering on attributes where we aren't expecting user input to be a URL?
Comment #120
RainbowArrayI'm not up to speed on all the details of this issue, but just a note that srcset also uses URLs in the Responsive Image module, if that's relevant.
Comment #121
David_Rothstein CreditAttribution: David_Rothstein as a volunteer commentedIf it's user input it doesn't matter what we expect - it can be whatever it wants to be :) If there are browsers that can get tricked into interpreting it as a URL with e.g. a
javascript:
protocol, that's when the problems arise.Reading through #2105841: Xss filter() mangles image captions and title/alt/data attributes it seems like the discussion there was pretty happy to have a much larger whitelist of do-not-need-to-be-protocol-filtered attributes than wound up in the final patch, but it's still a whitelist - the fallback behavior is to run the protocol-filtering (which for most attributes won't hurt anyway). So I really just want to avoid adding documentation here that implies ":variable" is the wrong choice for non-URL attributes, when in fact it's doing the exact sanitization we generally recommend for them. This was confusing and really sends people down the wrong path.
If it helps, I'd be happy with having the documentation also explicitly discourage passing in arbitrary user input to most attributes in the first place -- for most attributes it's not likely what the developer really wanted to do (even though it's safe with ":variable") and for "style" and "on*" it's of course unsafe entirely and never what the developer really wanted to do.
Comment #122
stefan.r CreditAttribution: stefan.r commentedNot really following the point here, which non-URL attributes would be tricked into interpreting input as a URL? src and href are out (those are always URLs/relative paths), any of the others in the list in #119?
We would have to be expecting a URL/path necessarily right? If we do src="http://www.example.com/@color.png", or href="#@anchor", we shouldn't need protocol filtering as the protocol is already defined through the code.
Yes, until we solve #2570431: Document that certain (non-"href") attribute values in t() and SafeMarkup::format() are not supported and may be insecure that would make more sense...
Comment #123
catchWe can point to the whitelist, but using this for 'tite' or 'alt' attributes will completely break valid strings in unexpected ways, so it needs to be clear not to use it for that.
Copying my example from another issue reply to David:
Comment #124
catchYes let's just do that here. As far as I know core only uses href so we won't be contradicting our own docs.
Comment #125
fgmsrcset
is another major attribute in the same line ashref
andsrc
, albeit with different validity rules. Let's not forget it.Comment #126
stefan.r CreditAttribution: stefan.r commentedComment #127
stefan.r CreditAttribution: stefan.r commentedsrcset would be nice but not sure if it's blocking the critical as it is not used in core yet and we haven't looked into how to sanitize that so I don't think we can just add that in the ":" docs?
Comment #130
stefan.r CreditAttribution: stefan.r commentedFixing the test fails by using the same solution used elsewhere in Component (leaving the FQN with Drupal\Core for the @see only).
Comment #131
RainbowArray#127: srcset is indeed used in core, within the Responsive Image module.
Comment #132
David_Rothstein CreditAttribution: David_Rothstein as a volunteer commentedNot sure if there are actually any, but the current behavior of Drupal's XSS filtering is that if we don't know the attribute is specifically safe, we must protocol-filter it (regardless of what type of attribute it is). There are some pretty wacky examples at https://www.owasp.org/index.php/XSS_Filter_Evasion_Cheat_Sheet (e.g.
<TABLE BACKGROUND="javascript:alert('XSS')">
and<IMG LOWSRC="javascript:alert('XSS')">
) although I guess those are considered URL attributes too.Anyway, I'm happy with the wording in the current patch - I think it now does a good job explaining the situation and leaves the rest to #2570431: Document that certain (non-"href") attribute values in t() and SafeMarkup::format() are not supported and may be insecure.
I still don't understand why this paragraph is here (or at least the first half of the paragraph) - can someone explain the motivation for including it? Why would we want to change stripDangerousProtocols() to do something other than strip off protocols that it doesn't know are safe?
Very good point.
Comment #133
stefan.r CreditAttribution: stefan.r commentedDavid_Rothstein see #82/#86 for explanation about the malformed bit.
You made a good point about how people will ignore the docs anyway... and as apparently it's too difficult/expensive to check for some of this on run-time, it could be interesting to look into doing automated checks on drupal.org contrib module source code at release time.
And I may be wrong but all the wacky and non-wacky examples would still be specifically for URLs as described in the docs, so I think we're good here :)
Comment #134
lauriiiJust a reroll after #2557113: Make t() return a TranslationWrapper object to remove reliance on a static, unpredictable safe list
Comment #135
lauriiiSome docs improvements
Comment #136
joelpittetDocumentation review
Do we really need this @todo? I think we can just remove it. It's not the responsibility of the
:placeholder
to care about non-URL attributes.Maybe we could add a reference to find these attributes out? Maybe
http://www.w3.org/html/wg/drafts/html/master/index.html#attributes-1
This is not true, this is good for non-URL attribtues like
title=""
nit: no comma needed.
Unassigned @xjm because we can fix these things before she reviews again.
Comment #137
stefan.r CreditAttribution: stefan.r commentedThanks @joelpittet, not sure about 1-3 but will fix that comma :)
Comment #138
stefan.r CreditAttribution: stefan.r commentedI think this is RTBC by the way - leaving NR for @joelpittet to check
Comment #139
David_Rothstein CreditAttribution: David_Rothstein as a volunteer commentedLooks pretty good to me now - my concern above regarding the stripDangerousProtocols() documentation was addressed in one of the rerolls above. (I guess the original idea there, when that documentation was on SafeMarkup::format() instead, is that maybe someday the :placeholder could do some kind of URL validation also, but I think that's not a good idea.)
One thing:
What does that mean? - did it mean to say "an attribute accepting URLs"?
And I'm glad I'm not the only one who made the mistake to think
<a title="@title">
is safe in D8 like it is in D7 :) I think that definitely justifies #2570431: Document that certain (non-"href") attribute values in t() and SafeMarkup::format() are not supported and may be insecure being critical.Comment #140
David_Rothstein CreditAttribution: David_Rothstein as a volunteer commentedOh wait, this is really wrong for another reason too:
The result of stripDangerousProtocols() is not actually secure to use in an HTML attribute - in fact it says so later on in the @return documentation of that method ("As with all plain-text strings, this return value must not be output to an HTML page without being sanitized first"). So I think we have to change the above to not imply that it's secure to stick directly into an attribute.
Comment #141
stefan.r CreditAttribution: stefan.r commentedHmm the bit from #139 seems to have gotten introduced in #135 only but the suggestion it's secure is very wrong indeed :)
Comment #142
pwolanin CreditAttribution: pwolanin as a volunteer and at Acquia commentedTaking a look at this
Comment #143
pwolanin CreditAttribution: pwolanin as a volunteer and at Acquia commentedThis looks weird:
It feels like this patch is doing rather more than I expected.
Comment #144
stefan.r CreditAttribution: stefan.r commentedabout to clarify #143 further
Comment #145
stefan.r CreditAttribution: stefan.r commentedAddressing feedback from @pwolanin in #143. Discussed this here in BCN and this was more about the test being a bit confusing rather than functionality of the patch.
Comment #146
pwolanin CreditAttribution: pwolanin as a volunteer and at Acquia commentedLooks better, but we have the same weirdness in core/tests/Drupal/Tests/Component/Utility/SafeMarkupTest.php
Comment #148
stefan.r CreditAttribution: stefan.r commented@pwolanin I don't see that we do, the test seems fine there? We don't do any argument value conversion there...
Comment #149
alexpottThis is looking really good to me. The documentation reads well and is clear about the use-cases for
:placeholder
and the differences between it and@placeholder
.This issue has landed. You can remove the hack.
Comment #150
jcnventura CreditAttribution: jcnventura at Wunder commentedRemoved the hack. Let's see if the test bots like it.
Comment #151
pwolanin CreditAttribution: pwolanin as a volunteer and at Acquia commentedMinor - this looks a little confusing since there is a
:
at the beginning and end:+ * - :variable: Escaped to HTML using Html::escape() and filtered for
Not sure I have any solution.
Comment #152
jcnventura CreditAttribution: jcnventura at Wunder commentedIndeed, even though my brain filtered that trailing colon automatically. I'm not if there's anything we can do that wouldn't make it worse.
I suggest to leave it as is.
Comment #153
stefan.r CreditAttribution: stefan.r commented@pwolanin see #78, maybe we can go with this anyway despite the confusing-ness... we repeat the proper format often enough elsewhere in the docs and code examples
Comment #154
alexpottHas @David_Rothstein's comment in #140 been addressed?
Comment #155
dawehnerSighs, it is not worth to discuss about that IMHO. Earlier versions of the patches had used ':variable':
Comment #156
stefan.r CreditAttribution: stefan.r commented@alexpott, yes, the comment has been changed to:
Comment #157
catchI needed to double check this was true for a URL like http://example.com/:javascript - and it is, stripDangerousProtocols explicitly handles that.
Apart from that this is RTBC for me. The only slightly confusing bit is the @todo for non-attribute support, but we have that critical issue open at the moment to discuss/document not supporting/add support for those.
Marking RTBC.
Comment #158
alexpottCommitted c5d96cb and pushed to 8.0.x. Thanks!
Comment #160
googletorp CreditAttribution: googletorp as a volunteer and at Reveal IT commentedAwesome, great work all.
Comment #161
stefan.r CreditAttribution: stefan.r commentedCR added at https://www.drupal.org/node/2571689, can anyone review and publish?
Comment #162
pfrenssenCR reviewed and published, thanks!
Comment #163
hass CreditAttribution: hass commentedMost usage for URLs was @url and not !url.
How will you solve the problem that this change will invalidate all current translations? Translators have really enough to do and cannot waste their time with translating all again. Many of them are not searching for old strings with @ chars inside.
Comment #165
hass CreditAttribution: hass commentedWhere is the case that changes all url placeholders in translatable strings?
This will invalidate all translated strings and has RC deadline as I know. Or do we only have this placeholder, but it is not used?
Comment #166
Gábor HojtsyLooks like this issue missed adding Twig translatable string support for this placeholder. It is possible to reproduce the % placeholder in twig, but not :. Let's not make Twig a second class citizen when it comes to translatability. #2592573: The :placeholder translation placeholder type not supported in Twig.
Comment #167
larowlanMinor follow-up #2690843: \Drupal\Core\Logger\LogMessageParser::parseMessagePlaceholders doesn't support :variable placeholder