In #2545972-96: Remove all code usages SafeMarkup::checkPlain() and rely more on Twig autoescaping @pwolanin pointed out the #plain_text
might be preferred to #safe_strategy
. In the same issue @Damien Tournoud says that
'#safe_strategy' => Markup::SAFE_STRATEGY_ESCAPE is just plain wrong, as it mixes two radically different issues: the issue of the format (plain text vs. html), and the issue of the sanitization.
Whilst this mixing of "radically different" has been present in HEAD for a long time with ['#markup' => SafeMarkup::checkPlain($stuff)]
however we're trying to remove SafeMarkup::checkPlain()
so perhaps we should implement this.
Comment | File | Size | Author |
---|---|---|---|
#30 | 2555931-2.30.patch | 21.19 KB | alexpott |
| |||
#30 | 28-30-interdiff.txt | 2.71 KB | alexpott |
#28 | 22-27-interdiff.txt | 574 bytes | alexpott |
#28 | 2555931-2.27.patch | 20.35 KB | alexpott |
#22 | 2555931-2.22.patch | 20.29 KB | alexpott |
Comments
Comment #2
alexpottGiven that
#safe_strategy
has only been around for a week perhaps it is acceptable to make this change.Comment #3
joelpittetHmm, I must have missed when '#safe_strategy' went in. But reading over the patch a couple of times the '#safe_strategy' seems clear in how the text will be treated. The DX looks to improve slightly, but having less keys to add will do that for anything.
#plain_text
as a key doesn't tell me that it will be escaped (except if I remember that check_plain(), escaped things, which I'd rather forget ;-) ).Comment #4
Damien Tournoud CreditAttribution: Damien Tournoud commented@alexpott: #2 looks wrong to me: if it is plain-text, it is plain-text and it must be escaped. It is absolutely incorrect to call
SafeMarkup::isSafe($elements['#markup']
in that case.This relates to a misunderstanding you had in the other issue (2545972#104):
This is just not possible. If something is plain-text, it is plain-text. If something is HTML, it is HTML. There is no in-between, and you cannot just imagine that you can pull a switch and magically start accepting a subset of HTML in a field that was previously plain-text. In an HTML text, some key characters have special meaning (notably
&
,<
and>
), and you cannot just change their meaning mid-flight.On a blog about HTML, for example (it's a very contrived example, but it gets the point across very well), all those titles would be affected by a sudden switch from plain-text to HTML:
Comment #5
pwolanin CreditAttribution: pwolanin as a volunteer commentedI agree with Damien - you should be setting a string there and it should always be escaped.
Here's a patch that changes the logic a little to match that notion.
Comment #6
catch@Damien that's true in terms of the lifetime of that site - the HTML blog can't change strategy mid-stream (or not without a database update to escape all the existing titles). Also the example isn't that contrived, there are d.o issues with HTML tags in the titles.
However another site might want
<em>
tags in node titles to work and choose filtering instead. I've seen that implemented on at least a couple of sites in the past, although it's never pretty.If both sites from the start decide on escaping vs. filtering then there is no 'mid-flight' change at all.
Comment #7
alexpottHere is a non-contrived usage of HTML in a title taken from https://www.newscientist.com/:
<i>American Zoo</i> shows animal conservation some tough love
https://www.newscientist.com/And of course we have a few thousand users of https://www.drupal.org/project/html_title and a not inconsiderate amount of help articles and questions on the topic https://www.google.com/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF...
The current approach of escape means that people have no alternative but to enable a module or hack their theme because we escape. If we filtered then at least people could enter the the escaped html in the title. However, I'm not suggesting that we change this here I just think we need to make it as easy as possible for a module to exist that changes this behaviour.
Wrt to the current patch - I ummed and ahhed about the making it always escape. I think we might have problems with translation since if a translator adds html, for example: http://www.w3schools.com/tags/tag_bdi.asp
Why unset? We don't do this for other render array properties.
Comment #8
pwolanin CreditAttribution: pwolanin as a volunteer commentedThe unset isn't really needed.
Translations shouldn't be a problem if they go to #markup. Really anything that's already escaped and/or marked safe should also be using that.
Comment #9
pwolanin CreditAttribution: pwolanin at Acquia commentedI think having the node title in a render array and being able to change the render array (moving text from #text_plain to #markup doesn't seem a lot harder than changing the value of another key) is a lot better than the case now of just having text.
Comment #10
Wim LeersI expected a
'#type' => 'plain_text'
, so aPlainText
render element type. I think semantically that'd make more sense.#text_plain
? Shouldn't this be#plain_text
?Comment #11
joelpittet@pwolanin my comment in #3 is still my concern.
#plain_text
to me doesn't mean "escape", but neither does#markup
mean filter some of the markup... So I don't think I have much to contribute to this issue.Needs work on #10
Comment #12
joelpittetThen again, this patch doesn't make things worse... and for people familiar with the field options of 'Plain Text' (and not oddly named
check_plain()
function). And #markup is a bit clearer as you don't have to know the strategy it's using on top of all the other implicit things that are happening.I can't think of a better name, so I change my mind.
Comment #13
joelpittetFixed comment typo from #10
Comment #14
Wim LeersI'm not yet fully convinced this is actually better.
I don't disagree at all with Damien's explanation/rationale in #4. But, the fact is, the end result/goal of Drupal's render arrays always is… markup. It is from that POV that I always interpreted
#markup
: .I'm not saying I think
#plain_text
is a bad idea per se, but I do think it needs further consideration. Hence, moving back to NR. I think this at least needs a from Alex Pott.Finally, #10 is also not yet answered ( ).
Comment #15
joelpittetre #10 I'm ok with it being not being a white elephant like
#markup
and creating#plain_text
as a properElement
Comment #16
effulgentsia CreditAttribution: effulgentsia at Acquia commentedSimilar to #14, I'm also wondering what DX benefits:
has over:
Per #4, in both cases, the developer needs to know that $plain_text is plain text and not an HTML string, and then convey that information to the render array (which per #14 is an HTML concatenation machine). To me, the two approaches above seem like equally direct and straightforward ways of conveying that, but the latter seems more future compatible with #2554073: Allow #markup to be an array of strings and join them safely.
Comment #17
alexpottWell one big issue with
is that currently that unnecessarily filters the #markup even though it has been escaped and that is a waste of resources.
Comment #18
joelpittetre #17 would we be open to the idea of reverting that
#markup ~= Xss Admin Filter
so that#markup => "<script>alert('What you put into it');</script>"
?Comment #19
effulgentsia CreditAttribution: effulgentsia at Acquia commentedIs #17 significant enough to worry about? How common is it to put plain text strings into a render array? Most text string literals should go through t(), which can be assigned to #markup directly. Things like node title (and all other content entity fields) should be output with field formatters, which would allow the appropriate ability for a site to configure whether those should be treated as HTML or text. Sure, maybe there are some content/configuration strings which would go through this Html::escape() followed by Xss::filterAdmin() sequence, but especially if these are short strings, how much overhead are we really talking about?
Comment #20
alexpottUnfortunately I don;t think it's working that way - the title is not a field in the way that you can just go and edit the field settings.
The patch attached contains a conversion of the MessageForm to use #plain_text for the item render elements it puts on the page. This nicely illustrates the problem with the Markup render element - the support for #markup (and with this patch #plain_text) is baked into the render. The lines that do this are:
Also testing this revealed a double escaping bug in the dblog messages list :(
Comment #22
alexpottLol... so we have a test asserting on the double escape bug in dblog...
This code in link generator is supposed to the strip tags from titles... even though they are later escaped hence the test change might not be quite what you where expecting. fun.
Comment #23
joelpittet#22 That reminds me of #2531824: Attribute class to check safe strings before escaping (has tests) I wonder if there is a correlation, maybe not...
Comment #24
joelpittetShould we add
assertNoRaw()
to check for that script tag?Comment #25
Damien Tournoud CreditAttribution: Damien Tournoud commentedFor the record, I agree with #16. I am coming here just for one reason which is that
'#safe_strategy' => Markup::SAFE_STRATEGY_ESCAPE
was just wrong, because it was confusing encoding and sanitization.Comment #26
pwolanin CreditAttribution: pwolanin at Acquia commented@Damien Tournoud - I think #16 is not as good since it's not possible to change the already-escaped text.
Obviously it would also work fine, but I think it's helpful to have a "native" option in render arrays
Comment #27
andypost+1 rtbc, just nits to clean use statements
+1, yay!
please clean-up use of class
Comment #28
alexpottI think the filtering already escaped text is problematic because of performance and just general sense. What is the point?
I agree that
'#safe_strategy'
was a misstep so I think we have four options here:SafeMarkup::checkPlain()
. I think this is a bad idea - we the safe string list is difficult to work with and has very hard to predict side effects. We should be doing everything we can to reduce it for D8 and definitely plan for it's removal.#markup
and not do the filtering. This leads to more object creation and does not have the advantage of just scanning a render array and knowing what will be escaped or filteredI think the fourth approach is preferable since it allows us to alter render arrays to change the behaviour without adding or removing calls to
Html::escape()
and it gives render arrays a way of working very similar to Twig's auto-escape.Here's a patch that adds the additional assertion requested by @joelpittet.
Comment #29
Wim Leerss/#text_plain/#plain_text/
I think this means we want to add documentation to
Html::escape()
, to inform the developer thatHtml::escape()
should generally not be used when constructing render arrays:#plain_text
should be used instead."Attribute will escape" sounds strange.
Yay for these tests!
Comment #30
alexpott@Wim Leers thanks.
1. Fixed
2. Fixed
3. I don't think the comments saying how many characters truncate is truncating too is useful - refactor the comments to hopefully fix this.
4. :)
Comment #31
Wim LeersMuch clearer, thanks!
80 cols violation.
So… given that, shouldn't we just stop reviewing/rerolling this patch? This is a bit confusing :P
Comment #32
alexpottOops I realised that there was another option of not filtering #markup whilst writing the comment.
That's not an 80 cols violation cause of
@link theme_render render arrays @endlink
- see https://www.drupal.org/coding-standards/docs#linkComment #33
Wim Leers#32: So… you are still +1 for this patch?
Comment #34
alexpottYes I think it is the way to go given that we want #markup to auto filter and we using Twig's auto-escape.
Comment #35
pwolanin CreditAttribution: pwolanin as a volunteer and at Acquia commentedhey ho, let's go.
Comment #36
lauriiiI created small CR for this.
Comment #37
alexpottThere is another reason why this is necessary and the following code does not work...
Xss:adminFilter() is not idempotent wrt strings it does not filter :( So
:(
Comment #38
alexpottThank @lauriii for the CR - we need to update https://www.drupal.org/node/2549395 once this is committed too.
Comment #39
alexpottActaully #37 is fortunately wrong... yay...
The code in
Xss::filter()
is interesting.Comment #41
catchOK so I liked using the element on the other issue, but it clearly does not work with #item_list.
Adding #plain_text and ditching #safe_strategy is great.
Committed/pushed to 8.0.x, thanks!
Comment #42
stefan.r CreditAttribution: stefan.r commentedMaybe "s/and the/so (that) the/" to clarify that it's usage of that key that causes the auto-escaping?
s/filtered to protect against XSS using the admin tag list/filtered using the the admin tag list to protect against XSS/
s/using the/by using the/
s/tags that Xss::filter() would accept/the tags that Xss::filter() must allow/
s/instead of #markup/instead of the #markup property/
with #markup and/or #plain_text set (though I think it we should only allow "or")
Are we missing a [] here?
We escape if it's #plain_text, otherwise we filter so let's say "sanitized" instead of "escaped"?
The second sentence here is a bit confusing and I don't know that it even belongs in the @return, should we just drop it?
Should we also explain that under which circumstances this will return a regular string? I.e. when the contents of #markup are already in the safe list or we pass in an empty string?
Xss::filterAdmin()
These not only specify what the array provide, should we make explicit that they also provide the actual value, even if it's obvious from the code example?
I.e.
"which tags are using" is missing a "we", but maybe "which tags to use for filtering the markup"?
Also, "s/tags that Xss::filter() would accept/the tags that Xss::filter() must allow/"
As to the last sentence, I wonder if we should have an assertion disallowing setting of both #allowed_tags and #markup at the same time, instead of spelling out these edge cases?
So this would give an assertion error if we did
assert(!(isset('#plain_text') && isset('#markup')))
Comment #43
stefan.r CreditAttribution: stefan.r commentedblegh, crossposted with a commit :)
Comment #44
stefan.r CreditAttribution: stefan.r commentedThis is blocking deprecation of SafeMarkup::checkPlain() / SafeMarkup::escap so no reason to revert, let's open a follow-up: #2559577: Follow up to #2555931: Fix comments
Comment #45
stefan.r CreditAttribution: stefan.r commentedComment #46
star-szrSmall follow-up: #2559597: Clean up uses of Drupal\Core\Render\Element\Markup