Closed (fixed)
Project:
Drupal core
Version:
8.0.x-dev
Component:
render system
Priority:
Major
Category:
Task
Assigned:
Unassigned
Issue tags:
Reporter:
Created:
22 Aug 2015 at 23:31 UTC
Updated:
12 Sep 2015 at 01:24 UTC
Jump to comment: Most recent, Most recent file
Comments
Comment #2
alexpottGiven that
#safe_strategyhas only been around for a week perhaps it is acceptable to make this change.Comment #3
joelpittetHmm, I must have missed when '#safe_strategy' went in. But reading over the patch a couple of times the '#safe_strategy' seems clear in how the text will be treated. The DX looks to improve slightly, but having less keys to add will do that for anything.
#plain_textas a key doesn't tell me that it will be escaped (except if I remember that check_plain(), escaped things, which I'd rather forget ;-) ).Comment #4
damien tournoud commented@alexpott: #2 looks wrong to me: if it is plain-text, it is plain-text and it must be escaped. It is absolutely incorrect to call
SafeMarkup::isSafe($elements['#markup']in that case.This relates to a misunderstanding you had in the other issue (2545972#104):
This is just not possible. If something is plain-text, it is plain-text. If something is HTML, it is HTML. There is no in-between, and you cannot just imagine that you can pull a switch and magically start accepting a subset of HTML in a field that was previously plain-text. In an HTML text, some key characters have special meaning (notably
&,<and>), and you cannot just change their meaning mid-flight.On a blog about HTML, for example (it's a very contrived example, but it gets the point across very well), all those titles would be affected by a sudden switch from plain-text to HTML:
Comment #5
pwolanin commentedI agree with Damien - you should be setting a string there and it should always be escaped.
Here's a patch that changes the logic a little to match that notion.
Comment #6
catch@Damien that's true in terms of the lifetime of that site - the HTML blog can't change strategy mid-stream (or not without a database update to escape all the existing titles). Also the example isn't that contrived, there are d.o issues with HTML tags in the titles.
However another site might want
<em>tags in node titles to work and choose filtering instead. I've seen that implemented on at least a couple of sites in the past, although it's never pretty.If both sites from the start decide on escaping vs. filtering then there is no 'mid-flight' change at all.
Comment #7
alexpottHere is a non-contrived usage of HTML in a title taken from https://www.newscientist.com/:
<i>American Zoo</i> shows animal conservation some tough lovehttps://www.newscientist.com/And of course we have a few thousand users of https://www.drupal.org/project/html_title and a not inconsiderate amount of help articles and questions on the topic https://www.google.com/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF...
The current approach of escape means that people have no alternative but to enable a module or hack their theme because we escape. If we filtered then at least people could enter the the escaped html in the title. However, I'm not suggesting that we change this here I just think we need to make it as easy as possible for a module to exist that changes this behaviour.
Wrt to the current patch - I ummed and ahhed about the making it always escape. I think we might have problems with translation since if a translator adds html, for example: http://www.w3schools.com/tags/tag_bdi.asp
Why unset? We don't do this for other render array properties.
Comment #8
pwolanin commentedThe unset isn't really needed.
Translations shouldn't be a problem if they go to #markup. Really anything that's already escaped and/or marked safe should also be using that.
Comment #9
pwolanin commentedI think having the node title in a render array and being able to change the render array (moving text from #text_plain to #markup doesn't seem a lot harder than changing the value of another key) is a lot better than the case now of just having text.
Comment #10
wim leersI expected a
'#type' => 'plain_text', so aPlainTextrender element type. I think semantically that'd make more sense.#text_plain? Shouldn't this be#plain_text?Comment #11
joelpittet@pwolanin my comment in #3 is still my concern.
#plain_textto me doesn't mean "escape", but neither does#markupmean filter some of the markup... So I don't think I have much to contribute to this issue.Needs work on #10
Comment #12
joelpittetThen again, this patch doesn't make things worse... and for people familiar with the field options of 'Plain Text' (and not oddly named
check_plain()function). And #markup is a bit clearer as you don't have to know the strategy it's using on top of all the other implicit things that are happening.I can't think of a better name, so I change my mind.
Comment #13
joelpittetFixed comment typo from #10
Comment #14
wim leersI'm not yet fully convinced this is actually better.
I don't disagree at all with Damien's explanation/rationale in #4. But, the fact is, the end result/goal of Drupal's render arrays always is… markup. It is from that POV that I always interpreted
#markup: .I'm not saying I think
#plain_textis a bad idea per se, but I do think it needs further consideration. Hence, moving back to NR. I think this at least needs a from Alex Pott.Finally, #10 is also not yet answered ().
Comment #15
joelpittetre #10 I'm ok with it being not being a white elephant like
#markupand creating#plain_textas a properElementComment #16
effulgentsia commentedSimilar to #14, I'm also wondering what DX benefits:
has over:
Per #4, in both cases, the developer needs to know that $plain_text is plain text and not an HTML string, and then convey that information to the render array (which per #14 is an HTML concatenation machine). To me, the two approaches above seem like equally direct and straightforward ways of conveying that, but the latter seems more future compatible with #2554073: Allow #markup to be an array of strings and join them safely.
Comment #17
alexpottWell one big issue with
is that currently that unnecessarily filters the #markup even though it has been escaped and that is a waste of resources.
Comment #18
joelpittetre #17 would we be open to the idea of reverting that
#markup ~= Xss Admin Filterso that#markup => "<script>alert('What you put into it');</script>"?Comment #19
effulgentsia commentedIs #17 significant enough to worry about? How common is it to put plain text strings into a render array? Most text string literals should go through t(), which can be assigned to #markup directly. Things like node title (and all other content entity fields) should be output with field formatters, which would allow the appropriate ability for a site to configure whether those should be treated as HTML or text. Sure, maybe there are some content/configuration strings which would go through this Html::escape() followed by Xss::filterAdmin() sequence, but especially if these are short strings, how much overhead are we really talking about?
Comment #20
alexpottUnfortunately I don;t think it's working that way - the title is not a field in the way that you can just go and edit the field settings.
The patch attached contains a conversion of the MessageForm to use #plain_text for the item render elements it puts on the page. This nicely illustrates the problem with the Markup render element - the support for #markup (and with this patch #plain_text) is baked into the render. The lines that do this are:
Also testing this revealed a double escaping bug in the dblog messages list :(
Comment #22
alexpottLol... so we have a test asserting on the double escape bug in dblog...
This code in link generator is supposed to the strip tags from titles... even though they are later escaped hence the test change might not be quite what you where expecting. fun.
Comment #23
joelpittet#22 That reminds me of #2531824: Attribute class to check safe strings before escaping (has tests) I wonder if there is a correlation, maybe not...
Comment #24
joelpittetShould we add
assertNoRaw()to check for that script tag?Comment #25
damien tournoud commentedFor the record, I agree with #16. I am coming here just for one reason which is that
'#safe_strategy' => Markup::SAFE_STRATEGY_ESCAPEwas just wrong, because it was confusing encoding and sanitization.Comment #26
pwolanin commented@Damien Tournoud - I think #16 is not as good since it's not possible to change the already-escaped text.
Obviously it would also work fine, but I think it's helpful to have a "native" option in render arrays
Comment #27
andypost+1 rtbc, just nits to clean use statements
+1, yay!
please clean-up use of class
Comment #28
alexpottI think the filtering already escaped text is problematic because of performance and just general sense. What is the point?
I agree that
'#safe_strategy'was a misstep so I think we have four options here:SafeMarkup::checkPlain(). I think this is a bad idea - we the safe string list is difficult to work with and has very hard to predict side effects. We should be doing everything we can to reduce it for D8 and definitely plan for it's removal.#markupand not do the filtering. This leads to more object creation and does not have the advantage of just scanning a render array and knowing what will be escaped or filteredI think the fourth approach is preferable since it allows us to alter render arrays to change the behaviour without adding or removing calls to
Html::escape()and it gives render arrays a way of working very similar to Twig's auto-escape.Here's a patch that adds the additional assertion requested by @joelpittet.
Comment #29
wim leerss/#text_plain/#plain_text/
I think this means we want to add documentation to
Html::escape(), to inform the developer thatHtml::escape()should generally not be used when constructing render arrays:#plain_textshould be used instead."Attribute will escape" sounds strange.
Yay for these tests!
Comment #30
alexpott@Wim Leers thanks.
1. Fixed
2. Fixed
3. I don't think the comments saying how many characters truncate is truncating too is useful - refactor the comments to hopefully fix this.
4. :)
Comment #31
wim leersMuch clearer, thanks!
80 cols violation.
So… given that, shouldn't we just stop reviewing/rerolling this patch? This is a bit confusing :P
Comment #32
alexpottOops I realised that there was another option of not filtering #markup whilst writing the comment.
That's not an 80 cols violation cause of
@link theme_render render arrays @endlink- see https://www.drupal.org/coding-standards/docs#linkComment #33
wim leers#32: So… you are still +1 for this patch?
Comment #34
alexpottYes I think it is the way to go given that we want #markup to auto filter and we using Twig's auto-escape.
Comment #35
pwolanin commentedhey ho, let's go.
Comment #36
lauriiiI created small CR for this.
Comment #37
alexpottThere is another reason why this is necessary and the following code does not work...
Xss:adminFilter() is not idempotent wrt strings it does not filter :( So
:(
Comment #38
alexpottThank @lauriii for the CR - we need to update https://www.drupal.org/node/2549395 once this is committed too.
Comment #39
alexpottActaully #37 is fortunately wrong... yay...
The code in
Xss::filter()is interesting.Comment #41
catchOK so I liked using the element on the other issue, but it clearly does not work with #item_list.
Adding #plain_text and ditching #safe_strategy is great.
Committed/pushed to 8.0.x, thanks!
Comment #42
stefan.r commentedMaybe "s/and the/so (that) the/" to clarify that it's usage of that key that causes the auto-escaping?
s/filtered to protect against XSS using the admin tag list/filtered using the the admin tag list to protect against XSS/
s/using the/by using the/
s/tags that Xss::filter() would accept/the tags that Xss::filter() must allow/
s/instead of #markup/instead of the #markup property/
with #markup and/or #plain_text set (though I think it we should only allow "or")
Are we missing a [] here?
We escape if it's #plain_text, otherwise we filter so let's say "sanitized" instead of "escaped"?
The second sentence here is a bit confusing and I don't know that it even belongs in the @return, should we just drop it?
Should we also explain that under which circumstances this will return a regular string? I.e. when the contents of #markup are already in the safe list or we pass in an empty string?
Xss::filterAdmin()
These not only specify what the array provide, should we make explicit that they also provide the actual value, even if it's obvious from the code example?
I.e.
"which tags are using" is missing a "we", but maybe "which tags to use for filtering the markup"?
Also, "s/tags that Xss::filter() would accept/the tags that Xss::filter() must allow/"
As to the last sentence, I wonder if we should have an assertion disallowing setting of both #allowed_tags and #markup at the same time, instead of spelling out these edge cases?
So this would give an assertion error if we did
assert(!(isset('#plain_text') && isset('#markup')))Comment #43
stefan.r commentedblegh, crossposted with a commit :)
Comment #44
stefan.r commentedThis is blocking deprecation of SafeMarkup::checkPlain() / SafeMarkup::escap so no reason to revert, let's open a follow-up: #2559577: Follow up to #2555931: Fix comments
Comment #45
stefan.r commentedComment #46
star-szrSmall follow-up: #2559597: Clean up uses of Drupal\Core\Render\Element\Markup