Problem/Motivation
Follow-up to #2527360: Review all usages of Xss::filter(), Xss::filterAdmin(), and Html::escape()
The safe string list in SafeMarkup is difficult to control and has potential side effects. We should not use it if we don't have to.
Proposed resolution
Add two new render array keys #markup_xss_tags
and #markup_safe_strategy
. #markup_xss_tags
can take an array of tags to change how #markup
is filtered. In HEAD #markup
is filtered using the admin tag list. #markup_safe_strategy
allows us to switch from a xss filtering strategy to an escaping strategy using htmlspecialchars()
.
This patch also changes #title to be able to be a render array since different implementations have different strategies (it's a bit of mess). Other functions, for example, search_excerpt() are changed to return a render array instead of a safe marked string to allow for late filtering/escaping by the render system.
Remaining tasks
- Agree approach
- Write and review patch
User interface changes
None
API changes
#title
can be a render arraysearch_excerpt()
returns a render arraySafeMarkup::xssFilter()
is removedaggregator_filter_xss()
is removedaggregator_xss_tags()
is addedDrupal\aggregator\Controller\AggregatorController::feedTitle()
returns a render arrayDrupal\menu_ui\Controller\MenuController
returns a render arrayDrupal\taxonomy\Controller::vocabularyTitle
returns a render arrayDrupal\taxonomy\Controller::termTitle
returns a render array
Data model changes
None
Comment | File | Size | Author |
---|---|---|---|
#41 | 2549791-2.40.patch | 54.65 KB | alexpott |
#41 | 35-40-interdiff.txt | 4.94 KB | alexpott |
#36 | 2549791-2.35.patch | 53.23 KB | alexpott |
#36 | 31-35-interdiff.txt | 5.38 KB | alexpott |
#31 | interdiff.txt | 15.51 KB | Wim Leers |
Comments
Comment #2
alexpottComment #4
alexpottSo we really need #2501931: Remove SafeMarkup::set in twig_render_template() and ThemeManager and FieldPluginBase:advancedRender in since aggregator is relying on SafeMarkup set side effects to traverse the views render pipeline without side effects. Patch attached includes the latest path from that issue.
Also I think we should fix search_excerpt to not do the early render - just need to fix the test. And potentially add a template for head title and put all the logic from html process in there so that we don't have to do the early render.
Comment #5
alexpottImproved documentation in the patch and the issue summary.
Comment #7
dawehnerI can haz documentation?
Would it make sense to do an early return given that we already know its safeness?
opinions on $this vs. static:: ?
I'm not entirely sure whether making the markup_safe_strategy implicit is the right thing to do here
Do you mind putting an inline doc about the type here?
Comment #8
alexpottRe #7 - thanks @dawehner
Comment #9
Wim LeersI have some doubts/concerns about this patch.
s/html/HTML/
This is not yet documenting, not even mentioning
#markup_safe_strategy
and the possible values for that."safeMarkup" is a strange method name IMO, because it doesn't have a verb.
It very much concerns me that we're now adding even escaping/filtering logic to
Renderer
.Isn't
Renderer
more complex than we'd like already?Is there really no other way?
I think these could be converted to
#type => 'processed_text'
, and use a text format that usesfilter_html
to only allow the specified tags.That's really what the aggregator is doing here: consuming content from elsewhere (the content creator is not a person, but an external site), whose output needs to be filtered through the filter system, just like for any node that any content creator creates.
It seems wrong to use the render system to do filtering in a filter.
(This uses the render system to render some markup. Very different to the above.)
Comment #10
alexpottRe #9
Changed markup_xss_tags to markup_allowed_tags as @Wim Leers advice on IRC plus added more docs (again with the help of Wim)
Comment #11
Wim LeersInterdiff review:
s/html/HTML/
"renderer system" should be either "Renderer" or "render system" :)
"filtered for XSS using" -> "filtered for XSS protection using" or maybe "filtered to protect against XSS using"
s/tags to filter/tags allowed by the filtering/
?
s/key/property/
s/tags are using/tags are allowed when filtering/
s/string/markup/ ?
Based on the above changes/discussions, more remarks on other parts of the patch:
Why no @see to the other constant?
I think
MARKUP_SAFE_STRATEGY_something
would be clearer, because it'd closely match#markup-safe_strategy
.Again only an @see to one of the two constants.
s/permitted/allowed/
s/xss/allowed/
Comment #12
alexpottAddressed all the points of the review... went for
MARKUP_SAFE_STRATEGY_FILTER
andMARKUP_SAFE_STRATEGY_ESCAPE
. Thanks @Wim Leers.I think that
SafeMarkup::xssFilter()
is okay to remove because it was only introduced in #2506195: Remove SafeMarkup::set() from Xss::filter() and therefore has only existed for 1 month.Comment #13
alexpottHere's the patch :) - Also updated the CR
Comment #14
Wim LeersI have no more remarks, besides one silly nit below. But I think this needs additional review by somebody else before it can be RTBC'd.
Should we prefix this with an underscore, to indicate it's not a public API?
Or at least add
@internal
?Comment #15
alexpott@Wim Leers I think marking it with @internal and an underscore makes sense.
Comment #16
xjm@Wim Leers pinged me this issue. I think removing this method from SafeMarkup is an excellent idea, and it probably makes sense for render arrays to be able to control the escaping strategy of
#markup
.I'm somewhat nervous about adding even more magic keys to the Render ArrayPI, but it helps to conceptualize a render array as an object-in-waiting.
I'm also nervous about anything that requires us to add even more
renderPlain()
calls (basically just using render arrays as "renderable markup objects" locally when the context doesn't support them).I pinged @catch and @Fabianx for their feedback too.
Comment #17
alexpottThis patch only adds one renderPlain call - and that is to a test.
Comment #18
alexpottUsing Html::escape().
Comment #19
Wim LeersShouldn't these now also point to
Html::escape()
?Comment #20
alexpottGood point @Wim Leers. Fixed.
Comment #21
dawehnerI'm wondering whether the constants should reflect that we deal with HTML for those cases ...
Comment #22
catchDiscussed this in irc with alexpott and Wim Leers.
So the main concern I have with this is that search excepts, aggregator, contact messages are all extremely special cases - similar to Wim's point in #9. #markup isn't a special case, but we're adding a capability to it to support things that are.
However page titles, while also a special case aren't exactly esoteric.
So it really comes down to opening follow-ups to look at the places that require this, and see if we can do them differently (like using the filter system for aggregator and possibly search excerpts). Then maybe we can deprecate this in a minor release and/or remove in 9.x.
We also talked about adding a new #type, but that would then conflict with #markup.
In terms of the patch itself, I think it's fine.
Comment #23
alexpottThanks @catch and @Wim Leers. Opened the followups:
Re #21 in the discussion with @Wim Leers and @catch, @Wim Leers said:
The renderer is about HTML at the moment. If we change that we need to do more work that just these constants.
Comment #24
Wim LeersGiven #22 and #23, I think this is
RTBC again.now RTBC!Comment #25
joelpittetJust throwing in a little request: Any chance the key could just be
#allowed_tags
instead of#markup_allowed_tags
? or is the idea to discourage it's use? then I'm fine with it as is.Comment #26
alexpott@joelpittet key name is because it only affects
#markup
it does not affect all the other things we admin filter by default. Of course, if there are better key names let's make the change.Comment #27
joelpittetWe don't do that with other types so I don't think it's a good to start that pattern here if that is the reason.
eg.
#items
isn't#item_list_items
because it only works with#type => item_list
.Comment #28
Wim Leers#27: the difference is that
#items
is specific to#type => 'item_list'
, i.e. you have to already specify that specific#type
. But#markup
is special: it's something thatRenderer
itself directly supports. If you don't specify a#type
, you can still use#markup
. Therefore I think it's okay to be more explicit and different compared to the common case.Comment #29
catchSo I was about to commit this, but then I read Wim's comment and thought of #2012818: Remove #type 'markup'.
A new #type would conflict with #markup, but if we moved #markup back to an element, then the new key support and accompanying logic can live in the render element. Also the original motivation for removing the element was because there was no distinction at all except more typing, but given the extra logic around #markup there's something to use there now. Not 100% sure it's better than the patch, but Wim Leers said in irc he'll try it.
Comment #30
Wim LeersAnd done, as described in #29. This also moves and renames the constants from
RendererInterface::MARKUP_SAFE_STRATEGY_something
toMarkup::SAFE_STRATEGY_something
.And because of this, I think #27 now does make sense. Doing that next.
Comment #31
Wim Leers#markup_allowed_tags
→#allowed_tags
#markup_safe_strategy
→#safe_strategy
Comment #32
Wim LeersThis would effectively revert the CR at https://www.drupal.org/node/2036237.
Comment #33
dawehnerNice!
You can use ->willReturnCallback if you like
Comment #34
Wim Leers2. I had that at one point, but lost it in my many attempts to minimize the changes in this area. Let's see if there are other nitpicks Alex or I can fix in one go :)
Comment #35
dawehnerYeah its still RTBC
Comment #36
alexpottJust noticed some unused use statements :) also took care of #33.2
Comment #37
Wim LeersRTBC++
(I just moved code.)
Comment #38
catchOne issue - these docs are for a completely different class.
Should we just say "Provides a render element for HTML as a string, with sanitization." or something?
Comment #39
alexpottOn it.
Comment #40
Wim LeersUgh, sorry, I forgot to update that :(
Comment #41
alexpottImproved docs.
Comment #42
alexpottComment #43
Wim LeersExcellent! :)
EDIT: dreditor WTF.
Comment #44
alexpottRTBC++
(I just moved documentation.)
#37 ;)
Comment #45
catchThanks for the docs update.
Committed/pushed to 8.0.x, thanks!
Comment #47
Wim LeersWoot! Next up: #2545972: Remove all code usages SafeMarkup::checkPlain() and rely more on Twig autoescaping and #2549393: Remove SafeMarkup::escape() and rely more on Twig autoescaping.
Comment #48
cilefen CreditAttribution: cilefen commentedIn the CR, is it #markup_allowed_tags or #allowed_tags? It seems to contradict itself.
Comment #49
David_Rothstein CreditAttribution: David_Rothstein as a volunteer commentedI fixed the change notice now so it uses #allowed_tags consistently.
However the change notice does a good job explaining how to do this with #markup but does not really explain much about how to replace SafeMarkup::xssFilter() elsewhere in a render array.
For example if I have this:
and if the #theme callback is generic enough that it does not impose restrictions on what can be passed in (e.g. plain text vs HTML) then before this issue the calling code could just call SafeMarkup::xssFilter($description) to have it be treated as HTML.
Now with that no longer an option, I'm not sure what you're supposed to do. This might work:
although the nested properties look a little strange.
However, if you do that, doesn't that mean the theme preprocess function (and anything else that alters or interacts with the render array along the way) now needs to consider the possibility that #description could be either a string or an array?
Or is the idea that it should always be an array, and you're not supposed to use strings in render API properties at all anymore?
I'm confused about how all this works and I think more info is needed in the change notice.
Comment #50
jhodgdonThis patch that was committed in modules/search changed the return value of the search_excerpt() function from a string to a render array.
This is actually a fairly large API change. There is no change notice. There are contrib modules that use this function, such as Display Suite, so it absolutely needs a change notice. Please write one. The patch may also have changed other API functions, I haven't looked, but this change (a) should have been at least passed by the Search module maintainers and (b) doesn't have a change notice.
Comment #51
jhodgdonI took a look through the patch. aggregator_filter_xss() is apparently removed, and I cannot see a change notice for this either.
Comment #52
alexpottre #49 - the render array still supports string properties. These generally will be auto escaped by Twig and this is not changed in this patch. If you want them to include markup then either you'll be using
SafeMarkup::format()
,t()
or one of the "special" render properties i.e.#markup
,#prefix
or#suffix
.re #50/#51 Created https://www.drupal.org/node/2564261 and https://www.drupal.org/node/2564263
Comment #53
David_Rothstein CreditAttribution: David_Rothstein as a volunteer commented@alexpott, I don't see how those would work with my example in #49, where $description comes from an unknown source (e.g. user input) and is expected to contain HTML.
I guess it's technically possible to do something like this:
and it would work... but I can't imagine this is what we want to recommend.
Comment #54
alexpott@David_Rothstein I think that if the description contains markup we should be using a render array. For the FieldApi is uses a special FieldFilterString object to do this - which whilst not ideal - is one way of going. In the example you give the $description would be escaped because Xss::filterAdmin() does not put it in the safe list.
Comment #55
David_Rothstein CreditAttribution: David_Rothstein as a volunteer commentedMy example has SafeMarkup::format() which would put it in the safe list.
So you're basically suggesting what I had in #49, then?:
But if so, see the questions I had there. Basically we are saying that preprocess functions/etc are going to need to be very aware of this. If they just take a variable and treat it as a string, then the calling code can't really decide to send in HTML (at least not using the recommended, non-contortionist method).
Comment #56
alexpott@David_Rothstein the
@placeholder
would cause it to be escaped. And yes preprocess functions should be aware of this. Because as you can see from the patch we didn't ever have to actually do this.Comment #57
David_Rothstein CreditAttribution: David_Rothstein as a volunteer commentedOh, yikes - I meant
!description
but in fact that doesn't treat it as safe either now. Anyway, we all agree it wasn't a good workaround anyway :)We at least need something in the change notice about this since it's a common use case, but I'm not quite sure what to recommend. I think it basically means that if you're defining a theme implementation that you expect other modules to call, you need to consider that the calling code might pass in an array rather than a string and handle both cases (or require/document that they only pass in an array). Or decide on your level that this should never have HTML, in which case you can still decide to treat it as a string only.... Still a little confused how to do this in practice.
Comment #58
alexpottI think that the additional documentation has been added in #2559971: Make SafeMarkup::format() return a safe string object to remove reliance on a static, unpredictable safe list and #2565895: Add a new :placeholder to SafeMarkup::format() for URLs that handles bad protocols and is continuing to be improved in #2570431: Document that certain (non-"href") attribute values in t() and SafeMarkup::format() are not supported and may be insecure
Comment #59
Fabianx CreditAttribution: Fabianx as a volunteer commentedThere are 2 unpublished change record drafts on this issue.
Are they correct?
Comment #61
Fabianx CreditAttribution: Fabianx as a volunteer commentedThere are still 2 unpublished change record drafts on this issue.
Are they correct?
Comment #62
alexpottThey are correct - fwiw I've published them.