Early Bird Registration for DrupalCon Portland 2024 is open! Register by 23:59 PST on 31 March 2024, to get $100 off your ticket.
Follow-up to #2501639: Remove SafeMarkup::set in drupal_check_module()
Problem/Motivation
rdf_preprocess_comment() calls SafeMarkup::set() which is meant to be for internal use only.
Proposed resolution
Move markup created inside PHP into template.
Remaining tasks
- ✔ Evaluate whether the string can be refactored to one of the formats outlined in this change record: https://www.drupal.org/node/2311123
- ✔ Manual testing and screenshots
- ✔ Identify whether there is existing automated test coverage for the sanitization of the string. If there is, list the test in the issue summary. If there isn't, add an automated test for it.
Manual testing steps (for XSS and double escaping)
Do these steps both with HEAD and with the patch applied:
- Clean install of Drupal 8.
- Enable RDF and comment modules (already enabled in standard install)
- Add a comment to a node (article)
- Find the RDF markup next to author and submitted
- make sure it's in the output from comment.html.twig, twig_debug is helpful for verifying
- Compare the output above in HEAD and with the patch applied. Confirm that there is no double-escaping.
Screenshot before patch:
Screenshot after patch:
User interface changes
N/A
API changes
N/A
Comment | File | Size | Author |
---|---|---|---|
#109 | remove_safemarkup_set-2501697-109.patch | 4.61 KB | joelpittet |
#109 | interdiff.txt | 1.03 KB | joelpittet |
#108 | remove_safemarkup_set-2501697-107.patch | 4.62 KB | ZenDoodles |
#106 | interdiff-2501697-106.txt | 1.14 KB | ZenDoodles |
#106 | remove_safemarkup_set-2501697-106.patch | 4.65 KB | ZenDoodles |
Comments
Comment #1
leslieg CreditAttribution: leslieg as a volunteer commentedworking on this as part of NHDevDays2 sprint
Comment #2
leslieg CreditAttribution: leslieg as a volunteer commentedComment #3
leslieg CreditAttribution: leslieg as a volunteer commentedComment #4
leslieg CreditAttribution: leslieg as a volunteer commentedpatch to change safemarkup::set to safemarkup::format attached
Comment #5
leslieg CreditAttribution: leslieg as a volunteer commentedfixed formatting issues with patch to change safemarkup::set to safemarkup::fromat
Comment #7
leslieg CreditAttribution: leslieg as a volunteer commentedComment #8
leslieg CreditAttribution: leslieg as a volunteer commentedComment #9
lokapujyaGood to go. Expecting this to be green.
Comment #10
xjmThanks @lokapujya and @leslieg!
On a skim this seems to make sense, but can we document some steps for manual testing and have someone run through them before/after?
Comment #11
leslieg CreditAttribution: leslieg as a volunteer commentedAdded manaul test steps
Comment #12
leslieg CreditAttribution: leslieg as a volunteer commentedComment #13
star-szrThese two lines both need a space after the final comma in the array per https://www.drupal.org/coding-standards#array.
Comment #14
leslieg CreditAttribution: leslieg as a volunteer commentedMade changes suggested by @cottser to comply with coding standards
Comment #15
leslieg CreditAttribution: leslieg as a volunteer commentedComment #17
star-szrI manually tested #7, @leslieg is uploading a new patch presently that fixes the coding standards but there are no other changes.
This patch actually improves things, I'm updating the issue summary with new screenshots showing this in action and updated the manual testing steps to be more specific.
Comment #18
leslieg CreditAttribution: leslieg as a volunteer commentedrecreated patch - fixed issue with diff not using 8.0.x
Comment #20
leslieg CreditAttribution: leslieg as a volunteer commentedInadvertantly includes another file's changes
Comment #21
star-szrMuch better! This is ready to go.
Comment #24
cilefen CreditAttribution: cilefen commentedI think there was a random test failure. I am putting this back to RTBC based on #21.
Comment #27
akalata CreditAttribution: akalata commentedOne of the testbots got twitchy this morning. Setting back to RTBC.
Comment #28
xjmAssigning to myself for additional feedback, but leaving at RTBC for now.
Comment #31
joelpittetFailed to checkout bogus. back to RTBC
Comment #32
alexpottGiven that we're building render arrays are this point we could just use the
#type => 'html_tag'
.So something like
Comment #33
alexpottComment #34
xjmAgreed, thanks @alexpott.
Comment #35
pfrenssenGoing to address the remarks as part of the D8 Accelerate sprint.
Comment #36
pfrenssenUpdated the patch, this is untested, letting the bot take care of it :)
Comment #37
pfrenssenComment #39
pfrenssenThanks bot for reinforcing the notion that manual testing is essential :)
Comment #40
pfrenssenFixed failures, and as a bonus we now got rid of a deprecated call to
drupal_render()
.Comment #41
lauriiiTested this manually on the default frontpage and node page and it seems to be working. Patch looks also sane.
Comment #42
star-szrThere was another issue to remove html_tag (I think it was a @catch issue) but probably much too late by this point. Only thing is I think the #value line is missing a trailing comma.
Comment #43
xjmNice, this is looking much improved.
I think we should also have an automated test here to assert the variables are sanitized on render for XSS and not double-escaped (since they can contain markup I think). Particularly since this is being output to RDF and not HTML. (Sorry for not mentioning this before.)
If anyone can find a link to the issue @Cottser is mentioning that would also be helpful to review/reference at least here in the issue.
Comment #44
lauriiiIt was at the IS but someone else has marked that we don't need it so didn't spend time on reviewing that.
Comment #45
star-szrHmm maybe it was just an issue comment somewhere. I can't find it. May also have dreamt it.
Comment #46
star-szrI may be thinking of this from the issue summary on #1825952: Turn on twig autoescape by default, but I think these are @chx's words above the "Original report by @catch" :)
Comment #47
chx CreditAttribution: chx commentedEven if I didn't write that I strongly agree with it :D
Comment #48
pwolanin CreditAttribution: pwolanin as a volunteer and at Acquia commentedSo, this is to add RDF tags into the HTML so the output is HTML.
given the new filtering in #markup, probably we could just convert these to be of type #markup as an easy fix?
Comment #49
pwolanin CreditAttribution: pwolanin as a volunteer and at Acquia commentedComment #50
hestenetI am going to try @pwolanin's suggestion in comment #48
Comment #51
hestenetTried my rather naive edit based on comment #48. Seems to apply cleanly at least - but someone with much more experience should determine whether this is safe and probably pick it up from here:
Source from HEAD:
Source with patch:
Comment #52
star-szrTo follow up #46 I think this is the comment from @catch I was thinking of:
#2296101-7: Remove SafeMarkup::set() use in \Drupal\Core\Render\Element\HtmlTag::preRenderHtmlTag()
Comment #53
joelpittet@hestenet thanks I think that looks pretty good at first glance. A couple of things.
Also setting status to needs review so testbot can take a crack at it.
I think it's 'type' => 'html_tag'. But close.
nit: Needs a comma at the end of all elements listed on a new line.
This looks like it should work, just needs to be tested to confirm.
Comment #55
Les Lim#51 isn't quite the way to use
#markup
-- attaching a new patch based off of #40.Re #53:
This patch implements @pwolanin's suggestion in #48 to specifically not use 'html_tag', opting instead for '#markup' which applies
Xss::filterAdmin()
automatically.Comment #57
Les LimShoot, using
#markup
as in #55 isn't going to work.Xss:filterAdmin()
doesn't like attributes with ":" characters in them, so:<span rel="schema:author"></span>
is being stripped down to:
<span rel="author"></span>
Comment #58
YesCT CreditAttribution: YesCT commentedcan we whitelist the rdf keywords?
Comment #59
Les Lim@YesCT: maybe? It would probably be easier and more future-proof to whitelist the "rel" attribute, to exempt it from protocol stripping. Patch coming.
Comment #60
Les LimWhitelisting "rel" in
Xss::attributes().
Comment #61
joelpittetGo go gadget tests!
Comment #63
dsnopekDo we know for sure that
$variables['author']
and$variables['submitted']
are safe? Maybe it would be better to use an inline template so that those will get filtered if they aren't already safe?Comment #64
Les LimThe failures in #60 are because we also need to whitelist the "typeof" attribute.or maybe not. Hm.Working on a new patch to use inline template per #63.
Comment #65
Les LimI'm not really comfortable with the patch in #60 touching
Xss::filter()
internals, especially since it would have to whitelist several attributes to cover the full set used for RDF. I've opened #2544110: XSS attribute filtering is inconsistent and strips valid attributes to talk about the XSS filter separately.This patch follows dsnopek's suggestion in #63 to use inline templates (instead of #markup with its implicit XSS filtering).
The inline template presumes that
new Attribute($author_attributes)
is already sanitized. TheAttribute
class is supposed to take care of that, but we've seen that it does not strip potentially dangerous protocols likejavascript:
, which is why it allows for valid RDF values like "schema:author". But I also can't think of a way to exploit that in this case.Comment #66
YesCT CreditAttribution: YesCT commented@Les Lim thanks for implementing the inline templates.
sounds like we should have some code comments explaining
what you said:
The inline template presumes that new Attribute($author_attributes) is already sanitized.
and any implications of that? Like how there is no way to exploit it...
Do we still need tests here? per #43
" to assert the variables are sanitized on render for XSS and not double-escaped (since they can contain markup I think). Particularly since this is being output to RDF and not HTML."
probably yes, asserting we are santitizing (where we are) and not (where we are not... and explain why it is ok to let things through)
Comment #67
dsnopekI think this is because it's concatenating the attributes directly into the template, rather than using Twig to expand them!
So, rather than:
... this should do something like:
However, this might not have the nice property of not filtering out those RDF properties. :-/ But it's the right thing to do from a security and Twig perspective! Concatenating data directly into the inline template is definitely wrong - it's as if that data were part of the code in the template written by the themer in a non-inline template.
Comment #68
xjmThe purpose of Attribute() is to do sanitization of attributes, so it always makes safe attributes when we use it HTML tags for attributes. So that's fine either way. We have used this in previous issues, for example, in #2501705: Remove SafeMarkup::set() in LinkGenerator and document SafeMarkup::set() in LinkGeneratorTest. (Using inline templates in this issue is advantageous because it allows us to do late rendering instead of the early rendering needed for links.)
What @dsnopek suggests is probably fine too if we use an inline template here. However, the code flow is very hard to follow now with the foreach. There are three variables:
'author'
,'created'
, and'submitted'
.'author'
should only be processed when$author_mapping
is set.'created'
should only be processed when$created_mapping
is set.'submitted'
should be processed when either one or the other is set. I think we could make the code flow easier to understand by handling each case separately.I will say that I'm a bit sad this issue got reworked to avoid
html_tag
based on a third-hand comment (like is there even an issue for that? and it certainly can't happen in D8 AFAIK... that'd be an enormous BC break that's hard to justify now and I think not at all feasible after RC). On the other hand, the inline template is more relevant for this issue than other issues where there has been pushback about its use, because the goal of this code is specifically to assemble markup. (It is a lot of boilerplate though still.)Comment #69
xjmOn the other other hand, @alexpott and I also discussed that it might just be good to make this an actual template. We have simpler templates than this in core too.
Comment #70
dsnopek@xjm: Ah, ok, thanks! I still don't think we shouldn't concatenate directly into an inline template, if this stays being an inline template, though.
There's at least one issue about removing html_tag: #2544318: Remove #type => html_tag usages
However, it's newer than the comment in #42
Comment #71
joelpittetI hope I didn't misconstrued anything in #68 or #69 but here's how I think that would be implemented.
No interdiff because it's quite different approach.
Comment #72
dsnopekNot that this issue needs any more uncertainty, suggestions for approach or off-hand comments, but... :-)
I think what @xjm said in #68 about using
html_tag
should be considered. Given the simplicity of the template or inline template it would seem like a good match? And while there is at least one issue for removinghtml_tag
, like @xjm said:However, I don't want to derail current work even further than I have already, so feel free to ignore this. :-)
But I think it might be good to have a sub-system maintainer or committer review the issue and at least dictate the approach (is it #type => markup, inline template, real template, or html_tag?) so that work can go forward and prevent more changes of direction.
Comment #73
joelpittet@dsnopek I'm for anything that get this past the core gates:) I have no problem with using
html_tag
to solve the problem.And I am one of the sub-system maintainers for the theme system;) and @xjm is a committer, but maybe we should get another set of eyes?
We are trying to go to templates first, second inline-templates/renderable arrays, and
#markup
should be avoided if something is more appropriate. So to dictate a bit:#type => html_tag
would be preferable over#markup.
Comment #74
lauriiiI think between using
#type => html_tag
and templates I would prefer using real template whenever it makes any sense. Moving all the markup from PHP to templates is quite ideal because then we don't have to deal with autoescaping. Also in this case I wonder why this markup even was in the PHP because it might be something that people want to override easily and after moving it to template its trivial.Not much to do with this patch but should we open a follow-up to start using the template created here in other places creating RDF spans?
I did also test the latest patch manually and I saw the RDF spans showing up normally in comments :)
Patch is RTBC for me but the tests needs still some figuring out.
Comment #75
lauriiiAnd because this template doesn't have any classes inside it, it doesn't have to be copied inside the module. Also the attributes variable and span is needed for functional reasons so they shouldn't be removed from the module.
Comment #76
joelpittetAssigning to myself for writing tests. (discussed on twig call)
Comment #77
acouch CreditAttribution: acouch at NuCivic (formerly Nuams) commentedWorking with jday at Madison hackathon and cat story sharathon
Comment #78
jday CreditAttribution: jday commentedDeleting this comment, error I was reporting was in fact due to a poorly applied patch, i.e. user error
Comment #79
xjmRe: #78, I wasn't able to reproduce this, but @jday and I figured out it was an issue with the way the patch was applied, so @jday will retest using git.
Comment #80
acouch CreditAttribution: acouch at NuCivic (formerly Nuams) commentedPer chat I will work on this this weekend and 1) work on trying to break the current test 2) add assertLink and check the output for the author and submitted
Comment #81
akalata CreditAttribution: akalata commentedChiming in with some backstory on the approach to avoid
#type => html_tag
. dawehner had suggested this in #2296101-4: Remove SafeMarkup::set() use in \Drupal\Core\Render\Element\HtmlTag::preRenderHtmlTag(), with Catch agreeing.HtmlTag::preRenderHtmlTag()
has lead to long discussions about what input is sanitized where; using a template makes that much more clear.#type => html_tag
is used primarily for html<head>
elements, or in tests. The other uses in core could easily be converted into templates. So even if it is not removed completely, we can scale back its usage, at least in core. Discussion at #2544318: Remove #type => html_tag usages welcome!Comment #82
andypostWhy we add extra span for rdf attributes? suppose parent element (field template) should be re-used
Comment #83
joelpittet@andypost I believe the answer to that is related to not knowing if the parent element is being used in the template but that would be a good question for @scor to answer because he'd know better than I.
Comment #84
Wim LeersThis patch looks beautiful.
Only nits:
This is not really appending data to "elements" (I am interpreting "elements" as "render elements"), it's just creating a pair of
[$render_array, $rdf_metadata]
. That's not appending.Not sure what the best comment here is, but this is IMO fairly confusing.
s/element./, including RDF attributes./
Comment #85
ZenDoodles CreditAttribution: ZenDoodles at Palantir.net commentedMine!
Comment #86
ZenDoodles CreditAttribution: ZenDoodles at Palantir.net commentedUpdated based on feedback in #84.
Comment #87
Wim LeersLooks great! But this still needs tests AFAICT. I think somebody who was more involved in this issue would be better able to RTBC this.
Comment #88
ZenDoodles CreditAttribution: ZenDoodles at Palantir.net commentedI'll write tests.
Comment #89
ZenDoodles CreditAttribution: ZenDoodles at Palantir.net commentedActually... SimpleTests already exist.
Attached patch is how I broke it to demonstrate. I wrapped everything in check_plain() to improperly escape. Tests will fail.
Comment #90
ZenDoodles CreditAttribution: ZenDoodles at Palantir.net commentedHah... always read the patch with your eyeballs first. This one is the real one.
Comment #93
ZenDoodles CreditAttribution: ZenDoodles at Palantir.net commentedSince tests fail as expected when the elements are improperly escaped, the patch above demonstrates there are already tests.
The patch in #86 is ready for review again.
Comment #94
ZenDoodles CreditAttribution: ZenDoodles at Palantir.net commented@andypost, Re: #82... I believe the your concern is out-of-scope. This patch does not add a wrapper. It just moves it to a template where it belongs.
Comment #95
scor CreditAttribution: scor as a volunteer commented@joelpittet is right of course. at the time when this code was written, there was no way to add attribute to the parent element from within the preprocess function. Now this has changed, and this markup will improve with #2214241: Field default markup - removing the divitis.
Re-uploading latest working patch to avoid having to scroll up (no credit please).
Comment #96
joelpittetComment #97
YesCT CreditAttribution: YesCT commentedI'm going to review this.
Comment #98
YesCT CreditAttribution: YesCT commentedMake render arrays that wrap the 'author' and 'submitted' RDF metadata. They will be available variables in comment.html.twig.
I think might be more accurate.
Make the render arrays for RDF metadata available in comment.html.twig.
other than those clarifications, I think this looks ready.
It has been a while since we have done manual testing. From reading the patch, the markup *should* be exactly the same. I'll just confirm that next.
Comment #99
YesCT CreditAttribution: YesCT commented1. Discussed this with @ZenDoodles in person, and after learning a bit more about this, I think those comments are fine. No need to change them.
2. manual testing
before
after
there are two differences,
a. the double space in the span for rel in head is fixed in the patch to just be one space.
b. there is a newline after the patch before the comment__author p close tag
--
also confirmed the new template core/modules/rdf/templates/rdf-wrapper.html.twig is being used, for example:
3. updating the remaining tasks in the summary (to note they are done)
====
aside from the extra newline, I think this is rtbc.
Comment #100
ZenDoodles CreditAttribution: ZenDoodles at Palantir.net commentedYou're killing me Smalls!
Comment #101
joelpittetWe follow git settings. Normally we don't care about whitespaces, but we do care for inline elements (which this using). BUT this inline element is for robots, not people, it doesn't display to anybody.
Here's where we are discussing this.
#2245901: Trim trailing EOF whitespace from templates
So don't do this removal because of robots:)
Comment #102
YesCT CreditAttribution: YesCT commentedok. that makes sense. so that makes 85_94 the one to commit. reuploading it so it is the most recent patch on the issue (again).
Comment #104
xjmReading this, I questioned what happened if
$variables['created']
or$variables['submitted']
weren't set at all, but I confirmed they are both always set intemplate_preprocess_comment()
.Also, as far as I can tell, in HEAD,
$variables['created']
would never be a render array; it is only ever set to a string. So the!is_array()
check is not actually necessary. However, it makes sense to support it being a render array going forward for consistency, so I think it makes sense to do this.I also read the great saga of the final template newline and agree with retaining it here since it doesn't actually negatively affect the presentation in this case. (However, YesCT++ for the careful testing to identify the potential issue in the first place.)
So all the above things are fine. However, after carefully inspecting @ZenDoodles' test results in #100, I realized it isn't actually failing any assertions for the correct markup contents of the span tag. So I think we need to add one short additional test method to
CommentsAttributesTest
that saves a comment using the existing save method in the test and then asserts that the author name is a link for the authenticated user. Should only be a couple lines long.Thanks everyone for your careful review of this issue!
Comment #105
ZenDoodles CreditAttribution: ZenDoodles at Palantir.net commentedAh. Okay... I got it.
Comment #106
ZenDoodles CreditAttribution: ZenDoodles at Palantir.net commentedAdded a test for @xjm
Comment #107
YesCT CreditAttribution: YesCT commentedI'm not sure we do function calls like that.
the place that says is probably https://www.drupal.org/coding-standards#functcall
Comment #108
ZenDoodles CreditAttribution: ZenDoodles at Palantir.net commentedOops... Put the newline back in this one.
Comment #109
joelpittetClarified the comment above the user_role_grant_permissions and fixed #107
Comment #110
kgoel CreditAttribution: kgoel at Forum One commentedSpoke with @joelpittet - comment is improved in CommentAttributesTest.php for testCommentRdfAuthorMarkup() and so it's clear which link show up when registered user posts comment and also #107 has been addressed. RTBC pending testbot.
Comment #112
xjmGreat work on this everyone! This patch is a great example for other issues as well.
Committed and pushed to 8.0.x. I added the following inline comment on commit:
Comment #113
dsnopekWoohoo! Congrats everyone, this issue was epic :-)