Suggested commit message:
Issue #1825952 by Fabianx, joelpittet, bdragon, heddn, chx, xjm, pwolanin, mikey_p, ti2m, bfr, dags, cilefen, scor: Turn on twig autoescape by default
Note: The approach has changed several times in this issue, so anything before comment #139 may not be relevant.
Work on this patch is done in a sandbox:
https://www.drupal.org/node/1857558
git clone --branch autoescape2--xjm http://git.drupal.org/sandbox/chx/1857558.git d8_autoescape
Contact chx for access.
Problem/Motivation
No one can write XSS-safe code. Not core contributors, nor contrib developers, no one.
Proposed resolution
- Turn on Twig autoescaping by default.
- Add a new SafeMarkup class. Sanitized markup strings can be flagged (with caution) as safe with SafeMarkup::set() and thoroughly documented (see #2280965: [meta] Remove every SafeMarkup::set() call). These will be rendered as-is by Twig; everything else will be escaped automatically.
Remaining tasks
Task | Novice task? | Contributor instructions | Complete? |
---|---|---|---|
Add unit tests for SafeMarkup | Instructions | Done (#205) | |
Add detailed documentation for SafeMarkup | |||
Draft a change record for the API changes | Instructions | Done (needs review) |
User interface changes
If we do this right, then none. If we don't then you will see a new kind of bug: double escaping.
API changes
New SafeMarkup class and SafeMarkup::implode() helper.
You are not allowed to put unsafe user data in #attached. This can be relaxed in a followup but it truly gets gnarly. You are advised to not use #type => html_tag
if at all possible or at least not with unsafe user data. This is not something I want to waste an effort on making it work.
Original report by @catch
Part of meta issue
follow-up from #1696786: Integrate Twig into core: Implementation issue
Twig as it stands introduces a fair bit of overhead into the theme system. Fabianx indicated that a lot of this is from marking $variables as secure so they're not double escaped later.
Ideally, if Twig autoescape is going to be enabled, then we should just pass raw variables to it and let it do the work. This way, if a template doesn't print the date, or a link, or whatever might currently be check_plain()ed in preprocess, we're not spending all this time creating it for it to be never used. In general, we should be able to remove a large chunk of preprocess work, and just let Twig sort out variables on demand.
Doing this means that a PHPTemplate engine in contrib is going to have to add back a way to securely format variables, but I don't see a way around this if we don't want a serious performance regression.
Comment | File | Size | Author |
---|---|---|---|
#280 | interdiff.txt | 837 bytes | scor |
#280 | twig-autoescape-1825952-280.patch | 102.09 KB | scor |
#275 | interdiff.txt | 637 bytes | scor |
#275 | twig-autoescape-1825952-275.patch | 102.02 KB | scor |
#271 | interdiff.txt | 1.26 KB | scor |
Comments
Comment #3
jenlamptonPostponing this until #1757550: [Meta] Convert core theme functions to Twig templates is resolved.
Once the above is done, we will be able to create a single patch that removes all instances of check_plain, check_markup, etc, from content passed to Twig. We can't do that until Twig is actually rendering all of our output or it would be a major security regression.
Oh, but I did add this as one of the major steps in our roadmap so it won't get overlooked or forgotten :)
Comment #4
webchickAlso raising to at least major, since this was a big win we were counting on with Twig, so makes sense to make use of it. :)
Comment #5
David_Rothstein CreditAttribution: David_Rothstein commentedIsn't this dependent on #1818266: [meta] A secure theme system (with twig) too? Turning on auto-escape is not going to be simple, as far as I know, but there was some progress in that issue.
Comment #6
webchickOops. I think I meant to mark the other issue as major. Thanks, David!
Comment #7
catchSome of the drupal_render() stuff elsewhere is handling the "don't prepare wasted variables in preprocess". I need to try to talk to Fabian about what other savings we could make with preprocess for things like format_date() which is a perennial killer for content listings.
Comment #7.1
star-szrFabian little x. Silly case sensitive usernames.
Comment #8
star-szrI'm going to try and pull a patch out based on the work done in @Fabianx's sandbox: https://drupal.org/sandbox/Fabianx/1819382
The patch will be based on a diff of the d8-core-1712444 and d8-core-1712444-v1 branches.
Comment #9
star-szrHere is a rough initial version, brought over almost everything except for one unrelated inline comment and one change that seemed unrelated/unnecessary:
The only things that I changed from the sandbox were some tiny little coding standards things that I spotted along the way.
This also changes twig_escape_filter in vendor which we can't do obviously.
Putting -do-not-test.patch because this is quite broken - for example #type => html_tag gets escaped. Going to see if I can push this a bit further though.
Comment #10
Fabianx CreditAttribution: Fabianx commentedI had inlined those in the optimized version and we probably need to do that for the final and to be profiled version.
I wish for a PHP-preprocessor here :-D
Yes, here I already inlined it.
This change is left-over cruft.
Again - inlined for speed.
Probably should inline them all, it is quite easy to see.
Could be wrapped in:
if $GLOBALS['use_safe_strings']
and use the same pattern everywhere if we decide autoescape is optional to keep same performance as before.
Should probably inline here.
Inline, use just one pattern, Fabianx!
I am not totally sure this is still needed - I think it was left-over.
It should be removed.
The check should be like for RenderWrapper in the twig_render_template.
This should be removed.
Drupal should not rely on twig as engine (even if its the default one).
Please remove.
No longer needed.
This should be:
'twig_raw' => new \Twig_Filter_Function('twig_raw'),
Probably _very_ confusing now.
Inline it!
And another one to inline ...
That is all :-D.
Thanks for working on this!
Comment #11
star-szrThanks @Fabianx, that's great :)
First the fix for output coming from RenderWrapper, cleanup from #10 coming in the next patch. Blocks are still busted (encoded when they shouldn't be) currently.
Comment #12
star-szrBlocks are still broken but cleaned up as per #10 (other than the if $GLOBALS suggestion) and things are still working as they were before from what I can see. Smaller patch file too :)
@Fabianx or anyone else - I'm not clear why some strings are wrapped in parentheses, would love to know why that is. Should we be doing this for all of them or is there a rule/reason to this? Examples:
Comment #13
star-szrCorrect interdiff. Since I forgot to name the last patch -do-not-test.patch, temporarily setting to needs review so I can cancel that test :)
Comment #15
joelpittet@Fabianx Would there be possible name collisions with the $GLOBALS suggestion?
Comment #16
Fabianx CreditAttribution: Fabianx commented#12: No, the parentheses are a left-over from drupal_mark_safe($output) -> inlining conversion and yes I worked on it in a hurry ...
#15: Yes, indeed, should probably use $_GLOBALS['drupal_safe_strings'] instead.
Comment #17
Crell CreditAttribution: Crell commentedAdding a new global to track rendering state is absolutely not OK. We've been working hard to remove globals from Drupal. Adding in new ones is not cool.
Comment #18
David_Rothstein CreditAttribution: David_Rothstein commentedShould this patch be moved to #1818266: [meta] A secure theme system (with twig)? Otherwise we are starting to repeat discussion that already happened there.
(That's why this issue was marked postponed on that one. The fact that #1818266: [meta] A secure theme system (with twig) has "meta" in the issue title is arguably a bit misleading...)
Comment #19
Fabianx CreditAttribution: Fabianx commentedAs I said already, it would be good to have the same "pattern" (including variable name) to be able to do MASS Search+Replace.
Globals are really only used for performance, we could also use the advanced drupal_static_fast pattern, which should be similar to the Globals.
( Just [n] function calls for [n] functions using this technique.)
Comment #20
star-szrI won't have time to look at this for a little while, unassigning for now.
Comment #22
joelpittetA bit unclear on why we have to mark strings as 'safe'? Can't we escape everything and {{ safe_string|raw }}? I'm probably being naive but thought I'd ask.
Comment #23
joelpittetOk in a crazy attempt to move this forward a bit I started writing some regular expressions... instead of a big patch.
This is what I came up with so far:
Tools: curl, ack, perl and
git co
=git checkout
I'm trying to ignore the idea of "safe" strings... though it will probably come up again.
Wondering if anybody can push this a bit further... the obvious catch is regions are getting escaped
{{ page_top }}
FYI, the nasty looking perl regular expression just does some nested brace matching.
Comment #26
joelpittetHmmm, the more I think about this issue the more like it seems we are trying to rob peter to pay paul.
We turn this auto-escape on and get:
checkPlain
's with addition of all themark_safe
's.All renderable arrays normally producing markup when printed would need to be "safe"/raw. This leaves all it's variables raw unless there is a twig template that they are being rendered into.
Sorry if this sounds :-(, trying to build a case in my head to make this work... is there any case where a Renderable Array would need to be escaped or can we handle that it it's template/theme function/post and render hooks etc? Maybe we can mark renderable arrays as unsafe in their hook_element_info/hook_theme definitions and assume the rest as safe?
Comment #27
joelpittetAny direct call to theme() will generate a string that is unsafe. We've been trying to remove those through the conversions but there are still a number of them. So we'd need a way to mark those.
So far I've found the following need to be marked as safe:
Those are all the targets to be 'safe'/raw that I can think of.
Comment #28
Crell CreditAttribution: Crell commentedEven if performance is a wash, IMO relying on Twig to handle escaping is a DX win as then Twig is behaving more like it does in every other system that uses it, which is a growing number. That makes Drupal theme adoption easier for developers used to Twig from some other CMS or framework.
Comment #29
joelpittet@Crell, probably not a discussion for this thread on which frameworks do and don't have auto escaping on as a cursory search showed two that have it off due to reasons similar to Drupal and one has it off. Though if you see me on IRC ping me with a few examples of CMS's that do that because I'd like to read through how they go about it.
What I'd love to know from you and others, does that $GLOBALS['safe_strings'] look like a viable solution? And if not, maybe some other bright ideas? I imagine if strings were objects we could tag them as safe, but PHP doesn't do that...
One big blocker on this issue is the direct calls to theme(), render() and drupal_render() that we've being trying in twig conversion issues to convert to render arrays as much as possible but there are still cases where those happen in core. That would be an solveable problem though the other edge cases may need something like this $GLOBALS['safe_strings'] idea to get this issue moving, so if that's kosher I'll keep moving with that?
Irrelevant cursory search:
ProcessWire recommends auto-escape off because they too do the processing before:
http://modules.processwire.com/modules/template-twig-replace/
Kohana has auto-escape on because they leave everything to the developer:
https://github.com/jonathangeiger/kohana-twig/blob/master/config/twig.php
Contao auto-escape off:
https://contao.org/en/extension-list/view/twig.10020019.de.html
Comment #30
joelpittetSo here's a drastic approach. If I'm lucky it will pass install.
The plan here:
Completely remove TwigReference and it's cohorts so that |raw works.
Kill show and change hide to a filter help with that using #2114563: Remove TwigReference with the help of 'without' filter replacing 'show'/'hide' functions.
Rewrite TwigNodeVisitor to just for Arrays that look like renderable arrays to pass through twig_render_var.
Ideally, I'd just wrap all arrays in a RenderableArray object with a __toString() and drop the twig_render_var function too...
May have gone crazy... but at least I found what was hurting
|raw
, the node visitor converting too many things to Twig_Node_Print as referenced.Comment #34
joelpittetIf i'm reading that correctly, that means I win:) It installed just couldn't login which is progress:)
Comment #35
Crell CreditAttribution: Crell commentedjoel: Huh. I'm a bit surprised at that. Given what a big deal Twig and Fabien make about auto-escaping in Twig I wouldn't have expected many projects to bypass it. Although admittedly I don't know how many projects have the Russian-doll templating model we do.
Comment #36
joelpittetAfter a couple more tests, seems that TwigReference is preventing
{{ dump(_context|keys) }}
from producing any output. So the patch in #30 with that print statement will return results in any twig file that gets rendered but without that patch it will print an emptyarray()
That along with
{{ var|raw }}
not working with TwigReference make that wrapper a blocker for this issue and I'll continue to keep it out. I may even postpone this issue on #2114563: Remove TwigReference with the help of 'without' filter replacing 'show'/'hide' functions. as it helps remove TwigReference.Comment #37
joelpittetPostponing on #2114563: Remove TwigReference with the help of 'without' filter replacing 'show'/'hide' functions.
Comment #38
joelpittetSeems the node visitor coming before auto-escape is causing
raw
filter not to work still. Not exactly sure why, but may be how the Auto-escape node visitor visits the nodes and looks for the raw filter. Could be a bug in the way it goes about finding theraw
filter. Though if you set the priority from -1 to 1,raw
seems to work as expected but all you get in the compile is this difference:vs
Which indicates to me the raw filter is just a flag dictates when to escape or not during the 'node visitor' phase.
Would be nice to combine those two functions, though they do specific jobs so that would likely shoot ourselves in the foot.
Comment #39
joelpittetI'm always keeping in the back of my mind, that maybe auto-escape on will not be a huge win.
Any markup we send to a variable in the template will need to be either explicitly marked as Safe or automaticcly done so. The automatic way may open up potential for security holes, the manual/explicit will be a PITA.
That being said, here's some automagic that does some of the variables. Also note, #38 is not resolved or tackled.
I made a subclass of Twig_Markup so I didn't have to type in the charset each time.
Comment #41
mgiffordreroll.
Comment #43
joelpittet@mgifford thanks for the re-roll. Here's some help that will remove a few fails, I like that it passes install that's a good sign:)
Comment #45
joelpittetRe-roll and title change.
Comment #47
joelpittetFound bits that were double escaping and marked them as raw/markup. Biggest change is that all _theme output is returned as safe Markup because it's synonymous with including a template in a template.
This is not done still, but brings things a bit closer yet I believe.
Comment #49
joelpittetA few more...
Comment #51
joelpittetNot sure what's up with the contextual links but they aren't liking something here... likely because they are Markup objects being passed to jsonresponse.
Comment #53
nod_Only thing I can say is that it's not a JS issue:
This code gives:
Which is useless. instead of
{}
we should have a HTML string. Don't know why it's broken.Comment #54
joelpittetThanks for looking at that and giving me a hint @_nod
Here's a few more fixes, hopefully another reduction in exceptions and fails.
Comment #56
joelpittetFew more...
Comment #57
joelpittetwith interdiff.
Comment #59
xjmComment #60
joelpittetThanks for the bump @xjm.
I'd really like someone to review what I've got so far to make sure I'm taking a viable path to this holy grail. I likely will eventually get to 0 fails, but not sure if I'm on the right path to this goal... or just opening up security holes or system weirdness in the process.
So anybody is welcome to come ping me on IRC or post in this issue their concerns or suggestions and I'll keep trotting along...
Comment #61
David_Rothstein CreditAttribution: David_Rothstein commentedMy concern with this approach is the issue discussed in #1818266: [meta] A secure theme system (with twig) (which you also touched on a bit yourself in the comments above, such as #39). If we rely on manually marking strings as safe/raw in the template, then we're effectively shifting responsibility for security from the "developer" to "themer", which doesn't sound good.
Here's an example from the patch:
Before: All the themer had to do was print the variables that were prepared for them. Unless they were doing something non-standard, they didn't have to worry about security at all.
After: The themer needs to understand that "requirement.description|raw" is correct and safe, but "requirement.title|raw" is incorrect and a security hole. (In this specific case I'm not sure it actually is, but the point is that this would be true anywhere where user input is involved and some of it is expected to have HTML and has been pre-filtered with a function like filterXss(), but some of it isn't and is therefore left for the template to auto-escape.) It's a lot of security-related decisions that the person writing the HTML is being asked to make.
Comment #62
joelpittet@David_Rothstein thanks, that a very good way to put that and for reviewing the patch!
Though if you look at D7's use of
check_plain()
. You'll notice it lives in a number oftheme_*
functions (theme_links, theme_menu_local_task, theme_aggregator_page_opml, theme_file_link, etc) as well a few templates(mostly contrib templates mind you). Preprocess straddles that divide but it's more of a themer's tool than a developers tool as it's just massaging (mostly) and manipulating(rarely) data to be useful in the template and you see a fewcheck_plain()
s there too.So regardless of this proposal/patch. Themer's need to take some part in escaping/unescaping to some degree.
With this approach of auto-escaping on by default, the only time a themer needs to use
{{ variable|raw }}
is when{{ variable }}
inadvertently escapes their markup passed through which is not intended to be escaped. And would provide more an incentive to review their security implications than currently with everything printing raw defaults unless explicitly escaped.My biggest concern with this patch is that I may be opening up new security holes inadvertently with my approach to it.
Comment #63
joelpittetRe-rolled
Comment #64
sunHm. Wrapping every string literal into a
Markup
object looks like a performance problem.It also hard-codes an assumption on the output being HTML throughout the entire system - i.e., theme template preprocessing starts way before the actual theme template system is even initialized + triggered.
That doesn't look right to me.
Hm. All of these changes look very dangerous to me.
Dangerous, because they are not consistent.
I guess I share @David_Rothstein's concerns.
That said, the fundamental idea here appears to be that all variables are escaped by default, so the worst that can happen is that stuff gets double-escaped.
Sans this change proposal, the worst that can happen is that stuff is not escaped at all.
While that idea makes sense to me, the inconsistency does not. I wonder whether we can find a more creative solution to address that?
E.g., a trivial pattern, like moving all template variables that contain raw HTML into a separate top-level variable à la html.description?
Or a more sophisticated approach of explicitly wrapping such variables into a
RawHtmlWrapper
object in preprocess functions?I think we need to explore some more ideas to complement and make sense of the main change proposal.
Comment #66
joelpittet@sun Markup == RawHtmlWrapper, which is what I've been doing. It's extending Twig_Markup class which is marked as safe markup for twig. @see http://twig.sensiolabs.org/api/master/Twig_Markup.html
I may have been hasty on a few of those
new Markup
's and|raw
. And the one offs are fairly easy to change/remove/replace but the ones around _theme() and drupal_render() and in l() are the more overarching ones.Thanks for having a look at this too! Glad to see some people taking notice!
Comment #67
chx CreditAttribution: chx commentedA new approach without raw is being worked on by @joelpittet and me. Stay tuned. Sneak peek: https://drupal.org/node/2208061#comment-8791489
Comment #68
chx CreditAttribution: chx commentedAnd if anything then this MUST be a beta blocker.
Comment #72
chx CreditAttribution: chx commentedComment #76
chx CreditAttribution: chx commentedA bazillion of those are from assertRaw failing which was tried to be patched individually; I have instead patched assertRaw and assertNoRaw and assert itself.
Comment #77
joelpittetOk here's a few more items and comments to push this, in hopefully the right directions.
There are still many spots to address. Any time we use a
SafeMarkup
object (previously above calledMarkup
) we have to justify it as preserving what is already safe (when concatinating two SafeMarkup objects for example). Or document why we are marking it as safe.We want to as chx made clear last night (avoid creating "safe", but do try to preserve "safe")
Comment #78
joelpittetWhoops, sorry @chx, cross posted and somehow deleted your patch? Strange...
Comment #79
joelpittetInterdiff from @chx #76.
Comment #83
catchDiscussed with Dries, Angie and Alex - decided this should block the beta.
Comment #84
pwolanin CreditAttribution: pwolanin commentedDiscussed a bit with ircmaxwell in IRC (of course)- his reaction would support this being a beta blocker: "switching to Twig, with auto-escaping off, is worse than just using PHP directly". In other words, while Twig is great, using it without auto-escaping gives a false sense of security that will lead to mistakes.
Though he also thought we were hobbling ourselves by the output of t() being safe markup, among other things, but they were beyond the range of fixable by beta.
Comment #85
chx CreditAttribution: chx commentedComment #91
ParisLiakos CreditAttribution: ParisLiakos commentedUsing stuff from
\Drupal\Core
in\Drupal\Component
is a no-no. (we just closed a critical to make our components not Drupal aware)If we really want to go this way, then those 2 classes should move in the
\Drupal\Core\Utility
namespaceComment #92
pwolanin CreditAttribution: pwolanin commentedAny reason SafeMarkup cannot be under component?
Comment #93
ParisLiakos CreditAttribution: ParisLiakos commentedaccording to the README there, because it depends on Twig.
core/lib/Drupal/Component/README.txt
:Twig is neither PHP nor Drupal Component
Comment #94
joelpittetRe: #93 To be clear Twig is PHP. It depends on a vendor class called \Twig_Markup. That object instance is looked for by twig to see if the markup is safe. We've extended it to make it easier to wrap without having to pass in the charset to the constructor and add some helpers.
So for this
. Do you have a proposed solution or are we just throwing a wrench in the tires? Sound like an 'ideal world' problem in an imperfect world, though I'm open to ideas...
Comment #95
joelpittetOh I see what you mean by PHP, like \ ... read that wrong, sorry. Nevertheless.
Comment #96
chx CreditAttribution: chx commentedWe just move things into Core from Component, not a biggie. Will make the patch a bit bigger by the move but we will do that at the very end to avoid reroll hell.
Comment #97
xjm@chx, can the moving-of-existing-stuff go into HEAD in a separate issue now?
Comment #98
chx CreditAttribution: chx commentedthat will be fun to merge but: yes.
Comment #99
chx CreditAttribution: chx commentedComment #100
dixon_Only made it through 1/3 of the patch. But here's some initial feedback.
Architecturally I don't have much to comment on, generally the approach makes a lot of sense. @chx was very kind to walk me through the changes in the coder lounge. The only problem I have is that PHP makes us go through a lot of hoops to make this work which is quite annoying :) So the DX is taking a small hit with this patch, but I guess we just need to make sure the reasoning behind this approach is properly documented.
Shouldn't
t()
returnSafeMarkup
itself?@chx Referring to our discussion in the code lounge just now, could you please elaborate a bit more on why you want to remove
SafeMarkup::implode()
etc. Personally I think they make sense in places like this.Nitpick: I think we need some comments explaining the reason behind the
chr(0)
technique here.Nitpick: Unnecessary newline.
Can we not use
SafeMarkup::implode()
here?I'm not sure I understand why
raw
is needed here?Use
SafeMarkup::implode()
instead?Comment #101
chx CreditAttribution: chx commented1. it does
2. only safemarkup::implode will stay, the concat will be removed. strReplace is already removed.
3. commented
4. removed
5, 7. done
6. because otherwise Twig double escapes.
Comment #103
chx CreditAttribution: chx commentedSo fixing some of #100 broke the patch; strReplace removal is postponed and one of the implodes is back...
Comment #104
chx CreditAttribution: chx commentedSome
eightnine followup issues have been filed. If your review finds issues please consult the Child issues in the sidebar before reporting them. This issue is past overdue by 1.5 years at least, getting in quick and then farming out tidying seems better than making a 250kb patch even bigger.Comment #105
moshe weitzman CreditAttribution: moshe weitzman commentedI reviewed this patch and don't have much good to say about it. I mean, it is great that auto-escape is turned on. To me, that's the only thing that could make the twig tradeoff worthwhile. Twig already added deployment complexity as we now compile .twig to php in the shared filesystem. And the autoescape proposed by this patch degrades our DX substantially. I'm not going to block this patch, but I'm not going to cheerlead for it either. Below are a couple of DX degradation:
Comment #106
pwolanin CreditAttribution: pwolanin commentedI'm a bit confused about the code in drupal_set_message(). I assume the session data is serialized - why can't the SafeMarkup be serialized and unserialized instead of using the chr(0) hack. That smells fishy to me in any case.
Comment #107
joelpittet@moshe weitzman just because that snippet was a bit confusing for me and maybe others, here is the whole context re #105
Because concatenation flattens the safe object back to an unsafe string it needs to wrap it back up as safe. t() returns SafeMarkup so without the concatination that that SafeMarkup::create() wrap wouldn't be necessary and if you put that $pifr_assertion in a !passthrough or @escaped token it wouldn't be needed for that example.
Comment #108
chx CreditAttribution: chx commentedRe #106 session is not serialized but http://www.php.net/manual/en/function.session-encode.php session encoded; any better solution is warmly welcome. Also? That cleanup can be a followup. Regarding #105, Joel mostly answered it all, I just want to note that in all the codebase we have only 100 SafeMarkup:: calls of any sort and a significant portion of that is in theme functions which will go away and/or caused by hardwiring HTML, small snippets of HTML but still in module code. Very important:
If you are solely using render arrays with your HTML in Twig templates you never need to create SafeMarkup manually.
Comment #109
chx CreditAttribution: chx commentedRerolled after the TranslationWrapper patch and removed the chr(0) hack from drupal_[sg]et_message functions. Hopefully this passes.
Comment #110
David_Rothstein CreditAttribution: David_Rothstein commentedI can imagine many of the SafeMarkup calls going away, but the strings masquerading as objects wouldn't, right? Moshe had one example and here are a couple more:
Kind of ugly.
This patch is basically implementing option #1 from #1818266: [meta] A secure theme system (with twig) (as written in that issue's issue summary), but the discussion there quickly moved away from the strings-as-objects approach (presumably because of the DX implications) and instead talked more about registering the safe strings some other way. What's the reason for going back?
The performance implications of that approach weren't great either (latest benchmarks here) and this looks like it could be the same or worse (however, the memory usage should be better which is one advantage of doing it as objects).
I don't understand examples like this... can't it just be $item->description and the idea is that Twig takes care of the rest?
Also, at this point should #1818266: [meta] A secure theme system (with twig) be closed as a duplicate? I don't get the difference between the two issues anymore. If so, Fabianx should get major commit credit in anything that happens here.
Comment #111
David_Rothstein CreditAttribution: David_Rothstein commentedHaven't fully thought this through but is there a way to have the marking-strings-as-safe happen only in the preprocess layer (and afterwards) rather than throughout Drupal? Then if they have to be treated as objects the impact is more limited.
Preprocess functions are already responsible for figuring out what's safe anyway.
Pseudo-code:
Comment #112
chx CreditAttribution: chx commentedI have implemented this from ground up and there is no other way to do this; in that meta whatever is suggested is 'less secure'. Here's the train of thought of why this alone works:
I am sure there are vague ideas out there but there's nothing that actually works.
In general, I would like to say this patch and the understandable reaction is the classic case of atomic plant and the bike shed. Noone understood the routing issues where the title XSS was conceived, reviewed and committed. Now you understand this issue because it's very easy. I would like to only see the objections of those who objected changing Drupal from secure by default to insecure by default! For the same reason, while I will benchmark this, I completely refuse to accept any objections on performance grounds. You all wanted this default-by-insecure system, it is not unintentional because the title change notice contains this behavior change -- and the exploit code for the security hole as well. Which proves that the escape-by-default behavior of drupal_set_title in Drupal 7, may it rest in peace, was the right way. But you were having none of that and removed one of the very few mechanisms that actually protected a variable from XSS -- and put the exploit code in the change notice while at it. And to be clear, the page title is just the symptom, the changing from secure by default to insecure by default is a mindset, a mindset dangerous beyond imagination.
You wanted this music, now pay the piper.
Also, in closing, already D7 had XSS in contrib commonly and we protect from that too.
Comment #113
chx CreditAttribution: chx commentedIn specifics, #111 is acting too late -- you want to mark strings as secure when they become secure and not do some praying that whatever is passed to a template preprocess function happens to be secure. Yes, there is some of that praying in this patch but we will see that it is gone and we will go back to the root of every string. You can't do that if you slap $html_generator->alreadySafeHTML($some_html); in the preprocess -- how the heck do you know $some_html is secure? You can't.
For a concrete example, in
TitleResolver::getTitle
this$route_title = $this->t($title, $args, $options);
is secure but$route_title = call_user_func_array($callable, $arguments);
this might or might not be insecure. Refer to the page title change notice for a callable that is insecure and watch how this patch automatically fixes it.Comment #114
pwolanin CreditAttribution: pwolanin commentedHaving spent some time discussing with chx, I think this is the right approach given the limitations of PHP, and we have to accept some rough edges initially, but with continued Twig conversion, the existing follow-up issues, etc, you will almost never need to think about the SafeMarkup class when all is said and done.
Twig having auto-escape on will be an unbelievably huge win for D8 security in practice, and will allow us to close up automatically a lot of otherwise gaping security holes currently in 8.x.
Comment #115
chx CreditAttribution: chx commentedTons of pretty doc fixes from pwolanin; made Moshe's example more readable.
Comment #116
chx CreditAttribution: chx commentedI have found the changes to AjaxResponseRenderer are not necessary -- they were added before we discovered the useful JsonSerializable interface.
Comment #119
moshe weitzman CreditAttribution: moshe weitzman commentedDoesn't this pretty much assumes that the site is using the Twig theme engine? If we intend to cripple all other theme engines, we should postpone this issue on #1537050: [meta] Should we keep / improve multiple theme engine functionality?.
@oelpittet - thanks for explaining that one instance of drupal_set_message(). Makes sense.
@chx - I don't understand what you are trying to say in #110 about title XSS and insecure by default. Could you link to an issue or two? Is i still possible rollback other changes as an alternative to this patch? Just curious as having another option here would be helpful.
Comment #120
chx CreditAttribution: chx commentedGood luck getting every issue leading to #2264041: Add a test to ensure title callbacks are not vulnerable to XSS rolled back but that still won't protect contrib like this patch would.
This doesn't really bind you to Twig; merely you will be using an object that Twig provides; if you read Twig_Markup it's a very simple class independent from the rest of Twig.
Comment #121
pwolanin CreditAttribution: pwolanin commentedSo, I'm not sure what remains to remove the RenderWrapper, but addressing that so that twig core doesn't need to be patched seems like the main remaining task here.
Comment #122
joelpittetRenderWrapper removal is assigned to @catch to review.
Comment #123
chx CreditAttribution: chx commentedWe can unhack Twig once RenderWrapper is gone. We know there's a plan and it'll be. I really would like to see this in so that we can start on the many (and some of them are quite big) followups.
Comment #126
markhalliwellWhere I see a lot of "hesitation" is around the [backend] DX. Yes, this does make the DX a little more difficult on the back-end, but as it's been explained to me, this is just because of limitations in PHP. That's unfortunate, but is not a valid excuse for blocking this issue. We must make core safe.
IMO though, this actually enforces/encourages markup to be put in templates.. where it should be in the first place. In my eyes, this is really just a paradigm shift in the preprocess layer:
Before:
"I have to escape my data because it may be vulnerable to an XSS attack."
After:
"Why is my template escaping my HTML? Oh I have to explicitly define that my markup is safe."
I'm not going to RTBC this just yet (due to the recent discussion and patches), even though I think it really should be. We can follow-up with child issues if/when it is necessary.
Comment #127
pwolanin CreditAttribution: pwolanin commented2 items that seem a little funny:
Returns an empty string instead of an object if the input is an empty string. Why is that preferred? It would seem to lead to inconsistency?
Could at least do with a code comment to explain what's happening.
Comment #128
ParisLiakos CreditAttribution: ParisLiakos commentedWhy dont we create our own
Markup
class underDrupal\Component
then and haveSafeMarkup
extend it in the same namespcae?a) It removes an unnecessary dependency to the whole Twig from a *lot* of stuff
b) i won't bump #2280963: Refactor use of SafeMarkup in HWLDFWordAccumulator to major (because its
Drupal\Component
and uses a class fromDrupal\Core
)c) we dont have to do #2280959: Untangle the Core-Component tangle of Twig autoescape
Yes we might be duplicating a 20 line class, but i it saves us a ton of troubles, i really think it is worthy
Comment #129
pwolanin CreditAttribution: pwolanin commented@ParisLiakos - because then Twig itself won't respect the content as being safe and will double-escape it.
That's the whole reason SafeMarkup extends Twig_Markup.
Comment #130
chx CreditAttribution: chx commented> Returns an empty string instead of an object if the input is an empty string. Why is that preferred? It would seem to lead to inconsistency?
{% if foo %}
When foo is an object that always passes. Even if holds the empty string. You are bumping into the twin of #953034: [meta] Themes improperly check renderable arrays when determining visibility .
class HeadElement extends SafeMarkup {
Well, if you look at the class hiearchy, everything produced is indeed safe markup and I do not really what comment are you looking for.
Comment #131
ParisLiakos CreditAttribution: ParisLiakos commentedre #129:
well, then we could have
nvm this wont work..Drupal\Core\Template\SafeMarkup
with extends\Twig_Markup
and proxies everything toDrupal\Component\Template\SafeMarkup
but dunno how much ugly is it:)
trying to avoid the PITA of moving String class
Comment #132
chx CreditAttribution: chx commentedThe comment pwolanin wanted is already there, added a @see to be found. Joelpittet rerolled against HEAD. So, new patch. The decrease in size is due to #2138073: Remove module_load_include() call from NodeController::revisionOverview().
Comment #133
David_Rothstein CreditAttribution: David_Rothstein commentedI'm still interested in this:
There was working code in that issue too (or at least working enough to benchmark) that did exactly that. It's a similar approach as this patch, but the DX was better since it never treated strings as objects (though at the cost of some uglier internal code, and possibly a larger memory hit).
---
Regarding #111, I don't think it's a problem for PHP developers to continue having to know where the strings they're working with came from (and if they're safe). Although it would definitely be nice if they didn't have to worry about it at all either :)
Comment #134
chx CreditAttribution: chx commentedThe meta contained no patch and I honestly forgot what and how it tried to achieve; it was woefully incomplete; however there's now an autocomplete2 branch which explores the possibility of keeping strings scalars and storing all the strings that go through
SafeMarkup::create
in a static property on that object. I have reverted all the tests to see how badly we are hurt but all the work on this issue was definitely not in vain because other parts of core; all theSafeMarkup::
calls are definitely necessary. If it works out then I will post it in here in perhaps two days. Stay tuned.David Rothstein, thanks so much!
Edit: the first attempts come up at 109 fail(s), and 42 exception(s) proving this is a very viable approach.
Edit2: Down to 58 fail(s), and 0 exception(s). New patch and issue summary coming soon.
Comment #135
Fabianx CreditAttribution: Fabianx commentedI think the combination of #132 and https://qa.drupal.org/pifr/test/805208 is a viable one:
a) Through the usage of a factory class, the DX is very nice and very D8-drupaly. +1 to the SafeMarkup::create() and SafeMarkup::is() methods.
b) The biggest concerns around using an isset() array as a safe string registry had been that we store that as a global. The current patch uses a static class, if that is a problem we however could still easily transform that into some kind of SafeString service that is injected. In any case without having to change any code outside of SafeString class, we can have safe strings with nice DX, too. +1 for that, too.
Unrelated:
I don't think we need a Twig Vendor Patch, as we can easily check for "$obj instanceof RenderWrapper" within twig_render unless I am mistaken totally.
Also: yay!
Thanks so much to chx and joelpittet to making this a test-passing reality!
Comment #137
Crell CreditAttribution: Crell commentedI'm generally +1 on designing APIs that are only ugly if you're using them wrong, as it helps encourage people to use them right. (ie, put markup in templates where it belongs.) PHP developers already need to know where their strings come from and if they're safe if they want to avoid sec holes. They're just currently on their own to keep track of it rather than having code to help them do it.
That said:
This doesn't make sense to me. Wouldn't we want the __toString() to return a SafeMarkup object rather than adding it to the class hierarchy? I'd rather avoid adding even more parent classes if we can avoid it, and adding a dependency on the Twig namespace to the Page objects.
Comment #138
Fabianx CreditAttribution: Fabianx commented#137 I believe with the new approach in #134 indeed the __toString method would instead call SafeMarkup::create() on its return.
Comment #139
chx CreditAttribution: chx commentedNew approach: we store strings in SafeMarkup. Strings are strings.
Comment #141
Fabianx CreditAttribution: Fabianx commentedToo excited to not fix the small typo ...
Comment #142
Fabianx CreditAttribution: Fabianx commentedHere are the next steps:
Besides some nits like using SafeMarkup::implode directly and figuring out why we suddenly need to checkPlain strings within ->assertText, the biggest culprit is the amount of |safe that is needed. Oh and we could also turn the static class into an injected service, that is just a search and replace mainly and chx had a POC patch for that, but could also be a safe follow-up.
Luckily things like
'<span' | safe
are things that Twigs own autoescape already takes care of automatically during compile time and after researching the background of the twig autoescape (found that for the first time) things got clear to me, how to leverage both twig and our own approach together.Background links:
* https://github.com/fabpot/Twig/issues/4
* https://github.com/fabpot/Twig/pull/158/files
which explain a lot why twig uses what its uses.
Approach
The trick is so simple and obvious that I did not see it:
- Replace the twig_escape_filter with our own by using a NodeVisitor with a very heigh weight so that it runs after all others.
We can then choose to only support HTML format for our own safe strings - or extend SafeMarkup::create to have an optional 'html' parameter, but again could be follow-up.
So the deal is:
- Format == 'html' - check SafeMarkup::is() and checkPlain if not, also check Twig_Markup
- Format != 'html' - call original twig_escape_filter
This will also automagically get the 'raw' filter to work - without having to convert into an Object (twig does not add the escape filter if a raw filter is present in the chain during compile time.)
Oh and maybe add a print_secure PHP function that can be used from non-twig engines and just does a SafeMarkup::is() check and checkPlains if not.
I am leaving this patch for some review by others.
Future steps
In a more advanced implementation, the escape filter could do a lot of what twig_render_var is already doing, so we would be removing our custom print function - if the escape filter is present in the chain to avoid the double function call - but thats kinda micro-optimization, so really a future step.
Comment #143
Fabianx CreditAttribution: Fabianx commentedTurn on twig autoescape by default and it all works!
Next major steps:
a) See if this passes tests - it should
b) Remove as much |safe calls as possible or replace with |raw filter that now works!
c) Fix assertText weirdness
d) cleanup code to use ::implode directly where possible and ::create where its not needed.
e) Optimize out our twig_render_var as drupal_escape_filter comes immediately before it and does the same, so is aware of render_arrays, etc.!
Next normal steps:
f) Add a print_secure / p function for phptemplate
g) Maybe Dependency inject the SafeMarkup class
h) Support multiple escaping strategies in SafeMarkup with core only using 'html' to future-proof it; support 'all'
Comment #145
chx CreditAttribution: chx commentedThe check_plain => checkPlain patch broke a few use statements. The interdiff is between #141 and #143 ; this is just a reroll.
Comment #147
Fabianx CreditAttribution: Fabianx commentedNew patch attached, which corrects test failures and also adds a |safe_join filter, which uses SafeMarkup::implode() instead.
Comment #148
Fabianx CreditAttribution: Fabianx commentedAnd another patch:
- Remove |safe filter completely and use |safe_join or |raw instead.
- Optimize template PHP code: Either add the twig_drupal_escape_filter or twig_render_var
Comment #150
Fabianx CreditAttribution: Fabianx commented- Remove @todo that is resolved
- Add two helper functions: print_safe and p for compatibility with phptemplate and other engines
- Add $strategy parameter to SafeMarkup
- Fixed TokenTrans parser to also work with Drupal's TwigNodeVisitor off.
- Mark trans filters as safe within twig
Comment #151
joelpittetThis is looking very good so far. Thanks @chx and @Fabianx for looking into another approach. I'm ok with both approaches btw. The DX on our first approach was not ideal to say the least but there are some trade offs to consider in any approach we take...
My biggest concern and I think may be the big difference here is that this latest approach may not scale well with storing strings in a global array. I'll see if I can run a profiling with a 100 nodes on the homepage and see what I get with xhprof_kit between the two approaches(will likely need to reroll the first one due to the volume of core commits in the last week! :-) ).
re #150
@Fabianx I don't think we need to add those two helper functions to drupal core of print_safe() and p(). They add another drupalism like the l() function that is though easy to type is not intuitive by looking at it what it does. Nor do I think we need to support phptemplate to that extent. To avoid any contention, if you think we should have those two functions, post them in a follow up and help avoid this issue from any unnessasary bikeshedding and feature creep.
Thanks for marking those filters as safe, I meant to do that in the fixes for TwigTrans!
And woo hoo for this patch being 90K vs the 230K+!
And lastly, I agree we should likely just replace the |join filter with ours to keep things seamless. And maybe even call twig's join function to keep it in line with upstream? I'm fine with either way just thought I'd
|up
and speak my mind there because I noticed the @todo/comment.Comment #152
Fabianx CreditAttribution: Fabianx commented> My biggest concern and I think may be the big difference here is that this latest approach may not scale well with storing strings in a global array. I'll see if I can run a
> profiling with a 100 nodes on the homepage and see what I get with xhprof_kit between the two approaches(will likely need to reroll the first one due to the volume of
> core commits in the last week! :-) ).
== Memory wise
The maximum I measured was 1.7 MB for 42 nodes so far, with the minimum being 700kB.
With the formula:
strlen(all_strings) < page_size
as an upper bound, this cannot be worse than the overhead of our arrays of doom - anyway :-D.
And if it was worse, then we have a problem in Drupal anyway, because that means we create lots of strings we don't need.
== Performance wise:
It should be faster as object creation is quite expensive (as shown by removing the TwigReference objects)
chx said it already in #1818266: [meta] A secure theme system (with twig) you cannot beat a hash table and storing in an array and using isset() is using an efficient access to a hash table (isset does not have function overhead).
The last I measured way back when Twig got in was 3.5% overhead in terms of time and 2.9% in function calls, which is minimal compared to all the overhead added within core elsewhere. Lets see where this stands :).
But yes, looking forward to some benchmarks.
> re #150
> @Fabianx I don't think we need to add those two helper functions to drupal core of print_safe() and p(). They add another drupalism like the l() function that is though
> easy to type is not intuitive by looking at it what it does. Nor do I think we need to support phptemplate to that extent. To avoid any contention, if you think we should
> have those two functions, post them in a follow up and help avoid this issue from any unnessasary bikeshedding and feature creep.
Okay, I'll create a follow-up, but core maintainers, webchick in particular, had stated that if we support autoescape by _default_ on in core, that we then need to support a way in templates. p is a function that was used in rails 2.3 (actually its called 'h' though - http://www.railsdispatch.com/posts/security).
catch in particular wanted auto-escape especially so we could remove lots of checkPlain calls that are then done automatically - so auto-escape on by default is
important to be able to reap potential performance benefits even.
So I can revert that commit, its GIT, its no problem, but we will need a helper - if we are serious about it being on by default and don't throw away phptemplate at the same time. As that means Drupal 8 supports phptemplate themes in a way (we have a test for that), so we need to ensure they can be used securely.
>Thanks for marking those filters as safe, I meant to do that in the fixes for TwigTrans!
:)
> And woo hoo for this patch being 90K vs the 230K+!
Yes, and it will get even smaller.
> And lastly, I agree we should likely just replace the |join filter with ours to keep things seamless. And maybe even call twig's join function to keep it in line with
> upstream? I'm fine with either way just thought I'd |up and speak my mind there because I noticed the @todo/comment.
My biggest concern is that someone does something stupid like:
[1,2,3]|join(user_xss_string)
which you probably won't do if you need to do:
[1,2,3]|safe_join(user_xss_string)
and check documentation of safe_join before. What did @fabpot say about it? Do you have the upstream link? He is usually quite sensible to security related issues.
== Main todos left here:
- check assertText weirdness
- remove concat (performance hog without benefit)
- clean up more
Comment #153
joelpittetHere's that upstream join issue.
https://github.com/fabpot/Twig/issues/1420
Regarding the helper functions, we seem to be deprecating them, for instance check_plain() in favour String::checkPlain() so I don't see why we wouldn't use a helper method of the domain object (SafeMarkup). But yeah would rather have that discussion in a follow up and like I said not bikeshead this issue if I can help it. Thanks ahead for the revert.
Great point on the string array memory vs arrays of doom, I'll still likely do a profile but that is good to keep in mind.
Comment #154
Fabianx CreditAttribution: Fabianx commented- Removed helper functions per #153
- Fix bug with missing translation t() call
== Main todos left here:
- check assertText weirdness: DONE for 1, TODO for 1
This is a drupal_set_message, which is triggered originally from ConfigImporter::checkOp().
- remove concat (performance hog without benefit) - concat is needed as one string is safe, the other one not - DONE
- clean up more - DONE so far I can see.
==
Remaining work:
Fix properly by finding out why that string done deep in configImporter is escaped on output even its wrapped by $this->t().
I don't know how to fix it, as I don't even know how to get to this code path.
==
Next steps:
- This is ready for some serious review in terms of reviewing the patch and the security.
- This needs some heavy manual testing - as all admin screens should be checked for double-escaped output not caught by tests.
-- An alternative would be for someone to 'record' all output of content retrieved during test runs, name it after the test and compare two automated test runs.
Comment #156
dawehnerI am pretty sure timm will help you on that, see https://drupal.org/node/2229187
Comment #157
ti2m CreditAttribution: ti2m commentedI can't promise anything, but in theory I should be able to detect any changes by running regression tests with siteeffect on all the unique backend urls. I'm not able to check any intermediate pages like the install process, node deletion, etc... though. But I'll give the "direct" backend urls a try and let you know about the results.
Comment #158
joelpittet@ti2m you may not be able to get them all, though it could save us a good chunk in the regression(escaped HTML manual testing) that we'd otherwise leave up to people to find after the patch was committed. So this would help tons, IMO! Thanks @dawehner for linking these up and @ti2m for your work on that cool set of regression tests!
Comment #159
Fabianx CreditAttribution: Fabianx commentedChangelog
- Add functions to set / get the SafeMarkup strings to persist them across page requests.
- Persist SafeMarkup state within $form_state['build_info']['safe_strings'] and $batch['safe_strings'].
- Remove previous work arounds.
Interdiff:
http://cgit.drupalcode.org/sandbox-chx-1857558/diff/?h=autoescape2--fabi...
Explanation
As both $form_state and $batch are per user or at least per role, this should be safe to use for this cases.
Next steps
- Await report from @t2im for how many pages we break
- Manual Testing
- Serious security and patch reviews of the patch
Comment #160
ti2m CreditAttribution: ti2m commentedQuick feedback, this seems to work, at least it will catch some of the double-escaped markup. Ran the first set of urls and already found two examples on
/admin/config/regional/date-time/formats/manage/long and /admin/structure/block/list/seven Will take a closer look at it tomorrow and try to run all the urls.
Comment #161
ti2m CreditAttribution: ti2m commentedUPDATE: I ran all unique (by router item) urls of a vanilla D8 install (including the patch in #159) that the build in siteeffect crawler found. From the 200+ urls, the following had escaped output:
Is that what you are looking for? Then I would proceed and enable all core modules, so far I only used the standard profile.
Comment #162
Fabianx CreditAttribution: Fabianx commented@ti2m: This is a great start!
Comment #163
joelpittetXhprof kit Profiling Scenario
http://www.lionsad.de/xhprof-kit/xhprof/xhprof_html/?run1=539fd0a78f1b7&...
Run 539fd0a78f1b7 uploaded successfully for drupal-perf-joelpittet.
Run 539fd2b68bc13 uploaded successfully for drupal-perf-joelpittet.
http://www.lionsad.de/xhprof-kit/xhprof/xhprof_html/?run1=539fd0a78f1b7&...
ct = function calls, wt = wall time, cpu = cpu time used, mu = memory usage, pmu = peak memory usage
Comment #164
Fabianx CreditAttribution: Fabianx commented#163: @joelpittet: That looks wonderful!
Comment #165
catchfwiw I'd be fine opening a separate follow-up with any outstanding double-escaping issues on it - those are annoying bugs but not security issues and shouldn't hold this particular issue up.
Comment #166
xjmAgreed. So long as we document as many as possible here (thanks @ti2m!) I think those are acceptable small regressions/followups for getting this issue in.
Comment #167
ti2m CreditAttribution: ti2m commentedAnother (and the last) update on the crawled urls. I enabled all modules on a fresh install and crawled the site as user 1. I only found two more urls with double escaped strings (first two in the list below). But the general problem is, that e.g. node edit forms aren't covered at all as no node exists on a vanilla install. I could post a file with all covered urls, roughly 300, if anyone is interested.
The total list of urls with double escaped strings that I found:
Let me know when I should test another version of the patch at some point (or another patch in general).
Comment #168
Fabianx CreditAttribution: Fabianx commented@ti2m:
Thanks for the very nice list!
I have a working implementation now to visit all URLs using the test suite, but would like to follow up on that here: #2229187: SiteEffect: Automated frontend regression testing to not distract this issue further. It would be great if you could respond there.
Thanks!
For this issue:
- We need reviews and security reviews of the patch!
Comment #169
xjmLooking at this today.
I think the API changes section is not up to date? The first thing it mentions is one of the followup issues. Also, I think the proposed commit message could probably be updated as well.Maybe it could also mention ti2m for the thorough testing.
Comment #170
xjmStarted by just reviewing
SafeMarkup
. More to follow.So, I think maybe this would be the place to add more detailed documentation of how SafeMarkup should be used, and what happens during rendering and in the theme layer?
Technically, the docblock should start with a verb. Maybe: "Marks strings that are already escaped for XSS purposes."
"Adds a string to a list of strings marked as secure."
This parameter needs more detailed explanation as to how it should be used. Are there other possible values in HEAD for the escaping strategy than
'html'
? (Presumably contrib/users can also define their own escaping strategies?) And what does'html'
mean exactly in this case?We only need to add a detailed explanation of what the escaping strategy for is in one place, and then we can add @see to the detailed explanation elsewhere.
is()
is a weird name. Yes, the string exists. Did we not name itisSafe()
on purpose? I guess it's static so it's always going to be SafeMarkup::is()... still a bit yoda. :)We are marking every single string in the passed-in list as TRUE -- regardless of what the value is! So if a string were marked as FALSE in the previous request, it would be switched to TRUE here. The assumption appears to be that the
$safeStrings
array is protected and that any string/strategy pair being set at all indicates it's safe, since in this class the strings only get set inset()
orcreate()
. However, what if someone were to extend this class and think that it could also mark unsafe strings? Or the list is polluted in some other way? It seems like it would be more robust/better hardened to only treat the string/strategy pair as safe if it's exactly TRUE. Same goes for in theis()
method above.This is missing parameter documentation. Also, since the way that this is used has security implications, a more specific explanation of "in a safe manner" would not go amiss. It looks like what it does is check plain any string that is not already considered safe? So maybe:
Also, maybe it's worth noting that the delimiter is not checked nor escaped, but assumed to be safe? Is that sound from a security perspective?
What's the deal with the
'all'
escaping strategy here? So far that's at least two string keys that have special meaning (the other being the default of'html'
) but are not documented anywhere that I've yet found.Maybe:
Also, it strikes me that it's more of a setMultiple(). Finally, is there no array typehint in the method on purpose?
Maybe it would be good to add here that the strings are merged into the existing list of safe strings? I.e. the list of safe strings passed in isn't exclusive; it's added on to the current list.
$safe_strings
is missing its$
.Also, the parameter documentation is a little confusing/feels inaccurate. Is this more correct?
As above. What it actually returns is a list of all known strings and whether or not they were marked as safe.
Comment #171
xjmOh, one more thing. This also needs better docs of how it's used. It takes any number of (presumed string) arguments, and concatenates them, checking them by string-casting them and then check-plaining them if the string isn't found. However, I don't know from the method names or docs that I can't just pass it arrays or whatever. Maybe we should be doing some argument validation and/or allow an array as an argument?
Comment #172
xjmWe maybe need to document this new ArrayPI key in the batch topic docs:
https://api.drupal.org/api/drupal/core%21includes%21form.inc/group/batch/8
So more ArrayPI action. I think we need to update the docs of
drupal_set_message()
anddrupal_get_messages()
.Looking at format_xml_elements() closely. The element contents are check-plained. The attributes go through Attribute and it's not clear to me whether those are escaped/rendered safe in any way. And the keys are definitely concatenated straight into the output with no validation. So, I think we need to at a minimum warn not to pass user input, or be a little more careful about validating keys and attributes?
So, in general, I think we need to do a better job elsewhere of being explicit like we are here about inputs that are treated as safe (and there are possibly opportunities for further hardening when we do have to make such a statement, as it appears there is in this case).
Also, it looks like the warning about user input only applies to
#tag
(and the other top-level #keys could be unsanitized and that's okay); is that correct?Finally, a nitpick: this is a comma splice. Better:
Whoa, this comment is confusing. What's second? Is it just a run-on sentence? It seems to be saying #attributes is safe, correct? Is this because Attribute does sanitization? (I need to check whether that's the case; see remarks above.)
I think this is what the comment should be if the answers to the above are "yes":
And then follow that up with the @todo to the followup.
And all that said... is this really a good idea? Can't we do some sort of validation or escaping on the contents of
#tag
? Is this okay to punt on like this?Is there a followup already for this @todo? If so we can add the link directly. If not, let's file it (if we do agree it's okay to punt to a followup).
Similar to the Xss case; maybe worth mentioning that check-plained strings are (obviously) automatically marked as safe HTML in the docs, with a reference to SafeMarkup where we're going to add the nice detailed docs? :)
So, I think we can update the docs for Xss::filter() to indicate that it marks the strings as safe HTML, with an @see to SafeMarkup.
This made me arch an eyebrow. Is it correct to automatically mark the exception message is safe and blindly trust that whatever code set the exception message did the right thing? Would it be too ornerous/risk too much double-escaping to check-plain the message if it's not already marked as a safe string? Or is that a terrible idea? (Obviously exception messages are probably the least of our concerns, but I wanted to raise the question.)
http://www.myinstants.com/instant/oh-yeah/
Could use an inline comment, maybe:
And this is the first place that does make me think my questions above about
isset()
vs.TRUE
expectations for safe strings. It's very common to monkey with$form_state
at all levels, and here it's a way of getting around the explicit contract otherwise set by SafeMarkup.safeStrings isn't a thing (in this context anyway). Also, this is pedantry, but is it really a "cache"? Maybe:
This strikes me as a little weird. We're adding the
HEAD
tag to the list of safe HTML strings? I mean, it's safe once. It's wrong every other time. I'm not sure this is a problem, it just seems weird for the HTML head element to be in the big generic bucket of HTML-safe strings every single time I look at them, no matter where I am on the page.I just got to
drupal_render()
(eek) :) so going to step back a bit and hopefully give folks a chance to answer some of my questions above, since I think they're also relevant to other parts of the patch.Comment #173
xjmIn general, I'm concerned by how many times we're calling
SafeMarkup::create($some_variable_from_somewhere_else)
. Having to trace the variable up the function and then up the call chain makes it less clear whether the code is actually secure. I like the pattern used in some places of adding an inline comment justifying the use of SafeMarkup, although we could do a better job of not only stating that certain inputs are safe, but stating why they are.That said, though, explicitly evaluating each case where we do add a SafeMarkup is still a much easier security check than trying to comb through the entire render and theme layers. So I'm overall very +1 to the solution we've come up with. :)
Comment #174
xjmSpoke to chx about a few of the points in IRC. He pointed me to #2280965: [meta] Remove every SafeMarkup::set() call. I'm not sure about doing that as a followup and not in this patch.
chx has given me sandbox access so I'm going to start help cleaning up some of my points mentioned above.
Comment #176
Fabianx CreditAttribution: Fabianx commentedGeneral things
Escaping Strategy
This was added to me as I converted our drupal_escape_filter to replace the twig autoescape filter. So this is mainly to be compatible with Twig:
- http://twig.sensiolabs.org/doc/tags/autoescape.html
- http://twig.sensiolabs.org/doc/filters/escape.html
- https://api.drupal.org/api/drupal/core%21vendor%21twig%21twig%21lib%21Tw...
CSS, JS and html attributes all need different escaping strategies.
This was added with the idea that core would only support 'html'.
'all' is special cased by twig to being safe for everything, this is mainly the 'raw' filter that is safe for all cases.
https://api.drupal.org/api/drupal/core!vendor!twig!twig!lib!Twig!Extensi...
Detailed comments
#170:
1. I agree the class is a great way to document how SafeMarkup works and what it means to use.
2. agree
3. escaping strategy answered above
4. is() is really weird, I agree.
5. I thought we were escaping any string that was not already safe? We don't use comparison as isset() is fast and we never put a FALSE there, would unset in that case or set to NULL - @todo check more
6. agree; yes need for sure document for this and safe_join that the parameter is not secure.
7. A string is safe if its safe for the given strategy or all strategies.
8. array typehint is oversight on me; yes this is setMultiple rather, indeed.
9. agree, I used adding in as there could be new strings marked safe already.
10. typo yes
11. As we don't use FALSE, the description is correct even if a little misleading.
Comment #177
xjmStarting work in:
http://cgit.drupalcode.org/sandbox-chx-1857558/log/?h=autoescape2--xjm
Comment #178
Fabianx CreditAttribution: Fabianx commented#171: Agree, 100%. Docs for concat() are needed. We once thought about removing it, but it is way better to have an API that concats and escapes than an API where you just mark it safe.
#172:
1. Good catch, agree.
2. agree
3. @todo format_xml_elements - revisit this, agree
4. agree, marking safe should be explicit and we have the follow-up in any case, too - it is also way simpler with this new patch (just 90k)
5. Yes, I think we could escape the tag element in a way, but its kinda interesting as you would still want to allow script or such ... @todo html_tag
6. no idea, ask chx @todo drupal_pre_render_html_tag
7. agree
8. agree
9. agree @todo on500Html should be marked safe by the caller.
10. ROTFL :-D
11. agree, could use a different key and remove before retrieval. - this is just set before form saving and restored during retrieval so is safe, but obviously leaks the data in the $form_state. Good catch!
12. agree
13. Well, as that function is called directly - I think it might be needed, the alternative is a |raw in the template, which is more eek, but yes would be nicer to escape at the source.
To the drupal_render one:
I think we could even remove the output of that from the list of safe strings - at least from a twig perspective as the drupal_twig_escape_filter does consider drupal_render() render arrays already secure.
In that case only code that does $output = drupal_render(y); (which is bad bad bad for caching anyway) would need to do:
$output = SafeMarkup::create(drupal_render(y));
I have not thought about it from an AJAX perspective or such, yet - though. Just a pre-thought to drupal_render().
Comment #179
Fabianx CreditAttribution: Fabianx commented#173:
I do agree that SafeMarkup use cases should be justified.
I wondered if we should allow simple twig inline templates:
before:
After (mocked up):
That gives us autoescape for free and is compiled and cached to disk like normal templates, so pretty performant and not more difficult to use than t().
As twig supports doing dynamic templates, this might be a 10 line code change or so to support that ...
Moved to: #2289999: Add an easy way to create HTML on the fly without having to create a theme function / template
>That said, though, explicitly evaluating each case where we do add a SafeMarkup is still a much
> easier security check than trying to comb through the entire render and theme layers.
> So I'm overall very +1 to the solution we've come up with. :)
Absolutely agree!!!
Comment #180
xjmSo, after looking at this a bit more, I'd like to propose changing the names of several methods on
SafeMarkup
. @chx and/or @joelpittet, could you give feedback on these suggested method names? (It might be there are reasons for the current names that I've overlooked.)(Still working through a number of fixes in the sandbox; just wanted to raise this question in the meanwhile.)
Comment #181
joelpittetI'm fine with those changes, thanks @xjm The create() was a factory method before but it is no longer, just got repurposed from what I recall. I think the same is for the other methods but @Fabianx or @chx may have another opinion as they created those method names.
Comment #182
xjmThanks @joelpittet! @chx also confirmed in IRC he's okay with the renames. Adding that change.
Comment #183
xjmIt took me a bit to confirm that everything that goes into Attribute is for darn sure absolutely check-plained properly for use as an attribute string, so I added #2290143: Improve the class documentation of \Drupal\Core\Template\Attribute.
Comment #184
xjmAlright, I've addressed everything in my own reviews either by fixing it or adding an @todo, except #172 points 9 and 13 which my brain to even process right now. Also merged 8.x and pushed everything up in my branch in the sandbox.
The interdiff is nearly as big as the patch. ;) So adding myself to the proposed commit message.
Comment #185
xjmThese are the @todo I added for further discussion or improvement. Some of it might be followup material since (e.g.) format_xml_whatsit() is no more unsafe than it was previously.
And particularly to the point about isSafe(), it still feels ick to me to not check for the TRUE value we set, but I've left the @todo instead of changing it outright because (I imagine) it's very much in the critical path.
So I've tried to reduce the leakage with $form_state here by unsetting the value after we retrieve it. Is that sufficient or should we do more?
I'm also having flashbacks to monstrous form cache entries bringing down sites. I imagine this safe string list will get freaking enormous. Is this a concern?
Note that I still haven't reviewed the rest of the patch, but I need to go to sleep now. :)
Comment #186
xjmSo that I can sleep. This issue will absolutely need a change record. It's like the hugest huge thing.
Comment #188
Fabianx CreditAttribution: Fabianx commented@todo:
- This is missing is -> isSafe conversion, such drupal installation fails.
- check $autoescape parameter here so that explicit escaping is still possible:
This should no longer be needed with strings being stored in $form_state['build'].
These 5 lines in core/modules/system/src/Tests/System/DateFormatsMachineNameTest.php can be removed safely. chx reverted them early in his branch, but its still correct in mine.
Comment #189
xjmWhoops, just fixing some missed renames. Didn't have PHPStorm picking up
.engine
or.install
files and missed the comments.I haven't addressed the other points of #188.
Comment #191
xjmRemoving
SafeMarkup::concat()
and fixing the date test.Comment #192
Fabianx CreditAttribution: Fabianx commentedJust a quick re-roll against HEAD.
Comment #193
xjmDiscussed the
$form_state
question with @catch. His suggestion was to not worry about it for this patch, because there are these other issues already addressing the root causes of form cache issues:So I've filed: #2295823: Ensure that we don't store excessive lists of safe strings
Comment #194
xjmRerolled against HEAD.
Comment #196
mikey_p CreditAttribution: mikey_p commentedSomething in the latest head broke a few tags in the header which are now escaped.
Comment #197
xjmFiled #2296101: Remove SafeMarkup::set() use in \Drupal\Core\Render\Element\HtmlTag::preRenderHtmlTag() for
drupal_pre_render_html_tag()
.Comment #198
mikey_p CreditAttribution: mikey_p commentedI was able to resolve the head tags issue with this change:
But I'm guessing that its considered bad code style?
Comment #201
pwolanin CreditAttribution: pwolanin commentedChanges based on @mikey_p's fix and some cleanup and added comment.
Comment #205
pwolanin CreditAttribution: pwolanin commentedMake the implode delimiter safe and add a unit test.
Comment #208
pwolanin CreditAttribution: pwolanin commentedEasy fix, plus some doxygen cleanup in the unit test and a code comment fix from xjm.
Comment #209
pwolanin CreditAttribution: pwolanin commentedComment #210
xjmDone for tonight. I just added a few more cleanups. Things we still need to do (notes for tomorrow):
format_xml_elements() @todo needs a followup issue filed and referenced issueFixedcore/modules/system/system.install @todo Huh?Fixed@todo in update.report.inc Huh?FixedAdd a @todo referencing the setMultiple() followup, especially for $form_state.DoneAlso from #188:
DoneAdd followup(s) for the double-escaped strings that need fixing (see #167).FixedFinally, a high-level question. In @catch's original summary:
Are we still accomplishing this goal? If I read the recent profiling correctly, there's actually a 1-2% regression for the profiled case. If that's true, we presumably would need critical followups to remove whatever now-redundant work we're doing, right?
Comment #211
chx CreditAttribution: chx commentedEdit: eh, nevermind. Deleted my comment. Drupal 8 is really hard to work on, tho.
Comment #212
chx CreditAttribution: chx commented> 4. core/modules/system/system.install @todo Huh?
Well, that's a botched attempt to document it's safe: it is concat'd from two t() and a fixed br tag.
> 5. @todo in update.report.inc Huh?
> // @todo when converting to Twig, $data might get double-escaped, so
Update functions will get converted to twig templates. This is a warning to avoid double escape problems after that.
> 6. // @todo Add an optional strategy parameter to SafeMarkup function calls,
Not that I know of. (This is Fabianx's code who made an exception to work on this. Otherwise, AFAIK he is fed up with the core queue even worse than me. I can guess why.)
> Is the following text from the issue summary still true? What does "not allowed" mean in this context? Where is this documented if so?
In order: Yes. That I will find you and beat you with a clue-by-four with a blazing security sign on it if you do it. Nowhere ATM? I do not know where to put #attached documentation.
Comment #213
catchThis patch goes some way to enabling that, but it won't fix it. The idea is to stop preparing sanitized variables in preprocess, and just print things when we need them via the templates. The issues dealing with that overall goal are #2060783: Remove the preprocess layer. and #2035055: Introduce hook_theme_prepare[_alter]() and remove hook_preprocess_HOOK() but they're stalled/9.x at this point.
We could open specific issues to audit what's happening in preprocess and try to do less, for example https://api.drupal.org/api/drupal/core%21modules%21node%21node.module/fu... is not a happy function, but that's orthogonal to this issue. Same with any remaining double-escaping issues.
Additionally, when we added Twig to core, we didn't have entity render caching at all, the original Twig regression likely looks a lot different if you don't have field and entity templates rendered every request, and there are much worse issues on cache misses than this.
Comment #214
Fabianx CreditAttribution: Fabianx commentedI will try to take care of #8 and #9, but will take a moment to get to it.
Comment #215
Dries CreditAttribution: Dries commentedJust wanted to chime in and say I'm generally +1 to this. While DX matters a lot, security matters more. I'm happy to see we already improved the DX a lot. If we can further improve DX in the patch, that would be great, but that shouldn't hold up getting this great security feature committed. Further DX improvements can happen in follow-ups.
Comment #216
xjmFor #210.3, filed: #2296885: Remove format_xml_elements() (plus pushed a reference to it to the sandbox).
Comment #217
xjmSo I figured out what the @todo in system.install (#210.4) was about. As a best practice, rather than marking
SafeMarkup::set($some_string)
when$some_string
is composed of miscellaneoust()
calls and whitespace, we should do doSafeMarkup::implode('', $array_of_pieces)
. Attached tries to implement that in that case. (Note that I think things like this are actually okay to put in followups; I just wanted to work through this instance.)Comment #218
xjmHmm, reverting that actually. It's scope-creep-y; we pretty much know that what we assemble in the
system_requirements()
is safe. :P I've filed a followup issue for that instead: #2296929: Remove system_requirements() SafeMarkup::set() usePseudo-interdiff just shows the updated @todo comment.
Comment #219
xjmSo here's the zinger that's been in my mind for days, that I was reminded of when looking at #217: How do we keep people from doing something like this (but with the bad less obvious):
...which would be like "Hey entire string list! Script tags for everyone!"
Comment #220
joelpittet@xjm I don't think we can prevent that... but side benefit of the DX being a bit harder is people will likely tend away from that I'd hope.
Likely if they want to do that they could also just use whatever the equivalent to
drupal_add_js('jQuery(function () { alert("Hello!"); });', 'inline');
is now...Comment #221
chx CreditAttribution: chx commentedYup. We make it harder to shoot yourself in the foot but if you insist... it's your foot. The CR already contains but perhaps not strong enough that SafeMarkup::set is bad practice in and itself. Perhaps put it in the doxygen?
Comment #222
xjmAbsolutely. Other than minor cleanups and final review, super thorough documentation for the class is the last thing blocking this.
The differences though are that:
_drupal_add_js()
is now intended for internal use. (Though the patch that made that change apparently didn't update the docblock.)SafeMarkup::set('<script>')
marks the string safe for the whole page. So all it takes is one module using the wrong API to do something safe within that module, and they've circumvented Twig's autoescaping for that string for any other place someone manages to try to inject something.So I just want to make sure it's darn clear that
SafeMarkup::set('<bit of generic html>')
is doing it wrong. :) This is also what makes me uneasy with the bit where we areSafeMarkup::set()
ing the<head>
tag. It's safe once. It's not safe ever again, but Twig doesn't care. HEAD is not regressing, but if our goal is to get rid of some of our own escaping and let Twig do the work, we need to be careful of cases like this.Comment #223
pwolanin CreditAttribution: pwolanin commentedPerhaps there should be a way to remove a string from the safe list? e.g. if I want to use a certain tag as a delimiter in implode() but it's not generally safe?
Comment #224
chx CreditAttribution: chx commentedNope. Do not complicate the system with an unset; make the core implementations less stupid and mark SafeMarkup::set() as bad practice and move on? BTW I still do not understand how can you create an XSS hole but w/e.
Comment #225
mikey_p CreditAttribution: mikey_p commentedMaking an XSS with it would be pretty hard, but I'm sure possible somewhere on some site. You'd have to have a case where you have multiple fields that are joined together, where you can insert the starting and closing script tags along with the payload and trust that they are combined without any other text in the middle to prevent parse errors. While extremely unlikely, it's not impossible either.
Comment #226
xjmThanks @mikey_p for the clear explanation. That's the edgecase I was trying to get at.
It's not something that should hold up this patch, as we are going to be vastly reducing the number of XSS entry points in the wild and the opportunity for exploit is... let's say academic. But conceptually, for this patch, I want to get what the correct way to use SafeMarkup is (and isn't) so it doesn't get abused. A SafeMarkup::set() call with a variable in it is a red flag. A SafeMarkup::set() call with a very short markup string (should we even be doing
SafeMarkup::set('<br />')
?) is a red flag. Especially incomplete markup is a red flag. And as chx points out, the first rule of SafeMarkup is: you do not use SafeMarkup.Comment #227
xjmTo that end: #2297703: [meta] Refactor and remove as many SafeMarkup::set() calls as possible
Comment #228
xjmJust noticed. This isn't a functional bug really. It's a maintainer-approved release-and-beta-blocking priority task.
Comment #229
xjmAnd the double-escaping followup: #2297711: Fix HTML escaping due to Twig autoescape
Comment #230
xjmSo, I started adding strongly worded documentation to
SafeMarkup::set()
this morning. But what I realized? Is that all the concerns about usingSafeMarkup::set()
inappropriately also apply to the sanitization functions that use it. Like. If you callt('<')
... If you're thinking "Why would anyone ever do that?" you haven't spent enough of your life cleaning up the bat-guano crazy things that get done on some sites.Comment #231
xjmIt's even broader than that; it's not just a coding best practices problem. Since our text filters call
Xss::filter()
which in turn marks the string as safe, that means all you need is someone with sufficient permissions to create an entity with a filtered text field that contains a single angle bracket, and that field to be used somewhere on the page (or processed somewhere during the page render). It'd be a very roundabout and unlikely thing to exploit, but as long as we use this mechanism for opting strings out of Twig's autoescaping, we need to be careful about what responsibility we shift solely to Twig for sanitization.Comment #232
xjmAttached fixes #210.7, tests reverting the change indicated in #210.9 and attempts to improve the documentation for
SafeMarkup::set()
.Comment #233
xjmBTW my first instinct to address #231 was that we should consider separate buckets of string safeness, i.e. code-defined strings from
t()
orString::format()
are different from things that go throughString::checkPlain()
, which are also different from a delimiter you want to useSafeMarkup::implode()
on or concatenate two translated strings with, which is also different from something that came to us from an input filter. @alexpott also mentioned the idea of scopes when I mentioned this issue to him. But I'm loathe to add any further complexity to this issue; its goal is to make it easier for themers to theme safely, and it accomplishes that. It just gives our theme-layer autoescaping a back door that might be more easily opened than we realize. It might merit a followup discussion, at the least.Comment #234
pwolanin CreditAttribution: pwolanin commented@xjm - I'm not seeing how you could know which bucket to find the string in at render time?
Maybe we should revert the delimiter change?
Comment #235
xjm@pwolanin, yeah, I've been debating that all morning, because it encourages people to do a more bad thing than the bad thing they're already doing. I've also been thinking that the
implode()
method encourages bad practice to begin with. For example, these usages:seem questionable to me. And making lists of things should be done though whatever mechanism we use for
theme_item_list()
now. Edit: The last one is especially egregious, it's doing exactly what we are saying in theset()
docs it shouldn't: Adding any (!) of the allowed HTML tags, both their opening and closing versions, to the safe markup list everywhere. This would be necessary to keep the markup from being escaped with or without SafeMarkup::implode(). What we should be doing instead is not setting the tags as safe, checking whether$field_output
is already safe and check-plaining it if not, and then marking the whole string as safe without the implode. The DX is worse. But the DX should be worse, because we're doing it wrong.Comment #236
xjmA better method to have would be something like
makeSafe()escapeIfUnsafe()
or something. It could accept a single string or flat array of strings, and check-plain them only if they're not already set. Then the calling code could do whatever concatenation or implosion it wants, with whatever naughty markup it wants, and then mark only the completed string as safe.Comment #237
pwolanin CreditAttribution: pwolanin commentedPart of the follow-up was to have checkPlain itself do a bunch of this work so it's more transparent to the developer.
Comment #238
bfr CreditAttribution: bfr commentedJust few typos i bumped into.
Comment #239
xjmThanks @bfr! Can you provide interdiffs when creating new patches? The patch is developed in a sandbox so we have no way at the moment to incorporate whatever changes you made.
@pwolanin, where is the followup for that?
Comment #240
pwolanin CreditAttribution: pwolanin commented@xjm - I think it's part of #2273923: Remove html => TRUE option from l() and link generator.
Comment #241
xjmAh. So I think that changing
checkPlain()
to check whether a string is in the safe list before escaping could be part of this patch... but on the other hand, it would also totally change the expectation of whatcheckPlain()
does.checkPlain()
should always be the way that people can escape anything, regardless as to whether it's already been marked safe as markup somewhere else. I'll indicate that on that issue.Comment #242
xjmAfter protracted IRC discussion about this, we agreed that we probably should remove
SafeMarkup::implode()
:Comment #243
xjmAttached includes cleanups from @joelpittet and @bfr, rerolled against HEAD. Also updated in the sandbox.
Comment #244
joelpittet@chx and I will do a
implode
implosion run through tomorrow (Tuesday) evening.Comment #245
Fabianx CreditAttribution: Fabianx commentedSafeMarkup::implode is still needed for safe joining from twig itself though (see safe_join in the patch) - and there it is good to use. There is also upstream discussion on it.
I don't think the delimiter itself should be needed to be safe though, just marked clearly that you don't put user provided input into the delimiter.
SafeMarkup::implode in that fashion is still useful, though I do agree that its better to split it up into a SafeMarkup::escape function, which checks if its already escaped.
What currently is
SafeMarkup::implode('', SafeMarkup::set('<div>'), $possibly_unsafe_var, SafeMarkup::set('</div>'))
is then just:
SafeMarkup::set('<div>' . SafeMarkup::escape($possible_unsafe_var) . '</div>');
which is indeed better for the strings that are safe.
I still think ultimately this should be:
to prevent all this HTML string binding in core and at least in D8 render arrays are the way to go ...
@xjm: My own instinct also was for different buckets with comparable strings and that is definitely possible using different escaping strategies - but boundaries would need to be clearly defined and in the end you have to decide if its already escaped or not, so while intermediately possible, in the end you have to say yes|no.
Comment #246
xjmTo clarify, this is what @Fabianx is referring to for Twig needing the implode method.
I still would be inclined to suggest that this function should simply do what implode() was doing internally before (adding back the documentation about the delimiter needing to be safe edit and obviously using our new escape() method instead of the protected properties) but replacing, and then all other usages in Drupal code should be replaced to only SafeMarkup::set() whole strings.
Comment #247
chx CreditAttribution: chx commentedHere. implode() removed. For those following at home: we didn't remove implode() because it was problematic in this patch but because it was encouraging behavior that could possibly lead to problems. The following pattern is used a lot:
to avoid a conditional within the loop.
Comment #248
Fabianx CreditAttribution: Fabianx commentedI would strongly suggest to keep code duplication to a minimum
What about:
It is the same as the duplicated lines, but makes it more clear that the array is safely escaped before it is passed to ::set.
It also ensures escape can deal with arrays - which I think is important.
In twig_drupal_join_filter can then also just do:
I like the DX of that instead of having to traverse the list again and again ...
Comment #249
Fabianx CreditAttribution: Fabianx commented#247 also removes some code for Traversable that is still needed from twig_drupal_join_filter ...
That will need to be brought back, please (as its also in default twig_join_filter).
EDIT: As the array is traversed now, this is not possible.
Comment #250
chx CreditAttribution: chx commentedIn my opinion, everything related to this is transitional, ugly and discouraged (even if don't quite manage to clean all of them up by 8.0.0). I wouldn't waste effort on making a smoother road to a destination (SafeMarkup::set) that we tell people not to go to. Just use a render array, 'mkay?
Comment #251
Fabianx CreditAttribution: Fabianx commentedOkay, deal.
Lets keep ::escape as is and try to avoid ::set calls as the plague and remove as many as possible and replace with for example the render arrays with twig_template #type.
I still think escape on arrays is useful as escape never sets any safe strings, but fine with a follow-up.
Comment #252
xjmYay! Thanks @chx!
/me cracks knuckles and digs back in.
Comment #253
dags CreditAttribution: dags commentedPicking up at Jersey Shore Drupal 8 sprint to work on docs.
Comment #254
dags CreditAttribution: dags commentedPHPDocs for the SafeMarkup class.
Comment #256
xjmComment #257
xjmHere's the patch rerolled against HEAD, with @dags' and @cilefen's docs.
Comment #258
xjmAnd expanding those docs a bit.
Comment #259
cilefen CreditAttribution: cilefen commentedTypo on the last commit.
Comment #260
cilefen CreditAttribution: cilefen commentedI updated the change record to match the new SafeMarkup class comments.
Comment #261
steveoliver CreditAttribution: steveoliver commentedWhile I still wish the implode() codeblocks looked more along the lines of #248 (requiring support for arrays in SafeMarkup::escape()), given #250, I only have these nitpicks of the current patch in #258:
The returned array is still indexed by type, is it not? I'd rather keep the specific description of the multidimensional array here than replace it with a "see".
- add " for Drupal" in:
SafeMarkup provides a store for known safe strings and methods for Drupal to manage them ...
?
- typo in 'interal'. should be 'internal'.
Preventing a few calls to drupal_render()... nice.
needs one more \n after }
1. Replacement function for Twig's escape filter (Twig should be capitalized).
2. @param string $string--actual arg name is "$arg" not "$string".
3. @param string $charset. "The charset." ==> "The ASCII character set to use for encoding."
4. @param Boolean should be @param bool $autoescape
5. @param bool $autoescape description: clearer as: "Whether this function should perform autoescaping (TRUE) or if the argument has already been escaped by the developer (FALSE)."?
Comment #262
star-szrComment #263
xjmThanks @steveoliver! I think all those are good changes, except:
I disagree; it makes no sense to maintain identical documentation in two places, and there is a detailed example in drupal_set_message().
Comment #264
chx CreditAttribution: chx commentedDo not change the doxygen on twig_drupal_escape_filter beyond the absolute minimum; it's a copypaste from twig original and it shouldn't be more. In particular "The ASCII character set to use for encoding." is incorrect, such a thing does not even exist, ASCII itself is a character set (but htmlspecialchars doesn't accept ASCII) but so is utf-8 and so on.
Comment #265
xjmFixing those minor points.
Comment #266
xjmAddresses #261.
$string
because that's whattwig_escape_filter()
class it (to @chx's point in #264).(string) $string
and such feels a little silly but that's whattwig_escape_filter()
does, so...Comment #267
steveoliver CreditAttribution: steveoliver commented@xjm awesome!
1. Manual testing of several standard Drupal pages looks good.
2. I just double-checked that we have follow up issues created, updated, and linked for every @todo.
3. Super-minor, missed last nits:
Helpful comment? Unconventional format? ...Otherwise +1 RTBC
Comment #268
steveoliver CreditAttribution: steveoliver commented@chx reminded me 3. is for IDEs. Change record looks good. RTBC! :)
Comment #269
tim.plunkettI was about ~60% of the way through the patch, and I didn't see anything that should hold this up.
+1 for RTBC.
Comment #270
Fabianx CreditAttribution: Fabianx commentedI am late back to the party, but this part needs a little more work unfortunately and not due to the @todo, but due to the fact that we change the meaning of:
|escape in a template.
Currently we don't escape even if someone wants to explicitly escape - which is a major change to how twig autoescape works.
Therefore, sorry for setting CNW so close to merging.
The code should actually be:
That also solves the @todo. I knew I was missing something before, but just by re-reviewing now, I finally found it.
Can someone put that change into a patch, thanks?
With this change it is RTBC + 1 from me, too.
Comment #271
scor CreditAttribution: scor commentedThe patch needed a reroll (271-reroll.patch). Addressing @Fabianx comment from #270 in twig-autoescape-1825952-271.patch (which is the one to be committed).
Comment #273
cosmicdreams CreditAttribution: cosmicdreams commentedSyntax Error here, forgot closing ")"
Comment #274
Fabianx CreditAttribution: Fabianx commentedUh, yes. Right #273.
Sorry for that.
Comment #275
scor CreditAttribution: scor commentedComment #276
star-szrComment #277
Fabianx CreditAttribution: Fabianx commentedThere are strong arguments from our code to remove making $glue safe and instead document in the function definition that glue is expected to be a string safe for output and user provided data should never be passed as a glue.
/me sings the code needs work song ... (sorry for that!)
EDIT:
To be precise what I mean that needs change is:
Comment #278
Fabianx CreditAttribution: Fabianx commentedComment #280
scor CreditAttribution: scor commentedclarified the usage of $glue per #277.
Comment #283
chx CreditAttribution: chx commentedBot fluke.
Comment #284
alexpottCommitted 87e675f and pushed to 8.x. Thanks!
Can we get a followup to add explicit testing of
twig_drupal_join_filter()
andtwig_drupal_escape_filter()
- something along the lines ofTwigTransTest
.Fixed on commit.
Comment #286
Fabianx CreditAttribution: Fabianx commentedOMG, Yes! That is wonderful!
/me sings the squash-the-beta-blocker-song "Another one bites the dust ..." :-D
Comment #287
pfrenssenJust a note, for inline variable declarations, please use the standard format, and not the proprietary PHPStorm-only format.
Comment #288
chx CreditAttribution: chx commentedpfrenssen thanks I never knew that! Please file a followup.
Comment #289
webchickAwesome work getting this in, all!! :)
Reviewing the change notice though, I don't see any before/after D7 vs. D8 code for module devs and especially themers on how they're supposed to upgrade their existing code. Could this please be added? And the cautionary tales about SafeMarkup::set() that are there seem like they probably belong somewhere less transient than a change notice that will only be reviewed by people upgrading code from D7 to D8 (which will eventually reach 0 people).
Comment #290
pfrenssenMade a followup for the @var declarations #2305641: Use the "standard" format for @var inline variable type declarations as well as a suggestion to put this in the comment standards #2305593: [policy] Set a standard for @var inline variable type declarations
Comment #291
xjmThe cautionary remarks are also all over the SafeMarkup class in the API documentation itself. :)
To the point about D7-to-D8 code... in some ways, it's out of scope. SafeMarkup is an internal use API. The scope of using the render and theme layers properly is way, way beyond this issue. Maybe one of the theme system maintainers could add links to other resources about themable code in D8?
Comment #292
jbrown CreditAttribution: jbrown commentedThere are double-escaped strings on the module uninstall page:
Should I make a separate issue?
Comment #293
heddn@jbrown, yes please. We knew going into this patch that we'd have a few stray things slip through the cracks. For this, and all other similar findings please open their own separate follow-ups.
Comment #294
xjmYep, please file double-escaping bug reports as children of #2297711: Fix HTML escaping due to Twig autoescape.
Comment #295
thedavidmeister CreditAttribution: thedavidmeister commentedyay, this getting in is great news!
Comment #296
yched CreditAttribution: yched commentedFeedback welcome in #2199637-32: Replace "required" flag of Field module with proper dependencies :
In short, this patch did:
But it also made Xss::filter() itself call SafeMarkup::set(), so we end up calling it twice here. Not sure why the wrapping SafeMarkup::set() was added ?
Comment #297
chx CreditAttribution: chx commentedYou are feeding a safe string to Html::normalize so the result will be safe too but the output is not necessarily the same as the input so the new string needs to be marked as safe too.
Comment #299
hass CreditAttribution: hass commentedIs this change the reason for #2345903: Form descriptions are checkplained?
Comment #300
xjmComment #301
kay_v CreditAttribution: kay_v as a volunteer commented