Suggested commit message:

Issue #1825952 by Fabianx, joelpittet, bdragon, heddn, chx, xjm, pwolanin, mikey_p, ti2m, bfr, dags, cilefen, scor: Turn on twig autoescape by default

Note: The approach has changed several times in this issue, so anything before comment #139 may not be relevant.

Work on this patch is done in a sandbox:
https://www.drupal.org/node/1857558

git clone --branch autoescape2--xjm http://git.drupal.org/sandbox/chx/1857558.git d8_autoescape

Contact chx for access.

Problem/Motivation

No one can write XSS-safe code. Not core contributors, nor contrib developers, no one.

Proposed resolution

  • Turn on Twig autoescaping by default.
  • Add a new SafeMarkup class. Sanitized markup strings can be flagged (with caution) as safe with SafeMarkup::set() and thoroughly documented (see #2280965: [meta] Remove every SafeMarkup::set() call). These will be rendered as-is by Twig; everything else will be escaped automatically.

Remaining tasks

Contributor tasks needed
Task Novice task? Contributor instructions Complete?
Add unit tests for SafeMarkup Instructions Done (#205)
Add detailed documentation for SafeMarkup
Draft a change record for the API changes Instructions Done (needs review)

User interface changes

If we do this right, then none. If we don't then you will see a new kind of bug: double escaping.

API changes

New SafeMarkup class and SafeMarkup::implode() helper.

You are not allowed to put unsafe user data in #attached. This can be relaxed in a followup but it truly gets gnarly. You are advised to not use #type => html_tag if at all possible or at least not with unsafe user data. This is not something I want to waste an effort on making it work.

Original report by @catch

Part of meta issue

follow-up from #1696786: Integrate Twig into core: Implementation issue

Twig as it stands introduces a fair bit of overhead into the theme system. Fabianx indicated that a lot of this is from marking $variables as secure so they're not double escaped later.

Ideally, if Twig autoescape is going to be enabled, then we should just pass raw variables to it and let it do the work. This way, if a template doesn't print the date, or a link, or whatever might currently be check_plain()ed in preprocess, we're not spending all this time creating it for it to be never used. In general, we should be able to remove a large chunk of preprocess work, and just let Twig sort out variables on demand.

Doing this means that a PHPTemplate engine in contrib is going to have to add back a way to securely format variables, but I don't see a way around this if we don't want a serious performance regression.

CommentFileSizeAuthor
#280 interdiff.txt837 bytesscor
#280 twig-autoescape-1825952-280.patch102.09 KBscor
#275 interdiff.txt637 bytesscor
#275 twig-autoescape-1825952-275.patch102.02 KBscor
#271 interdiff.txt1.26 KBscor
#271 twig-autoescape-1825952-271.patch102.02 KBscor
#271 twig-autoescape-1825952-271-reroll.patch102.14 KBscor
#266 interdiff-261-265.txt4.73 KBxjm
#266 twig-autoescape-1825952-266.patch102.17 KBxjm
#259 1825952-259-do-not-test.patch699 bytescilefen
#258 docs-polish-interdiff.txt1.71 KBxjm
#258 twig-autoescape-1825952-258.patch101.91 KBxjm
#257 twig-autoescape-257.patch101.51 KBxjm
#254 intediff-247-254.txt86.5 KBdags
#254 1825952-254.patch1.01 KBdags
#247 interdiff.txt21.39 KBchx
#247 1825952_247.patch101.25 KBchx
#243 twig-interdiff-232-242.txt23.97 KBxjm
#243 twig-autoescape-1825952-1825952-243.patch102.46 KBxjm
#238 twig-autoescape-1825952-238.patch105.3 KBbfr
#232 interdiff-218-232.txt5.39 KBxjm
#232 twig-autoescape-1825952-232.patch105.3 KBxjm
#218 interdiff.txt980 bytesxjm
#218 twig-autoescape-1825952-218.patch105.22 KBxjm
#217 interdiff-210-217.txt11.83 KBxjm
#217 twig-autoescape-1825952-217.patch114.65 KBxjm
#210 interdiff.txt4.48 KBxjm
#210 twig-autoescape-1825952-209.patch104.52 KBxjm
#208 1825952-208.patch103.71 KBpwolanin
#208 increment.txt3.23 KBpwolanin
#205 1825952-205.patch103.52 KBpwolanin
#205 increment.txt8.94 KBpwolanin
#201 1825952-201.patch97.03 KBpwolanin
#201 increment.txt944 bytespwolanin
#194 twig-autoescape-1825952-194.patch95.74 KBxjm
#192 1825952_192.patch96.38 KBFabianx
#191 interdiff-189-191.txt6.2 KBxjm
#191 twig-autoescape-1825952-191.patch96.36 KBxjm
#189 interdiff-184-189.txt4.35 KBxjm
#189 twig-autoescape-1825952-189.patch98.76 KBxjm
#184 interdiff-184.txt50.86 KBxjm
#184 twig-autoescape-1825952-184.patch98.77 KBxjm
#159 1825952_159.patch90.71 KBFabianx
#154 1825952_154.patch88.18 KBFabianx
#150 1825952_151.patch90 KBFabianx
#148 1825952_149.patch86.95 KBFabianx
#147 1825952_145.patch87.2 KBFabianx
#145 1825952_145.patch84.96 KBchx
#143 1825952_143.patch84.83 KBFabianx
#141 1825952_139--fix-typo.patch83.67 KBFabianx
#139 1825952_139.patch83.67 KBchx
#132 1825952_131.patch234.93 KBchx
#116 1825952_116.patch237.15 KBchx
#115 1825952_115.patch238.34 KBchx
#109 1825952_109.patch240.12 KBchx
#103 interdiff.txt2.56 KBchx
#103 1825952_103.patch241.41 KBchx
#101 1825952_101.patch240.28 KBchx
#99 1857558.patch205.51 KBchx
#85 1825952_84.patch62.46 KBchx
#79 1825952-twig-autoescape-79.patch48.88 KBjoelpittet
#79 interdiff.txt15.91 KBjoelpittet
#77 1825952-twig-autoescape-76.patch57.22 KBjoelpittet
#77 interdiff.txt16.74 KBjoelpittet
#72 1825952_72.patch48.11 KBchx
#63 1825952-twig-autoescape-63.patch56.03 KBjoelpittet
#57 interdiff.txt27.59 KBjoelpittet
#56 1825952-twig-autoescape-56.patch56.54 KBjoelpittet
#54 interdiff.txt10.75 KBjoelpittet
#54 1825952-twig-autoescape-54.patch30.04 KBjoelpittet
#51 interdiff.txt6.79 KBjoelpittet
#51 1825952-twig-autoescape-51.patch22.82 KBjoelpittet
#49 interdiff.txt3.46 KBjoelpittet
#49 1825952-twig-autoescape-49.patch18.07 KBjoelpittet
#47 interdiff.txt7.25 KBjoelpittet
#47 1825952-twig-autoescape-47.patch14.76 KBjoelpittet
#45 1825952-twig-autoescape-45-reroll.patch7.98 KBjoelpittet
#43 interdiff.txt2.16 KBjoelpittet
#43 1825952-43-twig-autoescape.patch7.96 KBjoelpittet
#41 1825952-twig-autoescape-41.patch6.68 KBmgifford
#39 1825952-twig-autoescape-39.patch6.69 KBjoelpittet
#30 1825952-twig-autoescape-30.patch39.89 KBjoelpittet
#23 autoescape-on.patch1.72 KBjoelpittet
#13 interdiff.txt7.48 KBstar-szr
#12 1825952-12.patch9.32 KBstar-szr
#12 interdiff.txt5.37 KBstar-szr
#11 1825952-11-do-not-test.patch11.63 KBstar-szr
#11 interdiff.txt606 bytesstar-szr
#9 1825952-9-do-not-test.patch11.43 KBstar-szr
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

jenlampton’s picture

Status: Active » Postponed

Postponing this until #1757550: [Meta] Convert core theme functions to Twig templates is resolved.

Once the above is done, we will be able to create a single patch that removes all instances of check_plain, check_markup, etc, from content passed to Twig. We can't do that until Twig is actually rendering all of our output or it would be a major security regression.

Oh, but I did add this as one of the major steps in our roadmap so it won't get overlooked or forgotten :)

webchick’s picture

Priority: Normal » Major

Also raising to at least major, since this was a big win we were counting on with Twig, so makes sense to make use of it. :)

David_Rothstein’s picture

Isn't this dependent on #1818266: [meta] A secure theme system (with twig) too? Turning on auto-escape is not going to be simple, as far as I know, but there was some progress in that issue.

webchick’s picture

Priority: Major » Normal

Oops. I think I meant to mark the other issue as major. Thanks, David!

catch’s picture

Some of the drupal_render() stuff elsewhere is handling the "don't prepare wasted variables in preprocess". I need to try to talk to Fabian about what other savings we could make with preprocess for things like format_date() which is a perennial killer for content listings.

star-szr’s picture

Issue summary: View changes

Fabian little x. Silly case sensitive usernames.

star-szr’s picture

Assigned: Unassigned » star-szr
Status: Postponed » Active

I'm going to try and pull a patch out based on the work done in @Fabianx's sandbox: https://drupal.org/sandbox/Fabianx/1819382

The patch will be based on a diff of the d8-core-1712444 and d8-core-1712444-v1 branches.

star-szr’s picture

Status: Active » Needs work
FileSize
11.43 KB

Here is a rough initial version, brought over almost everything except for one unrelated inline comment and one change that seemed unrelated/unnecessary:

diff --git a/core/includes/form.inc b/core/includes/form.inc
index 5d6b32e..e84ff49 100644
--- a/core/includes/form.inc
+++ b/core/includes/form.inc
@@ -3426,7 +3426,7 @@ function theme_tableselect($variables) {
           // A header can span over multiple cells and in this case the cells
           // are passed in an array. The order of this array determines the
           // order in which they are added.
-          if (!isset($element['#options'][$key][$fieldname]['data']) && is_array($element['#options'][$key][$fieldname])) {
+          if (is_array($element['#options'][$key][$fieldname]) && !isset($element['#options'][$key][$fieldname]['data'])) {
             foreach ($element['#options'][$key][$fieldname] as $cell) {
               $row['data'][] = $cell;
             }

The only things that I changed from the sandbox were some tiny little coding standards things that I spotted along the way.

This also changes twig_escape_filter in vendor which we can't do obviously.

Putting -do-not-test.patch because this is quite broken - for example #type => html_tag gets escaped. Going to see if I can push this a bit further though.

Fabianx’s picture

+++ b/core/includes/bootstrap.incundefined
@@ -1406,6 +1406,17 @@ function check_plain($text) {
+function drupal_mark_safe($string) {
+  global $safe_strings;
+  $safe_strings[$string] = TRUE;
+  return $string;
+}
+
+function drupal_is_safe($string) {
+  global $safe_strings;
+  return isset($safe_strings[$string]);
+}

I had inlined those in the optimized version and we probably need to do that for the final and to be profiled version.

I wish for a PHP-preprocessor here :-D

+++ b/core/includes/common.incundefined
@@ -1365,7 +1365,9 @@ function l($text, $path, array $options = array()) {
+  $string = ('<a href="' . $url . '"' . $attributes . '>' . $text . '</a>');
+  $GLOBALS['safe_strings'][$string] = TRUE;

Yes, here I already inlined it.

+++ b/core/includes/common.incundefined
@@ -3873,7 +3875,7 @@ function drupal_render(&$elements) {
-  $output = $prefix . $elements['#children'] . $suffix;
+  $output = ($prefix) . $elements['#children'] . ($suffix);

This change is left-over cruft.

+++ b/core/includes/theme.incundefined
@@ -1122,7 +1122,8 @@ function theme($hook, $variables = array()) {
-  return $output;
+  $GLOBALS['safe_strings'][$output] = TRUE;
+  return ($output);

Again - inlined for speed.

+++ b/core/lib/Drupal/Component/Utility/String.phpundefined
@@ -29,7 +29,9 @@ class String {
+    $GLOBALS['safe_strings'][$string] = TRUE;
+    return $string;

Probably should inline them all, it is quite easy to see.

Could be wrapped in:

if $GLOBALS['use_safe_strings']

and use the same pattern everywhere if we decide autoescape is optional to keep same performance as before.

+++ b/core/lib/Drupal/Component/Utility/String.phpundefined
@@ -105,7 +107,7 @@ public static function format($string, array $args = array()) {
-    return strtr($string, $args);
+    return drupal_mark_safe(strtr($string, $args));

Should probably inline here.

+++ b/core/lib/Drupal/Component/Utility/Xss.phpundefined
@@ -71,7 +71,7 @@ public static function filter($string, $allowed_tags = array('a', 'em', 'strong'
-    return preg_replace_callback('%
+    return drupal_mark_safe(preg_replace_callback('%

Inline, use just one pattern, Fabianx!

+++ b/core/lib/Drupal/Core/Template/Attribute.phpundefined
@@ -30,7 +30,7 @@
-class Attribute implements \ArrayAccess, \IteratorAggregate {
+class Attribute extends \Twig_Markup implements \ArrayAccess, \IteratorAggregate {

I am not totally sure this is still needed - I think it was left-over.

It should be removed.

The check should be like for RenderWrapper in the twig_render_template.

+++ b/core/lib/Drupal/Core/Template/Attribute.phpundefined
@@ -73,6 +73,9 @@ public function offsetSet($name, $value) {
+    elseif ($value instanceof \Twig_Markup) {
+      $value = new AttributeString($name, (string)$value);

This should be removed.

Drupal should not rely on twig as engine (even if its the default one).

+++ b/core/lib/Drupal/Core/Template/AttributeValueBase.phpundefined
@@ -12,7 +12,7 @@
-abstract class AttributeValueBase {
+abstract class AttributeValueBase extends \Twig_Markup {

Please remove.

+++ b/core/lib/Drupal/Core/Template/AttributeValueBase.phpundefined
@@ -69,6 +69,8 @@ public function printed() {
-  abstract function __toString();
+  public function __toString() {
+    return parent::__toString();
+  }

No longer needed.

+++ b/core/lib/Drupal/Core/Template/TwigExtension.phpundefined
@@ -40,6 +40,8 @@ public function getFilters() {
       'passthrough' => new \Twig_Filter_Function('twig_raw_filter'),
       'placeholder' => new \Twig_Filter_Function('twig_raw_filter'),
+      // Helper filter used to replace twig's original raw() filter.
+      'twig_raw' => 'twig_raw',

This should be:

'twig_raw' => new \Twig_Filter_Function('twig_raw'),

Probably _very_ confusing now.

+++ b/core/modules/filter/filter.moduleundefined
@@ -711,7 +711,8 @@ function check_markup($text, $format_id = NULL, $langcode = '', $cache = FALSE,
-      return $cached->data;
+      // @todo: The caller is responsible that this is really safe.
+      return drupal_mark_safe($cached->data);

Inline it!

+++ b/core/modules/filter/filter.moduleundefined
@@ -752,7 +753,8 @@ function check_markup($text, $format_id = NULL, $langcode = '', $cache = FALSE,
-  return $text;
+  // @todo: The caller is responsible that this is really safe.
+  return drupal_mark_safe($text);

And another one to inline ...

That is all :-D.

Thanks for working on this!

star-szr’s picture

FileSize
606 bytes
11.63 KB

Thanks @Fabianx, that's great :)

First the fix for output coming from RenderWrapper, cleanup from #10 coming in the next patch. Blocks are still busted (encoded when they shouldn't be) currently.

star-szr’s picture

FileSize
5.37 KB
9.32 KB

Blocks are still broken but cleaned up as per #10 (other than the if $GLOBALS suggestion) and things are still working as they were before from what I can see. Smaller patch file too :)

@Fabianx or anyone else - I'm not clear why some strings are wrapped in parentheses, would love to know why that is. Should we be doing this for all of them or is there a rule/reason to this? Examples:

+++ b/core/includes/theme.inc
@@ -1122,7 +1122,8 @@ function theme($hook, $variables = array()) {
-  return $output;
+  $GLOBALS['safe_strings'][$output] = TRUE;
+  return ($output);

+++ b/core/lib/Drupal/Component/Utility/String.php
@@ -29,7 +29,9 @@ class String {
-    return htmlspecialchars($text, ENT_QUOTES, 'UTF-8');
+    $string = (htmlspecialchars($text, ENT_QUOTES, 'UTF-8'));
+    $GLOBALS['safe_strings'][$string] = TRUE;
+    return $string;
star-szr’s picture

Status: Needs work » Needs review
FileSize
7.48 KB

Correct interdiff. Since I forgot to name the last patch -do-not-test.patch, temporarily setting to needs review so I can cancel that test :)

joelpittet’s picture

@Fabianx Would there be possible name collisions with the $GLOBALS suggestion?

Fabianx’s picture

#12: No, the parentheses are a left-over from drupal_mark_safe($output) -> inlining conversion and yes I worked on it in a hurry ...

#15: Yes, indeed, should probably use $_GLOBALS['drupal_safe_strings'] instead.

Crell’s picture

Adding a new global to track rendering state is absolutely not OK. We've been working hard to remove globals from Drupal. Adding in new ones is not cool.

David_Rothstein’s picture

Should this patch be moved to #1818266: [meta] A secure theme system (with twig)? Otherwise we are starting to repeat discussion that already happened there.

(That's why this issue was marked postponed on that one. The fact that #1818266: [meta] A secure theme system (with twig) has "meta" in the issue title is arguably a bit misleading...)

Fabianx’s picture

As I said already, it would be good to have the same "pattern" (including variable name) to be able to do MASS Search+Replace.

Globals are really only used for performance, we could also use the advanced drupal_static_fast pattern, which should be similar to the Globals.
( Just [n] function calls for [n] functions using this technique.)

star-szr’s picture

Assigned: star-szr » Unassigned

I won't have time to look at this for a little while, unassigning for now.

joelpittet’s picture

A bit unclear on why we have to mark strings as 'safe'? Can't we escape everything and {{ safe_string|raw }}? I'm probably being naive but thought I'd ask.

joelpittet’s picture

Issue summary: View changes
Status: Needs work » Needs review
FileSize
1.72 KB

Ok in a crazy attempt to move this forward a bit I started writing some regular expressions... instead of a big patch.

This is what I came up with so far:

Tools: curl, ack, perl and git co = git checkout

ack check_plain -l --print0 | xargs -0 perl -pi -e 's#check_plain\(((?:\((?-1)\)|[^\(\)]++)+)\)#\1#g'
ack String::checkPlain -l --print0 | xargs -0 perl -pi -e 's#String::checkPlain\(((?:\((?-1)\)|[^\(\)]++)+)\)#\1#g'
ack "array_map\('check_plain'" -l --print0 | xargs -0 perl -pi -e "s|array_map\('check_plain', (.+?)\)|\1|g"


// Exclude everything in includes (except preprocess...)
git co -- core/includes/


// Exclude attributes
git co -- core/lib/Drupal/Core/Template/


// Do remove from preprocess in theme.inc!
@todo


// Turn on Auto-Escape
curl https://drupal.org/files/issues/autoescape-on.patch | git apply

I'm trying to ignore the idea of "safe" strings... though it will probably come up again.

Wondering if anybody can push this a bit further... the obvious catch is regions are getting escaped
{{ page_top }}

FYI, the nasty looking perl regular expression just does some nested brace matching.

joelpittet’s picture

Hmmm, the more I think about this issue the more like it seems we are trying to rob peter to pay paul.

We turn this auto-escape on and get:

  • A performance regression because all variables being printed will be escaped by default.
  • Removal of all checkPlain's with addition of all the mark_safe's.

All renderable arrays normally producing markup when printed would need to be "safe"/raw. This leaves all it's variables raw unless there is a twig template that they are being rendered into.

Sorry if this sounds :-(, trying to build a case in my head to make this work... is there any case where a Renderable Array would need to be escaped or can we handle that it it's template/theme function/post and render hooks etc? Maybe we can mark renderable arrays as unsafe in their hook_element_info/hook_theme definitions and assume the rest as safe?

joelpittet’s picture

Any direct call to theme() will generate a string that is unsafe. We've been trying to remove those through the conversions but there are still a number of them. So we'd need a way to mark those.

So far I've found the following need to be marked as safe:

  • Any variable that has been rendered in preprocess with theme(), render() or drupal_render().
  • Any object with a __toString(), DONE already in the patch. For example Attribute, RenderWraper, etc.
  • Any generated URL.
  • Any helper function that generates markup, for example String::placeholder.
  • Any variable that is already checkPlain'd explicitly.
  • And then, in general, any variable that contains markup explicitly.

Those are all the targets to be 'safe'/raw that I can think of.

Crell’s picture

Even if performance is a wash, IMO relying on Twig to handle escaping is a DX win as then Twig is behaving more like it does in every other system that uses it, which is a growing number. That makes Drupal theme adoption easier for developers used to Twig from some other CMS or framework.

joelpittet’s picture

@Crell, probably not a discussion for this thread on which frameworks do and don't have auto escaping on as a cursory search showed two that have it off due to reasons similar to Drupal and one has it off. Though if you see me on IRC ping me with a few examples of CMS's that do that because I'd like to read through how they go about it.

What I'd love to know from you and others, does that $GLOBALS['safe_strings'] look like a viable solution? And if not, maybe some other bright ideas? I imagine if strings were objects we could tag them as safe, but PHP doesn't do that...

One big blocker on this issue is the direct calls to theme(), render() and drupal_render() that we've being trying in twig conversion issues to convert to render arrays as much as possible but there are still cases where those happen in core. That would be an solveable problem though the other edge cases may need something like this $GLOBALS['safe_strings'] idea to get this issue moving, so if that's kosher I'll keep moving with that?

Irrelevant cursory search:

ProcessWire recommends auto-escape off because they too do the processing before:
http://modules.processwire.com/modules/template-twig-replace/

Kohana has auto-escape on because they leave everything to the developer:
https://github.com/jonathangeiger/kohana-twig/blob/master/config/twig.php

Contao auto-escape off:
https://contao.org/en/extension-list/view/twig.10020019.de.html

joelpittet’s picture

Status: Needs work » Needs review
FileSize
39.89 KB

So here's a drastic approach. If I'm lucky it will pass install.

The plan here:
Completely remove TwigReference and it's cohorts so that |raw works.
Kill show and change hide to a filter help with that using #2114563: Remove TwigReference with the help of 'without' filter replacing 'show'/'hide' functions.
Rewrite TwigNodeVisitor to just for Arrays that look like renderable arrays to pass through twig_render_var.

Ideally, I'd just wrap all arrays in a RenderableArray object with a __toString() and drop the twig_render_var function too...

May have gone crazy... but at least I found what was hurting |raw, the node visitor converting too many things to Twig_Node_Print as referenced.

joelpittet’s picture

If i'm reading that correctly, that means I win:) It installed just couldn't login which is progress:)

Crell’s picture

joel: Huh. I'm a bit surprised at that. Given what a big deal Twig and Fabien make about auto-escaping in Twig I wouldn't have expected many projects to bypass it. Although admittedly I don't know how many projects have the Russian-doll templating model we do.

joelpittet’s picture

After a couple more tests, seems that TwigReference is preventing {{ dump(_context|keys) }} from producing any output. So the patch in #30 with that print statement will return results in any twig file that gets rendered but without that patch it will print an empty array()

That along with {{ var|raw }} not working with TwigReference make that wrapper a blocker for this issue and I'll continue to keep it out. I may even postpone this issue on #2114563: Remove TwigReference with the help of 'without' filter replacing 'show'/'hide' functions. as it helps remove TwigReference.

joelpittet’s picture

Assigned: Fabianx » Unassigned
Status: Needs work » Postponed
joelpittet’s picture

Status: Postponed » Needs review

Seems the node visitor coming before auto-escape is causing raw filter not to work still. Not exactly sure why, but may be how the Auto-escape node visitor visits the nodes and looks for the raw filter. Could be a bug in the way it goes about finding the raw filter. Though if you set the priority from -1 to 1, raw seems to work as expected but all you get in the compile is this difference:

        // line 39
        echo "<div";
        echo twig_render_var(twig_escape_filter($this->env, (isset($context["attributes"]) ? $context["attributes"] : null), "html", null, true));
        echo ">
  ";

vs

        // line 39
        echo "<div";
        echo twig_escape_filter($this->env, twig_render_var((isset($context["attributes"]) ? $context["attributes"] : null)), "html", null, true);
        echo ">
  ";

Which indicates to me the raw filter is just a flag dictates when to escape or not during the 'node visitor' phase.

Would be nice to combine those two functions, though they do specific jobs so that would likely shoot ourselves in the foot.

joelpittet’s picture

I'm always keeping in the back of my mind, that maybe auto-escape on will not be a huge win.
Any markup we send to a variable in the template will need to be either explicitly marked as Safe or automaticcly done so. The automatic way may open up potential for security holes, the manual/explicit will be a PITA.

That being said, here's some automagic that does some of the variables. Also note, #38 is not resolved or tackled.
I made a subclass of Twig_Markup so I didn't have to type in the charset each time.

mgifford’s picture

Status: Needs work » Needs review
FileSize
6.68 KB

reroll.

joelpittet’s picture

Status: Needs work » Needs review
FileSize
7.96 KB
2.16 KB

@mgifford thanks for the re-roll. Here's some help that will remove a few fails, I like that it passes install that's a good sign:)

joelpittet’s picture

Title: Rely on Twig auto-escape, don't prepare wasted variables in preprocess » Turn on twig autoescape by default.
Status: Needs work » Needs review
FileSize
7.98 KB

Re-roll and title change.

joelpittet’s picture

Status: Needs work » Needs review
FileSize
14.76 KB
7.25 KB

Found bits that were double escaping and marked them as raw/markup. Biggest change is that all _theme output is returned as safe Markup because it's synonymous with including a template in a template.

This is not done still, but brings things a bit closer yet I believe.

joelpittet’s picture

Status: Needs work » Needs review
FileSize
18.07 KB
3.46 KB

A few more...

joelpittet’s picture

Status: Needs work » Needs review
FileSize
22.82 KB
6.79 KB

Not sure what's up with the contextual links but they aren't liking something here... likely because they are Markup objects being passed to jsonresponse.

nod_’s picture

Only thing I can say is that it's not a JS issue:

    $rendered = array();
    foreach ($ids as $id) {
      $element = array(
        '#type' => 'contextual_links',
        '#contextual_links' => _contextual_id_to_links($id),
      );
      $rendered[$id] = drupal_render($element);
    }

    return new JsonResponse($rendered);

This code gives:

{
block:block=bartik_footer:|menu:menu=footer:: {}
block:block=bartik_powered:: {}
block:block=bartik_search:: {}
block:block=bartik_tools:|menu:menu=tools:: {}
views_ui_edit:view=frontpage:location=page&name=frontpage&display_id=page_1: {}
}

Which is useless. instead of {} we should have a HTML string. Don't know why it's broken.

joelpittet’s picture

Status: Needs work » Needs review
FileSize
30.04 KB
10.75 KB

Thanks for looking at that and giving me a hint @_nod

Here's a few more fixes, hopefully another reduction in exceptions and fails.

joelpittet’s picture

Status: Needs work » Needs review
FileSize
56.54 KB

Few more...

joelpittet’s picture

FileSize
27.59 KB

with interdiff.

xjm’s picture

Priority: Normal » Major
Issue tags: +beta target
joelpittet’s picture

Thanks for the bump @xjm.

I'd really like someone to review what I've got so far to make sure I'm taking a viable path to this holy grail. I likely will eventually get to 0 fails, but not sure if I'm on the right path to this goal... or just opening up security holes or system weirdness in the process.

So anybody is welcome to come ping me on IRC or post in this issue their concerns or suggestions and I'll keep trotting along...

David_Rothstein’s picture

My concern with this approach is the issue discussed in #1818266: [meta] A secure theme system (with twig) (which you also touched on a bit yourself in the comments above, such as #39). If we rely on manually marking strings as safe/raw in the template, then we're effectively shifting responsibility for security from the "developer" to "themer", which doesn't sound good.

Here's an example from the patch:

--- a/core/modules/system/templates/status-report.html.twig
+++ b/core/modules/system/templates/status-report.html.twig
@@ -33,9 +33,9 @@
       </td>
       <td class="status-title">{{ requirement.title }}</td>
       <td class="status-value">
-        {{ requirement.value }}
+        {{ requirement.value|raw }}
         {% if requirement.description %}
-          <div class="description">{{ requirement.description }}</div>
+          <div class="description">{{ requirement.description|raw }}</div>
         {% endif %}
       </td>
     </tr>

Before: All the themer had to do was print the variables that were prepared for them. Unless they were doing something non-standard, they didn't have to worry about security at all.

After: The themer needs to understand that "requirement.description|raw" is correct and safe, but "requirement.title|raw" is incorrect and a security hole. (In this specific case I'm not sure it actually is, but the point is that this would be true anywhere where user input is involved and some of it is expected to have HTML and has been pre-filtered with a function like filterXss(), but some of it isn't and is therefore left for the template to auto-escape.) It's a lot of security-related decisions that the person writing the HTML is being asked to make.

joelpittet’s picture

@David_Rothstein thanks, that a very good way to put that and for reviewing the patch!

Though if you look at D7's use of check_plain(). You'll notice it lives in a number of theme_* functions (theme_links, theme_menu_local_task, theme_aggregator_page_opml, theme_file_link, etc) as well a few templates(mostly contrib templates mind you). Preprocess straddles that divide but it's more of a themer's tool than a developers tool as it's just massaging (mostly) and manipulating(rarely) data to be useful in the template and you see a few check_plain()s there too.

So regardless of this proposal/patch. Themer's need to take some part in escaping/unescaping to some degree.

With this approach of auto-escaping on by default, the only time a themer needs to use {{ variable|raw }} is when {{ variable }} inadvertently escapes their markup passed through which is not intended to be escaped. And would provide more an incentive to review their security implications than currently with everything printing raw defaults unless explicitly escaped.

My biggest concern with this patch is that I may be opening up new security holes inadvertently with my approach to it.

joelpittet’s picture

Status: Needs work » Needs review
FileSize
56.03 KB

Re-rolled

sun’s picture

  1. +++ b/core/includes/common.inc
    @@ -915,7 +916,7 @@ function l($text, $path, array $options = array()) {
    -  return '<a href="' . $url . '"' . $attributes . '>' . $text . '</a>';
    +  return new Markup('<a href="' . $url . '"' . $attributes . '>' . $text . '</a>');
    
    @@ -3481,7 +3482,7 @@ function drupal_render(&$elements, $is_recursive_call = FALSE) {
    -  return $elements['#markup'];
    +  return new Markup($elements['#markup']);
    
    @@ -3509,7 +3510,7 @@ function drupal_render_children(&$element, $children_keys = NULL) {
    -  return $output;
    +  return new Markup($output);
    
    +++ b/core/modules/simpletest/lib/Drupal/simpletest/Form/SimpletestResultsForm.php
    @@ -155,7 +156,7 @@ public function buildForm(array $form, array &$form_state, $test_id = NULL) {
    -        $row[] = $assertion->message;
    +        $row[] = new Markup($assertion->message);
    

    Hm. Wrapping every string literal into a Markup object looks like a performance problem.

    It also hard-codes an assumption on the output being HTML throughout the entire system - i.e., theme template preprocessing starts way before the actual theme template system is even initialized + triggered.

    That doesn't look right to me.

  2. +++ b/core/modules/system/templates/details.html.twig
    @@ -17,17 +17,17 @@
    -      <div class="details-description">{{ description }}</div>
    +      <div class="details-description">{{ description|raw }}</div>
    
    +++ b/core/modules/system/templates/select.html.twig
    @@ -12,4 +12,4 @@
    -<select{{ attributes }}>{{ options }}</select>
    +<select{{ attributes }}>{{ options|raw }}</select>
    

    Hm. All of these changes look very dangerous to me.

    Dangerous, because they are not consistent.

    I guess I share @David_Rothstein's concerns.

    That said, the fundamental idea here appears to be that all variables are escaped by default, so the worst that can happen is that stuff gets double-escaped.

    Sans this change proposal, the worst that can happen is that stuff is not escaped at all.

    While that idea makes sense to me, the inconsistency does not. I wonder whether we can find a more creative solution to address that?

    E.g., a trivial pattern, like moving all template variables that contain raw HTML into a separate top-level variable à la html.description?

    Or a more sophisticated approach of explicitly wrapping such variables into a RawHtmlWrapper object in preprocess functions?

    I think we need to explore some more ideas to complement and make sense of the main change proposal.

joelpittet’s picture

@sun Markup == RawHtmlWrapper, which is what I've been doing. It's extending Twig_Markup class which is marked as safe markup for twig. @see http://twig.sensiolabs.org/api/master/Twig_Markup.html

I may have been hasty on a few of those new Markup's and |raw. And the one offs are fairly easy to change/remove/replace but the ones around _theme() and drupal_render() and in l() are the more overarching ones.

Thanks for having a look at this too! Glad to see some people taking notice!

chx’s picture

Assigned: Unassigned » chx

A new approach without raw is being worked on by @joelpittet and me. Stay tuned. Sneak peek: https://drupal.org/node/2208061#comment-8791489

chx’s picture

Priority: Major » Critical

And if anything then this MUST be a beta blocker.

chx’s picture

Issue summary: View changes
FileSize
48.11 KB
chx’s picture

Status: Needs work » Needs review
FileSize
39.4 KB

A bazillion of those are from assertRaw failing which was tried to be patched individually; I have instead patched assertRaw and assertNoRaw and assert itself.

joelpittet’s picture

FileSize
16.74 KB
57.22 KB

Ok here's a few more items and comments to push this, in hopefully the right directions.

There are still many spots to address. Any time we use a SafeMarkup object (previously above called Markup) we have to justify it as preserving what is already safe (when concatinating two SafeMarkup objects for example). Or document why we are marking it as safe.

We want to as chx made clear last night (avoid creating "safe", but do try to preserve "safe")

joelpittet’s picture

Whoops, sorry @chx, cross posted and somehow deleted your patch? Strange...

joelpittet’s picture

Interdiff from @chx #76.

catch’s picture

Issue tags: -beta target +beta blocker

Discussed with Dries, Angie and Alex - decided this should block the beta.

pwolanin’s picture

Discussed a bit with ircmaxwell in IRC (of course)- his reaction would support this being a beta blocker: "switching to Twig, with auto-escaping off, is worse than just using PHP directly". In other words, while Twig is great, using it without auto-escaping gives a false sense of security that will lead to mistakes.

Though he also thought we were hobbling ourselves by the output of t() being safe markup, among other things, but they were beyond the range of fixable by beta.

chx’s picture

Issue summary: View changes
Status: Needs work » Needs review
FileSize
62.46 KB
ParisLiakos’s picture

+++ b/core/lib/Drupal/Component/Utility/String.php
@@ -7,6 +7,8 @@
+use Drupal\Core\Template\SafeMarkup;

+++ b/core/lib/Drupal/Component/Utility/Xss.php
@@ -7,6 +7,8 @@
+use Drupal\Core\Template\SafeMarkup;

Using stuff from \Drupal\Core in \Drupal\Component is a no-no. (we just closed a critical to make our components not Drupal aware)
If we really want to go this way, then those 2 classes should move in the \Drupal\Core\Utility namespace

pwolanin’s picture

Any reason SafeMarkup cannot be under component?

ParisLiakos’s picture

according to the README there, because it depends on Twig.
core/lib/Drupal/Component/README.txt:

Components MAY depend on other Components, but
that is discouraged. Components MAY NOT depend on any code that is not part of
PHP itself or another Drupal Component.

Twig is neither PHP nor Drupal Component

joelpittet’s picture

Re: #93 To be clear Twig is PHP. It depends on a vendor class called \Twig_Markup. That object instance is looked for by twig to see if the markup is safe. We've extended it to make it easier to wrap without having to pass in the charset to the constructor and add some helpers.

So for this

no-no

. Do you have a proposed solution or are we just throwing a wrench in the tires? Sound like an 'ideal world' problem in an imperfect world, though I'm open to ideas...

joelpittet’s picture

Oh I see what you mean by PHP, like \ ... read that wrong, sorry. Nevertheless.

chx’s picture

We just move things into Core from Component, not a biggie. Will make the patch a bit bigger by the move but we will do that at the very end to avoid reroll hell.

xjm’s picture

@chx, can the moving-of-existing-stuff go into HEAD in a separate issue now?

chx’s picture

that will be fun to merge but: yes.

chx’s picture

Issue summary: View changes
FileSize
205.51 KB
dixon_’s picture

Only made it through 1/3 of the patch. But here's some initial feedback.

Architecturally I don't have much to comment on, generally the approach makes a lot of sense. @chx was very kind to walk me through the changes in the coder lounge. The only problem I have is that PHP makes us go through a lot of hoops to make this work which is quite annoying :) So the DX is taking a small hit with this patch, but I guess we just need to make sure the reasoning behind this approach is properly documented.

  1. +++ b/core/includes/install.core.inc
    @@ -1709,9 +1710,9 @@ function install_finished(&$install_state) {
    -  drupal_set_message(t('Congratulations, you installed @drupal!', array(
    +  drupal_set_message(SafeMarkup::create(t('Congratulations, you installed @drupal!', array(
         '@drupal' => drupal_install_profile_distribution_name(),
    -  )) . $pifr_assertion);
    +  )) . $pifr_assertion));
    

    Shouldn't t() return SafeMarkup itself?

  2. +++ b/core/includes/theme.inc
    @@ -2021,7 +2024,7 @@ function template_preprocess_html(&$variables) {
    +  $variables['head_title'] = SafeMarkup::implode(' | ', $head_title);
    

    @chx Referring to our discussion in the code lounge just now, could you please elaborate a bit more on why you want to remove SafeMarkup::implode() etc. Personally I think they make sense in places like this.

  3. +++ b/core/includes/bootstrap.inc
    @@ -1180,6 +1181,9 @@ function watchdog($type, $message, array $variables = array(), $severity = WATCH
       if ($message) {
    +    if ($message instanceof SafeMarkup) {
    +      $message = chr(0) . $message;
    +    }
         if (!isset($_SESSION['messages'][$type])) {
           $_SESSION['messages'][$type] = array();
         }
    @@ -1223,6 +1227,13 @@ function drupal_set_message($message = NULL, $type = 'status', $repeat = FALSE)
    
    @@ -1223,6 +1227,13 @@ function drupal_set_message($message = NULL, $type = 'status', $repeat = FALSE)
      */
     function drupal_get_messages($type = NULL, $clear_queue = TRUE) {
       if ($messages = drupal_set_message()) {
    +    foreach ($messages as $message_type => $messages_typed) {
    +      foreach ($messages_typed as $key => $message) {
    +        if (!$message instanceof SafeMarkup && $message[0] == chr(0)) {
    +          $messages[$message_type][$key] = SafeMarkup::create(substr($message, 1));
    +        }
    +      }
    +    }
         if ($type) {
    

    Nitpick: I think we need some comments explaining the reason behind the chr(0) technique here.

  4. +++ b/core/includes/theme.maintenance.inc
    @@ -178,6 +181,7 @@ function theme_authorize_report($variables) {
       }
    +
       return $output;
     }
    

    Nitpick: Unnecessary newline.

  5. +++ b/core/modules/field/src/Plugin/views/field/Field.php
    @@ -685,11 +687,11 @@ public function submitGroupByForm(&$form, &$form_state) {
           if ($this->options['multi_type'] == 'separator') {
    -        return implode(Xss::filterAdmin($this->options['separator']), $items);
    +        return SafeMarkup::create(implode(Xss::filterAdmin($this->options['separator']), $items));
    

    Can we not use SafeMarkup::implode() here?

  6. +++ b/core/modules/locale/templates/locale-translation-update-info.html.twig
    @@ -21,7 +21,7 @@
       {% if modules %}
    -    {% set module_list = modules|join(', ') %}
    +    {% set module_list = modules|join(', ')|raw %}
    

    I'm not sure I understand why raw is needed here?

  7. +++ b/core/modules/responsive_image/responsive_image.module
    @@ -253,7 +254,7 @@ function theme_responsive_image($variables) {
    +    return SafeMarkup::create(implode("\n", $output));
    

    Use SafeMarkup::implode() instead?

chx’s picture

Status: Needs work » Needs review
FileSize
240.28 KB

1. it does
2. only safemarkup::implode will stay, the concat will be removed. strReplace is already removed.
3. commented
4. removed
5, 7. done
6. because otherwise Twig double escapes.

chx’s picture

Status: Needs work » Needs review
FileSize
241.41 KB
2.56 KB

So fixing some of #100 broke the patch; strReplace removal is postponed and one of the implodes is back...

chx’s picture

Some eightnine followup issues have been filed. If your review finds issues please consult the Child issues in the sidebar before reporting them. This issue is past overdue by 1.5 years at least, getting in quick and then farming out tidying seems better than making a 250kb patch even bigger.

moshe weitzman’s picture

I reviewed this patch and don't have much good to say about it. I mean, it is great that auto-escape is turned on. To me, that's the only thing that could make the twig tradeoff worthwhile. Twig already added deployment complexity as we now compile .twig to php in the shared filesystem. And the autoescape proposed by this patch degrades our DX substantially. I'm not going to block this patch, but I'm not going to cheerlead for it either. Below are a couple of DX degradation:

-  drupal_set_message(t('Congratulations, you installed @drupal', 
+  drupal_set_message(SafeMarkup::create(t('Congratulations, you installed @drupal!',
-          $groups[$info['group']][$class] = $info;
+          $groups[(string) $info['group']][$class] = $info;
pwolanin’s picture

I'm a bit confused about the code in drupal_set_message(). I assume the session data is serialized - why can't the SafeMarkup be serialized and unserialized instead of using the chr(0) hack. That smells fishy to me in any case.

joelpittet’s picture

@moshe weitzman just because that snippet was a bit confusing for me and maybe others, here is the whole context re #105

+++ b/core/includes/install.core.inc
@@ -1709,9 +1710,9 @@ function install_finished(&$install_state) {
-  drupal_set_message(t('Congratulations, you installed @drupal!', array(
+  drupal_set_message(SafeMarkup::create(t('Congratulations, you installed @drupal!', array(
     '@drupal' => drupal_install_profile_distribution_name(),
-  )) . $pifr_assertion);
+  )) . $pifr_assertion));

Because concatenation flattens the safe object back to an unsafe string it needs to wrap it back up as safe. t() returns SafeMarkup so without the concatination that that SafeMarkup::create() wrap wouldn't be necessary and if you put that $pifr_assertion in a !passthrough or @escaped token it wouldn't be needed for that example.

chx’s picture

Re #106 session is not serialized but http://www.php.net/manual/en/function.session-encode.php session encoded; any better solution is warmly welcome. Also? That cleanup can be a followup. Regarding #105, Joel mostly answered it all, I just want to note that in all the codebase we have only 100 SafeMarkup:: calls of any sort and a significant portion of that is in theme functions which will go away and/or caused by hardwiring HTML, small snippets of HTML but still in module code. Very important:

If you are solely using render arrays with your HTML in Twig templates you never need to create SafeMarkup manually.

chx’s picture

FileSize
240.12 KB

Rerolled after the TranslationWrapper patch and removed the chr(0) hack from drupal_[sg]et_message functions. Hopefully this passes.

David_Rothstein’s picture

Issue tags: +Needs benchmarks

I can imagine many of the SafeMarkup calls going away, but the strings masquerading as objects wouldn't, right? Moshe had one example and here are a couple more:

-    'label' => t('Machine name'),
+    'label' => (string) t('Machine name'),

......

       $bundle_label = String::checkPlain($bundles[$bundle]['label']);
-      $return[$bundle_label] = $entity_ids;
+      $return[(string) $bundle_label] = $entity_ids;

Kind of ugly.

This patch is basically implementing option #1 from #1818266: [meta] A secure theme system (with twig) (as written in that issue's issue summary), but the discussion there quickly moved away from the strings-as-objects approach (presumably because of the DX implications) and instead talked more about registering the safe strings some other way. What's the reason for going back?

The performance implications of that approach weren't great either (latest benchmarks here) and this looks like it could be the same or worse (however, the memory usage should be better which is one advantage of doing it as objects).

@@ -994,7 +999,7 @@ function template_preprocess_views_view_row_rss(&$variables) {
....
-  $variables['description'] = String::checkPlain($item->description);
+  $variables['description'] = $item->description instanceof SafeMarkup ? $item->description : String::checkPlain($item->description);

I don't understand examples like this... can't it just be $item->description and the idea is that Twig takes care of the rest?

Also, at this point should #1818266: [meta] A secure theme system (with twig) be closed as a duplicate? I don't get the difference between the two issues anymore. If so, Fabianx should get major commit credit in anything that happens here.

David_Rothstein’s picture

Haven't fully thought this through but is there a way to have the marking-strings-as-safe happen only in the preprocess layer (and afterwards) rather than throughout Drupal? Then if they have to be treated as objects the impact is more limited.

Preprocess functions are already responsible for figuring out what's safe anyway.

Pseudo-code:

function mymodule_preprocess_THEME_HOOK(&$variables, $html_generator) {
  // Will be filtered and automatically marked as safe.
  $variables['description'] = $html_generator->filterXss($item->description);

  // Not marked as safe; will be automatically check-plained by Twig.
  $variables['title'] = $item->title;

  // Will be filtered and automatically marked as safe (but unnecessary since
  // the above works just as well).
  $variables['title'] = $html_generator->checkPlain($item->title);

  // Some HTML that came from elsewhere in Drupal but that we know is OK; mark
  // it as safe manually.
  $variables['some_html'] = $html_generator->alreadySafeHTML($some_html);

  // Entire object, parts of which are not safe so the whole thing is treated
  // as not safe. Can the "raw" function also be disabled in the template (to
  // prevent people who don't know what they're doing from calling
  // {node.title|raw} and removing the security protection)?
  $variables['node'] = $node;
}
chx’s picture

Issue tags: -Needs benchmarks

I have implemented this from ground up and there is no other way to do this; in that meta whatever is suggested is 'less secure'. Here's the train of thought of why this alone works:

  1. Twig creates Twig_Markup objects
  2. t() contains tags often in the string but also the arguments are wrapped in <em> so we need to make sure it is not double escaped so it needs to return Twig_Markup.
  3. Now you are screwed by PHP and also you need to rewrap your strings.

I am sure there are vague ideas out there but there's nothing that actually works.

In general, I would like to say this patch and the understandable reaction is the classic case of atomic plant and the bike shed. Noone understood the routing issues where the title XSS was conceived, reviewed and committed. Now you understand this issue because it's very easy. I would like to only see the objections of those who objected changing Drupal from secure by default to insecure by default! For the same reason, while I will benchmark this, I completely refuse to accept any objections on performance grounds. You all wanted this default-by-insecure system, it is not unintentional because the title change notice contains this behavior change -- and the exploit code for the security hole as well. Which proves that the escape-by-default behavior of drupal_set_title in Drupal 7, may it rest in peace, was the right way. But you were having none of that and removed one of the very few mechanisms that actually protected a variable from XSS -- and put the exploit code in the change notice while at it. And to be clear, the page title is just the symptom, the changing from secure by default to insecure by default is a mindset, a mindset dangerous beyond imagination.

You wanted this music, now pay the piper.

Also, in closing, already D7 had XSS in contrib commonly and we protect from that too.

chx’s picture

In specifics, #111 is acting too late -- you want to mark strings as secure when they become secure and not do some praying that whatever is passed to a template preprocess function happens to be secure. Yes, there is some of that praying in this patch but we will see that it is gone and we will go back to the root of every string. You can't do that if you slap $html_generator->alreadySafeHTML($some_html); in the preprocess -- how the heck do you know $some_html is secure? You can't.

For a concrete example, in TitleResolver::getTitle this $route_title = $this->t($title, $args, $options); is secure but $route_title = call_user_func_array($callable, $arguments); this might or might not be insecure. Refer to the page title change notice for a callable that is insecure and watch how this patch automatically fixes it.

pwolanin’s picture

Having spent some time discussing with chx, I think this is the right approach given the limitations of PHP, and we have to accept some rough edges initially, but with continued Twig conversion, the existing follow-up issues, etc, you will almost never need to think about the SafeMarkup class when all is said and done.

Twig having auto-escape on will be an unbelievably huge win for D8 security in practice, and will allow us to close up automatically a lot of otherwise gaping security holes currently in 8.x.

chx’s picture

FileSize
238.34 KB

Tons of pretty doc fixes from pwolanin; made Moshe's example more readable.

chx’s picture

FileSize
237.15 KB

I have found the changes to AjaxResponseRenderer are not necessary -- they were added before we discovered the useful JsonSerializable interface.

moshe weitzman’s picture

+class SafeMarkup extends \Twig_Markup implements \JsonSerializable {

Doesn't this pretty much assumes that the site is using the Twig theme engine? If we intend to cripple all other theme engines, we should postpone this issue on #1537050: [meta] Should we keep / improve multiple theme engine functionality?.

@oelpittet - thanks for explaining that one instance of drupal_set_message(). Makes sense.

@chx - I don't understand what you are trying to say in #110 about title XSS and insecure by default. Could you link to an issue or two? Is i still possible rollback other changes as an alternative to this patch? Just curious as having another option here would be helpful.

chx’s picture

Good luck getting every issue leading to #2264041: Add a test to ensure title callbacks are not vulnerable to XSS rolled back but that still won't protect contrib like this patch would.

This doesn't really bind you to Twig; merely you will be using an object that Twig provides; if you read Twig_Markup it's a very simple class independent from the rest of Twig.

pwolanin’s picture

So, I'm not sure what remains to remove the RenderWrapper, but addressing that so that twig core doesn't need to be patched seems like the main remaining task here.

joelpittet’s picture

RenderWrapper removal is assigned to @catch to review.

chx’s picture

We can unhack Twig once RenderWrapper is gone. We know there's a plan and it'll be. I really would like to see this in so that we can start on the many (and some of them are quite big) followups.

markhalliwell’s picture

Where I see a lot of "hesitation" is around the [backend] DX. Yes, this does make the DX a little more difficult on the back-end, but as it's been explained to me, this is just because of limitations in PHP. That's unfortunate, but is not a valid excuse for blocking this issue. We must make core safe.

IMO though, this actually enforces/encourages markup to be put in templates.. where it should be in the first place. In my eyes, this is really just a paradigm shift in the preprocess layer:

Before:
"I have to escape my data because it may be vulnerable to an XSS attack."

After:
"Why is my template escaping my HTML? Oh I have to explicitly define that my markup is safe."

I'm not going to RTBC this just yet (due to the recent discussion and patches), even though I think it really should be. We can follow-up with child issues if/when it is necessary.

pwolanin’s picture

2 items that seem a little funny:

SafeMarkup::create()

Returns an empty string instead of an object if the input is an empty string. Why is that preferred? It would seem to lead to inconsistency?

 /**
  * This class represents an HTML element that appears in the HEAD tag.
  */
-class HeadElement {
+class HeadElement extends SafeMarkup {
 

Could at least do with a code comment to explain what's happening.

ParisLiakos’s picture

This doesn't really bind you to Twig; merely you will be using an object that Twig provides; if you read Twig_Markup it's a very simple class independent from the rest of Twig.

Why dont we create our own Markup class under Drupal\Component then and have SafeMarkup extend it in the same namespcae?
a) It removes an unnecessary dependency to the whole Twig from a *lot* of stuff
b) i won't bump #2280963: Refactor use of SafeMarkup in HWLDFWordAccumulator to major (because its Drupal\Component and uses a class from Drupal\Core)
c) we dont have to do #2280959: Untangle the Core-Component tangle of Twig autoescape

Yes we might be duplicating a 20 line class, but i it saves us a ton of troubles, i really think it is worthy

pwolanin’s picture

@ParisLiakos - because then Twig itself won't respect the content as being safe and will double-escape it.

That's the whole reason SafeMarkup extends Twig_Markup.

chx’s picture

> Returns an empty string instead of an object if the input is an empty string. Why is that preferred? It would seem to lead to inconsistency?

{% if foo %}

When foo is an object that always passes. Even if holds the empty string. You are bumping into the twin of #953034: [meta] Themes improperly check renderable arrays when determining visibility .

class HeadElement extends SafeMarkup {

Well, if you look at the class hiearchy, everything produced is indeed safe markup and I do not really what comment are you looking for.

ParisLiakos’s picture

re #129:
well, then we could have
Drupal\Core\Template\SafeMarkup with extends \Twig_Markup and proxies everything to Drupal\Component\Template\SafeMarkup
but dunno how much ugly is it:)
nvm this wont work..
trying to avoid the PITA of moving String class

chx’s picture

FileSize
234.93 KB

The comment pwolanin wanted is already there, added a @see to be found. Joelpittet rerolled against HEAD. So, new patch. The decrease in size is due to #2138073: Remove module_load_include() call from NodeController::revisionOverview().

David_Rothstein’s picture

I'm still interested in this:

This patch is basically implementing option #1 from #1818266: [meta] A secure theme system (with twig) (as written in that issue's issue summary), but the discussion there quickly moved away from the strings-as-objects approach (presumably because of the DX implications) and instead talked more about registering the safe strings some other way. What's the reason for going back?

There was working code in that issue too (or at least working enough to benchmark) that did exactly that. It's a similar approach as this patch, but the DX was better since it never treated strings as objects (though at the cost of some uglier internal code, and possibly a larger memory hit).

---

Regarding #111, I don't think it's a problem for PHP developers to continue having to know where the strings they're working with came from (and if they're safe). Although it would definitely be nice if they didn't have to worry about it at all either :)

chx’s picture

The meta contained no patch and I honestly forgot what and how it tried to achieve; it was woefully incomplete; however there's now an autocomplete2 branch which explores the possibility of keeping strings scalars and storing all the strings that go through SafeMarkup::create in a static property on that object. I have reverted all the tests to see how badly we are hurt but all the work on this issue was definitely not in vain because other parts of core; all the SafeMarkup:: calls are definitely necessary. If it works out then I will post it in here in perhaps two days. Stay tuned.

David Rothstein, thanks so much!

Edit: the first attempts come up at 109 fail(s), and 42 exception(s) proving this is a very viable approach.

Edit2: Down to 58 fail(s), and 0 exception(s). New patch and issue summary coming soon.

Fabianx’s picture

I think the combination of #132 and https://qa.drupal.org/pifr/test/805208 is a viable one:

a) Through the usage of a factory class, the DX is very nice and very D8-drupaly. +1 to the SafeMarkup::create() and SafeMarkup::is() methods.
b) The biggest concerns around using an isset() array as a safe string registry had been that we store that as a global. The current patch uses a static class, if that is a problem we however could still easily transform that into some kind of SafeString service that is injected. In any case without having to change any code outside of SafeString class, we can have safe strings with nice DX, too. +1 for that, too.

Unrelated:

I don't think we need a Twig Vendor Patch, as we can easily check for "$obj instanceof RenderWrapper" within twig_render unless I am mistaken totally.

Also: yay!

Thanks so much to chx and joelpittet to making this a test-passing reality!

Crell’s picture

I'm generally +1 on designing APIs that are only ugly if you're using them wrong, as it helps encourage people to use them right. (ie, put markup in templates where it belongs.) PHP developers already need to know where their strings come from and if they're safe if they want to avoid sec holes. They're just currently on their own to keep track of it rather than having code to help them do it.

That said:

+++ b/core/lib/Drupal/Core/Page/HeadElement.php
@@ -8,11 +8,14 @@
-class HeadElement {
+class HeadElement extends SafeMarkup {

This doesn't make sense to me. Wouldn't we want the __toString() to return a SafeMarkup object rather than adding it to the class hierarchy? I'd rather avoid adding even more parent classes if we can avoid it, and adding a dependency on the Twig namespace to the Page objects.

Fabianx’s picture

#137 I believe with the new approach in #134 indeed the __toString method would instead call SafeMarkup::create() on its return.

chx’s picture

Issue summary: View changes
FileSize
83.67 KB

New approach: we store strings in SafeMarkup. Strings are strings.

Fabianx’s picture

Status: Needs work » Needs review
FileSize
83.67 KB

Too excited to not fix the small typo ...

Fabianx’s picture

Here are the next steps:

Besides some nits like using SafeMarkup::implode directly and figuring out why we suddenly need to checkPlain strings within ->assertText, the biggest culprit is the amount of |safe that is needed. Oh and we could also turn the static class into an injected service, that is just a search and replace mainly and chx had a POC patch for that, but could also be a safe follow-up.

Luckily things like '<span' | safe are things that Twigs own autoescape already takes care of automatically during compile time and after researching the background of the twig autoescape (found that for the first time) things got clear to me, how to leverage both twig and our own approach together.

Background links:

* https://github.com/fabpot/Twig/issues/4
* https://github.com/fabpot/Twig/pull/158/files

which explain a lot why twig uses what its uses.

Approach

The trick is so simple and obvious that I did not see it:

- Replace the twig_escape_filter with our own by using a NodeVisitor with a very heigh weight so that it runs after all others.

We can then choose to only support HTML format for our own safe strings - or extend SafeMarkup::create to have an optional 'html' parameter, but again could be follow-up.

So the deal is:

- Format == 'html' - check SafeMarkup::is() and checkPlain if not, also check Twig_Markup
- Format != 'html' - call original twig_escape_filter

This will also automagically get the 'raw' filter to work - without having to convert into an Object (twig does not add the escape filter if a raw filter is present in the chain during compile time.)

Oh and maybe add a print_secure PHP function that can be used from non-twig engines and just does a SafeMarkup::is() check and checkPlains if not.

I am leaving this patch for some review by others.

Future steps

In a more advanced implementation, the escape filter could do a lot of what twig_render_var is already doing, so we would be removing our custom print function - if the escape filter is present in the chain to avoid the double function call - but thats kinda micro-optimization, so really a future step.

Fabianx’s picture

FileSize
84.83 KB

Turn on twig autoescape by default and it all works!

Next major steps:

a) See if this passes tests - it should
b) Remove as much |safe calls as possible or replace with |raw filter that now works!
c) Fix assertText weirdness
d) cleanup code to use ::implode directly where possible and ::create where its not needed.
e) Optimize out our twig_render_var as drupal_escape_filter comes immediately before it and does the same, so is aware of render_arrays, etc.!

Next normal steps:

f) Add a print_secure / p function for phptemplate
g) Maybe Dependency inject the SafeMarkup class
h) Support multiple escaping strategies in SafeMarkup with core only using 'html' to future-proof it; support 'all'

chx’s picture

Status: Needs work » Needs review
FileSize
84.96 KB

The check_plain => checkPlain patch broke a few use statements. The interdiff is between #141 and #143 ; this is just a reroll.

Fabianx’s picture

Status: Needs work » Needs review
FileSize
87.2 KB

New patch attached, which corrects test failures and also adds a |safe_join filter, which uses SafeMarkup::implode() instead.

Fabianx’s picture

FileSize
86.95 KB

And another patch:

- Remove |safe filter completely and use |safe_join or |raw instead.
- Optimize template PHP code: Either add the twig_drupal_escape_filter or twig_render_var

Fabianx’s picture

Status: Needs work » Needs review
FileSize
90 KB

- Remove @todo that is resolved
- Add two helper functions: print_safe and p for compatibility with phptemplate and other engines
- Add $strategy parameter to SafeMarkup
- Fixed TokenTrans parser to also work with Drupal's TwigNodeVisitor off.
- Mark trans filters as safe within twig

joelpittet’s picture

This is looking very good so far. Thanks @chx and @Fabianx for looking into another approach. I'm ok with both approaches btw. The DX on our first approach was not ideal to say the least but there are some trade offs to consider in any approach we take...

My biggest concern and I think may be the big difference here is that this latest approach may not scale well with storing strings in a global array. I'll see if I can run a profiling with a 100 nodes on the homepage and see what I get with xhprof_kit between the two approaches(will likely need to reroll the first one due to the volume of core commits in the last week! :-) ).

re #150
@Fabianx I don't think we need to add those two helper functions to drupal core of print_safe() and p(). They add another drupalism like the l() function that is though easy to type is not intuitive by looking at it what it does. Nor do I think we need to support phptemplate to that extent. To avoid any contention, if you think we should have those two functions, post them in a follow up and help avoid this issue from any unnessasary bikeshedding and feature creep.

Thanks for marking those filters as safe, I meant to do that in the fixes for TwigTrans!
And woo hoo for this patch being 90K vs the 230K+!

And lastly, I agree we should likely just replace the |join filter with ours to keep things seamless. And maybe even call twig's join function to keep it in line with upstream? I'm fine with either way just thought I'd |up and speak my mind there because I noticed the @todo/comment.

Fabianx’s picture

> My biggest concern and I think may be the big difference here is that this latest approach may not scale well with storing strings in a global array. I'll see if I can run a
> profiling with a 100 nodes on the homepage and see what I get with xhprof_kit between the two approaches(will likely need to reroll the first one due to the volume of
> core commits in the last week! :-) ).

== Memory wise

The maximum I measured was 1.7 MB for 42 nodes so far, with the minimum being 700kB.

With the formula:

strlen(all_strings) < page_size

as an upper bound, this cannot be worse than the overhead of our arrays of doom - anyway :-D.

And if it was worse, then we have a problem in Drupal anyway, because that means we create lots of strings we don't need.

== Performance wise:

It should be faster as object creation is quite expensive (as shown by removing the TwigReference objects)

chx said it already in #1818266: [meta] A secure theme system (with twig) you cannot beat a hash table and storing in an array and using isset() is using an efficient access to a hash table (isset does not have function overhead).

The last I measured way back when Twig got in was 3.5% overhead in terms of time and 2.9% in function calls, which is minimal compared to all the overhead added within core elsewhere. Lets see where this stands :).

But yes, looking forward to some benchmarks.

> re #150
> @Fabianx I don't think we need to add those two helper functions to drupal core of print_safe() and p(). They add another drupalism like the l() function that is though
> easy to type is not intuitive by looking at it what it does. Nor do I think we need to support phptemplate to that extent. To avoid any contention, if you think we should
> have those two functions, post them in a follow up and help avoid this issue from any unnessasary bikeshedding and feature creep.

Okay, I'll create a follow-up, but core maintainers, webchick in particular, had stated that if we support autoescape by _default_ on in core, that we then need to support a way in templates. p is a function that was used in rails 2.3 (actually its called 'h' though - http://www.railsdispatch.com/posts/security).

catch in particular wanted auto-escape especially so we could remove lots of checkPlain calls that are then done automatically - so auto-escape on by default is
important to be able to reap potential performance benefits even.

So I can revert that commit, its GIT, its no problem, but we will need a helper - if we are serious about it being on by default and don't throw away phptemplate at the same time. As that means Drupal 8 supports phptemplate themes in a way (we have a test for that), so we need to ensure they can be used securely.

>Thanks for marking those filters as safe, I meant to do that in the fixes for TwigTrans!

:)

> And woo hoo for this patch being 90K vs the 230K+!

Yes, and it will get even smaller.

> And lastly, I agree we should likely just replace the |join filter with ours to keep things seamless. And maybe even call twig's join function to keep it in line with
> upstream? I'm fine with either way just thought I'd |up and speak my mind there because I noticed the @todo/comment.

My biggest concern is that someone does something stupid like:

[1,2,3]|join(user_xss_string)

which you probably won't do if you need to do:

[1,2,3]|safe_join(user_xss_string)

and check documentation of safe_join before. What did @fabpot say about it? Do you have the upstream link? He is usually quite sensible to security related issues.

== Main todos left here:

- check assertText weirdness
- remove concat (performance hog without benefit)
- clean up more

joelpittet’s picture

Here's that upstream join issue.
https://github.com/fabpot/Twig/issues/1420

Regarding the helper functions, we seem to be deprecating them, for instance check_plain() in favour String::checkPlain() so I don't see why we wouldn't use a helper method of the domain object (SafeMarkup). But yeah would rather have that discussion in a follow up and like I said not bikeshead this issue if I can help it. Thanks ahead for the revert.

Great point on the string array memory vs arrays of doom, I'll still likely do a profile but that is good to keep in mind.

Fabianx’s picture

Issue tags: +Needs manual testing
FileSize
88.18 KB

- Removed helper functions per #153
- Fix bug with missing translation t() call

== Main todos left here:

- check assertText weirdness: DONE for 1, TODO for 1

-    $this->assertText(String::format('Deleted and replaced configuration entity "@name"', array('@name' => $name_secondary)));
+    // @todo Revisit for a better solution.
+    $this->assertText(String::checkPlain(String::format('Deleted and replaced configuration entity "@name"', array('@name' => $name_secondary))));

This is a drupal_set_message, which is triggered originally from ConfigImporter::checkOp().

- remove concat (performance hog without benefit) - concat is needed as one string is safe, the other one not - DONE
- clean up more - DONE so far I can see.

==

Remaining work:

-    $this->assertText(String::format('Deleted and replaced configuration entity "@name"', array('@name' => $name_secondary)));
+    // @todo Revisit for a better solution.
+    $this->assertText(String::checkPlain(String::format('Deleted and replaced configuration entity "@name"', array('@name' => $name_secondary))));

Fix properly by finding out why that string done deep in configImporter is escaped on output even its wrapped by $this->t().

I don't know how to fix it, as I don't even know how to get to this code path.

==

Next steps:

- This is ready for some serious review in terms of reviewing the patch and the security.
- This needs some heavy manual testing - as all admin screens should be checked for double-escaped output not caught by tests.
-- An alternative would be for someone to 'record' all output of content retrieved during test runs, name it after the test and compare two automated test runs.

dawehner’s picture

I am pretty sure timm will help you on that, see https://drupal.org/node/2229187

ti2m’s picture

I can't promise anything, but in theory I should be able to detect any changes by running regression tests with siteeffect on all the unique backend urls. I'm not able to check any intermediate pages like the install process, node deletion, etc... though. But I'll give the "direct" backend urls a try and let you know about the results.

joelpittet’s picture

@ti2m you may not be able to get them all, though it could save us a good chunk in the regression(escaped HTML manual testing) that we'd otherwise leave up to people to find after the patch was committed. So this would help tons, IMO! Thanks @dawehner for linking these up and @ti2m for your work on that cool set of regression tests!

Fabianx’s picture

Status: Needs work » Needs review
FileSize
90.71 KB

Changelog

- Add functions to set / get the SafeMarkup strings to persist them across page requests.
- Persist SafeMarkup state within $form_state['build_info']['safe_strings'] and $batch['safe_strings'].
- Remove previous work arounds.

Interdiff:

http://cgit.drupalcode.org/sandbox-chx-1857558/diff/?h=autoescape2--fabi...

Explanation

As both $form_state and $batch are per user or at least per role, this should be safe to use for this cases.

Next steps

- Await report from @t2im for how many pages we break
- Manual Testing
- Serious security and patch reviews of the patch

ti2m’s picture

Quick feedback, this seems to work, at least it will catch some of the double-escaped markup. Ran the first set of urls and already found two examples on
/admin/config/regional/date-time/formats/manage/long and /admin/structure/block/list/seven Will take a closer look at it tomorrow and try to run all the urls.

ti2m’s picture

UPDATE: I ran all unique (by router item) urls of a vanilla D8 install (including the patch in #159) that the build in siteeffect crawler found. From the 200+ urls, the following had escaped output:

  • /admin/modules/uninstall
  • /admin/config/regional/date-time/formats/add
  • /admin/config/people/accounts/fields/user.user.user_picture
  • /admin/config/content/formats/manage/basic_html
  • /admin/structure/block
  • /admin/structure/block/list/seven
  • /admin/structure/menu/manage/footer
  • /filter/tips
  • /admin/structure/views/nojs/rearrange-filter/frontpage/page_1

Is that what you are looking for? Then I would proceed and enable all core modules, so far I only used the standard profile.

Fabianx’s picture

@ti2m: This is a great start!

joelpittet’s picture

Xhprof kit Profiling Scenario

  1. 50 nodes on the homepage view
  2. bartik standard profile
  3. blocks: who's online, recent content, recent comments
  4. Images and comments added to nodes.
  5. twig c extension and xdebug turned off.
  6. Render cache turned off
=== 8.x..8.x compared (539fd0a78f1b7..539fd2109980d):

ct  : 313,812|313,812|0|0.0%
wt  : 1,697,643|1,704,440|6,797|0.4%
cpu : 1,684,208|1,690,428|6,220|0.4%
mu  : 53,708,848|53,708,848|0|0.0%
pmu : 53,866,712|53,866,712|0|0.0%

http://www.lionsad.de/xhprof-kit/xhprof/xhprof_html/?run1=539fd0a78f1b7&...

Run 539fd0a78f1b7 uploaded successfully for drupal-perf-joelpittet.
Run 539fd2b68bc13 uploaded successfully for drupal-perf-joelpittet.

=== 8.x..1825952-safemarkup-string-theory compared (539fd0a78f1b7..539fd2b68bc13):

ct  : 313,812|319,209|5,397|1.7%
wt  : 1,697,643|1,738,702|41,059|2.4%
cpu : 1,684,208|1,724,940|40,732|2.4%
mu  : 53,708,848|55,383,616|1,674,768|3.1%
pmu : 53,866,712|55,526,888|1,660,176|3.1%

http://www.lionsad.de/xhprof-kit/xhprof/xhprof_html/?run1=539fd0a78f1b7&...

ct = function calls, wt = wall time, cpu = cpu time used, mu = memory usage, pmu = peak memory usage

Fabianx’s picture

#163: @joelpittet: That looks wonderful!

catch’s picture

fwiw I'd be fine opening a separate follow-up with any outstanding double-escaping issues on it - those are annoying bugs but not security issues and shouldn't hold this particular issue up.

xjm’s picture

fwiw I'd be fine opening a separate follow-up with any outstanding double-escaping issues on it - those are annoying bugs but not security issues and shouldn't hold this particular issue up.

Agreed. So long as we document as many as possible here (thanks @ti2m!) I think those are acceptable small regressions/followups for getting this issue in.

ti2m’s picture

Another (and the last) update on the crawled urls. I enabled all modules on a fresh install and crawled the site as user 1. I only found two more urls with double escaped strings (first two in the list below). But the general problem is, that e.g. node edit forms aren't covered at all as no node exists on a vanilla install. I could post a file with all covered urls, roughly 300, if anyone is interested.

The total list of urls with double escaped strings that I found:

  • /admin/config/regional/translate/settings
  • /admin/config/development/logging
  • /admin/modules/uninstall
  • /admin/config/regional/date-time/formats/add
  • /admin/config/regional/date-time/formats/manage/long
  • /admin/config/people/accounts/fields/user.user.user_picture
  • /admin/config/content/formats/manage/basic_html
  • /admin/structure/block
  • /admin/structure/block/list/seven
  • /admin/structure/views/nojs/rearrange-filter/frontpage/page_1
  • /admin/structure/menu/manage/footer
  • /filter/tips

Let me know when I should test another version of the patch at some point (or another patch in general).

Fabianx’s picture

@ti2m:

Thanks for the very nice list!

I have a working implementation now to visit all URLs using the test suite, but would like to follow up on that here: #2229187: SiteEffect: Automated frontend regression testing to not distract this issue further. It would be great if you could respond there.

Thanks!

For this issue:

- We need reviews and security reviews of the patch!

xjm’s picture

Looking at this today.

I think the API changes section is not up to date? The first thing it mentions is one of the followup issues. Also, I think the proposed commit message could probably be updated as well.Maybe it could also mention ti2m for the thorough testing.

xjm’s picture

Assigned: chx » xjm

Started by just reviewing SafeMarkup. More to follow.

  1. +++ b/core/lib/Drupal/Component/Utility/SafeMarkup.php
    @@ -0,0 +1,104 @@
    + * A class marking strings as already escaped for XSS purposes.
    + */
    +class SafeMarkup {
    

    So, I think maybe this would be the place to add more detailed documentation of how SafeMarkup should be used, and what happens during rendering and in the theme layer?

    Technically, the docblock should start with a verb. Maybe: "Marks strings that are already escaped for XSS purposes."

  2. +++ b/core/lib/Drupal/Component/Utility/SafeMarkup.php
    @@ -0,0 +1,104 @@
    +   * Marks strings as being secure.
    

    "Adds a string to a list of strings marked as secure."

  3. +++ b/core/lib/Drupal/Component/Utility/SafeMarkup.php
    @@ -0,0 +1,104 @@
    +   * @param string $strategy
    +   *   The escaping strategy.
    ...
    +   * @param string $strategy
    +   *   The escaping strategy.
    

    This parameter needs more detailed explanation as to how it should be used. Are there other possible values in HEAD for the escaping strategy than 'html'? (Presumably contrib/users can also define their own escaping strategies?) And what does 'html' mean exactly in this case?

    We only need to add a detailed explanation of what the escaping strategy for is in one place, and then we can add @see to the detailed explanation elsewhere.

  4. +++ b/core/lib/Drupal/Component/Utility/SafeMarkup.php
    @@ -0,0 +1,104 @@
    +   * Checks if a string is safe to output.
    ...
    +  public static function is($string, $strategy = 'html') {
    

    is() is a weird name. Yes, the string exists. Did we not name it isSafe() on purpose? I guess it's static so it's always going to be SafeMarkup::is()... still a bit yoda. :)

  5. +++ b/core/lib/Drupal/Component/Utility/SafeMarkup.php
    @@ -0,0 +1,104 @@
    +    return isset(static::$safeStrings[(string) $string][$strategy]) ||
    +      isset(static::$safeStrings[(string) $string]['all']);
    ...
    +   * Concatenates strings in a safe manner.
    ...
    +      foreach ($strategies as $strategy => $value) {
    +        $string = (string) $string;
    +        static::$safeStrings[$string][$strategy] = TRUE;
    

    We are marking every single string in the passed-in list as TRUE -- regardless of what the value is! So if a string were marked as FALSE in the previous request, it would be switched to TRUE here. The assumption appears to be that the $safeStrings array is protected and that any string/strategy pair being set at all indicates it's safe, since in this class the strings only get set in set() or create(). However, what if someone were to extend this class and think that it could also mark unsafe strings? Or the list is polluted in some other way? It seems like it would be more robust/better hardened to only treat the string/strategy pair as safe if it's exactly TRUE. Same goes for in the is() method above.

  6. +++ b/core/lib/Drupal/Component/Utility/SafeMarkup.php
    @@ -0,0 +1,104 @@
    +  /**
    +   * Implodes strings in a safe manner.
    +   *
    +   * @return \Drupal\Component\Utility\SafeMarkup|string
    +   */
    +  public static function implode($delimiter, array $array, $strategy = 'html') {
    

    This is missing parameter documentation. Also, since the way that this is used has security implications, a more specific explanation of "in a safe manner" would not go amiss. It looks like what it does is check plain any string that is not already considered safe? So maybe:

    Implodes a list of strings safely by escaping any that are not known to be safe.

    Also, maybe it's worth noting that the delimiter is not checked nor escaped, but assumed to be safe? Is that sound from a security perspective?

  7. +++ b/core/lib/Drupal/Component/Utility/SafeMarkup.php
    @@ -0,0 +1,104 @@
    +        isset(static::$safeStrings[(string) $string]['all']))) {
    

    What's the deal with the 'all' escaping strategy here? So far that's at least two string keys that have special meaning (the other being the default of 'html') but are not documented anywhere that I've yet found.

  8. +++ b/core/lib/Drupal/Component/Utility/SafeMarkup.php
    @@ -0,0 +1,104 @@
    +   * Sets previously retrieved safe strings.
    ...
    +  public static function set($safe_strings) {
    

    Maybe:

    Adds previously retrieved known safe strings to the safe string list.

    Also, it strikes me that it's more of a setMultiple(). Finally, is there no array typehint in the method on purpose?

  9. +++ b/core/lib/Drupal/Component/Utility/SafeMarkup.php
    @@ -0,0 +1,104 @@
    +   * This is useful for batch and form API, where it is important to preserve
    +   * the safe markup state across page requets.
    

    Maybe it would be good to add here that the strings are merged into the existing list of safe strings? I.e. the list of safe strings passed in isn't exclusive; it's added on to the current list.

  10. +++ b/core/lib/Drupal/Component/Utility/SafeMarkup.php
    @@ -0,0 +1,104 @@
    +   * @param array safe_strings
    +   *   The strings retrieved via SafeMarkup::get().
    

    $safe_strings is missing its $.

    Also, the parameter documentation is a little confusing/feels inaccurate. Is this more correct?

    A list of safe strings previously retrieved by SafeMarkup:get().

  11. +++ b/core/lib/Drupal/Component/Utility/SafeMarkup.php
    @@ -0,0 +1,104 @@
    +  * @return array
    +  *   Returns all strings currently marked safe.
    

    As above. What it actually returns is a list of all known strings and whether or not they were marked as safe.

xjm’s picture

+++ b/core/lib/Drupal/Component/Utility/SafeMarkup.php
@@ -0,0 +1,104 @@
+   * Concatenates strings in a safe manner.
+   *
+   * @return \Drupal\Component\Utility\SafeMarkup|string
+   */
+  public static function concat() {
+    return SafeMarkup::implode('', func_get_args());

Oh, one more thing. This also needs better docs of how it's used. It takes any number of (presumed string) arguments, and concatenates them, checking them by string-casting them and then check-plaining them if the string isn't found. However, I don't know from the method names or docs that I can't just pass it arrays or whatever. Maybe we should be doing some argument validation and/or allow an array as an argument?

xjm’s picture

  1. +++ b/core/includes/batch.inc
    @@ -44,6 +45,9 @@ function _batch_page(Request $request) {
    +  // Restore safe strings from previous batches.
    +  $batch['safe_strings'] += array();
    +  SafeMarkup::set($batch['safe_strings']);
    
    @@ -480,6 +484,8 @@ function _batch_finished() {
    +    // Update safe strings.
    +    $batch['safe_strings'] = SafeMarkup::get();
    

    We maybe need to document this new ArrayPI key in the batch topic docs:
    https://api.drupal.org/api/drupal/core%21includes%21form.inc/group/batch/8

  2. +++ b/core/includes/bootstrap.inc
    @@ -1186,7 +1187,10 @@ function drupal_set_message($message = NULL, $type = 'status', $repeat = FALSE)
    +      $_SESSION['messages'][$type][] = array(
    +        'safe' => SafeMarkup::is($message),
    +        'message' => $message,
    +      );
    
    @@ -1224,6 +1228,14 @@ function drupal_set_message($message = NULL, $type = 'status', $repeat = FALSE)
    +    foreach ($messages as $message_type => $message_typed_messages) {
    +      foreach ($message_typed_messages as $key => $message) {
    +        if ($message['safe']) {
    +          $message['message'] = SafeMarkup::create($message['message']);
    +        }
    +        $messages[$message_type][$key] = $message['message'];
    +      }
    +    }
    

    So more ArrayPI action. I think we need to update the docs of drupal_set_message() and drupal_get_messages().

  3. +++ b/core/includes/common.inc
    @@ -29,6 +29,7 @@
    @@ -485,7 +486,7 @@ function format_xml_elements($array) {
    
    @@ -485,7 +486,7 @@ function format_xml_elements($array) {
    -  return $output;
    +  return SafeMarkup::create($output);
    

    Looking at format_xml_elements() closely. The element contents are check-plained. The attributes go through Attribute and it's not clear to me whether those are escaped/rendered safe in any way. And the keys are definitely concatenated straight into the output with no validation. So, I think we need to at a minimum warn not to pass user input, or be a little more careful about validating keys and attributes?

  4. +++ b/core/includes/common.inc
    @@ -2829,6 +2829,7 @@ function drupal_pre_render_conditional_comments($elements) {
    + *     This is not HTML escaped, do not pass in user input.
    

    So, in general, I think we need to do a better job elsewhere of being explicit like we are here about inputs that are treated as safe (and there are possibly opportunities for further hardening when we do have to make such a statement, as it appears there is in this case).

    Also, it looks like the warning about user input only applies to #tag (and the other top-level #keys could be unsanitized and that's okay); is that correct?

    Finally, a nitpick: this is a comma splice. Better:

    This is not HTML-escaped, so do not pass in user input.

  5. +++ b/core/includes/common.inc
    @@ -2841,7 +2842,9 @@ function drupal_pre_render_conditional_comments($elements) {
    +    // Attributes are safe and we are assuming people don't use this function
    +    // and second they don't pass unsafe variables to #tag.
    +    $markup = SafeMarkup::create('<' . $element['#tag'] . $attributes . " />\n");
    

    Whoa, this comment is confusing. What's second? Is it just a run-on sentence? It seems to be saying #attributes is safe, correct? Is this because Attribute does sanitization? (I need to check whether that's the case; see remarks above.)

    I think this is what the comment should be if the answers to the above are "yes":

    This function is intended for internal use, so we assume that no unsafe values are passed in #tag. The attributes are already safe because [reason].

    And then follow that up with the @todo to the followup.

    And all that said... is this really a good idea? Can't we do some sort of validation or escaping on the contents of #tag? Is this okay to punt on like this?

  6. +++ b/core/includes/common.inc
    @@ -2853,6 +2856,8 @@ function drupal_pre_render_html_tag($element) {
    +    // @todo Creating safe markup, avoid if possible!
    +    $markup = SafeMarkup::create($markup);
    

    Is there a followup already for this @todo? If so we can add the link directly. If not, let's file it (if we do agree it's okay to punt to a followup).

  7. +++ b/core/lib/Drupal/Component/Utility/String.php
    @@ -31,7 +33,7 @@ class String {
    -    return htmlspecialchars($text, ENT_QUOTES, 'UTF-8');
    +    return SafeMarkup::create(htmlspecialchars($text, ENT_QUOTES, 'UTF-8'));
    

    Similar to the Xss case; maybe worth mentioning that check-plained strings are (obviously) automatically marked as safe HTML in the docs, with a reference to SafeMarkup where we're going to add the nice detailed docs? :)

  8. +++ b/core/lib/Drupal/Component/Utility/Xss.php
    @@ -90,7 +92,7 @@ public static function filter($string, $html_tags = array('a', 'em', 'strong', '
    -    return preg_replace_callback('%
    +    return SafeMarkup::create(preg_replace_callback('%
    
    @@ -99,7 +101,7 @@ public static function filter($string, $html_tags = array('a', 'em', 'strong', '
    -      )%x', $splitter, $string);
    +      )%x', $splitter, $string));
    

    So, I think we can update the docs for Xss::filter() to indicate that it marks the strings as safe HTML, with an @see to SafeMarkup.

  9. +++ b/core/lib/Drupal/Core/Controller/ExceptionController.php
    @@ -312,7 +313,7 @@ public function on500Html(FlattenException $exception, Request $request) {
    -      drupal_set_message($message, $class, TRUE);
    +      drupal_set_message(SafeMarkup::create($message), $class, TRUE);
    

    This made me arch an eyebrow. Is it correct to automatically mark the exception message is safe and blindly trust that whatever code set the exception message did the right thing? Would it be too ornerous/risk too much double-escaping to check-plain the message if it's not already marked as a safe string? Or is that a terrible idea? (Obviously exception messages are probably the least of our concerns, but I wanted to raise the question.)

  10. +++ b/core/lib/Drupal/Core/CoreServiceProvider.php
    @@ -95,9 +95,7 @@ public static function registerTwig(ContainerBuilder $container) {
    -        // @todo Remove in followup issue
    -        // @see http://drupal.org/node/1712444.
    -        'autoescape' => FALSE,
    +        'autoescape' => TRUE,
    

    http://www.myinstants.com/instant/oh-yeah/

  11. +++ b/core/lib/Drupal/Core/Form/FormBuilder.php
    @@ -363,6 +364,8 @@ public function getCache($form_build_id, &$form_state) {
    +          $form_state['build_info'] += array('safe_strings' => array());
    +          SafeMarkup::set($form_state['build_info']['safe_strings']);
    

    Could use an inline comment, maybe:

    Retrieve the list of previously known safe strings.

    And this is the first place that does make me think my questions above about isset() vs. TRUE expectations for safe strings. It's very common to monkey with $form_state at all levels, and here it's a way of getting around the explicit contract otherwise set by SafeMarkup.

  12. +++ b/core/lib/Drupal/Core/Form/FormBuilder.php
    @@ -385,6 +388,10 @@ public function setCache($form_build_id, $form, $form_state) {
    +    // Cache safeStrings for form re-use.
    

    safeStrings isn't a thing (in this context anyway). Also, this is pedantry, but is it really a "cache"? Maybe:

    Store the list of safe strings for form re-use.

  13. +++ b/core/lib/Drupal/Core/Page/HeadElement.php
    @@ -8,9 +8,12 @@
      * This class represents an HTML element that appears in the HEAD tag.
    + *
    + * @see template_preprocess_html()
    
    @@ -52,7 +55,7 @@ public function __toString() {
    -    return $string;
    +    return SafeMarkup::create($string);
    

    This strikes me as a little weird. We're adding the HEAD tag to the list of safe HTML strings? I mean, it's safe once. It's wrong every other time. I'm not sure this is a problem, it just seems weird for the HTML head element to be in the big generic bucket of HTML-safe strings every single time I look at them, no matter where I am on the page.

I just got to drupal_render() (eek) :) so going to step back a bit and hopefully give folks a chance to answer some of my questions above, since I think they're also relevant to other parts of the patch.

xjm’s picture

In general, I'm concerned by how many times we're calling SafeMarkup::create($some_variable_from_somewhere_else). Having to trace the variable up the function and then up the call chain makes it less clear whether the code is actually secure. I like the pattern used in some places of adding an inline comment justifying the use of SafeMarkup, although we could do a better job of not only stating that certain inputs are safe, but stating why they are.

That said, though, explicitly evaluating each case where we do add a SafeMarkup is still a much easier security check than trying to comb through the entire render and theme layers. So I'm overall very +1 to the solution we've come up with. :)

xjm’s picture

Spoke to chx about a few of the points in IRC. He pointed me to #2280965: [meta] Remove every SafeMarkup::set() call. I'm not sure about doing that as a followup and not in this patch.

chx has given me sandbox access so I'm going to start help cleaning up some of my points mentioned above.

Fabianx’s picture

General things

Escaping Strategy

This was added to me as I converted our drupal_escape_filter to replace the twig autoescape filter. So this is mainly to be compatible with Twig:

- http://twig.sensiolabs.org/doc/tags/autoescape.html
- http://twig.sensiolabs.org/doc/filters/escape.html
- https://api.drupal.org/api/drupal/core%21vendor%21twig%21twig%21lib%21Tw...

CSS, JS and html attributes all need different escaping strategies.

This was added with the idea that core would only support 'html'.

'all' is special cased by twig to being safe for everything, this is mainly the 'raw' filter that is safe for all cases.

https://api.drupal.org/api/drupal/core!vendor!twig!twig!lib!Twig!Extensi...

Detailed comments

#170:

1. I agree the class is a great way to document how SafeMarkup works and what it means to use.
2. agree
3. escaping strategy answered above
4. is() is really weird, I agree.
5. I thought we were escaping any string that was not already safe? We don't use comparison as isset() is fast and we never put a FALSE there, would unset in that case or set to NULL - @todo check more
6. agree; yes need for sure document for this and safe_join that the parameter is not secure.
7. A string is safe if its safe for the given strategy or all strategies.
8. array typehint is oversight on me; yes this is setMultiple rather, indeed.
9. agree, I used adding in as there could be new strings marked safe already.
10. typo yes
11. As we don't use FALSE, the description is correct even if a little misleading.

xjm’s picture

Fabianx’s picture

#171: Agree, 100%. Docs for concat() are needed. We once thought about removing it, but it is way better to have an API that concats and escapes than an API where you just mark it safe.

#172:

1. Good catch, agree.
2. agree
3. @todo format_xml_elements - revisit this, agree
4. agree, marking safe should be explicit and we have the follow-up in any case, too - it is also way simpler with this new patch (just 90k)
5. Yes, I think we could escape the tag element in a way, but its kinda interesting as you would still want to allow script or such ... @todo html_tag
6. no idea, ask chx @todo drupal_pre_render_html_tag
7. agree
8. agree
9. agree @todo on500Html should be marked safe by the caller.
10. ROTFL :-D
11. agree, could use a different key and remove before retrieval. - this is just set before form saving and restored during retrieval so is safe, but obviously leaks the data in the $form_state. Good catch!
12. agree
13. Well, as that function is called directly - I think it might be needed, the alternative is a |raw in the template, which is more eek, but yes would be nicer to escape at the source.

To the drupal_render one:

I think we could even remove the output of that from the list of safe strings - at least from a twig perspective as the drupal_twig_escape_filter does consider drupal_render() render arrays already secure.

In that case only code that does $output = drupal_render(y); (which is bad bad bad for caching anyway) would need to do:

$output = SafeMarkup::create(drupal_render(y));

I have not thought about it from an AJAX perspective or such, yet - though. Just a pre-thought to drupal_render().

Fabianx’s picture

#173:

I do agree that SafeMarkup use cases should be justified.

I wondered if we should allow simple twig inline templates:

before:

$build['#markup'] = SafeMarkup::create('<span class="x">' . $mysecure_value . '</span>');

After (mocked up):


$build['#markup'] = Drupal::twig()->renderTemplate('<span class="x">{ value }</span>', array('value' => $mysecure_value));

That gives us autoescape for free and is compiled and cached to disk like normal templates, so pretty performant and not more difficult to use than t().

As twig supports doing dynamic templates, this might be a 10 line code change or so to support that ...

Moved to: #2289999: Add an easy way to create HTML on the fly without having to create a theme function / template

>That said, though, explicitly evaluating each case where we do add a SafeMarkup is still a much
> easier security check than trying to comb through the entire render and theme layers.
> So I'm overall very +1 to the solution we've come up with. :)

Absolutely agree!!!

xjm’s picture

So, after looking at this a bit more, I'd like to propose changing the names of several methods on SafeMarkup. @chx and/or @joelpittet, could you give feedback on these suggested method names? (It might be there are reasons for the current names that I've overlooked.)

Name in patch Proposed name Why
is() isSafe() If safe message is, speak like Yoda you do.
create() set() It's not creating an object, just setting a flag in a static array.
set() setMultiple() It's not a single setter. It sets multiple things.
get() getAll() It's not a single getter. It gets all the things.

(Still working through a number of fixes in the sandbox; just wanted to raise this question in the meanwhile.)

joelpittet’s picture

I'm fine with those changes, thanks @xjm The create() was a factory method before but it is no longer, just got repurposed from what I recall. I think the same is for the other methods but @Fabianx or @chx may have another opinion as they created those method names.

xjm’s picture

Thanks @joelpittet! @chx also confirmed in IRC he's okay with the renames. Adding that change.

xjm’s picture

It took me a bit to confirm that everything that goes into Attribute is for darn sure absolutely check-plained properly for use as an attribute string, so I added #2290143: Improve the class documentation of \Drupal\Core\Template\Attribute.

xjm’s picture

Issue summary: View changes
FileSize
98.77 KB
50.86 KB

Alright, I've addressed everything in my own reviews either by fixing it or adding an @todo, except #172 points 9 and 13 which my brain to even process right now. Also merged 8.x and pushed everything up in my branch in the sandbox.

The interdiff is nearly as big as the patch. ;) So adding myself to the proposed commit message.

xjm’s picture

  1. +++ b/core/includes/common.inc
    @@ -486,7 +487,10 @@ function format_xml_elements($array) {
    +  // @todo
    +  //   This is marking the output string as safe HTML, but we have only
    +  //   sanitized the attributes and tag values, not the tag names.
    
    @@ -2842,9 +2847,12 @@ function drupal_pre_render_conditional_comments($elements) {
    +    // @todo
    +    //   Escape this string properly instead?
    
    +++ b/core/lib/Drupal/Component/Utility/SafeMarkup.php
    @@ -8,24 +8,34 @@
    + * @todo
    + *   Add detailed documentation about how to use SafeMarkup and how it is
    + *   handled during rendering and in the theme layer.
    
    @@ -37,29 +47,55 @@ public static function create($string, $strategy = 'html') {
    +   * @todo
    +   *   Consider checking whether the value set for the string/strategy pair
    +   *   is exactly TRUE, rather than just checking whether it's set, to reduce
    +   *   the risk of false positives or pollution of the string list. The
    +   *   tradeoff is that isset() is reportedly faster than a comparison.
    ...
    +   * @todo
    +   *   Should this (instead or additionally) accept an array of strings as a
    +   *   parameter?
    ...
    +   * @todo
    +   *   Should $delimeter also be check-plained if it's not a known-safe string?
    

    These are the @todo I added for further discussion or improvement. Some of it might be followup material since (e.g.) format_xml_whatsit() is no more unsafe than it was previously.

    And particularly to the point about isSafe(), it still feels ick to me to not check for the TRUE value we set, but I've left the @todo instead of changing it outright because (I imagine) it's very much in the critical path.

  2. +++ b/core/lib/Drupal/Core/Form/FormBuilder.php
    @@ -364,8 +364,11 @@ public function getCache($form_build_id, &$form_state) {
    +          // Retrieve the list of previously known safe strings and store it
    +          // for this request.
               $form_state['build_info'] += array('safe_strings' => array());
    -          SafeMarkup::set($form_state['build_info']['safe_strings']);
    +          SafeMarkup::setMultiple($form_state['build_info']['safe_strings']);
    +          unset($form_state['build_info']['safe_strings']);
    
    @@ -389,8 +392,8 @@ public function setCache($form_build_id, $form, $form_state) {
    +    // Store the known list of safe strings for form re-use.
    +    $form_state['build_info']['safe_strings'] = SafeMarkup::getAll();
    

    So I've tried to reduce the leakage with $form_state here by unsetting the value after we retrieve it. Is that sufficient or should we do more?

    I'm also having flashbacks to monstrous form cache entries bringing down sites. I imagine this safe string list will get freaking enormous. Is this a concern?

Note that I still haven't reviewed the rest of the patch, but I need to go to sleep now. :)

xjm’s picture

Issue tags: +Needs change record

So that I can sleep. This issue will absolutely need a change record. It's like the hugest huge thing.

Fabianx’s picture

+++ b/core/themes/engines/twig/twig.engine
@@ -179,3 +184,72 @@ function twig_without($element) {
+      // Drupal only supports HTML strategy, fallback for other strategies.
+      // @todo Add optional strategy parameter to SafeMarkup function calls,
+      //       to avoid this when its already safe for this strategy.
+      return twig_escape_filter($env, $return, $strategy, $charset, $autoescape);
+    }
+    if (!SafeMarkup::is($return)) {
+      return String::checkPlain($return);
+    }

@todo:

- This is missing is -> isSafe conversion, such drupal installation fails.

- check $autoescape parameter here so that explicit escaping is still possible:

if ($autoescape && SafeMarkup::isSafe($return, $strategy)) {
  return $return;
}

if ($strategy != 'html') {
  return twig_escape_filter($env, $return, $strategy, $charset, $autoescape);
}

return String::checkPlain($return);
  1. +++ b/core/modules/field_ui/src/Form/FieldInstanceEditForm.php
    @@ -9,6 +9,7 @@
    +use Drupal\Component\Utility\SafeMarkup;
    
    @@ -122,7 +123,7 @@ public function buildForm(array $form, array &$form_state, FieldInstanceConfigIn
           '#rows' => 5,
    ...
    +      '#description' => SafeMarkup::set($this->t('Instructions to present to the user below this field on the editing form.<br />Allowed HTML tags: @tags', array('@tags' => _field_filter_xss_display_allowed_tags())) . '<br />' . $this->t('This field supports tokens.')),
           '#weight' => -10,
    

    This should no longer be needed with strings being stored in $form_state['build'].

  2. +++ b/core/modules/system/src/Form/ModulesUninstallForm.php
    --- a/core/modules/system/src/Tests/System/DateFormatsMachineNameTest.php
    +++ b/core/modules/system/src/Tests/System/DateFormatsMachineNameTest.php
    
    +++ b/core/modules/system/src/Tests/System/DateFormatsMachineNameTest.php
    @@ -7,6 +7,7 @@
    +use Drupal\Component\Utility\String;
    
    @@ -48,7 +49,8 @@ public function testDateFormatsMachineNameAllowedValues() {
    +    // @todo Revisit for a better solution.
    +    $this->assertText(String::checkPlain('The machine-readable name must be unique, and can only contain lowercase letters, numbers, and underscores. Additionally, it can not be the reserved word "custom".'), 'It is not possible to create a date format with the machine name that has any character other than lowercase letters, digits or underscore.');
    
    @@ -57,7 +59,8 @@ public function testDateFormatsMachineNameAllowedValues() {
    +    // @todo Revisit for a better solution.
    +    $this->assertText(String::checkPlain('The machine-readable name must be unique, and can only contain lowercase letters, numbers, and underscores. Additionally, it can not be the reserved word "custom".'), 'It is not possible to create a date format with the machine name "custom".');
    

    These 5 lines in core/modules/system/src/Tests/System/DateFormatsMachineNameTest.php can be removed safely. chx reverted them early in his branch, but its still correct in mine.

xjm’s picture

Status: Needs work » Needs review
FileSize
98.76 KB
4.35 KB

Whoops, just fixing some missed renames. Didn't have PHPStorm picking up .engine or .install files and missed the comments.

I haven't addressed the other points of #188.

xjm’s picture

Status: Needs work » Needs review
FileSize
96.36 KB
6.2 KB

Removing SafeMarkup::concat() and fixing the date test.

Fabianx’s picture

FileSize
96.38 KB

Just a quick re-roll against HEAD.

xjm’s picture

Discussed the $form_state question with @catch. His suggestion was to not worry about it for this patch, because there are these other issues already addressing the root causes of form cache issues:

So I've filed: #2295823: Ensure that we don't store excessive lists of safe strings

xjm’s picture

Rerolled against HEAD.

mikey_p’s picture

Something in the latest head broke a few tags in the header which are now escaped.

xjm’s picture

mikey_p’s picture

I was able to resolve the head tags issue with this change:

diff --git a/core/lib/Drupal/Core/Page/HtmlPage.php b/core/lib/Drupal/Core/Page/HtmlPage.php
index eb4c63d..ee04acc 100644
--- a/core/lib/Drupal/Core/Page/HtmlPage.php
+++ b/core/lib/Drupal/Core/Page/HtmlPage.php
@@ -8,6 +8,7 @@
 namespace Drupal\Core\Page;

 use Drupal\Core\Template\Attribute;
+use Drupal\Component\Utility\SafeMarkup;

 /**
  * Data object for an HTML page.
@@ -84,7 +85,7 @@ public function getHtmlAttributes() {
    *   A string of meta and link tags.
    */
   public function getHead() {
-    return implode("\n", $this->getMetaElements()) . implode("\n", $this->getLinkElements());
+    return SafeMarkup::set(SafeMarkup::implode("\n", $this->getMetaElements()) . SafeMarkup::implode("\n", $this->getLinkElements()));
   }

   /**

But I'm guessing that its considered bad code style?

pwolanin’s picture

Status: Needs work » Needs review
FileSize
944 bytes
97.03 KB

Changes based on @mikey_p's fix and some cleanup and added comment.

pwolanin’s picture

FileSize
8.94 KB
103.52 KB

Make the implode delimiter safe and add a unit test.

pwolanin’s picture

Status: Needs work » Needs review
FileSize
3.23 KB
103.71 KB

Easy fix, plus some doxygen cleanup in the unit test and a code comment fix from xjm.

pwolanin’s picture

Issue summary: View changes
xjm’s picture

Issue summary: View changes
FileSize
104.52 KB
4.48 KB

Done for tonight. I just added a few more cleanups. Things we still need to do (notes for tomorrow):

  1. Still need the detailed docs for SafeMarkup. I've put out a call for someone to help with this.
  2. Review/improve CR as needed.
  3. format_xml_elements() @todo needs a followup issue filed and referenced issue Fixed
  4. core/modules/system/system.install @todo Huh? Fixed
  5. @todo in update.report.inc Huh? Fixed
  6. @todo in twig_drupal_escape_filter() Is there an issue for this yet? Reference it.
    +  // We have a string or an object converted to a string: Autoescape it!
    +  if (isset($return)) {
    +    if ($strategy != 'html') {
    +      // Drupal only supports the HTML escaping strategy, so provide a
    +      // fallback for other strategies.
    +      // @todo Add an optional strategy parameter to SafeMarkup function calls,
    +      //   to avoid this when it is already safe for this strategy.
    +      return twig_escape_filter($env, $return, $strategy, $charset, $autoescape);
    +    }
  7. Add a @todo referencing the setMultiple() followup, especially for $form_state. Done
  8. From #188:

    check $autoescape parameter here so that explicit escaping is still possible:

    if ($autoescape && SafeMarkup::isSafe($return, $strategy)) {
      return $return;
    }
    if ($strategy != 'html') {
      return twig_escape_filter($env, $return, $strategy, $charset, $autoescape);
    }
    return String::checkPlain($return);
    

    Edit: Let's file a followup issue for this.

  9. Also from #188:
    +++ b/core/modules/field_ui/src/Form/FieldInstanceEditForm.php
    @@ -9,6 +9,7 @@
    +use Drupal\Component\Utility\SafeMarkup;
    @@ -122,7 +123,7 @@ public function buildForm(array $form, array &$form_state, FieldInstanceConfigIn
           '#rows' => 5,
    ...
    +      '#description' => SafeMarkup::set($this->t('Instructions to present to the user below this field on the editing form.<br />Allowed HTML tags: @tags', array('@tags' => _field_filter_xss_display_allowed_tags())) . '<br />' . $this->t('This field supports tokens.')),
           '#weight' => -10,

    This should no longer be needed with strings being stored in $form_state['build'].

    Done
  10. Address #172.9
  11. Finish line-by-line review.
  12. Add followup(s) for the double-escaped strings that need fixing (see #167). Fixed
  13. Is the following text from the issue summary still true? What does "not allowed" mean in this context? Where is this documented if so?

    You are not allowed to put unsafe user data in #attached. This can be relaxed in a followup but it truly gets gnarly. You are advised to not use #type => html_tag if at all possible or at least not with unsafe user data. This is not something I want to waste an effort on making it work.

Finally, a high-level question. In @catch's original summary:

Twig as it stands introduces a fair bit of overhead into the theme system. Fabianx indicated that a lot of this is from marking $variables as secure so they're not double escaped later.

Ideally, if Twig autoescape is going to be enabled, then we should just pass raw variables to it and let it do the work. This way, if a template doesn't print the date, or a link, or whatever might currently be check_plain()ed in preprocess, we're not spending all this time creating it for it to be never used. In general, we should be able to remove a large chunk of preprocess work, and just let Twig sort out variables on demand

Are we still accomplishing this goal? If I read the recent profiling correctly, there's actually a 1-2% regression for the profiled case. If that's true, we presumably would need critical followups to remove whatever now-redundant work we're doing, right?

chx’s picture

Edit: eh, nevermind. Deleted my comment. Drupal 8 is really hard to work on, tho.

chx’s picture

> 4. core/modules/system/system.install @todo Huh?

Well, that's a botched attempt to document it's safe: it is concat'd from two t() and a fixed br tag.

> 5. @todo in update.report.inc Huh?
> // @todo when converting to Twig, $data might get double-escaped, so

Update functions will get converted to twig templates. This is a warning to avoid double escape problems after that.

> 6. // @todo Add an optional strategy parameter to SafeMarkup function calls,

Not that I know of. (This is Fabianx's code who made an exception to work on this. Otherwise, AFAIK he is fed up with the core queue even worse than me. I can guess why.)

> Is the following text from the issue summary still true? What does "not allowed" mean in this context? Where is this documented if so?

In order: Yes. That I will find you and beat you with a clue-by-four with a blazing security sign on it if you do it. Nowhere ATM? I do not know where to put #attached documentation.

catch’s picture

Are we still accomplishing this goal? If I read the recent profiling correctly, there's actually a 1-2% regression for the profiled case. If that's true, we presumably would need critical followups to remove whatever now-redundant work we're doing, right?

This patch goes some way to enabling that, but it won't fix it. The idea is to stop preparing sanitized variables in preprocess, and just print things when we need them via the templates. The issues dealing with that overall goal are #2060783: Remove the preprocess layer. and #2035055: Introduce hook_theme_prepare[_alter]() and remove hook_preprocess_HOOK() but they're stalled/9.x at this point.

We could open specific issues to audit what's happening in preprocess and try to do less, for example https://api.drupal.org/api/drupal/core%21modules%21node%21node.module/fu... is not a happy function, but that's orthogonal to this issue. Same with any remaining double-escaping issues.

Additionally, when we added Twig to core, we didn't have entity render caching at all, the original Twig regression likely looks a lot different if you don't have field and entity templates rendered every request, and there are much worse issues on cache misses than this.

Fabianx’s picture

I will try to take care of #8 and #9, but will take a moment to get to it.

Dries’s picture

Just wanted to chime in and say I'm generally +1 to this. While DX matters a lot, security matters more. I'm happy to see we already improved the DX a lot. If we can further improve DX in the patch, that would be great, but that shouldn't hold up getting this great security feature committed. Further DX improvements can happen in follow-ups.

xjm’s picture

For #210.3, filed: #2296885: Remove format_xml_elements() (plus pushed a reference to it to the sandbox).

xjm’s picture

So I figured out what the @todo in system.install (#210.4) was about. As a best practice, rather than marking SafeMarkup::set($some_string) when $some_string is composed of miscellaneous t() calls and whitespace, we should do do SafeMarkup::implode('', $array_of_pieces). Attached tries to implement that in that case. (Note that I think things like this are actually okay to put in followups; I just wanted to work through this instance.)

xjm’s picture

Hmm, reverting that actually. It's scope-creep-y; we pretty much know that what we assemble in the system_requirements() is safe. :P I've filed a followup issue for that instead: #2296929: Remove system_requirements() SafeMarkup::set() use

Pseudo-interdiff just shows the updated @todo comment.

xjm’s picture

So here's the zinger that's been in my mind for days, that I was reminded of when looking at #217: How do we keep people from doing something like this (but with the bad less obvious):

  $my_markup = array();
  $my_markup[] = SafeMarkup::set('<script>');
  $my_markup[] = 'some_legit_js_from_my_module_just_added_in_a_bogus_way';
  $my_markup[] = SafeMarkup::set('</script>');
  SafeMarkup::implode('', $my_markup);

...which would be like "Hey entire string list! Script tags for everyone!"

joelpittet’s picture

@xjm I don't think we can prevent that... but side benefit of the DX being a bit harder is people will likely tend away from that I'd hope.

Likely if they want to do that they could also just use whatever the equivalent to drupal_add_js('jQuery(function () { alert("Hello!"); });', 'inline'); is now...

chx’s picture

Yup. We make it harder to shoot yourself in the foot but if you insist... it's your foot. The CR already contains but perhaps not strong enough that SafeMarkup::set is bad practice in and itself. Perhaps put it in the doxygen?

xjm’s picture

Yup. We make it harder to shoot yourself in the foot but if you insist... it's your foot. The CR already contains but perhaps not strong enough that SafeMarkup::set is bad practice in and itself. Perhaps put it in the doxygen?

Absolutely. Other than minor cleanups and final review, super thorough documentation for the class is the last thing blocking this.

@xjm I don't think we can prevent that... but side benefit of the DX being a bit harder is people will likely tend away from that I'd hope.

Likely if they want to do that they could also just use whatever the equivalent to drupal_add_js('jQuery(function () { alert("Hello!"); });', 'inline'); is now...

The differences though are that:

  • _drupal_add_js() is now intended for internal use. (Though the patch that made that change apparently didn't update the docblock.)
  • SafeMarkup::set('<script>') marks the string safe for the whole page. So all it takes is one module using the wrong API to do something safe within that module, and they've circumvented Twig's autoescaping for that string for any other place someone manages to try to inject something.

So I just want to make sure it's darn clear that SafeMarkup::set('<bit of generic html>') is doing it wrong. :) This is also what makes me uneasy with the bit where we are SafeMarkup::set()ing the <head> tag. It's safe once. It's not safe ever again, but Twig doesn't care. HEAD is not regressing, but if our goal is to get rid of some of our own escaping and let Twig do the work, we need to be careful of cases like this.

pwolanin’s picture

Perhaps there should be a way to remove a string from the safe list? e.g. if I want to use a certain tag as a delimiter in implode() but it's not generally safe?

chx’s picture

Nope. Do not complicate the system with an unset; make the core implementations less stupid and mark SafeMarkup::set() as bad practice and move on? BTW I still do not understand how can you create an XSS hole but w/e.

mikey_p’s picture

Making an XSS with it would be pretty hard, but I'm sure possible somewhere on some site. You'd have to have a case where you have multiple fields that are joined together, where you can insert the starting and closing script tags along with the payload and trust that they are combined without any other text in the middle to prevent parse errors. While extremely unlikely, it's not impossible either.

xjm’s picture

Thanks @mikey_p for the clear explanation. That's the edgecase I was trying to get at.

It's not something that should hold up this patch, as we are going to be vastly reducing the number of XSS entry points in the wild and the opportunity for exploit is... let's say academic. But conceptually, for this patch, I want to get what the correct way to use SafeMarkup is (and isn't) so it doesn't get abused. A SafeMarkup::set() call with a variable in it is a red flag. A SafeMarkup::set() call with a very short markup string (should we even be doing SafeMarkup::set('<br />')?) is a red flag. Especially incomplete markup is a red flag. And as chx points out, the first rule of SafeMarkup is: you do not use SafeMarkup.

xjm’s picture

xjm’s picture

Title: Turn on twig autoescape by default. » Turn on twig autoescape by default
Category: Bug report » Task

Just noticed. This isn't a functional bug really. It's a maintainer-approved release-and-beta-blocking priority task.

xjm’s picture

And the double-escaping followup: #2297711: Fix HTML escaping due to Twig autoescape

xjm’s picture

So, I started adding strongly worded documentation to SafeMarkup::set() this morning. But what I realized? Is that all the concerns about using SafeMarkup::set() inappropriately also apply to the sanitization functions that use it. Like. If you call t('<')... If you're thinking "Why would anyone ever do that?" you haven't spent enough of your life cleaning up the bat-guano crazy things that get done on some sites.

xjm’s picture

It's even broader than that; it's not just a coding best practices problem. Since our text filters call Xss::filter() which in turn marks the string as safe, that means all you need is someone with sufficient permissions to create an entity with a filtered text field that contains a single angle bracket, and that field to be used somewhere on the page (or processed somewhere during the page render). It'd be a very roundabout and unlikely thing to exploit, but as long as we use this mechanism for opting strings out of Twig's autoescaping, we need to be careful about what responsibility we shift solely to Twig for sanitization.

xjm’s picture

Attached fixes #210.7, tests reverting the change indicated in #210.9 and attempts to improve the documentation for SafeMarkup::set().

xjm’s picture

BTW my first instinct to address #231 was that we should consider separate buckets of string safeness, i.e. code-defined strings from t() or String::format() are different from things that go through String::checkPlain(), which are also different from a delimiter you want to use SafeMarkup::implode() on or concatenate two translated strings with, which is also different from something that came to us from an input filter. @alexpott also mentioned the idea of scopes when I mentioned this issue to him. But I'm loathe to add any further complexity to this issue; its goal is to make it easier for themers to theme safely, and it accomplishes that. It just gives our theme-layer autoescaping a back door that might be more easily opened than we realize. It might merit a followup discussion, at the least.

pwolanin’s picture

@xjm - I'm not seeing how you could know which bucket to find the string in at render time?

Maybe we should revert the delimiter change?

xjm’s picture

@pwolanin, yeah, I've been debating that all morning, because it encourages people to do a more bad thing than the bad thing they're already doing. I've also been thinking that the implode() method encourages bad practice to begin with. For example, these usages:

+++ b/core/modules/node/src/Plugin/Search/NodeSearch.php
@@ -266,10 +267,12 @@ public function execute() {
-      $node->rendered .= ' ' . $this->moduleHandler->invoke('comment', 'node_update_index', array($node, $item->langcode));
+      $node->rendered = SafeMarkup::implode(' ', array(
+        drupal_render($build),
+        $this->moduleHandler->invoke('comment', 'node_update_index', array($node, $item->langcode)),
+      ));

+++ b/core/modules/rdf/rdf.module
@@ -474,8 +475,8 @@ function rdf_preprocess_comment(&$variables) {
-    $variables['created'] .= $created_metadata_markup;
-    $variables['submitted'] .= $created_metadata_markup;
+    $variables['created'] = SafeMarkup::implode('', array($variables['created'], $created_metadata_markup));
+    $variables['submitted'] = SafeMarkup::set($variables['submitted'] . $created_metadata_markup);

+++ b/core/modules/search/tests/modules/search_embedded_form/search_embedded_form.module
@@ -9,10 +9,12 @@
-  $variables['snippet'] .= drupal_render($form);
+  $variables['snippet'] = SafeMarkup::implode('', array($variables['snippet'] , drupal_render($form)));

+++ b/core/modules/system/tests/modules/batch_test/batch_test.callbacks.inc
@@ -94,7 +95,9 @@ function _batch_test_finished_helper($batch_id, $success, $results, $operations)
-  drupal_set_message(implode('<br>', $messages));
+  // The BR tag is safe as a delimiter.
+  SafeMarkup::set('<br>');
+  drupal_set_message(SafeMarkup::implode('<br>', $messages));

+++ b/core/modules/views/views.theme.inc
@@ -541,7 +542,8 @@ function template_preprocess_views_view_table(&$variables) {
-          $label .= drupal_render($tablesort_indicator);
+          $markup = drupal_render($tablesort_indicator);
+          $label = SafeMarkup::implode('', array($label, $markup));

@@ -632,7 +634,7 @@ function template_preprocess_views_view_table(&$variables) {
-          $field_output = '<' . $element_type . '>' . $field_output . '</' . $element_type . '>';
+          $field_output = SafeMarkup::implode('', array(SafeMarkup::set('<' . $element_type . '>'), $field_output, SafeMarkup::set('</' . $element_type . '>')));

seem questionable to me. And making lists of things should be done though whatever mechanism we use for theme_item_list() now. Edit: The last one is especially egregious, it's doing exactly what we are saying in the set() docs it shouldn't: Adding any (!) of the allowed HTML tags, both their opening and closing versions, to the safe markup list everywhere. This would be necessary to keep the markup from being escaped with or without SafeMarkup::implode(). What we should be doing instead is not setting the tags as safe, checking whether $field_output is already safe and check-plaining it if not, and then marking the whole string as safe without the implode. The DX is worse. But the DX should be worse, because we're doing it wrong.

xjm’s picture

A better method to have would be something like makeSafe() escapeIfUnsafe() or something. It could accept a single string or flat array of strings, and check-plain them only if they're not already set. Then the calling code could do whatever concatenation or implosion it wants, with whatever naughty markup it wants, and then mark only the completed string as safe.

pwolanin’s picture

Part of the follow-up was to have checkPlain itself do a bunch of this work so it's more transparent to the developer.

bfr’s picture

Just few typos i bumped into.

xjm’s picture

Thanks @bfr! Can you provide interdiffs when creating new patches? The patch is developed in a sandbox so we have no way at the moment to incorporate whatever changes you made.

@pwolanin, where is the followup for that?

pwolanin’s picture

xjm’s picture

Ah. So I think that changing checkPlain() to check whether a string is in the safe list before escaping could be part of this patch... but on the other hand, it would also totally change the expectation of what checkPlain() does. checkPlain() should always be the way that people can escape anything, regardless as to whether it's already been marked safe as markup somewhere else. I'll indicate that on that issue.

xjm’s picture

After protracted IRC discussion about this, we agreed that we probably should remove SafeMarkup::implode():

'p_wolanin: xjm: and yes i see now why implode is naughty -- because it encoruages people to call SafeMarkup::set on things that are not really safe by themselves but only together with the whole thing that goes into implode.makes sense finally. I wonder whether SafeMarkup::set('<' . $element_type . '>' . SafeMarkup::escapeIfNotAlreadyEscaped($field_output) . '

xjm’s picture

Attached includes cleanups from @joelpittet and @bfr, rerolled against HEAD. Also updated in the sandbox.

joelpittet’s picture

@chx and I will do a implode implosion run through tomorrow (Tuesday) evening.

Fabianx’s picture

SafeMarkup::implode is still needed for safe joining from twig itself though (see safe_join in the patch) - and there it is good to use. There is also upstream discussion on it.

I don't think the delimiter itself should be needed to be safe though, just marked clearly that you don't put user provided input into the delimiter.

SafeMarkup::implode in that fashion is still useful, though I do agree that its better to split it up into a SafeMarkup::escape function, which checks if its already escaped.

What currently is

SafeMarkup::implode('', SafeMarkup::set('<div>'), $possibly_unsafe_var, SafeMarkup::set('</div>'))

is then just:

SafeMarkup::set('<div>' . SafeMarkup::escape($possible_unsafe_var) . '</div>');

which is indeed better for the strings that are safe.

I still think ultimately this should be:

$build['string'] = array(
  '#type' => 'twig_template',
  '#markup' => '<div>{var}</div>',
  '#context' => array(
    'var' => $possible_unsafe_var,
  ),
);

to prevent all this HTML string binding in core and at least in D8 render arrays are the way to go ...

@xjm: My own instinct also was for different buckets with comparable strings and that is definitely possible using different escaping strategies - but boundaries would need to be clearly defined and in the end you have to decide if its already escaped or not, so while intermediately possible, in the end you have to say yes|no.

xjm’s picture

+++ b/core/themes/engines/twig/twig.engine
@@ -179,3 +184,93 @@ function twig_without($element) {
+/**
+ * Overrides twig_join_filter().
+ *
+ * Safely joins several strings together.
+ *
+ * @param array|Traversable $value
+ *   The pieces to join.
+ * @param string $glue
+ *   The delimiter with which to join the string. Defaults to an empty string.
+ *
+ * @return \Drupal\Component\Utility\SafeMarkup|string
+ *   The imploded string, which is now also marked as safe.
+ */
+function twig_drupal_join_filter($value, $glue = '') {
+  if (is_object($value) && $value instanceof \Traversable) {
+    $value = iterator_to_array($value, false);
+  }
+
+  return SafeMarkup::implode($glue, (array) $value);
+}

To clarify, this is what @Fabianx is referring to for Twig needing the implode method.

I still would be inclined to suggest that this function should simply do what implode() was doing internally before (adding back the documentation about the delimiter needing to be safe edit and obviously using our new escape() method instead of the protected properties) but replacing, and then all other usages in Drupal code should be replaced to only SafeMarkup::set() whole strings.

chx’s picture

Here. implode() removed. For those following at home: we didn't remove implode() because it was problematic in this patch but because it was encouraging behavior that could possibly lead to problems. The following pattern is used a lot:

$result='';
$separator='';
foreach ($array as $item) {
  $result .= $separator . $item;
  $separator = $the_real_separator;
}

to avoid a conditional within the loop.

Fabianx’s picture

I would strongly suggest to keep code duplication to a minimum

What about:

$array = array(
  'string1',
  'string2',
);

$safe_array = SafeMarkup::escape($array);
$safe_output = implode('|', $safe_array);

return SafeMarkup::set($safe_output);

It is the same as the duplicated lines, but makes it more clear that the array is safely escaped before it is passed to ::set.

It also ensures escape can deal with arrays - which I think is important.

In twig_drupal_join_filter can then also just do:

$safe_array = SafeMarkup::escape($array);

return implode($delimiter, $safe_array);

I like the DX of that instead of having to traverse the list again and again ...

Fabianx’s picture

#247 also removes some code for Traversable that is still needed from twig_drupal_join_filter ...

That will need to be brought back, please (as its also in default twig_join_filter).

EDIT: As the array is traversed now, this is not possible.

chx’s picture

In my opinion, everything related to this is transitional, ugly and discouraged (even if don't quite manage to clean all of them up by 8.0.0). I wouldn't waste effort on making a smoother road to a destination (SafeMarkup::set) that we tell people not to go to. Just use a render array, 'mkay?

Fabianx’s picture

Okay, deal.

Lets keep ::escape as is and try to avoid ::set calls as the plague and remove as many as possible and replace with for example the render arrays with twig_template #type.

I still think escape on arrays is useful as escape never sets any safe strings, but fine with a follow-up.

xjm’s picture

Yay! Thanks @chx!

/me cracks knuckles and digs back in.

dags’s picture

Assigned: xjm » dags

Picking up at Jersey Shore Drupal 8 sprint to work on docs.

dags’s picture

FileSize
1.01 KB
86.5 KB

PHPDocs for the SafeMarkup class.

Status: Needs review » Needs work

The last submitted patch, 254: 1825952-254.patch, failed testing.

xjm’s picture

Issue summary: View changes
xjm’s picture

Status: Needs work » Needs review
FileSize
101.51 KB

Here's the patch rerolled against HEAD, with @dags' and @cilefen's docs.

xjm’s picture

And expanding those docs a bit.

cilefen’s picture

FileSize
699 bytes

Typo on the last commit.

cilefen’s picture

I updated the change record to match the new SafeMarkup class comments.

steveoliver’s picture

Status: Needs review » Active

While I still wish the implode() codeblocks looked more along the lines of #248 (requiring support for arrays in SafeMarkup::escape()), given #250, I only have these nitpicks of the current patch in #258:

  1. +++ b/core/includes/bootstrap.inc
    @@ -941,17 +964,23 @@ function drupal_set_message($message = NULL, $type = 'status', $repeat = FALSE)
      *   intact. Defaults to TRUE.
      *
      * @return array
    - *   A multidimensional array with keys corresponding to the set message types.
    - *   The indexed array values of each contain the set messages for that type.
      *   The messages returned are limited to the type specified in the $type
    - *   parameter. If there are no messages of the specified type, an empty array
    - *   is returned.
    + *   parameter, if any. If there are no messages of the specified type, an
    + *   empty array is returned. See drupal_set_message() for the array structure.
      *
    

    The returned array is still indexed by type, is it not? I'd rather keep the specific description of the multidimensional array here than replace it with a "see".

  2. +++ b/core/lib/Drupal/Component/Utility/SafeMarkup.php
    @@ -0,0 +1,141 @@
    + * string of markup to become double-escaped. SafeMarkup provides a store for
    + * known safe strings and methods to manage them throughout the page request.
    + * This class should be limited to interal use only. Module developers should
    + * instead use render arrays or String::checkPlain() or Xss::filter() to
    + * sanitize strings for output.
    

    - add " for Drupal" in:
    SafeMarkup provides a store for known safe strings and methods for Drupal to manage them ...
    ?
    - typo in 'interal'. should be 'internal'.

  3. +++ b/core/modules/book/src/BookExport.php
    @@ -105,18 +106,16 @@ protected function exportTraverse(array $tree, $callable) {
     ...
    -
    -        $callable_output = call_user_func($callable, $node, $children);
    -        $output .= drupal_render($callable_output);
    +        $build[] = call_user_func($callable, $node, $children);
           }
         }
    -    return $output;
    +    return drupal_render($build);
       }
    
    +++ b/core/modules/comment/comment.module
    @@ -813,12 +813,11 @@ function comment_node_update_index(EntityInterface $node, $langcode) {
             comment_prepare_thread($comments);
    -        $build = comment_view_multiple($comments);
    -        $return .= drupal_render($build);
    +        $build[] = comment_view_multiple($comments);
           }
         }
       }
    -  return $return;
    +  return drupal_render($build);
     }
    

    Preventing a few calls to drupal_render()... nice.

  4. +++ b/core/tests/Drupal/Tests/Component/Utility/SafeMarkupTest.php
    @@ -0,0 +1,110 @@
    +  }
    +  /**
    

    needs one more \n after }

  5. +++ b/core/themes/engines/twig/twig.engine
    @@ -179,3 +184,97 @@ function twig_without($element) {
    +
    +/**
    + * Overrides twig_escape_filter().
    + *
    + * Replacement function for twig's escape filter.
    + *
    + * @param Twig_Environment $env
    + *   A Twig_Environment instance.
    + * @param string $string
    + *   The value to be escaped.
    + * @param string $strategy
    + *   The escaping strategy. Defaults to 'html'.
    + * @param string $charset
    + *  The charset.
    + * @param Boolean $autoescape
    + *   Whether the function is called by the auto-escaping feature (TRUE) or by
    + *   the developer (FALSE).
    + *
    + * @return string|null
    + *   The escaped, rendered output, or NULL if there is no valid output.
    + */
    +function twig_drupal_escape_filter(\Twig_Environment $env, $arg, $strategy = 'html', $charset = NULL, $autoescape = FALSE) {
    +  // Check for a numeric zero.
    +  if ($arg === 0) {
    +    return 0;
    +  }
    

    1. Replacement function for Twig's escape filter (Twig should be capitalized).

    2. @param string $string--actual arg name is "$arg" not "$string".

    3. @param string $charset. "The charset." ==> "The ASCII character set to use for encoding."

    4. @param Boolean should be @param bool $autoescape

    5. @param bool $autoescape description: clearer as: "Whether this function should perform autoescaping (TRUE) or if the argument has already been escaped by the developer (FALSE)."?

star-szr’s picture

Status: Active » Needs work
xjm’s picture

Thanks @steveoliver! I think all those are good changes, except:

The returned array is still indexed by type, is it not? I'd rather keep the specific description of the multidimensional array here than replace it with a "see".

I disagree; it makes no sense to maintain identical documentation in two places, and there is a detailed example in drupal_set_message().

chx’s picture

Do not change the doxygen on twig_drupal_escape_filter beyond the absolute minimum; it's a copypaste from twig original and it shouldn't be more. In particular "The ASCII character set to use for encoding." is incorrect, such a thing does not even exist, ASCII itself is a character set (but htmlspecialchars doesn't accept ASCII) but so is utf-8 and so on.

xjm’s picture

Assigned: dags » xjm

Fixing those minor points.

xjm’s picture

Assigned: xjm » Unassigned
Status: Needs work » Needs review
FileSize
102.17 KB
4.73 KB

Addresses #261.

  1. I added back information about the return array being keyed by message type.
  2. Didn't change the first point; seemed redundant to me. Fixed the "intenal" typo.
  3. (Nothing to change)
  4. Fixed
    1. Fixed
    2. I changed the parameter name to $string because that's what twig_escape_filter() class it (to @chx's point in #264). (string) $string and such feels a little silly but that's what twig_escape_filter() does, so...
    3. Didn't change per #264.
    4. Fixed.
    5. Didn't change per #264.
steveoliver’s picture

@xjm awesome!

1. Manual testing of several standard Drupal pages looks good.

2. I just double-checked that we have follow up issues created, updated, and linked for every @todo.

3. Super-minor, missed last nits:

+++ b/core/themes/engines/twig/twig.engine
@@ -45,6 +47,7 @@ function twig_init(Extension $theme) {
 function twig_render_template($template_file, $variables) {
+  /** @var \Twig_Environment $twig_service */
   $twig_service = \Drupal::service('twig');

Helpful comment? Unconventional format? ...Otherwise +1 RTBC

steveoliver’s picture

Status: Needs review » Reviewed & tested by the community
Issue tags: -Needs manual testing, -Needs change record

@chx reminded me 3. is for IDEs. Change record looks good. RTBC! :)

tim.plunkett’s picture

I was about ~60% of the way through the patch, and I didn't see anything that should hold this up.

+1 for RTBC.

Fabianx’s picture

Status: Reviewed & tested by the community » Needs work
+++ b/core/themes/engines/twig/twig.engine
@@ -179,3 +184,97 @@ function twig_without($element) {
+  // We have a string or an object converted to a string: Autoescape it!
+  if (isset($return)) {
+    if ($strategy != 'html') {
+      // Drupal only supports the HTML escaping strategy, so provide a
+      // fallback for other strategies.
+      // @todo Add an optional strategy parameter to SafeMarkup function calls,
+      //   to avoid this when it is already safe for this strategy.
+      return twig_escape_filter($env, $return, $strategy, $charset, $autoescape);
+    }
+    if (!SafeMarkup::isSafe($return)) {
+      return String::checkPlain($return);
+    }
+    return $return;
+  }

I am late back to the party, but this part needs a little more work unfortunately and not due to the @todo, but due to the fact that we change the meaning of:

|escape in a template.

Currently we don't escape even if someone wants to explicitly escape - which is a major change to how twig autoescape works.

Therefore, sorry for setting CNW so close to merging.

The code should actually be:

// We have a string or an object converted to a string: Autoescape it!
if (isset($return)) {
  if ($autoescape && SafeMarkup::isSafe($return, $strategy) {
    return $return;
  }
  // Drupal only supports the HTML escaping strategy, so provide a
  // fallback for other strategies.
  if ($strategy == 'html') {
    return String::checkPlain($return);
  }
  return twig_escape_filter($env, $return, $strategy, $charset, $autoescape);
}

That also solves the @todo. I knew I was missing something before, but just by re-reviewing now, I finally found it.

Can someone put that change into a patch, thanks?

With this change it is RTBC + 1 from me, too.

scor’s picture

Status: Needs work » Needs review
FileSize
102.14 KB
102.02 KB
1.26 KB

The patch needed a reroll (271-reroll.patch). Addressing @Fabianx comment from #270 in twig-autoescape-1825952-271.patch (which is the one to be committed).

Status: Needs review » Needs work

The last submitted patch, 271: twig-autoescape-1825952-271.patch, failed testing.

cosmicdreams’s picture

+++ b/core/themes/engines/twig/twig.engine
@@ -237,17 +237,15 @@ function twig_drupal_escape_filter(\Twig_Environment $env, $string, $strategy =
+    if ($autoescape && SafeMarkup::isSafe($return, $strategy) {

Syntax Error here, forgot closing ")"

Fabianx’s picture

Uh, yes. Right #273.

Sorry for that.

scor’s picture

Status: Needs work » Needs review
FileSize
102.02 KB
637 bytes
star-szr’s picture

Issue summary: View changes
Fabianx’s picture

Issue summary: View changes
Status: Needs review » Needs work
+++ b/core/modules/file/templates/file-upload-help.html.twig
@@ -11,4 +11,4 @@
-{{ descriptions|join('<br />') }}
+{{ descriptions|safe_join('<br />') }}

+++ b/core/modules/system/tests/themes/test_theme/templates/theme-test-specific-suggestions--variant--foo.html.twig
@@ -2,4 +2,4 @@
-{{ theme_hook_suggestions|join("<br />") }}</p>
+{{ theme_hook_suggestions|safe_join("<br />") }}</p>

+++ b/core/modules/system/tests/themes/test_theme/templates/theme-test-specific-suggestions--variant.html.twig
@@ -2,4 +2,4 @@
-{{ theme_hook_suggestions|join("<br />") }}</p>
+{{ theme_hook_suggestions|safe_join("<br />") }}</p>

+++ b/core/modules/views_ui/templates/views-ui-display-tab-setting.html.twig
@@ -20,6 +20,6 @@
-    {{ settings_links|join('<span class="label">&nbsp;|&nbsp;</span>') }}
+    {{ settings_links|safe_join('<span class="label">&nbsp;|&nbsp;</span>') }}

+++ b/core/themes/engines/twig/twig.engine
@@ -179,3 +184,95 @@ function twig_without($element) {
+function twig_drupal_join_filter($value, $glue = '') {
+  $glue = SafeMarkup::escape($glue);
+  $separator = '';

There are strong arguments from our code to remove making $glue safe and instead document in the function definition that glue is expected to be a string safe for output and user provided data should never be passed as a glue.

/me sings the code needs work song ... (sorry for that!)

EDIT:

To be precise what I mean that needs change is:

/**
  * @param $glue ...
+  *                      Glue is expected to be safe for output and user provided data should never be used as a glue.
  */
function twig_drupal_join_filter($value, $glue = '') {
-  $glue = SafeMarkup::escape($glue);
  $separator = '';
Fabianx’s picture

Issue summary: View changes

The last submitted patch, 271: twig-autoescape-1825952-271-reroll.patch, failed testing.

scor’s picture

Status: Needs work » Needs review
FileSize
102.09 KB
837 bytes

clarified the usage of $glue per #277.

Status: Needs review » Needs work

The last submitted patch, 280: twig-autoescape-1825952-280.patch, failed testing.

chx’s picture

Status: Needs work » Reviewed & tested by the community

Bot fluke.

alexpott’s picture

Status: Reviewed & tested by the community » Fixed

Committed 87e675f and pushed to 8.x. Thanks!

Can we get a followup to add explicit testing of twig_drupal_join_filter() and twig_drupal_escape_filter() - something along the lines of TwigTransTest.

diff --git a/core/includes/errors.inc b/core/includes/errors.inc
index 1da90ee..43fe7e7 100644
--- a/core/includes/errors.inc
+++ b/core/includes/errors.inc
@@ -6,7 +6,6 @@
  */
 
 use Drupal\Component\Utility\SafeMarkup;
-use Drupal\Component\Utility\String;
 use Drupal\Component\Utility\Xss;
 use Drupal\Core\Page\DefaultHtmlPageRenderer;
 use Drupal\Core\Utility\Error;
diff --git a/core/includes/theme.maintenance.inc b/core/includes/theme.maintenance.inc
index 71daaee..8eb0775 100644
--- a/core/includes/theme.maintenance.inc
+++ b/core/includes/theme.maintenance.inc
@@ -5,7 +5,6 @@
  * Theming for maintenance pages.
  */
 
-use Drupal\Component\Utility\SafeMarkup;
 use Drupal\Component\Utility\Unicode;
 use Drupal\Core\Site\Settings;
 
diff --git a/core/lib/Drupal/Component/Utility/String.php b/core/lib/Drupal/Component/Utility/String.php
index e48934b..970436c 100644
--- a/core/lib/Drupal/Component/Utility/String.php
+++ b/core/lib/Drupal/Component/Utility/String.php
@@ -7,8 +7,6 @@
 
 namespace Drupal\Component\Utility;
 
-use Drupal\Component\Utility\SafeMarkup;
-
 /**
  * Provides helpers to operate on strings.
  *
diff --git a/core/lib/Drupal/Component/Utility/Xss.php b/core/lib/Drupal/Component/Utility/Xss.php
index 9189c08..ddce179 100644
--- a/core/lib/Drupal/Component/Utility/Xss.php
+++ b/core/lib/Drupal/Component/Utility/Xss.php
@@ -7,8 +7,6 @@
 
 namespace Drupal\Component\Utility;
 
-use Drupal\Component\Utility\SafeMarkup;
-
 /**
  * Provides helper to filter for cross-site scripting.
  *
diff --git a/core/modules/book/src/BookExport.php b/core/modules/book/src/BookExport.php
index 3231802..4b45078 100644
--- a/core/modules/book/src/BookExport.php
+++ b/core/modules/book/src/BookExport.php
@@ -7,7 +7,6 @@
 
 namespace Drupal\book;
 
-use Drupal\Component\Utility\SafeMarkup;
 use Drupal\Core\Entity\EntityManagerInterface;
 use Drupal\node\NodeInterface;
 
diff --git a/core/modules/field/src/Plugin/views/field/Field.php b/core/modules/field/src/Plugin/views/field/Field.php
index c33a070..f3814c4 100644
--- a/core/modules/field/src/Plugin/views/field/Field.php
+++ b/core/modules/field/src/Plugin/views/field/Field.php
@@ -8,7 +8,6 @@
 namespace Drupal\field\Plugin\views\field;
 
 use Drupal\Component\Utility\SafeMarkup;
-use Drupal\Component\Utility\String;
 use Drupal\Component\Utility\Xss;
 use Drupal\Core\Entity\ContentEntityDatabaseStorage;
 use Drupal\Core\Entity\EntityInterface;
diff --git a/core/modules/filter/filter.module b/core/modules/filter/filter.module
index 953924b..68b445e 100644
--- a/core/modules/filter/filter.module
+++ b/core/modules/filter/filter.module
@@ -8,7 +8,6 @@
 use Drupal\Component\Utility\Html;
 use Drupal\Component\Utility\Unicode;
 use Drupal\Component\Utility\NestedArray;
-use Drupal\Component\Utility\SafeMarkup;
 use Drupal\Component\Utility\String;
 use Drupal\Component\Utility\Xss;
 use Drupal\Core\Cache\Cache;
diff --git a/core/modules/system/src/Form/DateFormatFormBase.php b/core/modules/system/src/Form/DateFormatFormBase.php
index 6a4e050..8b0d9fa 100644
--- a/core/modules/system/src/Form/DateFormatFormBase.php
+++ b/core/modules/system/src/Form/DateFormatFormBase.php
@@ -124,7 +124,7 @@ public function form(array $form, array &$form_state) {
       '#machine_name' => array(
         'exists' => array($this, 'exists'),
         'replace_pattern' =>'([^a-z0-9_]+)|(^custom$)',
-        'error' => t('The machine-readable name must be unique, and can only contain lowercase letters, numbers, and underscores. Additionally, it can not be the reserved word "custom".'),
+        'error' => $this->t('The machine-readable name must be unique, and can only contain lowercase letters, numbers, and underscores. Additionally, it can not be the reserved word "custom".'),
       ),
     );
 
diff --git a/core/modules/system/src/Form/ModulesListForm.php b/core/modules/system/src/Form/ModulesListForm.php
index 2c2b6b6..da1ecd0 100644
--- a/core/modules/system/src/Form/ModulesListForm.php
+++ b/core/modules/system/src/Form/ModulesListForm.php
@@ -7,7 +7,6 @@
 
 namespace Drupal\system\Form;
 
-use Drupal\Component\Utility\SafeMarkup;
 use Drupal\Component\Utility\String;
 use Drupal\Component\Utility\Unicode;
 use Drupal\Core\Entity\EntityManagerInterface;
diff --git a/core/modules/views/views.module b/core/modules/views/views.module
index 6de7b71..745d397 100644
--- a/core/modules/views/views.module
+++ b/core/modules/views/views.module
@@ -9,7 +9,6 @@
  * incoming page and block requests.
  */
 
-use Drupal\Component\Utility\SafeMarkup;
 use Drupal\Component\Utility\String;
 use Drupal\Core\Cache\Cache;
 use Drupal\Core\Database\Query\AlterableInterface;

Fixed on commit.

  • alexpott committed 87e675f on
    Issue #1825952 by Fabianx, joelpittet, bdragon, heddn, chx, xjm,...
Fabianx’s picture

OMG, Yes! That is wonderful!

/me sings the squash-the-beta-blocker-song "Another one bites the dust ..." :-D

pfrenssen’s picture

Just a note, for inline variable declarations, please use the standard format, and not the proprietary PHPStorm-only format.

// This only works in PHPStorm.
/** @var \Twig_Environment $twig_service */


// This works in all IDE's, including PHPStorm.
/* @var $twig_service \Twig_Environment */
chx’s picture

pfrenssen thanks I never knew that! Please file a followup.

webchick’s picture

Awesome work getting this in, all!! :)

Reviewing the change notice though, I don't see any before/after D7 vs. D8 code for module devs and especially themers on how they're supposed to upgrade their existing code. Could this please be added? And the cautionary tales about SafeMarkup::set() that are there seem like they probably belong somewhere less transient than a change notice that will only be reviewed by people upgrading code from D7 to D8 (which will eventually reach 0 people).

pfrenssen’s picture

Made a followup for the @var declarations #2305641: Use the "standard" format for @var inline variable type declarations as well as a suggestion to put this in the comment standards #2305593: [policy] Set a standard for @var inline variable type declarations

xjm’s picture

And the cautionary tales about SafeMarkup::set() that are there seem like they probably belong somewhere less transient than a change notice that will only be reviewed by people upgrading code from D7 to D8 (which will eventually reach 0 people).

The cautionary remarks are also all over the SafeMarkup class in the API documentation itself. :)

To the point about D7-to-D8 code... in some ways, it's out of scope. SafeMarkup is an internal use API. The scope of using the render and theme layers properly is way, way beyond this issue. Maybe one of the theme system maintainers could add links to other resources about themable code in D8?

jbrown’s picture

There are double-escaped strings on the module uninstall page:

<label for="edit-uninstall-block" class="module-name table-filter-text-source">Block</label>

Should I make a separate issue?

heddn’s picture

@jbrown, yes please. We knew going into this patch that we'd have a few stray things slip through the cracks. For this, and all other similar findings please open their own separate follow-ups.

xjm’s picture

Yep, please file double-escaping bug reports as children of #2297711: Fix HTML escaping due to Twig autoescape.

thedavidmeister’s picture

yay, this getting in is great news!

yched’s picture

Feedback welcome in #2199637-32: Replace "required" flag of Field module with proper dependencies :

In short, this patch did:

 function field_filter_xss($string) {
-    return Html::normalize(Xss::filter($string, _field_filter_xss_allowed_tags()));
+  return SafeMarkup::set(Html::normalize(Xss::filter($string, _field_filter_xss_allowed_tags())));
  }

But it also made Xss::filter() itself call SafeMarkup::set(), so we end up calling it twice here. Not sure why the wrapping SafeMarkup::set() was added ?

chx’s picture

You are feeding a safe string to Html::normalize so the result will be safe too but the output is not necessarily the same as the input so the new string needs to be marked as safe too.

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.

hass’s picture

Is this change the reason for #2345903: Form descriptions are checkplained?

xjm’s picture

Issue tags: +SafeMarkup
kay_v’s picture

Issue summary: View changes