In #2545972-96: Remove all code usages SafeMarkup::checkPlain() and rely more on Twig autoescaping @pwolanin pointed out the #plain_text might be preferred to #safe_strategy. In the same issue @Damien Tournoud says that

'#safe_strategy' => Markup::SAFE_STRATEGY_ESCAPE is just plain wrong, as it mixes two radically different issues: the issue of the format (plain text vs. html), and the issue of the sanitization.

Whilst this mixing of "radically different" has been present in HEAD for a long time with ['#markup' => SafeMarkup::checkPlain($stuff)] however we're trying to remove SafeMarkup::checkPlain() so perhaps we should implement this.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

alexpott created an issue. See original summary.

alexpott’s picture

Given that #safe_strategy has only been around for a week perhaps it is acceptable to make this change.

joelpittet’s picture

Hmm, I must have missed when '#safe_strategy' went in. But reading over the patch a couple of times the '#safe_strategy' seems clear in how the text will be treated. The DX looks to improve slightly, but having less keys to add will do that for anything.

#plain_text as a key doesn't tell me that it will be escaped (except if I remember that check_plain(), escaped things, which I'd rather forget ;-) ).

Damien Tournoud’s picture

@alexpott: #2 looks wrong to me: if it is plain-text, it is plain-text and it must be escaped. It is absolutely incorrect to call SafeMarkup::isSafe($elements['#markup'] in that case.

This relates to a misunderstanding you had in the other issue (2545972#104):

I'm intrigued by the idea of introducing #plain_text and removing the #safe_strategy key. But there might be a key benefit of the current approach that is being overlooked. It makes it simpler to change the strategy whilst rendering. For example, at the moment node titles are escaped. However it is not uncommon to what to change this an permit a limited set of tags in the title. In HEAD atm the moment you could just change the strategy to Markup::SAFE_STRATEGY_FILTER and set #allowed_tags to the desired list.

This is just not possible. If something is plain-text, it is plain-text. If something is HTML, it is HTML. There is no in-between, and you cannot just imagine that you can pull a switch and magically start accepting a subset of HTML in a field that was previously plain-text. In an HTML text, some key characters have special meaning (notably &, < and >), and you cannot just change their meaning mid-flight.

On a blog about HTML, for example (it's a very contrived example, but it gets the point across very well), all those titles would be affected by a sudden switch from plain-text to HTML:

  • How to best use the <marquee> tag
  • Why you should stop using &nbsp; and start learning Unicode instead
  • We &#9829; developers
pwolanin’s picture

I agree with Damien - you should be setting a string there and it should always be escaped.

Here's a patch that changes the logic a little to match that notion.

catch’s picture

@Damien that's true in terms of the lifetime of that site - the HTML blog can't change strategy mid-stream (or not without a database update to escape all the existing titles). Also the example isn't that contrived, there are d.o issues with HTML tags in the titles.

However another site might want <em> tags in node titles to work and choose filtering instead. I've seen that implemented on at least a couple of sites in the past, although it's never pretty.

If both sites from the start decide on escaping vs. filtering then there is no 'mid-flight' change at all.

alexpott’s picture

Here is a non-contrived usage of HTML in a title taken from https://www.newscientist.com/:
<i>American Zoo</i> shows animal conservation some tough love https://www.newscientist.com/

And of course we have a few thousand users of https://www.drupal.org/project/html_title and a not inconsiderate amount of help articles and questions on the topic https://www.google.com/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF...

The current approach of escape means that people have no alternative but to enable a module or hack their theme because we escape. If we filtered then at least people could enter the the escaped html in the title. However, I'm not suggesting that we change this here I just think we need to make it as easy as possible for a module to exist that changes this behaviour.

Wrt to the current patch - I ummed and ahhed about the making it always escape. I think we might have problems with translation since if a translator adds html, for example: http://www.w3schools.com/tags/tag_bdi.asp

+++ b/core/lib/Drupal/Core/Render/Element/Markup.php
@@ -121,26 +100,20 @@ public function getInfo() {
+      unset($elements['#plain_text']);

Why unset? We don't do this for other render array properties.

pwolanin’s picture

The unset isn't really needed.

Translations shouldn't be a problem if they go to #markup. Really anything that's already escaped and/or marked safe should also be using that.

pwolanin’s picture

I think having the node title in a render array and being able to change the render array (moving text from #text_plain to #markup doesn't seem a lot harder than changing the value of another key) is a lot better than the case now of just having text.

Wim Leers’s picture

I expected a '#type' => 'plain_text', so a PlainText render element type. I think semantically that'd make more sense.

+++ b/core/lib/Drupal/Core/Render/Element/Markup.php
@@ -50,8 +47,7 @@
+ *   '#text_plain' => '<em>This is escaped</em>',

#text_plain? Shouldn't this be #plain_text?

joelpittet’s picture

Status: Needs review » Needs work

@pwolanin my comment in #3 is still my concern. #plain_text to me doesn't mean "escape", but neither does #markup mean filter some of the markup... So I don't think I have much to contribute to this issue.

Needs work on #10

joelpittet’s picture

Status: Needs work » Reviewed & tested by the community

Then again, this patch doesn't make things worse... and for people familiar with the field options of 'Plain Text' (and not oddly named check_plain() function). And #markup is a bit clearer as you don't have to know the strategy it's using on top of all the other implicit things that are happening.

I can't think of a better name, so I change my mind.

joelpittet’s picture

Fixed comment typo from #10

Wim Leers’s picture

Status: Reviewed & tested by the community » Needs review

I'm not yet fully convinced this is actually better.

I don't disagree at all with Damien's explanation/rationale in #4. But, the fact is, the end result/goal of Drupal's render arrays always is… markup. It is from that POV that I always interpreted #markup: Oh, hey, Drupal, you incredibly complex string concatenation machine, when concatenating this subtree of the overall render tree, all you need to do is use this bit of text as the markup. Thanks..

I'm not saying I think #plain_text is a bad idea per se, but I do think it needs further consideration. Hence, moving back to NR. I think this at least needs a Yep, I'm fully convinced because X, Y, Z. from Alex Pott.

Finally, #10 is also not yet answered (Why not '#type' => 'plain_text'?).

joelpittet’s picture

re #10 I'm ok with it being not being a white elephant like #markup and creating #plain_text as a proper Element

effulgentsia’s picture

Similar to #14, I'm also wondering what DX benefits:

return [
  '#plain_text' => $plain_text,
];

has over:

return [
  '#markup' => Html::escape($plain_text),
];

Per #4, in both cases, the developer needs to know that $plain_text is plain text and not an HTML string, and then convey that information to the render array (which per #14 is an HTML concatenation machine). To me, the two approaches above seem like equally direct and straightforward ways of conveying that, but the latter seems more future compatible with #2554073: Allow #markup to be an array of strings and join them safely.

alexpott’s picture

Well one big issue with

return [
  '#markup' => Html::escape($plain_text),
];

is that currently that unnecessarily filters the #markup even though it has been escaped and that is a waste of resources.

joelpittet’s picture

re #17 would we be open to the idea of reverting that #markup ~= Xss Admin Filter so that #markup => "<script>alert('What you put into it');</script>"?

effulgentsia’s picture

Is #17 significant enough to worry about? How common is it to put plain text strings into a render array? Most text string literals should go through t(), which can be assigned to #markup directly. Things like node title (and all other content entity fields) should be output with field formatters, which would allow the appropriate ability for a site to configure whether those should be treated as HTML or text. Sure, maybe there are some content/configuration strings which would go through this Html::escape() followed by Xss::filterAdmin() sequence, but especially if these are short strings, how much overhead are we really talking about?

alexpott’s picture

Things like node title (and all other content entity fields) should be output with field formatters

Unfortunately I don;t think it's working that way - the title is not a field in the way that you can just go and edit the field settings.

The patch attached contains a conversion of the MessageForm to use #plain_text for the item render elements it puts on the page. This nicely illustrates the problem with the Markup render element - the support for #markup (and with this patch #plain_text) is baked into the render. The lines that do this are:

    // If #theme is not implemented and the element has raw #markup as a
    // fallback, prepend the content in #markup to #children. In this case
    // #children will contain whatever is provided by #pre_render prepended to
    // what is rendered recursively above. If #theme is implemented then it is
    // the responsibility of that theme implementation to render #markup if
    // required. Eventually #theme_wrappers will expect both #markup and
    // #children to be a single string as #children.
    if (!$theme_is_implemented && isset($elements['#markup'])) {
      $elements['#children'] = SafeString::create($elements['#markup'] . $elements['#children']);
    }

Also testing this revealed a double escaping bug in the dblog messages list :(

Status: Needs review » Needs work

The last submitted patch, 20: 2555931-2.20.patch, failed testing.

alexpott’s picture

Status: Needs work » Needs review
FileSize
1.28 KB
20.29 KB

Lol... so we have a test asserting on the double escape bug in dblog...

    // Remove all HTML and PHP tags from a tooltip, calling expensive strip_tags()
    // only when a quick strpos() gives suspicion tags are present.
    if (isset($variables['options']['attributes']['title']) && strpos($variables['options']['attributes']['title'], '<') !== FALSE) {
      $variables['options']['attributes']['title'] = strip_tags($variables['options']['attributes']['title']);
    }

This code in link generator is supposed to the strip tags from titles... even though they are later escaped hence the test change might not be quite what you where expecting. fun.

joelpittet’s picture

#22 That reminds me of #2531824: Attribute class to check safe strings before escaping (has tests) I wonder if there is a correlation, maybe not...

joelpittet’s picture

+++ b/core/modules/dblog/src/Tests/DbLogTest.php
@@ -733,11 +733,11 @@ public function testTemporaryUser() {
-    $this->assertRaw('title="&amp;lt;script&amp;gt;alert(&#039;foo&#039;);&amp;lt;/script&amp;gt;Lorem ipsum dolor sit amet, consectetur adipiscing elit. Entry #0">&lt;script&gt;alert(&#039;foo&#039;);&lt;/script&gt;Lorem ipsum dolor sit…</a>');
+    $this->assertRaw('title="alert(&#039;foo&#039;);Lorem ipsum dolor sit amet, consectetur adipiscing &amp; elit. Entry #0">&lt;script&gt;alert(&#039;foo&#039;);&lt;/script&gt;Lorem ipsum dolor sit…</a>');

Should we add assertNoRaw() to check for that script tag?

Damien Tournoud’s picture

For the record, I agree with #16. I am coming here just for one reason which is that '#safe_strategy' => Markup::SAFE_STRATEGY_ESCAPE was just wrong, because it was confusing encoding and sanitization.

pwolanin’s picture

@Damien Tournoud - I think #16 is not as good since it's not possible to change the already-escaped text.

Obviously it would also work fine, but I think it's helpful to have a "native" option in render arrays

andypost’s picture

+1 rtbc, just nits to clean use statements

  1. +++ /dev/null
    @@ -1,147 +0,0 @@
    -class Markup extends RenderElement {
    
    +++ b/core/lib/Drupal/Core/Render/Renderer.php
    @@ -301,12 +303,6 @@ protected function doRender(&$elements, $is_root_call = FALSE) {
    -    if (isset($elements['#markup']) && !isset($elements['#type'])) {
    -      $elements['#type'] = 'markup';
    
    +++ b/core/lib/Drupal/Core/Render/theme.api.php
    @@ -267,11 +267,37 @@
    - *   @see \Drupal\Core\Render\Element\Markup
    

    +1, yay!

  2. +++ b/core/modules/contact/src/MessageForm.php
    @@ -130,12 +130,12 @@ public function form(array $form, FormStateInterface $form_state) {
    -      $form['name']['#markup'] = SafeMarkup::checkPlain($user->getUsername());
    +      $form['name']['#plain_text'] = $user->getUsername();
    ...
    -      $form['mail']['#markup'] = SafeMarkup::checkPlain($user->getEmail());
    +      $form['mail']['#plain_text'] = $user->getEmail();
    

    please clean-up use of class

alexpott’s picture

I think the filtering already escaped text is problematic because of performance and just general sense. What is the point?

I agree that '#safe_strategy' was a misstep so I think we have four options here:

  1. Keep SafeMarkup::checkPlain(). I think this is a bad idea - we the safe string list is difficult to work with and has very hard to predict side effects. We should be doing everything we can to reduce it for D8 and definitely plan for it's removal.
  2. Remove the auto-filtering of #markup. I don't think this is a good idea. The auto-filtering was brought into the render system to make it similar to Twig auto-escape in that the render system would prevent the developers from unintentionally introducing security issues.
  3. Create a new SafeStringInterface object that wraps strings and escapes them so we can pass it to #markup and not do the filtering. This leads to more object creation and does not have the advantage of just scanning a render array and knowing what will be escaped or filtered
  4. Add '#plain_text' support to Renderer (this patch)

I think the fourth approach is preferable since it allows us to alter render arrays to change the behaviour without adding or removing calls to Html::escape() and it gives render arrays a way of working very similar to Twig's auto-escape.

Here's a patch that adds the additional assertion requested by @joelpittet.

Wim Leers’s picture

  1. +++ b/core/lib/Drupal/Core/Render/Renderer.php
    @@ -377,6 +373,12 @@ protected function doRender(&$elements, $is_root_call = FALSE) {
    +    // All render elements support #markup and #text_plain.
    

    s/#text_plain/#plain_text/

  2. +++ b/core/lib/Drupal/Core/Render/Renderer.php
    @@ -756,4 +758,48 @@ protected function xssFilterAdminIfUnsafe($string) {
    +  protected function ensureMarkupIsSafe(array $elements) {
    ...
    +      $elements['#markup'] = SafeString::create(Html::escape($elements['#plain_text']));
    

    I think this means we want to add documentation to Html::escape(), to inform the developer that Html::escape() should generally not be used when constructing render arrays: #plain_text should be used instead.

  3. +++ b/core/modules/dblog/src/Controller/DbLogController.php
    @@ -184,13 +184,16 @@ public function overview() {
    +        // Truncate the title text to 256 chars of message. Attribute will
    +        // escape and unsafe HTML entities in the final text.
    

    "Attribute will escape" sounds strange.

  4. +++ b/core/tests/Drupal/Tests/Core/Render/RendererTest.php
    @@ -101,6 +101,19 @@ public function providerTestRenderBasic() {
    +    // Mixing #plain_text and #markup based renderable array.
    ...
    +    // Safe strings in #plain_text are are still escaped.
    

    Yay for these tests!

alexpott’s picture

@Wim Leers thanks.
1. Fixed
2. Fixed
3. I don't think the comments saying how many characters truncate is truncating too is useful - refactor the comments to hopefully fix this.
4. :)

Wim Leers’s picture

Much clearer, thanks!

+++ b/core/lib/Drupal/Component/Utility/Html.php
@@ -374,6 +374,10 @@ public static function decodeEntities($text) {
+   * When constructing @link theme_render render arrays @endlink passing the output of Html::escape() to

80 cols violation.


I think the third approach is preferable since […]

So… given that, shouldn't we just stop reviewing/rerolling this patch? This is a bit confusing :P

alexpott’s picture

I think the third approach is preferable since

Oops I realised that there was another option of not filtering #markup whilst writing the comment.

That's not an 80 cols violation cause of @link theme_render render arrays @endlink - see https://www.drupal.org/coding-standards/docs#link

Wim Leers’s picture

#32: So… you are still +1 for this patch?

alexpott’s picture

Yes I think it is the way to go given that we want #markup to auto filter and we using Twig's auto-escape.

pwolanin’s picture

Status: Needs review » Reviewed & tested by the community

hey ho, let's go.

lauriii’s picture

I created small CR for this.

alexpott’s picture

There is another reason why this is necessary and the following code does not work...

return [
  '#markup' => Html::escape($plain_text),
];

Xss:adminFilter() is not idempotent wrt strings it does not filter :( So

$plain_text = 'M & S';

$escaped_text = Html::escape($plain_text); // 'M &amp; S'
$output = $render->renderPlain['#markup' => $escaped_text]; // 'M &amp;amp; S'

:(

alexpott’s picture

Thank @lauriii for the CR - we need to update https://www.drupal.org/node/2549395 once this is committed too.

alexpott’s picture

Actaully #37 is fortunately wrong... yay...

$text = Drupal\Component\Utility\Xss::filterAdmin('M & S'); // 'M &amp; s'
$text2 = Drupal\Component\Utility\Xss::filterAdmin($text);  // 'M &amp; s'

The code in Xss::filter() is interesting.

  • catch committed 00d1bdb on 8.0.x
    Issue #2555931 by alexpott, pwolanin, joelpittet: Add #plain_text to...
catch’s picture

Status: Reviewed & tested by the community » Fixed

OK so I liked using the element on the other issue, but it clearly does not work with #item_list.

Adding #plain_text and ditching #safe_strategy is great.

Committed/pushed to 8.0.x, thanks!

stefan.r’s picture

Status: Fixed » Needs work
  1. +++ b/core/lib/Drupal/Component/Utility/Html.php
    @@ -374,6 +374,10 @@ public static function decodeEntities($text) {
    +   * '#markup' is not recommended. Use the '#plain_text' key instead and the
    +   * renderer will autoescape the text.
    

    Maybe "s/and the/so (that) the/" to clarify that it's usage of that key that causes the auto-escaping?

  2. +++ b/core/lib/Drupal/Core/Render/Renderer.php
    @@ -756,4 +758,48 @@ protected function xssFilterAdminIfUnsafe($string) {
    +   * By default all #markup is filtered to protect against XSS using the admin
    +   * tag list. Render arrays can alter the list of tags allowed by the filter
    +   * using the #allowed_tags property. This value should be an array of tags
    +   * that Xss::filter() would accept. Render arrays can escape text instead
    +   * of XSS filtering by setting the #plain_text property instead of #markup. If
    

    s/filtered to protect against XSS using the admin tag list/filtered using the the admin tag list to protect against XSS/

    s/using the/by using the/

    s/tags that Xss::filter() would accept/the tags that Xss::filter() must allow/

    s/instead of #markup/instead of the #markup property/

  3. +++ b/core/lib/Drupal/Core/Render/Renderer.php
    @@ -756,4 +758,48 @@ protected function xssFilterAdminIfUnsafe($string) {
    +   *   A render array with #markup set.
    

    with #markup and/or #plain_text set (though I think it we should only allow "or")

  4. +++ b/core/lib/Drupal/Core/Render/Renderer.php
    @@ -756,4 +758,48 @@ protected function xssFilterAdminIfUnsafe($string) {
    +   * @return \Drupal\Component\Utility\SafeStringInterface|string
    

    Are we missing a [] here?

  5. +++ b/core/lib/Drupal/Core/Render/Renderer.php
    @@ -756,4 +758,48 @@ protected function xssFilterAdminIfUnsafe($string) {
    +   *   The escaped markup wrapped in a SafeString object. If
    +   *   SafeMarkup::isSafe($elements['#markup']) returns TRUE, it won't be
    +   *   escaped or filtered again.
    

    We escape if it's #plain_text, otherwise we filter so let's say "sanitized" instead of "escaped"?

    The second sentence here is a bit confusing and I don't know that it even belongs in the @return, should we just drop it?

    Should we also explain that under which circumstances this will return a regular string? I.e. when the contents of #markup are already in the safe list or we pass in an empty string?

  6. +++ b/core/lib/Drupal/Core/Render/Renderer.php
    @@ -756,4 +758,48 @@ protected function xssFilterAdminIfUnsafe($string) {
    +   * @see \Drupal\Component\Utility\Xss::adminFilter()
    

    Xss::filterAdmin()

  7. +++ b/core/lib/Drupal/Core/Render/theme.api.php
    @@ -267,11 +267,37 @@
      * - #markup: Specifies that the array provides HTML markup directly. Unless
    ...
    + * - #plain_text: Specifies that the array provides text that needs to be
    

    These not only specify what the array provide, should we make explicit that they also provide the actual value, even if it's obvious from the code example?

  8. +++ b/core/lib/Drupal/Core/Render/theme.api.php
    @@ -267,11 +267,37 @@
    + *   vectors. (I.e, <script> and <style> are not allowed.) See
    

    I.e.

  9. +++ b/core/lib/Drupal/Core/Render/theme.api.php
    @@ -267,11 +267,37 @@
    + * - #allowed_tags: If #markup is supplied this can be used to change which tags
    + *   are using to filter the markup. The value should be an array of tags that
    + *   Xss::filter() would accept. If #plain_text is set this value is ignored.
    

    "which tags are using" is missing a "we", but maybe "which tags to use for filtering the markup"?

    Also, "s/tags that Xss::filter() would accept/the tags that Xss::filter() must allow/"

    As to the last sentence, I wonder if we should have an assertion disallowing setting of both #allowed_tags and #markup at the same time, instead of spelling out these edge cases?

  10. +++ b/core/tests/Drupal/Tests/Core/Render/RendererTest.php
    @@ -101,6 +101,19 @@ public function providerTestRenderBasic() {
    +      '#plain_text' => '<em>foo</em>',
    +      '#markup' => 'bar',
    

    So this would give an assertion error if we did assert(!(isset('#plain_text') && isset('#markup')))

stefan.r’s picture

blegh, crossposted with a commit :)

stefan.r’s picture

This is blocking deprecation of SafeMarkup::checkPlain() / SafeMarkup::escap so no reason to revert, let's open a follow-up: #2559577: Follow up to #2555931: Fix comments

stefan.r’s picture

Status: Needs work » Fixed
star-szr’s picture

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.