Problem/Motivation

As background, custom menu links can have both a "title" and "description". The "title" is rendered as the link text, while the "description" is rendered in the <a> element's "title" attribute. With that (confusing) terminology in mind, consider the following. The following code has 2 XSS vectors:

$untrusted_user_input_1 = 'javascript:alert(1)';
$untrusted_user_input_2 = '" onmouseover="alert(2)" onmouseout="';
$mistakenly_filtered_user_input_2 = check_markup($untrusted_user_input_2, 'restricted_html');
SafeMarkup::format('<a href="@url" title="@description">@title</a>', [
  '@title' => 'Click me',
  '@url' => $untrusted_user_input_1,
  '@description' => $mistakenly_filtered_user_input_2,
]);

Note that custom menu links themselves do not have this XSS vector, because:

  1. The <a> is not assembled via SafeMarkup::format(), but via Url::fromUri() and $options['attributes'], which are both safe from this.
  2. The "description" is never treated as HTML (check_markup() isn't called on it).

But if a contrib module happens to have a similar use case without these mitigating factors, then it could be vulnerable.

Proposed resolution

Option 1

Decide and document that t() and SafeMarkup::format() are not designed to be all-purpose HTML building functions. They are designed for simple translation and placeholder replacement, and therefore do not need to support placeholder insertion into non-URL attributes.

Option 2

Create a new placeholder prefix (~variable?) for non-URL attributes.

Remaining tasks

User interface changes

API changes

Data model changes

CommentFileSizeAuthor
#140 interdiff-113rebase-115.txt13.28 KByesct
#140 2570431-115.patch15.97 KByesct
#134 interdiff-129-134.txt1.12 KBstefan.r
#134 2570431-134.patch13.09 KBstefan.r
#131 2570431-129.patch13.08 KBstefan.r
#131 interdiff-114-129.txt4.51 KBstefan.r
#129 2570431-128.patch13 KBstefan.r
#129 interdiff-114-128.txt4.38 KBstefan.r
#116 2570431-114.patch13.06 KBstefan.r
#115 2570431-114.patch0 bytesstefan.r
#113 interdiff.2570431.107.113.txt4.18 KByesct
#113 2570431.113.patch13.16 KByesct
#107 interdiff.2570431.101.107.txt5.52 KByesct
#107 2570431-107.patch12.3 KByesct
#101 interdiff-2570431-98-101.txt760 bytesyesct
#101 2570431-101.patch10.72 KByesct
#98 interdiff-2570431-94-98.txt7.29 KByesct
#98 2570431-98.patch10.72 KByesct
#95 2570431-94.patch11.1 KBstefan.r
#95 interdiff-89-94.txt3.86 KBstefan.r
#88 placeholders-2570431-89.patch10.54 KBxjm
#87 interdiff-84-87.txt2.65 KBxjm
#87 placeholders-2570431-87.patch2.65 KBxjm
#84 2570431-81.patch9.8 KBstefan.r
#84 interdiff-74-81.txt11.29 KBstefan.r
#74 2570431-74.patch8.33 KBstefan.r
#74 interdiff-69-74.patch1008 bytesstefan.r
#69 2570431-69.patch8.4 KBheine
#66 interdiff.txt6.83 KBDavid_Rothstein
#66 2570431-66.patch8.98 KBDavid_Rothstein
#62 2570431-61.patch7.93 KBcatch
#59 2570431-59.patch8.01 KBstefan.r
#59 interdiff-57-59.txt2.29 KBstefan.r
#59 2570431-59.patch8.01 KBstefan.r
#57 2570431-57.patch7.95 KBstefan.r
#57 interdiff-50-57.txt3.22 KBstefan.r
#50 interdiff.txt5.02 KBeffulgentsia
#50 2570431-50.patch7.16 KBeffulgentsia
#44 2570431-44.patch4.64 KBstefan.r
#44 interdiff-36-44.txt3 KBstefan.r
#43 2570431-43.patch3.5 KBstefan.r
#36 2570431-36.patch2.19 KBcatch
#17 2570431-16.patch31.61 KBDavid_Rothstein
#9 2570431-9.patch953 bytesDavid_Rothstein

Comments

effulgentsia created an issue. See original summary.

plach’s picture

I may be wrong, but I think this is addressed by the HtmlAttributeOutputFormatter introduced in #2509218: Ensure that SafeString objects can be used in non-HTML contexts.

David_Rothstein’s picture

Priority: Major » Critical

So if things like t('...<a title="@title">...') and t('...<img alt="@alt">...') aren't guaranteed to be safe in Drupal 8, that seems like a critical issue to me (and it's a regression from Drupal 7). Those are relatively common situations.

My first thought was that maybe we should address this directly:

Xss::filter() only filters XSS for a string used within an HTML fragment, not within an attribute value.

For example, Xss:filter() currently escapes standalone < and > characters; perhaps it should also escape standalone " and '? This could be the simplest solution, but also kind of scary to be sure that it covers all cases.

A dedicated placeholder like "~variable" would work, but feels pretty unsatisfying. It leaves @variable as insecure (big change from Drupal 7), and is adding a new placeholder that would only be used in very specific situations (title/alt/data-* attributes are the only places, at least currently). And it breaks the general idea that you should be able to use @variable any time you have arbitrary user input going into a place that is intended to accept arbitrary text.

If this was being designed from the beginning I would say that we should seriously consider having @variable mean "force this to be plain text" (like Drupal 7) and then some other placeholder specifically for passing in HTML, since it's really the caller who knows whether a string is expected to be plain text or not. But that would be a big change now.

David_Rothstein’s picture

I may be wrong, but I think this is addressed by the HtmlAttributeOutputFormatter introduced in #2509218: Ensure that the results of t() can be used as plain text for non-HTML contexts.

From reading that issue it's not obvious to me how, but it was also the first time I saw it :) So in an example similar to the issue summary here, like this:

t('<a title="@title">@link-text</a>', array(
  '@title' => check_markup($user_input, 'restricted_html'),
  '@link-text' => check_markup($other_user_input, 'restricted_html'),
));

How would your code in that issue know that @title must be forced to plain text, but @link-text is OK to remain as HTML?

plach’s picture

Nope, sorry, I misunderstood the issue. I thought it was about situations like:

<img title="<?php print t('Welcome, @user_name', ...) ?>" />
David_Rothstein’s picture

Hm, well now that you mention it, that seems like a good example also (it's related, but not quite the same as, the examples above). It's another case where the Drupal 7 code would be safe but the Drupal 8 code isn't, since Drupal 8 doesn't currently guarantee that @user_name will be plain text.

Clearly this issue is also somewhat related to #2569041: Figure out what to do about attribute filtering in Twig (similar examples as above, but in Twig templates rather than t()).

plach’s picture

#2509218: Ensure that SafeString objects can be used in non-HTML contexts is proposing to change the meaning we give to placeholders: from denoting the kind of formatting that should be applied, we'd switch to describe the type of value that is going to be replaced. This is consistent with #2565895: Add a new :placeholder to SafeMarkup::format() for URLs that handles bad protocols, in fact @value would be a generic string and :url a more specific URL. How those are formatted depends on the specified output formatter, which in turn is tied to the response content type. In this scenario a new placeholder indicating an HTML attribute value would make sense, especially because the native format of strings is supposed to be HTML:

t('<a title="^title" href=":url">@link-text</a>', [
  '^title' => check_markup($user_input, 'restricted_html'),
  '@link-text' => check_markup($other_user_input, 'restricted_html'),
  ':url' => Url::fromRoute('<current>'),
]);
stefan.r’s picture

What kind of attributes would we support here, double and single-quoted only? on* and style would be out as well.

David_Rothstein’s picture

Status: Active » Needs review
StatusFileSize
new953 bytes

If this was being designed from the beginning I would say that we should seriously consider having @variable mean "force this to be plain text" (like Drupal 7)

I'm wondering how many tests will fail if we just do that - trying it in the attached patch. It actually might not be too many, since these days it's pretty hard to get HTML into a t() @variable in the first place (that's why the examples in the issue summary here use check_markup(), which is a somewhat unlikely situation).

If there are failures, many could be due to double-escaping, and there are other ways to solve those if we had to (such as a separate safe list for "plain text" only).

The current behavior of @variable, which skips converting something to plain text if it's already been transformed into XSS-safe HTML, really doesn't make much sense when you think about it - since plain text and HTML are just not the same thing.

plach’s picture

What kind of attributes would we support here, double and single-quoted only? on* and style would be out as well.

Well, if we don't want to parse the string (and I think we don't want to), it's hard to support only specific attributes. In that light @David_Rothstein's proposal of using :attr placeholders for any attribute and not just URLs would make sense. We could use SafeString::create() to explicitly opt-out from sanitization when we need to prevent the attribute value from being mangled.

Status: Needs review » Needs work

The last submitted patch, 9: 2570431-9.patch, failed testing.

The last submitted patch, 9: 2570431-9.patch, failed testing.

stefan.r’s picture

Well, if we don't want to parse the string (and I think we don't want to), it's hard to support only specific attributes.

I meant just documenting what is supported or not, i.e. "option 1" in the issue summary. If we at some point we wanted to be more thorough than that, maybe rather than parsing on run-time, misplaced placeholders could still be caught through automated source code analysis?

catch’s picture

Those are relatively common situations.

Do you have examples from contrib or custom code? Various people have claimed that they're not common and we shouldn't support them. I haven't personally looked. If it is relatively common, then agreed on the critical priority.

For example, Xss:filter() currently escapes standalone < and > characters; perhaps it should also escape standalone " and '? This could be the simplest solution, but also kind of scary to be sure that it covers all cases.

It's just not possible to have a single escaping function that works for both html fragments and attribute values. Most importantly you need to know attribute names (or at least types) before you can escape an attribute correctly. More on that in #2544110: XSS attribute filtering is inconsistent and strips valid attributes and Attributes #2567743: Add protocol filtering to the core Attribute component.

@David_Rothstein's proposal of using :attr placeholders for any attribute and not just URLs would make sense.

We can't do that because protocol filtering can only be applied to URIs. What gets stripped by protocol filtering might be fine in alt or title:

\Drupal\Component\Utility\Urlhelper::stripDangerousProtocols("Llamas: they're cute");
" they're cute"

Or even with a blacklist of protocols instead of a whitelist valid values could still break:

5] self> \Drupal\Component\Utility\Urlhelper::stripDangerousProtocols("javascript: reasons not to use it");
" reasons not to use it"

So if we do want to support non-URI attributes in t() I don't see a way to provide that context without a new placeholder (or parsing the whole string, but we've already agreed to avoid that).

David_Rothstein’s picture

Do you have examples from contrib or custom code? Various people have claimed that they're not common and we shouldn't support them.

I looked and couldn't find too many direct examples, but https://www.drupal.org/project/l10n_server has a bunch that look similar to this:

return t('approved by !author <span title="@ago">on @date</span>', $params)

(Probably not actually insecure in D8 either, since I don't think user input happens to get passed in for @ago - but of course it could be.)

However, I've seen tons of Drupal 7 examples where people build up an HTML string like "<a title="' . check_plain($variable) . '"> and "<div data-whatever="' . check_plain($variable) . '">. And according to https://www.drupal.org/node/2296163 a couple suggested ways to convert that code to Drupal 8 include Twig templates (thereby running into #2569041: Figure out what to do about attribute filtering in Twig which is arguably at least as serious an issue as this one) and SafeMarkup::format() (which is of course exactly this issue - changing title to clarify that). Therefore this pattern will be more likely in Drupal 8 than Drupal 7.

And more generally, <a href=":url"> is obviously a very common pattern in t() strings, so deciding you want to add a dynamic title attribute to that link is really something reasonable we should be supporting. If we don't, people will try title=":title" and see that it's buggy, then gravitate towards title="@title", which will appear to work just fine, and when it does many people will undoubtedly just use it (regardless of what our documentation says).

For example, Xss:filter() currently escapes standalone < and > characters; perhaps it should also escape standalone " and '? This could be the simplest solution, but also kind of scary to be sure that it covers all cases.

It's just not possible to have a single escaping function that works for both html fragments and attribute values. Most importantly you need to know attribute names (or at least types) before you can escape an attribute correctly.

Right, I wasn't suggesting that we ever recommend Xss:filter() for attribute values, just harden it so that in the event it does ever get used on the particular attribute values being discussed in this issue, it is XSS-safe for those. It is definitely a workaround rather than a solution that would address the root cause of this problem.

The more I look at it, the more I think the root cause really is that @variable should never be allowed to return HTML - if the caller says a string is supposed to be plain text, that is what they should always get. The current behavior could at the very least lead to bugs (in addition to the one security issue discussed here). Hence the patch I posted above. More on that in a minute.

David_Rothstein’s picture

Title: Decide if and how to support non-URL attribute values in t() placeholders » Decide if and how to support non-URL attribute values in t() and SafeMarkup::format() placeholders

changing title to clarify that

And now, actually doing that...

David_Rothstein’s picture

Status: Needs work » Needs review
StatusFileSize
new31.61 KB

So the patch in #9 has an incredible number of test failures, but a large number of them should be fixed by the patch in #2568977: Replace SafeMarkup::format() in the link generator - it's a bad example to everyone (which looks close to RTBC). So here's a version that adds the latest patch there on top of #9.

There will still be test failures here (or at least there should be) but I wonder if the others will fall into the category of "bad example of SafeMarkup::format() usage that should be removed anyway" too...

Status: Needs review » Needs work

The last submitted patch, 17: 2570431-16.patch, failed testing.

The last submitted patch, 17: 2570431-16.patch, failed testing.

effulgentsia’s picture

The more I look at it, the more I think the root cause really is that @variable should never be allowed to return HTML - if the caller says a string is supposed to be plain text, that is what they should always get.

What about the "@" symbol means that the caller is requesting it to be treated as plain-text? In D7, the documentation of it is:

 *   - @variable: Escaped to HTML using check_plain(). Use this as the default
 *     choice for anything displayed on a page on the site. 

I always interpreted that second sentence as the important one. And the "escaped using check_plain()" as an implementation detail. By far the most common uses of "@" (or any placeholders) in D7 is for "href" attributes and the content that goes into an HTML fragment. A couple years ago in D8, the "href" attributes were mostly converted to "!" and will soon be converted to ":". Which means every core usage of "@" (and I'm guessing at least 95% of contrib, unless there's data that suggests otherwise) will be for values that go into an HTML fragment. For values inserted into an HTML fragment, with the whole auto-escaping concept introduced into D8 due to Twig, I don't see why it makes sense for users of t() to need to distinguish already escaped (and possibly containing safe markup) placeholders from not yet escaped ones. E.g.,

t('Submitted by @username on @datetime', array('@username' => $variables['author'], '@datetime' => $variables['created']));

In this case, @username happens to have safe HTML (a link generated using proper APIs) and @datetime has a plain-text, not-yet-escaped value. Why should the person constructing the t() string need to know that though? The point of the t() string is to just say: put the following placeholders into here "safely", and in HTML-fragment-context, what HEAD does for "@" satisfies "safely".

For the example in #15:

return t('approved by !author <span title="@ago">on @date</span>', $params)

Ok, so there might be some few rare contrib cases where HTML attributes other than href are contained within t() strings. But even in the above case, the t() author is looking at 2 different placeholder types. Why not use "@" for the more common case of "in an HTML fragment" (as per HEAD), and something else for the less common case of "in an attribute"? Something like:

return t('approved by @author <span title="~ago">on @date</span>', $params)

or whatever other punctuation character to replace "~" if we don't like that one?

David_Rothstein’s picture

The test failures are way way reduced, though still a bit more than I was hoping for. Here's a couple examples of current code that is taking advantage of the ability of @variable to pass through HTML:

  1. Shortcut module does this (simplified version of the code):
      $link_text = t('Add to %shortcut_set shortcuts', array('%shortcut_set' => $shortcut_set->label()));
      $html = SafeMarkup::format('<span class="shortcut-action__message">@text</span>', array('@text' => $link_text));
    

    So this gets HTML (the <em> tag produced by the % placeholder) into @text by taking advantage of the fact that the first t() call marks its output as safe.

  2. The error handling system in core/includes/errors.inc does this (simplified version of the code):
      $html = SafeMarkup::format('%type: @message in %function (line %line of %file). <pre class="backtrace">@backtrace</pre>', array(
        '@message' => SafeString::create(Xss::filterAdmin($message)),
      ));
    

    So this is basically where SafeMarkup::xssFilter() would have been used in the past, but with that removed from core it makes up its own version of the same thing.

  3. The modules page does this in theme_system_modules_details() (simplified version of the code):
      $renderer = \Drupal::service('renderer');
      $description = t('Machine name: @machine-name', array('@machine-name' =>  $renderer->render($machine_name_render)));
    

    I guess that one is sort of the accepted method for getting HTML into @variable now.

It is notable that all 3 cases above could be fixed very easily with the current D8 code simply by changing @variable to !variable. However there is consensus elsewhere not to use !variable anymore. It seems pretty clear to me that we need some placeholder separate from @variable that allows passing in HTML, though (even if it's a new one).

For more information on why the current Drupal 8 behavior of trying to conflate plain text and HTML into the same @variable is problematic, http://acko.net/blog/safe-string-theory-for-the-web is a good read.

effulgentsia’s picture

One other option that is maybe worth suggesting: "@" is already a little magical in that it does something different for if the value is already SafeMarkup::isSafe() than if it's not. What about making it even more magical and having it check whether $string contains ="@foo"? Basically change:

if (!SafeMarkup::isSafe($value)) {
  $args[$key] = Html::escape($value);
}

to:

if (strpos($string, '="' . $key . '"') !== FALSE || !SafeMarkup::isSafe($value)) {
  $args[$key] = Html::escape($value);
}

I ran this idea by @alexpott and he wasn't happy with adding more magic like this, but if we really feel that supporting attributes in t() strings is necessary, and doing it via "@" is necessary, then I'm at a loss for how else to do that.

David_Rothstein’s picture

I cross-posted with #20, but hopefully the blog post linked above actually addresses some of those questions.

To paraphrase an example from the blog post (a non-security-related example):

  1. Someone types "The <b> tag is deprecated" into a text field, that is intended to take plain text as input.
  2. When that is displayed on the site, it needs to be rendered as plain text unconditionally. It would not make sense to ever display it as HTML, since it will mess up the display and mess up what the person was trying to say.
  3. Only the calling code can know whether it's supposed to be plain text or HTML - there is no way to determine that automatically.

To be clear, I'm not questioning the similar "escape only if not marked safe" behavior of the Twig system here - that is basically just "if we're about to print to HTML and no one told us that this string is safe yet, let's do something as secure as possible to avoid XSS issues". That definitely makes sense. But doing that same kind of logic early (in SafeMarkup::format())... that's the part I don't think makes sense.

effulgentsia’s picture

For more information on why the current Drupal 8 behavior of trying to conflate plain text and HTML into the same @variable is problematic, http://acko.net/blog/safe-string-theory-for-the-web is a good read.

I've read that, but don't see how that's an argument against the current behavior of "@". That article says:

Make your HTML templates escape dynamic variables implicitly if possible.

which we now do in Twig via {{ foo }}. And the "@" symbol in t() does exactly the same.

And:

The represented symbols are meaningless unless you know the context to interpret them in.

Right. That's what it's all about. We need to know the context of the placeholder, either by the caller of t() telling us (via a different prefix) or by us automagically figuring it out by parsing the string, as in #22. But within:

t('Submitted by @username on @datetime', array('@username' => $variables['author'], '@datetime' => $variables['created']));

Those two placeholders are output in the same context, so should use the same placeholder prefix, not a different one based on knowledge of the value's history.

David_Rothstein’s picture

But within:

t('Submitted by @username on @datetime', array('@username' => $variables['author'], '@datetime' => $variables['created']));

Those two placeholders are output in the same context, so should use the same placeholder prefix, not a different one based on knowledge of the value's history.

What would happen if the person's username is something like <bob> or I <3 Drupal? If it's not forced to display as plain text, it won't necessarily be displayed correctly.

effulgentsia’s picture

What would happen if the person's username is something like <bob> or I <3 Drupal?

Then SafeMarkup::isSafe() would return FALSE, so it would get escaped, both when printed from Twig, and when replaced via a "@" placeholder. In order for SafeMarkup::isSafe() to return true, something must have set it to true, which wouldn't be the case for those examples. Because SafeMarkup::isSafe() still uses a global registry, maybe you can come up with some other example of a username, which would be actual legit HTML that some other piece of code marked as safe, and then if you displayed that username during the same request as that other piece of code ran, it would display as HTML instead of getting escaped as it should in the context of a username. But I think that after #2557113: Make t() return a TranslationWrapper object to remove reliance on a static, unpredictable safe list, we'll be able to get rid of that global registry, so even that edge case won't be a problem anymore.

David_Rothstein’s picture

But I think that after #2557113: Make t() return a TranslationWrapper object to remove reliance on a static, unpredictable safe list, we'll be able to get rid of that global registry, so even that edge case won't be a problem

If there's no global registry, yes, that removes the chance of accidental collisions (due to interference from code that runs during an unrelated part of the request).

But does it remove the problem entirely? My understanding is that it would not solve the security issue described in the issue summary - or would it? I'm not fully clear on how the above issue works, but since you are familiar with it and didn't mark this issue as a duplicate of that one yet, I assume there is still a problem :)

It's also not too hard to think of non-security-related problems analogous to the example in the issue summary (and which have nothing to do with HTML attributes), e.g. something where you have generated HTML that is being displayed, and then also want to take the same HTML output and render it as plain text so the user can copy-paste the source code. Obviously the security problem involving HTML attributes is the most compelling, though.

Why not use "@" for the more common case of "in an HTML fragment" (as per HEAD), and something else for the less common case of "in an attribute"?

So we do seem to agree that there should be two separate placeholders, though based on the above I would phrase it differently:
1. "in an HTML fragment" => A placeholder that allows HTML (with some kind of safe fallback if it doesn't know the HTML is safe)
2. "in an attribute" => A placeholder that does not allow HTML and that always escapes to plain text

I agree there's no fundamental reason @ couldn't be used for the first and a new placeholder for the second. I think the reason I proposed the opposite is for consistency with Drupal 7. To have @ behave differently in Drupal 8 than it did in Drupal 7 (and in such a way that if you blindly upgraded your code without changing it, you could wind up with security issues), while at the same time introducing a new placeholder for Drupal 8 that behaves the exact same way @ did in Drupal 7... yeah, that scares me a bit :) But it would definitely be the least disruptive way to do it at this point.

effulgentsia’s picture

Will respond to #27 in a bit, but as a pre-requisite to that, I want to respond to #23:

Someone types "The <b> tag is deprecated" into a text field, that is intended to take plain text as input...Only the calling code can know whether it's supposed to be plain text or HTML

The difficulty though is: which calling code? Because in Drupal, there can be many pieces of code in between pulling a plain-text string out of the database and placing it inside of a larger string that requires translation.

A couple examples to illustrate:

  1. As #15 shows, there can be use cases for wanting to wrap a <span> tag around a date in order to provide a tooltip for how long ago that date is. With that in mind, consider the example from #20, which comes from template_preprocess_comment():
    $variables['created'] = format_date($comment->getCreatedTime());
    ...
    $variables['submitted'] = t('Submitted by @username on @datetime', array('@username' => $variables['author'], '@datetime' => $variables['created']));
    

    So here you can argue that the calling code knows that $variables['created'] ought to be treated as plain-text, since we just defined that variable earlier in the same function. However, how confident should this code be that format_date() has no desire to return legitimate HTML, such as the date string with a <span> around it? But also, what if you have some other module that wants to alter $variables['submitted'] to the following:

    function mymodule_preprocess_comment(&$variables) {
    $ago = ...;
    $created = drupal_render([
      '#type' => 'inline_template',
      '#template' => '<span title="{{ ago }}">{{ created }}</span>',
      '#context' => [
        'ago' => $ago,
        'created' => $created,
      ],
    ]);
    $variables['submitted'] = t('Submitted by @username on @datetime', array('@username' => $variables['author'], '@datetime' => $created));
    

    Now, you could argue that here the calling code knows that $variables['created'] is HTML so can use a different prefix for 'datetime', but doing so would mean requiring translators to translate a new string. But what if I want this module to be compatible with existing translations of that "Submitted by ..." string?

  2. Many translation strings use an entity label as a placeholder value. For example, DraggableListBuilder::buildRow() does the following:
    $row['weight'] = array(
      '#type' => 'weight',
      '#title' => t('Weight for @title', array('@title' => $entity->label())),
      ...
    

    Now, I would argue that $entity->label() should always be a plain-text string, but not everyone agrees with me on that. EntityInterface::label() does not say whether HTML is ok there or not. So suppose someone decides to write a module where a particular entity type's label() method returns a SafeString object that contains HTML. If we say that "@" means "always treat as plain text", then the above DraggableListBuilder code would end up double-escaping. Why should code like that have to think about whether HTML tags are allowed in entity labels and choose a prefix based on needing to guess correctly about that? The beauty of Twig auto-escaping and SafeString is it allows for far less code to need to be coupled to such considerations.

catch’s picture

Ok, so there might be some few rare contrib cases where HTML attributes other than href are contained within t() strings. But even in the above case, the t() author is looking at 2 different placeholder types. Why not use "@" for the more common case of "in an HTML fragment" (as per HEAD), and something else for the less common case of "in an attribute"?

So for me I think this is the simplest and most robust solution here. Just another placeholder, rarely used, that always calls Html::escape() (unless it's an AttributeSafeStringInterface, if we add that and there's a use case for it). String parsing the first argument might be fine in the end, but we're talking a couple of lines + docs for the placeholder to avoid that magic so it doesn't seem so bad to just do that.

David_Rothstein’s picture

So in response to #28, yes, if your code doesn't know the nature of the data it's passing to t() and wants to be flexible, it should use whatever placeholder allows HTML (and then leave it up to code which runs earlier to decide). It sounds like we would agree that writing a module which magically switches entity labels from plain text to HTML is easier said than done, but if you want to try to help support that kind of thing, it's what you would do. So for that, the patch in #17 would need to change more than just places that are failing tests.

And yes, it's a shame that the way translations work in Drupal, switching any of these placeholders breaks translations (even though it doesn't change anything that's actually translatable).

David_Rothstein’s picture

@catch, did you have any thoughts on this part?:

To have @ behave differently in Drupal 8 than it did in Drupal 7 (and in such a way that if you blindly upgraded your code without changing it, you could wind up with security issues), while at the same time introducing a new placeholder for Drupal 8 that behaves the exact same way @ did in Drupal 7... yeah, that scares me a bit :)

One thing worth pointing out here, the documentation for @ in Drupal 8 really needs to be fixed here. It is barely different than Drupal 7, but it's doing something very different underneath.

Current documentation (which in fact is just wrong):

   *   - @variable: Escaped to HTML using self::escape(). Use this as the
   *     default choice for anything displayed on a page on the site.

Possible new documentation (for whatever placeholder is used for HTML, assuming it's still @variable below):

    * - @variable: Displayed as HTML, with HTML sanitization performed
    *    before the string is passed in (see @link XYZ @endlink for details).
    *    If an unsanitized string is passed in, it is assumed to be plain text.
    *    Use this as the default choice for anything displayed on the site that
    *    can potentially contain HTML formatting. Do not use this for HTML
    *    attributes (which must always be plain text for security reasons) or
    *    in any other situation where you need to guarantee that the variable
    *    will be output as plain text; instead, use X or Y.
catch’s picture

@David yes I think that's unfortunate that the behaviour switches around like this, but removing the SafeMarkup::isSafe() check would be tricky since a fair bit of code relies on it.

Although while it currently contradicts the other criticals we have, another option would be to move the isSafe() check to !placeholder, and remove it from @.

The ! can be used for literal HTML strings (only if you mark them as safe first) and @ always escapes again. And the new URL placeholder stays as is.

That still gets us an extra placeholder in the end, it's just that it's the one used for HTML strings, not for attributes. With HTML strings being the more common (but still not exactly encouraged) case that might work, but really not sure.

And yes the docs around this really need an update.

David_Rothstein’s picture

Although while it currently contradicts the other criticals we have, another option would be to move the isSafe() check to !placeholder, and remove it from @.

That could work, although the reason I was suggesting a new placeholder above (rather than using !) is that it seems the decision elsewhere was already made to remove !placeholder ... and if we moved the new behavior to !placeholder then it would behave inconsistently between D7 and D8 also (although at least in a way that is more secure in D8 than D7, unlike @ which is currently less secure).

Bottom line:

  1. If this discussion were happening a year or two ago, I think it's a no-brainer that we'd have @placeholder behave the same between D7 and D8, and use a different placeholder for the new behavior being introduced in D8.
  2. At this point, that's a disruptive change, so we could instead keep @placeholder as is (less secure in D8 than D7) and a different placeholder for the old D7 behavior.

Seems like that's the decision that needs to be made.

It's also worth pointing out the effect of all this on %placeholder. Right now in Drupal 8 you can put HTML in %placeholder (unlike Drupal 7), which is nice -- but it can also produce invalid HTML somewhat easily, since not all HTML tags are allowed to be wrapped in <em>.

effulgentsia’s picture

we could instead keep @placeholder as is (less secure in D8 than D7) and a different placeholder for the old D7 behavior

In order to evaluate how good or bad this decision would be, I gathered some data on the prevalence of @placeholder usage in contrib (for 7.x), by downloading them all. Although that project lists ~12K modules, only ~8K downloaded successfully for me. From those 8K modules:

> grep -r --include="*.php" --include="*.module" --include="*.inc" "'@" * | wc -l
38171

In other words, ~38 thousand lines of code that use an @ placeholder somewhere (the single quote matches where you'd be populating $args).

> grep -r --include="*.php" --include="*.module" --include="*.inc" '="@' * | wc -l
4702

In other words, ~5 thousand lines of code where an @ placeholder is used within an attribute value. However, the vast majority of these are "href" attributes, which can be filtered out by:

> grep -r --include="*.php" --include="*.module" --include="*.inc" '[^f]="@' * | wc -l
49

In other words, only ~50 lines of code (among 8000 modules) that use an @ placeholder for some value of an attribute name that doesn't end in "f" (the only HTML attribute that ends in "f" is "href"). And 10 of those 50 are the title="@ago" use case in the l10n_server module per #15.

So picking a new placeholder prefix for the 0.1% case (non-href attributes compared to all usages of @) seems pretty low risk to me.

If there's no global registry, yes, that removes the chance of accidental collisions (due to interference from code that runs during an unrelated part of the request). But does it remove the problem entirely?

No, not entirely. It removes accidental collisions from unrelated variables, but this issue summary's security issue would still be triggered in situations where an @ placeholder is used within an attribute that's within the t() string, if the value itself comes from a function that marked it as safe while leaving HTML in it. Per the issue summary, that would include things like using a formatted text field's "processed" value in a "title" attribute. But most of the 49 cases above (including the @ago ones) don't do that, which further reduces the security risk from modules failing to update those cases to the new prefix.

David_Rothstein’s picture

Interesting data, though keep in mind my point in #15 that a lot of Drupal 7 code like <a title="' . check_plain($variable) . '"> and <div data-whatever="' . check_plain($variable) . '"> will get converted to SafeMarkup::format() for Drupal 8 too. The obvious choice is to use a @ placeholder, but currently that would be wrong.

catch’s picture

Status: Needs work » Needs review
StatusFileSize
new2.19 KB

Discussed this with @alexpott, @plach and @pwolanin at the sprint.

We think the answer to this, at least until an actual contrib or custom module needs to do this in a way which can not be handled via other options, is to say that we explicitly do not support non-URL attributes in SafeMarkup::format().

That doesn't lock us into adding yet-another-placeholder-type now, and it's possible to do it later if we really needed to.

Here's a patch to clean-up the current documentation and make it more unequivocal about the restriction. We can't get this 100% consistent until !placeholder is used, but I tried to make the rest more internally consistent at least.

catch’s picture

Title: Decide if and how to support non-URL attribute values in t() and SafeMarkup::format() placeholders » Document that non-URL attribute values in t() and SafeMarkup::format() are not supported

Status: Needs review » Needs work

The last submitted patch, 36: 2570431-36.patch, failed testing.

David_Rothstein’s picture

I disagree that this is a sufficient fix from a security perspective.

If we're not adding a new placeholder, the absolute bare minimum is to fix the @placeholder documentation as described earlier, and then also explicitly tell people what they can do if they do need an attribute value that doesn't work with :placeholder -- which as far as I can see, would basically be something like this:

SafeMarkup::format('<a title="@title">', array(
  '@title' => SafeString::create(Html::escape($message)),
);

Quite ugly, but not really any different from existing code (like #21.2 above).

effulgentsia’s picture

If we do add docs for #39, then I think we should point people towards options 1 or 2 from https://www.drupal.org/node/2296163 (i.e., something that uses a render array), not option 3 (SafeMarkup::format()). For example, if you're creating an <a> element (#39), shouldn't you be doing it with a '#type' => 'link' render array? If you're doing something for which there is no existing render element type, you can make a Twig file or inline Twig template. However, to document an example of doing it that way, we need #2569041: Figure out what to do about attribute filtering in Twig answered. The answer might be: <a title="{{ title|e('html_attr') }}">...</a>, but we won't know if that's really the answer until that issue's resolved.

But in short, if you're doing something not supported by SafeMarkup::format(), then don't use SafeMarkup::format() :)

plach’s picture

Status: Needs work » Needs review
stefan.r’s picture

Title: Document that non-URL attribute values in t() and SafeMarkup::format() are not supported » Document that non-"href"/"src" attribute values in t() and SafeMarkup::format() are not supported

As to the vague reference to "URL attribute values", we discussed this with @joelpittet and @alexpott at Drupalcon BCN and we can actually be explicit about what we support: just "src" and "href" for now. Any other use cases we can deal with as they come up - if developers use any other attribute, they likely shouldn't be using t() or SafeMarkup::format() in the first place.

stefan.r’s picture

StatusFileSize
new3.5 KB
stefan.r’s picture

StatusFileSize
new3 KB
new4.64 KB
catch’s picture

#39 the what you can do instead can be in the change notice, Drupal.org handbook and/or a topic on api.Drupal.org but it shouldn't be on the docs to Safemarkup::format() or t(). We can add an @see to one or more or those places of course.

rolodmonkey’s picture

@catch: Is there already a handbook page about formatting strings in code for D8? If so, can you provide a link? I would be willing to help update an existing page.

catch’s picture

@RoloDMonkey the only page I could find so far was https://www.drupal.org/node/2489544 (Drupal 7 version here https://www.drupal.org/writing-secure-code) or for Twig there is https://api.drupal.org/api/drupal/core%21lib%21Drupal%21Core%21Render%21...

Everywhere else refers to one of these two pages including for example https://api.drupal.org/api/drupal/core%21core.api.php/group/best_practic... although that links to the 7.x version at the moment.

rolodmonkey’s picture

Maybe we should create a child issue for the documentation?

Once this issue is committed, I would modify this page:

https://www.drupal.org/node/2489544

And probably link to:

- https://api.drupal.org/api/drupal/core%21includes%21common.inc/group/san...
- https://www.drupal.org/node/2357633

effulgentsia’s picture

Assigned: Unassigned » effulgentsia
Status: Needs review » Needs work

I'd like to make some changes to #44, so assigning to myself.

If there are any handbook pages, etc. that people want to edit in the meantime, go ahead, but it also might make sense to wait until this lands first.

effulgentsia’s picture

Assigned: effulgentsia » Unassigned
Status: Needs work » Needs review
StatusFileSize
new7.16 KB
new5.02 KB

Mostly I wanted to remove the reference to "src", because I think it's confusing to privilege "src" but not other, more common, attributes such as "title". So I tried to rewrite this in a way that made "href" the only special attribute.

I'm 9 hours behind Barcelona, so whoever works on this next, feel free to cherry pick between #44 and this and/or continue to tweak either until it's RTBCable.

dawehner’s picture

  1. +++ b/core/lib/Drupal/Component/Utility/SafeMarkup.php
    @@ -168,12 +168,25 @@
    +   * addition to formatting it. Exception messages, messages for test
    +   * assertions, and variables concatenated without the insertion of
    +   * language-specific words or punctuation are some examples where translation
    +   * is not applicable and calling this function directly is appropriate.
    

    Note: We killed it entirely from exception messages. Exceptions are escaped on render time, which is not when they are created. It feels like this function should at least give pointers where to read more about it. Ideally there would be a help topic in an .api.php file giving an overview of the entire problem space, so people can get started with something.

  2. +++ b/core/lib/Drupal/Component/Utility/SafeMarkup.php
    @@ -168,12 +168,25 @@
    +   * rather than this function. The "href" attribute is supported via the
    

    What about making a new paragraph here after "this function." Its a different type of documentation afterwards

  3. +++ b/core/lib/Drupal/Component/Utility/SafeMarkup.php
    @@ -195,8 +208,8 @@
    -   *     this when using the "src" or "href" attributes, ensuring the value
    -   *     is always wrapped in quotes:
    +   *     this when using the "href" attribute, ensuring the value is always
    +   *     wrapped in quotes:
    

    IMHO it is odd to implicit not support images which are really helpful for documentation purposes.

effulgentsia’s picture

IMHO it is odd to implicit not support images which are really helpful for documentation purposes.

Interesting. I've never encountered images inside of t()/format_string() strings for projects I've worked on. What are examples where you have? If it really is a common need, then I'm open to explicitly ok'ing it in the docs, but my concern was if it's rare but explicitly documented as supported, then the line between where SafeMarkup::format() is appropriate and where Twig is more appropriate is harder to communicate.

dawehner’s picture

This is not strictly a case where it was used, but when we still had advanced help in views (Drupal 6/7) we had quite a couple
of images to help people guide on using Drupal. After your question, I also question my answer from before to be honest :) hook_help() is honestly more like a glorified
README. If you want to provide proper documentation you would using a different system anyway.

stefan.r’s picture

given we can't really think of valid use cases I'd be fine with just removing src for now

David_Rothstein’s picture

Title: Document that non-"href"/"src" attribute values in t() and SafeMarkup::format() are not supported » Document that certain (non-"href"/"src") attribute values in t() and SafeMarkup::format() are not supported and may be insecure

The main point here isn't what's supported, it's what's secure. This is a simple string-building function, and people will do whatever they want and whatever looks easiest, regardless of whether it is "supported" or not. But if you make clear that something is insecure, at least they might listen :)

I don't have time to work on this now (maybe later in the week). The patch doesn't look bad, and it's fine to recommend Twig templates, but it doesn't make clear the issues with @ and we still need something along the lines of #31/#39 that directly explains on the @ documentation that it produces HTML, that there can be a security issue if you put HTML in the wrong place, and what to do if you need to guarantee plain text. We don't necessarily need to give an example using an attribute specifically (it can be inferred anyway) and perhaps an example belongs more in the @see entirely (depends on how the wording comes out), but it's a pretty important point.

In terms of what page to link to, there is kind of a mess - there actually seems to be a third in addition to the two linked above, and only the first says anything about Drupal 8:
https://www.drupal.org/node/2489544
https://www.drupal.org/writing-secure-code
https://www.drupal.org/node/28984

For this:

+   * inconvenient. The other placeholder types are unsafe to use within any
+   * attribute and the ":variable" placeholder type may or may not be safe and
+   * may or may not produce the expected output when used for attributes other
+   * than "href".

What does "the :variable placeholder type may or may not be safe" mean... when would it be unsafe?

stefan.r’s picture

@David_Rothstein "style" and "on*" attributes are an example.

I'll take a stab at addressing the concern in #55 in another patch.

stefan.r’s picture

StatusFileSize
new3.22 KB
new7.95 KB

added more warnings about security and took out this bit so as to not repeat ourselves too much, and the last bit about unexpected output would rather belong on the :variable bullet point itself:

The other placeholder types are unsafe to use within any attribute and the ":variable" placeholder type may or may not be safe and may or may not produce the expected output when used for attributes other than "href".

lauriii’s picture

I was just wondering, shouldn't we include also documentation for that @ escaped strings are not safe when they are being printed in CSS or Javascript.

stefan.r’s picture

StatusFileSize
new8.01 KB
new2.29 KB
new8.01 KB
effulgentsia’s picture

Title: Document that certain (non-"href"/"src") attribute values in t() and SafeMarkup::format() are not supported and may be insecure » Document that certain (non-"href") attribute values in t() and SafeMarkup::format() are not supported and may be insecure
Assigned: Unassigned » David_Rothstein

@David_Rothstein: what do you think of #59?

stefan.r’s picture

  1. +++ b/core/lib/Drupal/Component/Utility/PlaceholderTrait.php
    @@ -33,7 +33,11 @@ protected static function placeholderFormat($string, array $args, &$safe = TRUE)
         foreach ($args as $key => $value) {
    
    @@ -49,11 +55,17 @@ protected static function placeholderFormat($string, array $args, &$safe = TRUE)
    +          // This placeholder type is for URLs and URLs are not markup, so we
    

    let's mention href here rather than URLs as well.

  2. +++ b/core/lib/Drupal/Component/Utility/PlaceholderTrait.php
    @@ -49,11 +55,17 @@ protected static function placeholderFormat($string, array $args, &$safe = TRUE)
    +          // (...) This placeholder type may therefore be used
    +          // within "href" attributes. (...)
    

    so that this becomes redundant, as href is the only possible use case

I'm also aware that not all David_Rothstein's concerns have been fully addressed yet, as we currently only merely mention Twig as a possible alternative if more extensive templating / markup use is needed. But while we may use other patterns in core at times, I think that really is the only thing we can actually recommend at this stage. The (simple and formatted) plain-text output formatters (#2509218: Ensure that SafeString objects can be used in non-HTML contexts) might be another possible recommendation as we finish those. If we still think this is incomplete I'd be interested the use cases for guaranteed plain text we may have missed.

We also need to improve the linked documentation in #55 by the time we finish the safe markup issues, right now linking to that in a @see feels a bit futile.

catch’s picture

StatusFileSize
new7.93 KB

I'm also aware that not all David_Rothstein's concerns have been fully addressed yet, as we currently only merely mention Twig as a possible alternative if more extensive templating / markup use is needed. But while we may use other patterns in core at times, I think that really is the only thing we can actually recommend at this stage.

Fully agreed with this.

Interdiff fail, here's the hunk I updated though:

+          // Escape if not already safe. SafeMarkup::isSafe() may return TRUE
+          // for content that is safe within HTML fragments, but not within
+          // other contexts, so this placeholder type must not be used within
+          // HTML attributes, JavaScript or CSS.
+          // @see \Drupal\Component\Utility\SafeMarkup::format()

joelpittet’s picture

Status: Needs review » Reviewed & tested by the community

I've read through these changes a few times and I think this is a great improvement on these docs.

It seems like others are mostly in agreement to this being an improvement and maybe if you disagree with the RTBC can point out the pieces that you agree with and we can commit just those to move forward by moving the contentious bit out or open a follow-up to add further details regarding security?

dawehner’s picture

Looks good for me. We still need a general overview I think, but at least this is something fine.

stefan.r’s picture

This merely changes docs - it seems this is pending on foma; feedback but if this doesn't materialize it shouldn't block commit as we can always change docs if there's big problems with them.

David_Rothstein’s picture

Assigned: David_Rothstein » Unassigned
Status: Reviewed & tested by the community » Needs review
StatusFileSize
new8.98 KB
new6.83 KB

How about this.

I replaced the "may be insecure" wording with direct statements about security risks. And clarified a few other things as well, especially the actual behavior of @variable (since that's the root cause of the confusion here).

I gave up on putting a @see in because I couldn't find any drupal.org documentation that looked ready enough to link to. However, this part which I added:

+   *       ....................................... The recommended method
+   *       to sanitize a variable to plain text is to use a render array:
+   *       @code
+   *       $build = ['#plain_text' => $text_to_be_sanitized];
+   *       $variable = \Drupal::service('renderer')->renderPlain($build);
...

looks like too much detail - it would be great to link somewhere else for that code example instead. Ideally that link would go to somewhere else in the D8 codebase (not drupal.org), but this functionality does not seem to be directly documented in D8 and I'm not sure where to put it. Also, that code is kind of crazy... but I don't see another recommended way to generate plain text and mark it as safe.

We may need to mark #2569041: Figure out what to do about attribute filtering in Twig as critical, since most of what we're doing here is passing the problem off to that issue.

stefan.r’s picture

#66 looks like an improvement. The bit with the render array example is a bit detailed but not excessively so.

Could we clean up/update/merge these pages and link to the resulting page whenever it's up-to-date?

https://www.drupal.org/node/2489544
https://www.drupal.org/writing-secure-code
https://www.drupal.org/node/28984

xjm’s picture

+++ b/core/lib/Drupal/Component/Utility/SafeMarkup.php
@@ -168,38 +168,62 @@ public static function checkPlain($text) {
+   *     - If you need to guarantee that a variable will be output as plain
+   *       text, do not rely on the fallback sanitization; instead, perform the
+   *       sanitization yourself before passing it in. The recommended method
+   *       to sanitize a variable to plain text is to use a render array:
+   *       @code
+   *       $build = ['#plain_text' => $text_to_be_sanitized];
+   *       $variable = \Drupal::service('renderer')->renderPlain($build);
+   *       $output = SafeMarkup::format('This is plain text: @variable', ['@variable' => $variable]);
+   *       @endcode

So I think this entire bullet should be removed. It frightens people unnecessarily into making bad choices and provides a code sample that is bad practice or an indicator of code smell to begin with. With the sample code, there is no reason for SafeMarkup::format() to have been used at all.

What we should be telling people is SafeMarkup:format() is not an API for writing markup.

Furthermore, following #2559971: Make SafeMarkup::format() return a safe string object to remove reliance on a static, unpredictable safe list, the only way that @placeholder would already be marked safe is if it were a SafeStringInterface object itself.

heine’s picture

StatusFileSize
new8.4 KB

Attaching reroll to remove the bullet point. Comment intro clearly indicates that markup building is not the goal of the function, leaving the rest as is.

David_Rothstein’s picture

With the sample code, there is no reason for SafeMarkup::format() to have been used at all.

Why? There is no HTML in the string, so it looks like the right thing to use.

A better example would be drupal_set_message(t('This is what you typed: @value')), and we had a few examples above where forcing those to plain text might still be desired even given the other issues (you can't always guarantee that no one filtered a variable before you got it).

David_Rothstein’s picture

However, it is maybe not that common to actually need to guarantee it, and this is sort of covered by the sentence above it ("with HTML sanitization performed by the caller before the string is passed in"). I think if we did link that to one of the documentation pages mentioned above, and that page actually explained how to sanitize strings (either filter XSS or convert to plain text) in a way that also marks them as safe the bullet point could be removed as it would be covered by the documentation page instead.

+   *     following restrictions:
+   *     - Never use this within HTML attributes, JavaScript, or CSS. Doing so
+   *       is a security risk.

If there's only one bullet point, it shouldn't be a bulleted list :)

stefan.r’s picture

I think if we did link that to one of the documentation pages mentioned above, and that page actually explained how to sanitize strings (either filter XSS or convert to plain text) in a way that also marks them as safe the bullet point could be removed as it would be covered by the documentation page instead.

For filter XSS it'd be renderPlain(['#markup'] => some html).. but I don't know that this is a pattern we should be encouraging - we just said that Safemarkup::format() ought to be only for simple use cases any more, for any other ones Twig can be used.

For convert to plain text we can run PlainTextOutput::renderFromHtml() on the output of t()/ SafeMarkup::format(). During discussions at Drupalcon BCN there was consensus on making both the input and the output of t() / SafeMarkup::format() be HTML markup. Then we can turn that into plain text using PlainTextOutput::renderFromHtml() (see https://www.drupal.org/node/2574697)

@David_rothstein I'd be interested in seeing the use cases we're trying to cater to here, that couldn't be covered by Twig?

David_Rothstein’s picture

We said that developers shouldn't be passing in complex HTML themselves (e.g., t('This link to <span class="whatever"><a href="http://example.com" class="something" title="something else">@variable</a></span> has too many HTML tags and attributes')), not that @variable shouldn't contain HTML. Those are different things. See my comment in #21 for several examples where core is currently passing HTML in to a variable placeholder, and @effulgentsia's comment in #28 for some other reasonable-looking cases.

But I actually agree with you that it's not a very common situation. (If you look through a Drupal 7 codebase for examples that look like t('...!variable...', array('!variable' => filter_xss[_admin]($variable))), you won't find much.) That is why earlier in this issue, I was suggesting that Drupal 8 should have some placeholder that (like @variable in Drupal 7) simply escapes the variable to plain text unconditionally -- I think it meets a need and solves the security problem here at the same time. But people don't seem to want to do that, and so now we are trying to document what we currently have instead...

stefan.r’s picture

StatusFileSize
new1008 bytes
new8.33 KB

Removed the bullet point

The last submitted patch, 74: interdiff-69-74.patch, failed testing.

stefan.r’s picture

@David_Rothstein yes the @variable placeholder can perfectly contain simple HTML that's already marked safe. Discussed this with @xjm and she will look at updating all these docs as we finish up the work on the sanitization and in any case she said this would be out of scope for this issue.

Right now I think the advice is to use SafeMarkup::format() for simple use cases and otherwise Twig, and for outputting plain text to use PlainTextOutput::renderFromHtml().

For the case of attributes - they're problematic because we still have the safe list. As we get rid of it we can keep discouraging usage of @ inside HTML attributes, but say "if you really want to do it anyway, do t('<a title="@title" href=":href">', ['@title' => (string) $title];

dawehner’s picture

dawehner’s picture

xjm’s picture

Status: Needs review » Needs work

Discussed with @dawehner and @stefan.r; couple points:

  1. +++ b/core/lib/Drupal/Component/Utility/SafeMarkup.php
    @@ -168,38 +168,51 @@ public static function checkPlain($text) {
    +   * addition to formatting it. Exception messages, messages for test
    +   * assertions, and variables concatenated without the insertion of
    +   * language-specific words or punctuation are some examples where translation
    +   * is not applicable and calling this function directly is appropriate.
    

    I actually disagree with this. We should actually not use this for exception messages and we should try to get to a point where we don't do it in test assertion messages either. (@todo: Me to find the related issues or ask @alexpott.)

  2. +++ b/core/lib/Drupal/Component/Utility/SafeMarkup.php
    @@ -168,38 +168,51 @@ public static function checkPlain($text) {
    +   * - Variable placeholders should not be used within HTML attributes; they
    

    It's actually much bigger than just inside attributes, it's anything within the HTML tag (between the < and the > for an HTML tag).

  3. +++ b/core/lib/Drupal/Component/Utility/SafeMarkup.php
    @@ -168,38 +168,51 @@ public static function checkPlain($text) {
    +   *   - Secure: <a href=":variable">link text</a>
    

    Let's add a second secure example after this to make it clear that hardcoding a static attribute is fine. E.g.:
    - Secure: <a href=":variable" title="My hardcoded title attriubte">link text</a>

  4. +++ b/core/lib/Drupal/Component/Utility/SafeMarkup.php
    @@ -168,38 +168,51 @@ public static function checkPlain($text) {
    +   *   - Insecure: <a href=":variable" title="@variable">link text</a>
    

    To cover the examples of how it is unsafe to use placeholders inside an HTML tag, we could also add:

    -Insecure: <@tag>This is very unsafe</@tag>
    - Insecure: <a @tag>This is also pretty unsafe</a>
    

  5. +++ b/core/lib/Drupal/Core/Template/Attribute.php
    @@ -41,7 +41,7 @@
    - * input as a value for an attribute that expects an URI (href, src, ...),
    + * input as a value for an attribute that expects a URI (href, src, ...),
    

    Technically not in scope. :P

dawehner’s picture

Points:

  • We should add the <a href=":variable" title="This is a static string case
  • should we explain why putting it into attributes is unsafe?
  • Exception messages should NOT use it, there is simply no reason for it. Tests feel similar but I think we settled on that. For exceptions we converted all of them though.
  • minimal vs. trivial is a bit confusing

The last submitted patch, 74: interdiff-69-74.patch, failed testing.

The last submitted patch, 74: 2570431-74.patch, failed testing.

xjm’s picture

stefan.r’s picture

Status: Needs work » Needs review
StatusFileSize
new11.29 KB
new9.8 KB
xjm’s picture

For reference: #2514044: Do not use SafeMarkup::format in exceptions is the issue where we removed all SafeMarkup::format() use from exception messages.

For the test assertion messages, @alexpott pointed out that:

  1. We should not document things about the testing framework in runtime code.
  2. Each test assertion message already suggests using SafeMarkup::format() for the message (since 2012 I think when we removed t() from them)
  3. In many places, tests use SafeMarkup::format() to provide HTML formatting and single escaping with <pre> tags and so forth for presentation in the SimpleTest UI. The result gets saved to the database and then the final output is XSS-filtered. So it's not feasible to change this in 8.0.x and there is no issue for it yet, but improvements could go into future minor versions.

Thence @stefan.r removed those examples from the latest patch.

xjm’s picture

Assigned: Unassigned » xjm
+++ b/core/lib/Drupal/Component/Utility/FormattableString.php
@@ -17,21 +17,35 @@
+ *   - Insecure: <@variable>text</@variable>
+ *   - Insecure: <a href=":variable" title="@variable">link text</a>
+ *   Only the "href" attribute is supported via the special ":variable"
+ *   placeholder, to allow simple links to be inserted:
+ *   - Secure: <a href=":variable">link text</a>
+ *   - Secure: <a href=":variable" title="static text">link text</a>
+ *   - Insecure: <a @variable>link text</a>
+ *   - Insecure: <a href="@variable">link text</a>

The order here is a mite confusing. I tried to explain in a comment but I think it'll be easier just to upload an updated example, so doing that now. :)

xjm’s picture

Assigned: xjm » Unassigned
StatusFileSize
new2.65 KB
new2.65 KB

So these are the updates I made. However, I then started reading PlaceholderTrait and realized the docs now half overlap and half don't between the string object and the trait. It would probably be better to consolidate the docs a bit, as we previously worked to consolidate the main warnings about sanitization on t().

But my brain isn't up for that atm, so here are the edits I made. I simplified some language and added some clarifications.

xjm’s picture

StatusFileSize
new10.54 KB

Fail.

The last submitted patch, 87: placeholders-2570431-87.patch, failed testing.

The last submitted patch, 87: placeholders-2570431-87.patch, failed testing.

stefan.r’s picture

interdiff-84-88?

xjm’s picture

  1. +++ b/core/lib/Drupal/Component/Utility/FormattableString.php
    @@ -10,28 +10,44 @@
    + * This class is designed for formatting messages that are mostly text,
    + * not as an HTML template language. As such:
    + * - The passed-in string should contain minimal HTML.
    ...
    + * To build HTML that cannot meet the above restrictions, use an HTML template
    + * language, such as Twig, rather than this function.
    

    These bits are still relevant to this class if rearranged.

  2. +++ b/core/lib/Drupal/Component/Utility/FormattableString.php
    @@ -10,28 +10,44 @@
    + * - Variable placeholders should not be used within the "<" and ">" of an
    + *   HTML tag, such as in HTML attribute values, because this would be a
    + *   security risk. Examples:
    + *   - Insecure: <@variable>text</@variable>
    + *   - Insecure: <a @variable>link text</a>
    + *   - Insecure: <a title="@variable">link text</a>
    + *   Only the "href" attribute is supported via the special ":variable"
    + *   placeholder, to allow simple links to be inserted:
    + *   - Secure: <a href=":variable">link text</a>
    + *   - Secure: <a href=":variable" title="static text">link text</a>
    + *   - Insecure: <a href="@variable">link text</a>
    + *   - Insecure: <a href=":variable" title="@variable">link text</a>
    

    So this whole section is more about how to use placeholders, so I think it should be on PlaceholderTrait. That would also allow us to consolidate the docs a bit.

  3. +++ b/core/lib/Drupal/Component/Utility/FormattableString.php
    @@ -10,28 +10,44 @@
    + * See the documentation of
    + * \Drupal\Component\Utility\PlaceholderTrait::placeholderFormat() for
    + * details on the supported placeholders.
    

    This hunk should probably be moved up and added to the first paragraph, since PlaceholderTrait is the thing that actually does all this stuff and has the docs on how exactly it works. And actually, I'd also add the words "...and how to use them securely. Incorrect use of this function class can result in security vulnerabilities."

xjm’s picture

@stefan.r, the interdiff is the one in #87, I just happened to upload it as the patch as well. :P

stefan.r’s picture

I agree with thise comments, could anyone roll a patch as Im on my phone? :)

stefan.r’s picture

StatusFileSize
new3.86 KB
new11.1 KB
justachris’s picture

+++ b/core/lib/Drupal/Component/Utility/FormattableString.php
@@ -10,28 +10,34 @@
+ * This class is designed for formatting messages that are mostly text,
+ * not as an HTML template language. As such:
+ * - The passed-in string should contain minimal HTML.
+ * - The passed-in string should not be used within the "<" and ">" of an
+ *   HTML tag, such as in HTML attribute values. This would be a
+ *   security risk.

Drive-by nitpicking, but now that the number of points here are reduced, and the starts of each are repetitive, this should be something like:

...
not as an HTML template language. As such, the passed-in string:
- Should contain minimal HTML.
- Should not be used within...
dawehner’s picture

Assigned: Unassigned » yesct

I think this is ready now. We documented enough, its verbose enough but its not too verbose.
The FormattableString | TranslatedString point to the placeholder trait, which has the documentation.

yesct’s picture

Assigned: yesct » Unassigned
StatusFileSize
new10.72 KB
new7.29 KB

@justAChris Thanks for reviewing this. I addressed that feedback.

I also read the entire patch,

and fixed a few nits like: "Lines containing comments (including docblocks) must wrap as close to 80 characters as possible without going over" https://www.drupal.org/node/1354#drupal and "If you use a namespace on a class anywhere in documentation, always make sure it is a fully-qualified namespace (beginning with a backslash)." https://www.drupal.org/node/1354#classes

and I took out an unrelated change, so we are touching one less file

and this documentation is really great and super helpful and clear.

dawehner’s picture

Status: Needs review » Reviewed & tested by the community

Looks perfect for me now. Thank you @yesct

yesct’s picture

oops. I found one small thing. wait I will fix it now.

+++ b/core/lib/Drupal/Component/Utility/PlaceholderTrait.php
@@ -15,33 +15,51 @@
+   * - Should not be used within the "<" and ">" of an HTML tag, such as in HTML
+   * attribute values. This would be a security risk. Examples:

missing indent.

yesct’s picture

StatusFileSize
new10.72 KB
new760 bytes

here is the indent.

dawehner’s picture

Great catch!

catch’s picture

Status: Reviewed & tested by the community » Needs work

A few bits left for me, in general this looks great though.

  1. +++ b/core/lib/Drupal/Component/Utility/PlaceholderTrait.php
    @@ -15,33 +15,51 @@
    +   *   attribute values. This would be a security risk. Examples:
    +   *   - Insecure: <@variable>text</@variable>
    +   *   - Insecure: <a @variable>link text</a>
    +   *   - Insecure: <a title="@variable">link text</a>
    

    This looks like a description of the first argument,but it's actually talking about the return value. Can we use something other than @variable, like $return? Or possibly we should use actual @code examples to make this clearer.

  2. +++ b/core/lib/Drupal/Component/Utility/PlaceholderTrait.php
    @@ -15,33 +15,51 @@
    +   * To build HTML that cannot meet these restrictions, use an HTML template
    +   * language such as Twig, rather than this trait.
    

    We don't want to use the results of a Twig template as an HTML attribute either much, which is what much of the previous docs were talking about.

  3. +++ b/core/lib/Drupal/Component/Utility/PlaceholderTrait.php
    @@ -93,11 +120,16 @@ protected static function placeholderFormat($string, array $args, &$safe = TRUE)
    +          // invoke Html::decodeEntities() on it prior to passing it in as a
    

    Should we point to PlainTextOutput::renderFromHtml() now?

alexpott’s picture

+++ b/core/lib/Drupal/Component/Utility/PlaceholderTrait.php
@@ -15,33 +15,51 @@
+   *     caller before the string is passed in. If an unsanitized string (that
+   *     has never been marked as HTML-safe) is passed in, it will be
+   *     automatically sanitized using \Drupal\Component\Utility\Html::escape().
+   *     Use this placeholder as the default choice for anything displayed on
+   *     the site, but not within HTML attributes, JavaScript, or CSS. Doing so
+   *     is a security risk.

This feels like it written for when we have static safe list. If the placeholder is a string it will escaped - if it is an object that implements SafeStringInterface it will not be. If you want to ensure that escaping occurs you can cast to string. This becomes even more true after #2575615: Introduce HtmlEscapedText and remove SafeMarkup::setMultiple() and SafeMarkup::getAll() and remove the static safeStrings list

yesct’s picture

I will work on the feedback from #103 and #104

yesct’s picture

going to assume #2571695: Remove !placeholder and unsafe string return from SafeMarkup::format() will go in first, so ... not focusing on docs related to things that will be removing. (I am still working on the feedback for things that will stay.)

yesct’s picture

Status: Needs work » Needs review
StatusFileSize
new12.3 KB
new5.52 KB

partial work on #104

I have some questions and concerns still.

and have not started on @catch's concerns yet. (I will)

yesct’s picture

dawehner’s picture

We should certainly add some code examples, but beside from that and the third bullet point wiht some odd sentence strucutre.

yesct’s picture

@catch please clarify,
#103 point 2.

We don't want to use the results of a Twig template as an HTML attribute either much, which is what much of the previous docs were talking about.

Related #2569041: Figure out what to do about attribute filtering in Twig

Maybe a todo in the code to update the docs when that figures out what to do?

catch’s picture

The comments both talk about using attributes within t() and using the result of t() within attributes. Gets a bit confusing which it's talking about sometimes.

yesct’s picture

two candidate links for referencing which attributes take url value
http://stackoverflow.com/questions/2725156/complete-list-of-html-tag-att...
and what it references
http://www.w3.org/TR/REC-html40/index/attributes.html

yesct’s picture

Status: Needs review » Needs work
StatusFileSize
new13.16 KB
new4.18 KB

I'm going to dinner, so posting this. (if someone picks this up to work on it, please post a comment when you start.)

It will need to be redone to apply to head since #2571695: Remove !placeholder and unsafe string return from SafeMarkup::format() went in.

then very much should do #103 point 1.

yesct’s picture

I will work on this while on the plane.

stefan.r’s picture

Status: Needs work » Needs review
StatusFileSize
new0 bytes

Reroll with #103.1 addressed. Removed references to the list in #122 as later on in this issue it was decided to support the href attribute only.

stefan.r’s picture

StatusFileSize
new13.06 KB

YesCT sorry didn't see that!

The last submitted patch, 115: 2570431-114.patch, failed testing.

The last submitted patch, 115: 2570431-114.patch, failed testing.

Status: Needs review » Needs work

The last submitted patch, 116: 2570431-114.patch, failed testing.

stefan.r’s picture

Status: Needs work » Needs review

Testbot acting up on a docs only patch

The last submitted patch, 113: 2570431.113.patch, failed testing.

The last submitted patch, 113: 2570431.113.patch, failed testing.

catch’s picture

+++ b/core/lib/Drupal/Component/Utility/PlaceholderTrait.php
@@ -15,35 +15,86 @@
+   * This trait is designed for formatting messages that are mostly text, not as
+   * an HTML template language. As such, the passed-in string:
+   * - Should contain minimal HTML.
+   * - Should not be used within the "<" and ">" of an HTML tag, such as in HTML
+   *   attribute values. This would be a security risk. Examples:
+   *   @code

Sorry for the not very clear comment earlier.

So with this, it's conflating the 'passed in string' - i.e. first argument to t(). This should contain minimal HTML.

However 'Should not be used within the "<" and ">" of an HTML tag - this is not referring to the string argument, but the return value of __toString (i.e. once arguments are replaced).

We now have a way to support using the return value in attributes via PlainTextOutput::renderFromHtml() and directly in https://api.drupal.org/api/drupal/core%21lib%21Drupal%21Core%21Template%...

This is the bit that I think is still confusing. Very important to differentiate between the first argument and the return value and ideally keep the docs about each as far away from each other as possible.

David_Rothstein’s picture

+   *   - @variable: Use when the return value is to be used as HTML. When the
+   *     placeholder replacement value is:
+   *     - A string, it will be sanitized using
+   *       \Drupal\Component\Utility\Html::escape().
+   *     - A SafeStringInterface object, the object will be cast to a string and
+   *       not sanitized.
+   *     - A SafeStringInterface object cast to a string, forces sanitization.
+   *       Since it is not known what method a SafeStringInterface object could
+   *       have been santitized with, to force sanitization to happen, the
+   *       caller method could cast the object as a string before passing it in
+   *       as the replacement value.
+   *     Use this placeholder as the default choice for anything displayed on
+   *     the site, but not within HTML attributes, JavaScript, or CSS. Doing so
+   *     is a security risk.

This doesn't seem to match the current behavior of Drupal 8 - SafeMarkup::isSafe() still can return TRUE and bypass sanitization when a string is passed in. #2575615: Introduce HtmlEscapedText and remove SafeMarkup::setMultiple() and SafeMarkup::getAll() and remove the static safeStrings list would make it true.

I think it's also a bit too technical compared to the earlier patches, and it's missing one of the key points (if you want to display a string as HTML, you're responsible for filtering XSS before passing it in or else you'll just get plain text instead). The caller cares more about what they need to do (call XYZ sanitization functions if they are passing in HTML, or do XYZ if they are passing in plain text) than the details of SafeStringInterface.

Perhaps this issue should be postponed on #2575615: Introduce HtmlEscapedText and remove SafeMarkup::setMultiple() and SafeMarkup::getAll() and remove the static safeStrings list (?), but without that issue, really the only change we need to make to the @variable documentation in the earlier patches (from #101 and earlier) is to change things like "...string is passed in..." to "...value is passed in...".

David_Rothstein’s picture

To address @catch's comment in #123, could we also go back to something more like the previous wording:

This trait is designed for formatting messages that are mostly text, not as an HTML template language. As such:

  • The passed-in string should contain minimal HTML.
  • Variable placeholders should not be used within the "<" and ">" of an HTML tag. This would be a security risk. Examples:

...

That was actually the original intention - to talk about how variable placeholders should be used. What can be done with the return value is kind of a different question.

David_Rothstein’s picture

+   *     // Insecure (the "@variable" placeholder does not filter dangerous
+   *     // attributes):
+   *     $this->placeholderFormat('<a href="@variable">link text</a>', [':variable' => $variable]);

This example isn't insecure, just broken :) Replace ":variable" with "@variable" to make it insecure.

Also "filter dangerous attributes" should probably be more like "filter dangerous URLs" (or "protocols").

stefan.r’s picture

I think #125 makes sense - and in #126 I did mean to say protocols, and @variable :)

catch’s picture

Status: Needs review » Needs work

#125 changes sound good and nice catch with #126.

stefan.r’s picture

StatusFileSize
new4.38 KB
new13 KB

+ * - @variable: Use when the return value is to be used as HTML. When the

The return value of SafeMarkup::format() is always assumed to be used as HTML - for other contexts we have OutputStrategyInterface::renderFromHtml()... or is this talking about some other return value?

stefan.r’s picture

Status: Needs work » Needs review
stefan.r’s picture

StatusFileSize
new4.51 KB
new13.08 KB

Actually the comment in #129 applies to the %variable documentation as well.

catch’s picture

+++ b/core/lib/Drupal/Component/Utility/FormattableString.php
@@ -10,28 +10,33 @@
+ *   attribute values. This would be a security risk.

This still talks about the passed-in string where it really means the return value. Should we just drop this whole section of docs and leave it to the @see PlaceholderTrait?

catch’s picture

For me that's the last thing blocking commit (although I've also said that before in this issue...). I think we should get what's here in, then iterate further either as part of #2575551: Document the Drupal 8 sanitization API in the API docs and on Drupal.org, or after that issue is fixed if we need to.

stefan.r’s picture

StatusFileSize
new13.09 KB
new1.12 KB

Copied that bit from PlaceholderTrait just now, it's important information so may be worth it to repeat ourselves.

catch’s picture

Status: Needs review » Reviewed & tested by the community

That's the other way to do it, and makes sense until such time as when we merge in docs from traits on api.drupal.org

RTBC for me.

The last submitted patch, 129: 2570431-128.patch, failed testing.

Status: Reviewed & tested by the community » Needs work

The last submitted patch, 134: 2570431-134.patch, failed testing.

catch’s picture

Status: Needs work » Reviewed & tested by the community

/me pats PIFR on the head

catch’s picture

Issue tags: +beta target
yesct’s picture

StatusFileSize
new15.97 KB
new13.28 KB

Peter and I worked on this on the plane. I dont know what we want to do with this work.. but here it is.. in case it is useful, maybe in a follow-up

yesct’s picture

hiding mine from the summary table.

alexpott’s picture

Status: Reviewed & tested by the community » Fixed

This is a big improvement on the docs. I've also credited @joelpittet and myself since we were present and many discussions of the this issue. Committed 4d8e010 and pushed to 8.0.x. Thanks!

@catch is opening a follow up to continue to improve the docs as the API is refined.

  • alexpott committed 4d8e010 on 8.0.x
    Issue #2570431 by stefan.r, YesCT, David_Rothstein, xjm, catch,...
alexpott’s picture

Accidentally committed #140. Fixed that.

  • alexpott committed 4a8366b on 8.0.x
    Revert "Issue #2570431 by stefan.r, YesCT, David_Rothstein, xjm, catch,...
  • alexpott committed ba36dea on 8.0.x
    Issue #2570431 by stefan.r, YesCT, David_Rothstein, xjm, catch,...
yesct’s picture

@alexpott @catch which issue is the one mentioned in #142?

Is it #2578377: Make translatable docs consolidated and better for developers ?

I'm gonna work on some of the changes from #140 and see if they might still be appropriate.

yesct’s picture

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.