Problem/Motivation

SafeString/SafeStringInterface describes a string that is safe for use as an HTML node/fragment. This is now (or will be soon) the return value of t() and SafeMarkup::format().

However, it is not uncommon to want to use those strings in e-mail titles or HTML attributes - such as select list options.

Drupal 7 via !placeholder allowed you to create a string with no HTML escaping - this meant that some t() calls essentially returned plain text rather than HTML, specifically for e-mail subjects but this can open sites up to XSS when they're used elsewhere.

Drupal 8 with SafeString can similarly open up sites to XSS, given that the SafeString itself won't be HTML escaped when used in an HTML attribute. The result of t() and SafeMarkup are always HTML. When you manually create a SafeString it may have been XSS fitlered or Html::escape()d and we don't know.

Then on top of this, it's easy to get double escaping.

We need to consistently treat both the first argument and return values of t() and SafeMarkup::format() as HTML, so that we can translate that HTML to other formats like plain text, retaining the original meaning of the string.

Proposed resolution

Create a simple interface with a method to for HTML to text strategy classes

This is basically just a way to document the correct way to take the HTML string returned from translation wrapper and convert it to plain text.

pros: this will work for other functions that return HTML, and is consistent with how we handle e-mail bodies. Also it would handle any markup added by translators.
cons: it's not obvious why you'd take what might potentially have been a plain text string in the first place, return it 'as HTML', then convert it back to plain text again. But we will just have to document that the "transport protocol" we use for transporting strings through drupal (whether they are HTML or not) is HTML.

Remaining tasks

Reviews

User interface changes

None

API changes

None

Data model changes

None

CommentFileSizeAuthor
#166 2509218-165.patch5.61 KBjhedstrom
#166 interdiff.txt678 bytesjhedstrom
#164 interdiff.txt2.46 KBwim leers
#164 2509218-163.patch5.81 KBwim leers
#162 Screen Shot 2015-09-25 at 14.26.52.png38.43 KBwim leers
#161 2509218-161.patch5.8 KBjhedstrom
#161 interdiff.txt797 bytesjhedstrom
#154 safe_markup-contexts-2509218-153.patch5.81 KBplach
#153 safe_markup-contexts-2509218-153.interdiff.txt2.28 KBplach
#148 safe_markup-contexts-2509218-148.patch5.88 KBplach
#148 safe_markup-contexts-2509218-148.interdiff.txt847 bytesplach
#145 2509218-145.patch5.88 KBstefan.r
#145 interdiff-144-145.txt1.55 KBstefan.r
#144 safe_markup-contexts-2509218-144.patch5.99 KBplach
#144 safe_markup-contexts-2509218-144.interdiff.txt6.7 KBplach
#132 increment.txt633 bytespwolanin
#132 2509218-132.patch9.99 KBpwolanin
#126 2509218-126.patch9.98 KBstefan.r
#126 interdiff-122-126.txt1.74 KBstefan.r
#122 2509218-122.patch10.03 KBimiksu
#122 interdiff.txt1.3 KBimiksu
#120 2509218-120.patch10.03 KBstefan.r
#120 interdiff-119-120.txt711 bytesstefan.r
#119 2509218-119.patch10.02 KBstefan.r
#119 interdiff-111-119.txt865 bytesstefan.r
#111 increment.txt3.39 KBpwolanin
#111 2509218-111.patch10.04 KBpwolanin
#108 increment.txt1.38 KBpwolanin
#108 2509218-107.patch9.56 KBpwolanin
#103 2509218-99.patch9.68 KBstefan.r
#103 interdiff-94-99.txt12.04 KBstefan.r
#101 2509218-98.patch9.64 KBstefan.r
#101 interdiff-94-98.patch11.96 KBstefan.r
#94 interdiff.txt5.17 KBdawehner
#94 2509218-94.patch9.04 KBdawehner
#88 safe_markup-contexts-2509218-88.patch8.86 KBplach
#88 safe_markup-contexts-2509218-88.interdiff.txt3.75 KBplach
#86 safe_markup-contexts-2509218-86.interdiff.txt9.4 KBplach
#86 safe_markup-contexts-2509218-86.patch7.76 KBplach
#83 interdiff.txt2.59 KBdawehner
#83 2509218-83.patch5.23 KBdawehner
#82 2509218-82.patch2.63 KBpwolanin
#80 2509218-80.patch1.67 KBpwolanin
#61 safe_markup_contexts-2509218-61.patch23.6 KBalmaudoh
#61 interdiff.txt14.2 KBalmaudoh
#59 safe_markup_contexts-2509218-56.patch14.19 KBalmaudoh
#56 safe_markup-contexts-2509218-56.review.txt14.19 KBplach
#56 safe_markup-contexts-2509218-56.patch66.5 KBplach
#54 safe_markup-contexts-2509218-54.patch57.01 KBplach
#54 safe_markup-contexts-2509218-54.review.txt15.87 KBplach
#54 safe_markup-contexts-2509218-54.interdiff.txt821 bytesplach
#51 safe_markup-contexts-2509218-49.review.txt15.6 KBplach
#49 safe_markup-contexts-2509218-49.patch56.75 KBplach
#49 safe_markup-contexts-2509218-49.interdiff.txt14.76 KBplach
#40 safe_markup-contexts-2509218-41.patch49.76 KBplach
#40 safe_markup-contexts-2509218-41.interdiff.txt8.91 KBplach
#18 interdiff.txt1.3 KBsubhojit777
#18 make_behave_like_in-2509218-18.patch18.95 KBsubhojit777
#13 interdiff.txt1.94 KBeffulgentsia
#13 2509218.13.patch14.38 KBeffulgentsia
#12 2509218.12.patch12.32 KBeffulgentsia
#8 2509218.8.patch12.32 KBalexpott
#4 interdiff.txt672 byteseffulgentsia
#4 2509218-4.patch21.24 KBeffulgentsia
#2 2509218-2.patch20.47 KBeffulgentsia
SafeMarkup-remove-passthrough.patch2.75 KBeffulgentsia

Comments

Status: Needs review » Needs work

The last submitted patch, SafeMarkup-remove-passthrough.patch, failed testing.

effulgentsia’s picture

Status: Needs work » Needs review
StatusFileSize
new20.47 KB

Status: Needs review » Needs work

The last submitted patch, 2: 2509218-2.patch, failed testing.

effulgentsia’s picture

Status: Needs work » Needs review
StatusFileSize
new21.24 KB
new672 bytes
xjm’s picture

Priority: Normal » Major

+1 for "deprecating" the option this way.

It's interesting that the patch is green when my Views patch has a failure though -- didn't figure out how that happened yet. :)

+++ b/core/lib/Drupal/Component/Utility/SafeMarkup.php
@@ -209,38 +209,28 @@ public static function checkPlain($text) {
-   *   - !variable: Inserted as is, with no sanitization or formatting. Only
-   *     use this when the resulting string is being generated for one of:
-   *     - Non-HTML usage, such as a plain-text email.
-   *     - Non-direct HTML output, such as a plain-text variable that will be
-   *       printed as an HTML attribute value and therefore formatted with
-   *       self::checkPlain() as part of that.
-   *     - Some other special reason for suppressing sanitization.

I think we need to keep documentation of this indicating it is a legacy placeholder type that now behaves the same as @ but is deprecated in 8.0.x and that support for it will be removed before 9.0.0.

effulgentsia’s picture

Title: Make ! behave like @ in SafeMarkup::format() » Allow t() to work for non-HTML text
Status: Needs review » Needs work

#2558791: "!"-prefixed tokens should Xss::filterAdmin() but not affect safeness takes a different (and IMO, better) approach to the ! semantics. But I still think the 'html' => FALSE option for t() is worth adding, so retitling.

joelpittet’s picture

+++ b/core/lib/Drupal/Core/StringTranslation/TranslationManager.php
@@ -176,6 +171,26 @@ protected function doTranslate($string, array $options = array()) {
+    $html = isset($options['html']) ? $options['html'] : TRUE;

Could this be a bit more generic so it's inline with other strategies in twig? Doesn't have to be exact 'is_safe' => ['html']

But the idea is that you can make it safe for different contexts. ['js', 'html'] or ['all'], what do you think?
https://github.com/twigphp/Twig/blob/5fdbd991bfcf5ea2492c9ab074a2d2cde18...
https://github.com/twigphp/Twig/blob/5fdbd991bfcf5ea2492c9ab074a2d2cde18...

Also I have tests over here #2531824: Attribute class to check safe strings before escaping (has tests) for a bug that looks like it's partially addressed by this patch for feed_icon.

alexpott’s picture

Status: Needs work » Needs review
StatusFileSize
new12.32 KB

I think given #2557113: Make t() return a TranslationWrapper object to remove reliance on a static, unpredictable safe list we have another option here. If t() returns a TranslationWrapper then we can add a method on it to do this. See patch.

Status: Needs review » Needs work

The last submitted patch, 8: 2509218.8.patch, failed testing.

stefan.r’s picture

I like the idea in #8, it would solve part of the concern in #2558791: "!"-prefixed tokens should Xss::filterAdmin() but not affect safeness

  1. +++ b/core/lib/Drupal/Core/StringTranslation/TranslationInterface.php
    @@ -136,4 +136,31 @@ public function formatPluralTranslated($count, $translation, array $args = array
    +   * Never call translate($user_text) where $user_text is text that a user
    

    s/translate/translateForNonHtml/

  2. +++ b/core/lib/Drupal/Core/StringTranslation/TranslationInterface.php
    @@ -136,4 +136,31 @@ public function formatPluralTranslated($count, $translation, array $args = array
    +   * entered; doing so can lead security problems. The output of this method is
    +   * intended for non-HTML usages. If the caller wants to ensure no HTML tags
    

    Should we be more forceful about how dangerous this is if it ends up in HTML (or javascript)?

  3. +++ b/core/lib/Drupal/Core/StringTranslation/TranslationInterface.php
    @@ -136,4 +136,31 @@ public function formatPluralTranslated($count, $translation, array $args = array
    +   *   on the first character of the key, the value is escaped and/or themed.
    

    Maybe just "wrapped in <em>" as opposed to themed?

  4. +++ b/core/lib/Drupal/Core/StringTranslation/TranslationManager.php
    @@ -265,4 +265,15 @@ public function getNumberOfPlurals($langcode = NULL) {
    +  public function translateForNonHtml($string, array $args = array(), array $options = array()) {
    +    $string = $this->doTranslate($string, $options);
    

    Just an idea, but if we're worried about developers not thinking when they use this, maybe a 'strip_tags' key in the options array that defaults to TRUE...

  5. +++ b/core/lib/Drupal/Core/StringTranslation/TranslationWrapper.php
    @@ -128,4 +145,13 @@ public function jsonSerialize() {
    +   * Returns the translation unescaped for usages in non-HTML.
    

    Should we list a few example use cases here, i.e. error logs and plain text email messages?

  6. +++ b/core/modules/contact/contact.module
    @@ -7,8 +7,10 @@
    +
    

    newline

stefan.r’s picture

+++ b/core/lib/Drupal/Core/StringTranslation/TranslationManager.php
@@ -265,4 +265,15 @@ public function getNumberOfPlurals($langcode = NULL) {
+      $string = strtr($string, $args);

If we have a string that's intended for non-HTML output only this muddles the meaning of the placeholder prefixes a bit as both @, ! and any other prefix would output the unescaped/unfiltered string... but I guess there's no way around that when we have a TranslationWrapper that has a single translatable string that may be output to both HTML and non-HTML.

effulgentsia’s picture

Status: Needs work » Needs review
StatusFileSize
new12.32 KB

Just a reroll.

effulgentsia’s picture

Title: Allow t() to work for non-HTML text » Make ! behave like @ in SafeMarkup::format(), and add a new API for t() to work for non-HTML text
StatusFileSize
new14.38 KB
new1.94 KB

The last submitted patch, 12: 2509218.12.patch, failed testing.

Status: Needs review » Needs work

The last submitted patch, 13: 2509218.13.patch, failed testing.

The last submitted patch, 12: 2509218.12.patch, failed testing.

The last submitted patch, 13: 2509218.13.patch, failed testing.

subhojit777’s picture

Status: Needs work » Needs review
StatusFileSize
new18.95 KB
new1.3 KB

Reducing the number of failing tests

subhojit777’s picture

+++ b/core/modules/system/src/Tests/Common/XssUnitTest.php
@@ -41,7 +41,7 @@ function testT() {
-    $this->assertEqual($text, 'Verbatim text: <script>', 't replaces verbatim string as-is.');
+    $this->assertEqual($text, 'Verbatim text: &lt;script&gt;', 't replaces and escapes string.');
   }
 
   /**
diff --git a/core/modules/system/src/Tests/Theme/TwigTransTest.php b/core/modules/system/src/Tests/Theme/TwigTransTest.php

diff --git a/core/modules/system/src/Tests/Theme/TwigTransTest.php b/core/modules/system/src/Tests/Theme/TwigTransTest.php
index 21aef18..d30e753 100644

index 21aef18..d30e753 100644
--- a/core/modules/system/src/Tests/Theme/TwigTransTest.php

--- a/core/modules/system/src/Tests/Theme/TwigTransTest.php
+++ b/core/modules/system/src/Tests/Theme/TwigTransTest.php

+++ b/core/modules/system/src/Tests/Theme/TwigTransTest.php
+++ b/core/modules/system/src/Tests/Theme/TwigTransTest.php
@@ -139,7 +139,7 @@ protected function assertTwigTransTags() {

@@ -139,7 +139,7 @@ protected function assertTwigTransTags() {
     );
 
     $this->assertRaw(
-      'PAS-THRU: &"<>',
+      'PAS-THRU: &amp;&quot;&lt;&gt;',

I have just updated the tests since now ! will behave similar to @.

Status: Needs review » Needs work

The last submitted patch, 18: make_behave_like_in-2509218-18.patch, failed testing.

The last submitted patch, 18: make_behave_like_in-2509218-18.patch, failed testing.

joelpittet’s picture

We need to move #2506445-170: Replace !placeholder with @placeholder in t() and format_string() for non-URLs in tests here. This is the diff:

+++ b/core/modules/contact/src/Tests/ContactPersonalTest.php
@@ -79,12 +79,12 @@ function testSendPersonalContactMessage() {
-      '!site-name' => $this->config('system.site')->get('name'),
-      '!subject' => $message['subject[0][value]'],
-      '!recipient-name' => $this->contactUser->getUsername(),
+      '@site-name' => $this->config('system.site')->get('name'),
+      '@subject' => $message['subject[0][value]'],
+      '@recipient-name' => $this->contactUser->getUsername(),
...
-    $this->assertEqual($mail['subject'], t('[!site-name] !subject', $variables), 'Subject is in sent message.');
-    $this->assertTrue(strpos($mail['body'], 'Hello ' . $variables['!recipient-name']) !== FALSE, 'Recipient name is in sent message.');
+    $this->assertEqual($mail['subject'], t('[@site-name] @subject', $variables), 'Subject is in sent message.');
+    $this->assertTrue(strpos($mail['body'], 'Hello ' . $variables['@recipient-name']) !== FALSE, 'Recipient name is in sent message.');

These are for email tokens.

xjm’s picture

Status: Needs work » Postponed

I think we should probably postpone this on #2557113: Make t() return a TranslationWrapper object to remove reliance on a static, unpredictable safe list at this point, since that will change this patch.

xjm’s picture

xjm’s picture

Title: Make ! behave like @ in SafeMarkup::format(), and add a new API for t() to work for non-HTML text » Add new API for t() to work for non-HTML text
Issue tags: +Needs followup

Also, I think we should actually separate the two scopes of this issue. "Add new API for t() to work for non-HTML text" can mean simply using the new behavior from #2557113: Make t() return a TranslationWrapper object to remove reliance on a static, unpredictable safe list to allow @placeholder to be used for emails since they would then be escaped at the same time as other strings, which will improve the DX overall by simplifying things.

However, deprecating !placeholder and making it behave like @placeholder is much broader in scope and does not have consensus. We may actually remove it instead, and anyway fixing the test failures for that change will duplicate efforts on #2506445: Replace !placeholder with @placeholder in t() and format_string() for non-URLs in tests and friends.

Can we create a separate issue for that if we still think it's worth doing (as opposed to just removing them all)?

xjm’s picture

Title: Add new API for t() to work for non-HTML text » Allow t() to work for non-HTML text
xjm’s picture

Also, there are some incorrect hunks in the current patch.

effulgentsia’s picture

In #18, the working assumption was that once #2557113: Make t() return a TranslationWrapper object to remove reliance on a static, unpredictable safe list lands, the API for getting a translated string for non-HTML use (such as email subject or JSON output) would be something like:

t(...)->getTranslationForNonHtml()

with that method name still open for discussion (other options might be ->toPlainText(), etc.).

On a hangout earlier today with a bunch of people, @catch brought up another idea:

Html::toText(t(...))

where that toText() method could call MailFormatHelper::htmlToText() (or we move that implementation into the Html class).

A benefit of that is that it could be the same API for when we want to convert the HTML output of something other than t() to plain-text, such as:

Html::toText($token_service->replace(...))

A drawback might be that if you look at that current implementation of MailFormatHelper::htmlToText(), maybe that's a lot of specific choices on what to do that don't really make sense for strings coming out of t()? Then again, maybe they do?

effulgentsia’s picture

Priority: Major » Critical

Also, raising this to Critical, because it's impossible to remove the last usages of !placeholder from core, such as:

$message['subject'] .= t('[!site-name] !subject', $variables, $options);

without it.

xjm’s picture

One thing #28 does not mention is the risk of adding "!placeholder by another name".

Also, I'm confused that all the suggestions in #28 involve adding methods. I thought that the idea was that converting to late rendering of the translations would mean that the placeholders could be escaped then, in the render process, so they wouldn't be escaped otherwise, same as the expectation for Twig.

dawehner’s picture

Also, I'm confused that all the suggestions in #28 involve adding methods. I thought that the idea was that converting to late rendering of the translations would mean that the placeholders could be escaped then, in the render process, so they wouldn't be escaped otherwise, same as the expectation for Twig.

Well yeah at some point though you need to convert the object to a string. Maybe we could also go with TranslationWrapper->render($strategy)

catch’s picture

Title: Allow t() to work for non-HTML text » Ensure that the results of t() can be used as plain text for non-HTML contexts
Issue summary: View changes
Status: Postponed » Active

Re-titled and updated issue summary to try to summarize the two approaches.

Also if we go for option #2, this wouldn't need to be postponed, so moving back to active for the discussion at least.

plach’s picture

Not sure whether this is BS, since I just started to get my feet wet with the SafeMarkup stuff, but it seems to me that here we are needing an alternative sanitization logic. If we were able to instantiate a different sanitization service depending on the mime type of the output, we could use placeholders semantically and let each service figure out the most appropriate sanitization strategy. We could default to text/html as sanitization context and allow to specify alternative ones via $options. For example:

$args = ['@user_name' => $account->getName()];
// HTML email, @user_name is escaped.
t('Welcome @user_name!', $args);
// Plain text email, @user_name is not escaped.
t('Welcome @user_name!', $args, ['output' => 'text/plain']);
dawehner’s picture

Well, the problem is that like for tokens, the place which generates the t() cannot know yet, how its gonna be used, so making it lazy would help with that.
@plach
I think we should do something, that is as parallel as twig, which means you would pass a sanitization strategy?

catch’s picture

Well, the problem is that like for tokens, the place which generates the t() cannot know yet, how its gonna be used, so making it lazy would help with that.

So that's true for t() strings that end up in e-mail bodies - but we (should) already use MailHelper::htmlToText() for that.

I'm not sure it's true for e-mail subjects though - otherwise those places wouldn't be explicitly using the !placeholder to avoid sanitization.

i.e. $message['subject'] .= t('[!site-name] !subject', $variables, $options); from contact.module

dawehner’s picture

Well, IMHO for email subjects we should be able to strip all tags.

plach’s picture

@dawehner:

Well, the problem is that like for tokens, the place which generates the t() cannot know yet, how its gonna be used, so making it lazy would help with that.

On one hand we cannot have reliable output sanitization without knowing the output format, OTOH I realize that code generating token might miss the required contextual information, at least currently.

I think we'd have two ways to cope with this:

  • We require a string requiring sanitization to be provided the output format as contextual information, as I was suggesting in the example above. For tokens this would mean passing the output format in $options, as we are doing for language right now. How we'd implement that practically I'm not sure about: maybe a user could choose between [title] and [title:plain] or stuff like that.
  • Indeed a better solution DX-wise could be to lazily sanitize/stringify dynamic strings. For instance we could extend the approach introduced in #2557113: Make t() return a TranslationWrapper object to remove reliance on a static, unpredictable safe list and introduce a DynamicStringWrapper that could be passed around until contextual information about output format is finally available. This would allow us to have something like the following:
    $dynamic_string = t('Welcome, @user_name!', ['@user_name' => $account->getName()]);
    
    // Render the string as HTML, inherit Twig autoescaping.
    $build = ['#markup' => $dynamic_string];
    
    // Send as email content.
    function mymodule_send_welcome_email(DynamicStringWrapper $dynamic_string) {
      $dynamic_string->setOutputFormat('text/plain');
      // Send plaintext email, no escaping.
    }
    
    // Include it as HTML attribute, e.g. <input type="submit" value="$dynamic_string">.
    function _drupal_render_attribute(DynamicStringWrapper $dynamic_string) {
      $dynamic_string->setOutputFormat('text/html; x-drupal-context: attribute');
      // Escape attribute delimiters and type-specific content, e.g. URL protocol.
    }
    

    A token value could simply be wrapped into a child TokenStringWrapper having simply @value as hard-coded string pattern.
    In this scenario @value and :url placeholders would determine the value's semantics instead of the sanitization logic: the former would indicate any plain string, while the second would indicate a URL string. This way, depending on the output format, we would know whether escaping the value or sanitizing the URL protocol is needed, for instance.

Well, IMHO for email subjects we should be able to strip all tags.

I think this is not enough: if the string was "HTML-escaped" previously, stripping tags would have no effect and lots of bogus &lt; and &gt; could creep into the mail subject.

catch’s picture

Just discussed this with plach, I'd thought that #2569485: Add AttributeSafeStringInterface and UriAttributeSafeStringInterface was a competing approach to this, but actually I think we should use both.

You get a $translated_string object.

The object has a ->renderAsHtmlAttribute() method.

This strips tags (to avoid <em> tags either being invalidly not escaped, or uglily escaped in an HTML attribute.

It also encodes entities to avoid XSS.

And returns an AttributeSafeStringInterface.

You can then pass an AttributeSafeStringInterface into AttributeString, so that it doesn't end up getting run through Html::escape() when it's already in the right format.

If we only did renderAsPlainText() then that might be OK for an e-mail subject but it's not necessarily OK for an HTML attribute.

If we only do AttributeSafeStringInterface then you can still end up with either escaped or unescaped HTML tags in attributes.

If we do both all cases are covered and it's clear what should be used for what.

plach’s picture

Assigned: Unassigned » plach
Issue tags: +D8 Accelerate

I'll experiment a bit with this, although it would be better to get #2557113: Make t() return a TranslationWrapper object to remove reliance on a static, unpredictable safe list in first.

plach’s picture

Status: Active » Needs review
StatusFileSize
new8.91 KB
new49.76 KB

Here is a first stab, just to get an idea of how things could look like. This includes also #2557113-185: Make t() return a TranslationWrapper object to remove reliance on a static, unpredictable safe list, the interdiff is the new code.

Status: Needs review » Needs work

The last submitted patch, 40: safe_markup-contexts-2509218-41.patch, failed testing.

The last submitted patch, 40: safe_markup-contexts-2509218-41.patch, failed testing.

catch’s picture

So right now the first argument to t() is literal HTML and does not get escaped or filtered again (since it's marked as a SafeString). We also allow translated strings to have HTML in them.

We should probably make that more explicit than it currently is in the documentation, https://api.drupal.org/api/drupal/core%21includes%21bootstrap.inc/functi... doesn't really make it clear that the first argument is HTML at all, except saying don't put variables in there.

Given in this issue we're talking about three different formats of returned string - HTML + plain text + HTML attribute, we need to figure out what that looks like.

Let's say we start with the following string:

t('The &lt;em&gt; tag makes your text look like <em>"this"</em>.')

1. HTML:

Nothing happens, you get this back:

The &lt;em&gt; tag makes your text look like <em>"this"</em>.

Which will look like

The <em> tag makes your text look like "this".

in the browser (i.e. a select option or the title tag of an image or whatever).

2. Plain text:

plach suggested html_entity_decode(strip_tags($string)); The other option would be factoring out https://api.drupal.org/api/drupal/core!lib!Drupal!Core!Mail!MailFormatHe...

That would return:

The <em> tag makes your text look like "this".

Fine for e-mail subject lines and similar.

3. HTML attribute.

For this, I think we'd want to take the plain text string, then Html::escape() it, so:

Html::escape(html_entity_decode(strip_tags($string)));

That gets us:

'The &lt;em&gt; tag makes your text look like &quot;this&quot;.

Which will look like:

The <em> tag makes your text look like "this".

In the browser.

Having typed that out makes me wonder the following (assuming the above is what we want, it might not be):

If we can get a plain text output from t(), then we can pass that plain text to Attributes, and AttributeString would Html::escape() it - and maybe that's enough of an API for passing the results of t() to attributes (which Views currently does).

catch’s picture

Then that same example with arguments:


t('The @tag makes your text look like @result', ['@tag' =>'<em>', '@result' => SafeString::create('<em>"this"</em>']);

1. HTML:

First argument is escaped, second argument is marked as a SafeString so is not escaped. Return is correct HTML.

2. Plain text:

First argument is not escaped- this is fine.

Second argument is ... we could strip_tags() safe string arguments but erggh.

I think it'd be easier to get the HTML string first, then apply the same html_entity_decode(strip_tags($string)) to that. Gets us the same result regardless of whether the HTML is in the string or replacements then.

3. Attributes, as before we just Html::escape() the plain text value.

stefan.r’s picture

Html::escape(html_entity_decode(strip_tags($string)));

Hmm, not sure this is secure for all attributes, we may still need some special cases for other attributes such as on* then, and think about what to do about attributes that are not wrapped in double quotes.

2. Plain text:

The Html::escape(html_entity_decode(strip_tags($string))); might be fine for email subjects but the htmlToText helper is much nicer for longer text, I also quite like some of the things in https://github.com/soundasleep/html2text that we could be doing in htmlToText too.

catch’s picture

Hmm, not sure this is secure for all attributes, we may still need some special cases for other attributes such as on* then, and think about what to do about attributes that are not wrapped in double quotes.

So at the moment we just don't support passing user-entered content to on* (and never should). But if we wanted to provide a way to encode them properly I think we need a new issue for that - would need to be applied to Attributes and Xss::attributes() too.

The Html::escape(html_entity_decode(strip_tags($string))); might be fine for email subjects but the htmlToText helper is much nicer for longer text

Yes I think that's an open question.

effulgentsia’s picture

I haven't thought through #43 and its related comments yet, so this is not a commentary on that, but just want to respond to this part from the issue summary:

"!placeholder by another name", which somewhat turns the pro into a con.

I don't see this as a con, because IMO the "another name" part is enough to address the "makes the sanitization API harder to understand" part of the parent issue. Here's what I mean:

In Drupal 7, ! can be used to signify many different things, among which are:

  1. The output of t() will be used in non-HTML context, such as an email subject, so don't escape HTML entities, because email clients do not render email subjects as HTML.
  2. The output of t() will eventually be used in HTML context, but something else will escape it, such as drupal_attributes() or form_select_options().
  3. The value I'm passing is the output of drupal_render(), t(), or some other rendering function that does its own escaping, so don't re-escape that.
  4. The value I'm passing is not one of the above, but is something I know to be safe for my own reasons (e.g., I'm passing a literal string of HTML, or an implode() of safe parts with a literal glue), and don't (re-)escape that.
  5. The value I'm passing doesn't have any characters that require HTML encoding, such as a machine name, or a URL without a query string or fragment, so don't waste CPU time on a no-op check_plain().

Note that with each of the above cases, the calling code in question might be incorrect in its assumptions (each item below maps to the same number above), each mistake resulting in an XSS vulnerability:

  1. You thought you were returning a string to a caller for use in an email subject, but instead the caller used it within a drupal_set_message().
  2. You thought the values within #options were the responsibility of Form API to escape (which it is in Drupal 7), but then Form API changed to not do it (which happened in Drupal 8).
  3. You thought you were passing the result of a safe rendering function (such as drupal_render()), but suppose the HTML you have is really the result of a Views plugin function that wasn't sufficiently well documented as to whether it needs to return text or HTML, and some Views plugin out there got it wrong (yes, this really happened during D8's development).
  4. Plain human error: See https://www.drupal.org/node/2537866 for a recent, but by far not the only, example.
  5. You thought you were passing a URL without any query string or fragment, but some other module has a hook_url_outbound_alter() implementation that adds those.

The current implementation of ! in Drupal 8 HEAD solves nicely for the first two cases, whether you're right or wrong. If you're right, you get exactly what you asked for, and the resulting string isn't marked safe and doesn't need to be for those contexts. If you're wrong, you end up getting the escaping of the entire string, which is exactly what you should get when a string you thought you were returning for non-HTML use gets used in HTML.

The current implementation of ! in HEAD also solves nicely for the 3rd case if you're right, but causes escaping of the entire t() output (not just the incorrect placeholder) if you're wrong. And the current implementation breaks down even more for the last two cases, regardless of if your assumptions are right or wrong.

The problems with cases 3-5 are the reasons to remove ! entirely. But not arguments against solving just cases 1 and 2 with an API that's clear about that reduced scope.

catch’s picture

The output of t() will eventually be used in HTML context, but something else will escape it, such as drupal_attributes() or form_select_options().

So for me #2 is the tricky one, which #43 attempts to unpick, and I think plach's patch is compatible with what's in #43.

The problem being is that the output of t(), with the same strings and context, could be used as either an attribute value or an HTML fragment, certainly the Views info stuff is like that - and those need two different strategies.

I'd also add an example #6, which is the reverse problem of #1:

6. You thought you were creating a string for drupal_set_message() (or any HTML output), but someone put it into the subject of an e-mail instead.

#43 allows us to handle that case too.

plach’s picture

Status: Needs work » Needs review
StatusFileSize
new14.76 KB
new56.75 KB

This implements #43 - #44 and provides unit tests for that. I will start looking at test failures as soon as #2557113: Make t() return a TranslationWrapper object to remove reliance on a static, unpredictable safe list is committed, since I don't want to waste time chasing every iteration. Most of the failures are unit tests that just need to be adapted to the new API.

plach’s picture

Assigned: plach » Unassigned

Done for tonight

plach’s picture

Here's a version of #49 including only the parts added here, except for some small bits in TranslationInterface and TranslationManager.

Status: Needs review » Needs work

The last submitted patch, 49: safe_markup-contexts-2509218-49.patch, failed testing.

The last submitted patch, 49: safe_markup-contexts-2509218-49.patch, failed testing.

plach’s picture

StatusFileSize
new821 bytes
new15.87 KB
new57.01 KB

Added missing PHP docs

subhojit777’s picture

Status: Needs work » Needs review
plach’s picture

The last submitted patch, 54: safe_markup-contexts-2509218-54.patch, failed testing.

The last submitted patch, 54: safe_markup-contexts-2509218-54.patch, failed testing.

almaudoh’s picture

StatusFileSize
new14.19 KB

I really like this approach. Re-uploaded the review.txt patch in #56 since #2557113: Make t() return a TranslationWrapper object to remove reliance on a static, unpredictable safe list is now in.

+++ b/core/lib/Drupal/Core/StringTranslation/TranslationWrapper.php
@@ -119,34 +130,37 @@ public function getOptions() {
+   * TODO
    *
-   * @return string
-   *   The translated string.
+   * This should be an injected factory, likely a plugin manager.
+   *
+   * @return \Drupal\Core\OutputFormatter\OutputFormatterInterface
    */

Working on a service / plugin manager for replacing placeholders based on content type.

almaudoh’s picture

Assigned: Unassigned » almaudoh
almaudoh’s picture

Assigned: almaudoh » Unassigned
Issue tags: +Needs tests
StatusFileSize
new14.2 KB
new23.6 KB

In this patch...
1. Added a plugin manager with @OutputFormatter annotation to manage the different kinds of output formatters.
2. Moved the three output formatters to the Plugin/OutputFormatter directory

Needs tests for the new classes.

Status: Needs review » Needs work

The last submitted patch, 61: safe_markup_contexts-2509218-61.patch, failed testing.

The last submitted patch, 61: safe_markup_contexts-2509218-61.patch, failed testing.

pwolanin’s picture

This patch looks like it's off track in terms of scope and what it's doing. Let's make it as small as possible - ideally just doc and no API changes

plach’s picture

Thanks, some copy/paste issues:

  1. +++ b/core/lib/Drupal/Core/OutputFormatter/OutputFormatterManager.php
    @@ -0,0 +1,119 @@
    + * Contains \Drupal\Core\Block\BlockManager.
    
  2. +++ b/core/lib/Drupal/Core/OutputFormatter/OutputFormatterManager.php
    @@ -0,0 +1,119 @@
    + * Manages discovery and instantiation of block plugins.
    + *
    + * @todo Add documentation to this class.
    + *
    + * @see \Drupal\Core\Block\BlockPluginInterface
    
  3. +++ b/core/lib/Drupal/Core/OutputFormatter/OutputFormatterManager.php
    @@ -0,0 +1,119 @@
    +   * Constructs a new \Drupal\Core\Block\BlockManager object.
    
plach’s picture

This patch looks like it's off track in terms of scope and what it's doing. Let's make it as small as possible - ideally just doc and no API changes

I'm not sure what API changes you are referring to, can you expand on that? Also, I don't really think we are going to address this issue just with documentation.

pwolanin’s picture

Please update the issue summary before posting any more patches.

I can't even tell if this is going in the right direction. I think adding more complexity to t() and the rest of the API here might be the wrong thing.

From the issue summary I was hoping this would be mostly a documentation issue - instead it's looking light a significant API change.

pwolanin’s picture

@plach - option #2 in the current issue summary suggests it can mostly be a docs issue. It's not clear to me when/if that option was discarded.

plach’s picture

Sorry, we are currently going a different way, see #33 and #37, I will update the issue summary ASAP.

pwolanin’s picture

Status: Needs work » Postponed (maintainer needs more info)

Discussing with alexpott at MOB. We need a conversation in person about broadening this into to a generic (but as simple as possible) system to allow us to render any SafeString with a context-relevant formatter.

catch’s picture

Status: Postponed (maintainer needs more info) » Needs work

This is still critical, and it blocks #2571673: Convert Views t() usage where it is used as an attribute value, which is also critical, even if you don't personally like the solution.

The problem is very much there, and does not 'need more info'.

catch’s picture

Title: Ensure that the results of t() can be used as plain text for non-HTML contexts » Ensure that SafeString objects can be used in non-HTML contexts
Issue summary: View changes

Discussed more with pwolanin and alexpott at the sprint.

I don't think we should use plugins for this, @alexpott suggested just a method that takes a formatter and returns the string in the format, that seems plenty for extensibility to me. It'll be an interface with one method.

There's still a bit of discussion as to whether when we format for an attribute, do we return a plain text string that we then escape again, or return an escaped string and communicate to AttributeString and elsewhere that it doesn't get escaped again (via AttributeFormattedStringInterface or whatever which we don't have yet). However hopefully this clarifies exactly what the need is with the updated issue summary.

catch’s picture

Issue summary: View changes
catch’s picture

Issue summary: View changes
catch’s picture

Issue summary: View changes
wim leers’s picture

Issue summary: View changes

I read through the entire issue, and had the same remarks/questions/doubts about using plugins as #73. But catch posted it 10 minutes before I did :P

@pwolanin and @catch confirmed that we are now going with option 3 in the issue summary. That's why he tried to strike through the rest in #76, but failed miserably, because <del> is not able to strike through block-level elements :P

pwolanin’s picture

really more like #2

pwolanin’s picture

Issue summary: View changes

really more like #2

pwolanin’s picture

Status: Needs work » Needs review
StatusFileSize
new1.67 KB

After further discussion just having an interface and classes with a static method is the simplest possible way to solve this problem.

Here's a starting patch showing the general idea.

pwolanin’s picture

Issue summary: View changes
pwolanin’s picture

StatusFileSize
new2.63 KB
dawehner’s picture

StatusFileSize
new5.23 KB
new2.59 KB

Added some tests

plach’s picture

@pwolanin @dawehner:

Can we restore the test cases using placeholders that were introduced in #49?

plach’s picture

Assigned: Unassigned » plach

Working on this

  1. +++ b/core/lib/Drupal/Component/Utility/AttributeValueOutput.php
    @@ -0,0 +1,26 @@
    +   * @param $string|SafeStringInterface
    
    +++ b/core/lib/Drupal/Component/Utility/OutputStrategyInterface.php
    @@ -0,0 +1,23 @@
    +   * @param $string|SafeStringInterface
    
    +++ b/core/lib/Drupal/Component/Utility/PlainTextOutput.php
    @@ -0,0 +1,26 @@
    +   * @param $string|SafeStringInterface
    

    Missing FQCN

  2. +++ b/core/lib/Drupal/Component/Utility/AttributeValueOutput.php
    @@ -0,0 +1,26 @@
    \ No newline at end of file
    
    +++ b/core/lib/Drupal/Component/Utility/OutputStrategyInterface.php
    @@ -0,0 +1,23 @@
    \ No newline at end of file
    
    +++ b/core/lib/Drupal/Component/Utility/PlainTextOutput.php
    @@ -0,0 +1,26 @@
    \ No newline at end of file
    

    Missing newline

plach’s picture

More tests and docs

Status: Needs review » Needs work

The last submitted patch, 86: safe_markup-contexts-2509218-86.patch, failed testing.

plach’s picture

StatusFileSize
new3.75 KB
new8.86 KB

Fixed test failures.

plach’s picture

Status: Needs work » Needs review
Issue tags: -Needs tests
+++ b/core/lib/Drupal/Component/Utility/OutputStrategyInterface.php
@@ -0,0 +1,25 @@
+/**
+ * Common interface for output strategies.
+ */
+interface OutputStrategyInterface {

Probably this interface should provide more details about what output strategies are, when to use them and how.

catch’s picture

  1. +++ b/core/lib/Drupal/Component/Utility/HtmlAttributeValueOutput.php
    @@ -0,0 +1,29 @@
    + * Implements an output strategy to be used to format strings to be used as
    

    Would be good to avoid 'to be used' twice in the sentecne.

    "Implements an output strategy used to format strings for HTML attribute values."?

  2. +++ b/core/lib/Drupal/Component/Utility/HtmlAttributeValueOutput.php
    @@ -0,0 +1,29 @@
    +     return Html::escape(PlainTextOutput::renderFromHtml($string));
    

    We can skip the Html::escape() - Attributes/AttributeString will do that. I think that's what we decided this afternoon.

  3. +++ b/core/tests/Drupal/Tests/Component/Utility/HtmlAttributeValueOutputTest.php
    @@ -0,0 +1,71 @@
    +    $output = HtmlAttributeValueOutput::renderFromHtml($markup);
    

    Even if we don't have the separate output strategy for attributes vs. plain text, we could have an integration test with AttributeString to ensure it looks right after escaping?

effulgentsia’s picture

+1 to the general approach here.

Re #90.2: if we keep HtmlAttributeValueOutput in this patch at all, let's add a @todo for it to do the escaping AND upcast it to a AttributeSafeStringInterface once #2569485: Add AttributeSafeStringInterface and UriAttributeSafeStringInterface is in.

Should we also add a FormattedPlainTextOutput strategy (or better name if someone comes up with one) that invokes MailFormatHelper::htmlToText()? And if we do, then should we rename PlainTextOutput to SimplePlainTextOutput or similar name, or is it ok for that one to not have any qualifying prefix?

stefan.r’s picture

+1 to #91 - I had discussed this with @plach earlier today and the conclusion was a "fancier" version of the plain text output for larger text could be a nice non-critical followup here. SimplePlainTextOutput and FormattedPlainTextOutput sound great to me

almaudoh’s picture

Some docs nits mostly...

  1. +++ b/core/lib/Drupal/Component/Utility/HtmlAttributeValueOutput.php
    @@ -0,0 +1,29 @@
    + * Contains \Drupal\Component\Utility\PlainTextOutput.
    + */
    ...
    +class HtmlAttributeValueOutput implements OutputStrategyInterface {
    

    Contains \Drupal\Component\Utility\HtmlAttributeValueOutput

  2. +++ b/core/lib/Drupal/Component/Utility/HtmlAttributeValueOutput.php
    @@ -0,0 +1,29 @@
    +   * @param $string|\Drupal\Component\Utility\SafeStringInterface
    +   *   An HTML string or a any object that can be cast to string.
    

    "or a any" :)

  3. +++ b/core/lib/Drupal/Component/Utility/OutputStrategyInterface.php
    @@ -0,0 +1,25 @@
    + * @file
    + * Contains \Drupal\Component\Utility\OutputStrategyInterface.
    ...
    +interface OutputStrategyInterface {
    

    OutputStrategyInterface doesn't really explain for me what this does. Maybe OutputEscapeStrategyInterface or OutputFormatStrategyInterface. Or perhaps just OutputFormatInterface.

So what's the plan for implementation of these on SafeStrings? The IS is not very clear on this. Are we adding a new :: renderAsFormat(OutputStrategyInterface $strategy) method to SafeStringInterface...?

dawehner’s picture

StatusFileSize
new9.04 KB
new5.17 KB

Smal changes here and there.

The last submitted patch, 86: safe_markup-contexts-2509218-86.patch, failed testing.

plach’s picture

Assigned: plach » Unassigned

Not working on this atm...

lauriii’s picture

+1 for the SimplePlainTextOutput to make it possible to create more plaintext output strategies

The last submitted patch, 88: safe_markup-contexts-2509218-88.patch, failed testing.

pwolanin’s picture

I think all of these just return a simple string, so I don't think we should return a HtmlAttributeValueOutput or anything else for any of these.

pwolanin’s picture

+++ b/core/lib/Drupal/Component/Utility/HtmlAttributeValueOutput.php
@@ -0,0 +1,32 @@
+   * @param $string|\Drupal\Component\Utility\SafeStringInterface

I put this here originally, but actually we can accept any object that implements __toString
Not sure how to note that in the @param

+++ b/core/lib/Drupal/Component/Utility/HtmlAttributeValueOutput.php
@@ -0,0 +1,32 @@
+    // @todo Conver the result to AttributeSafeStringInterface, see
+    //   https://www.drupal.org/node/2569485

Typo here, but I also think the comment isn't right - we may use it with AttributeSafeStringInterface but every class implementing this interface should return a string.

+++ b/core/tests/Drupal/Tests/Component/Utility/PlainTextOutputTest.php
@@ -0,0 +1,70 @@
+    $safe_string = $this->prophesize(SafeStringInterface::class);

Why not just create a SafeString object? I don't see the value of a mock here.

stefan.r’s picture

Issue summary: View changes
Issue tags: -Needs issue summary update
StatusFileSize
new11.96 KB
new9.64 KB

Making some further changes

The last submitted patch, 101: interdiff-94-98.patch, failed testing.

stefan.r’s picture

StatusFileSize
new12.04 KB
new9.68 KB

The last submitted patch, 101: 2509218-98.patch, failed testing.

Status: Needs review » Needs work

The last submitted patch, 103: 2509218-99.patch, failed testing.

pwolanin’s picture

I'm going to try to fix the test fails now.

almaudoh’s picture

+++ b/core/lib/Drupal/Component/Utility/OutputStrategyInterface.php
--- /dev/null
+++ b/core/lib/Drupal/Component/Utility/PlainTextSimpleOutput.php

+++ b/core/lib/Drupal/Component/Utility/PlainTextSimpleOutput.php
@@ -0,0 +1,37 @@
+ * Contains \Drupal\Component\Utility\PlainTextSimpleOutput.
...
+class PlainTextSimpleOutput implements OutputStrategyInterface {

+++ b/core/tests/Drupal/Tests/Component/Utility/PlainTextSimpleOutputTest.php
@@ -0,0 +1,64 @@
+class PlainTextSimpleOutputTest extends UnitTestCase {
...
+    $output = SimplePlainTextOutput::renderFromHtml($markup);

should be SimplePlainTextOutput

pwolanin’s picture

Status: Needs work » Needs review
StatusFileSize
new9.56 KB
new1.38 KB

mis-named classes used in the code.

Status: Needs review » Needs work

The last submitted patch, 108: 2509218-107.patch, failed testing.

stefan.r’s picture

The PlainTextSimple / PlainTextFormatted was deliberate as SimplePlainText seemed more confusing but I don't care much about one or the other. If we're going to go back to SimplePlainText let's rather mention FormattedPlainText in the @todo as well.

I think we'll have to revert the SafeString change as SafeString is in Core\Render - which we're not supposed to refer to in Component?

pwolanin’s picture

Status: Needs work » Needs review
StatusFileSize
new10.04 KB
new3.39 KB

Silly me - the DrupalComponentTest fails if you do that. Back to using mocks, plus fix another class name use.

dawehner’s picture

  1. +++ b/core/lib/Drupal/Component/Utility/OutputStrategyInterface.php
    @@ -0,0 +1,35 @@
    + * Output strategies assist in transforming unsanitized HTML strings into
    + * strings that are appropriate for a given context (i.e. plain-text, HTML
    + * attributes), through performing the relevant sanitization and formatting.
    

    I'm still convinced by the comment about unsanizited. If you pass in something like t('Foo

  2. +++ b/core/lib/Drupal/Component/Utility/PlainTextSimpleOutput.php
    @@ -0,0 +1,37 @@
    + * @todo Provide a PlainTextFormattedOutput strategy that transforms HTML
    + *   into formatted plain text for use in the email body and long texts.
    

    Do we need to come up with the email given that we already have \Drupal\Core\Mail\MailFormatHelper::htmlToText

stefan.r’s picture

@dawehner yes, 1 is confusing, let me see about rewriting that.

As to 2, earlier in the issue a PlainTextFormattedOutput option came up (to be added in a followup). As far as I can see this would be no different from MailFormatHelper, so we could just move the logic from MailFormatHelper to PlainTextFormattedOutput as the formatted plain text could be used in more contexts than just email.

dawehner’s picture

As to 2, earlier in the issue a PlainTextFormattedOutput option came up (to be added in a followup). As far as I can see this would be no different from MailFormatHelper, so we could just move the logic from MailFormatHelper to PlainTextFormattedOutput as the formatted plain text could be used in more contexts than just email.

Do you think this is needed as part of this issue?

The last submitted patch, 101: interdiff-94-98.patch, failed testing.

The last submitted patch, 101: 2509218-98.patch, failed testing.

The last submitted patch, 103: 2509218-99.patch, failed testing.

The last submitted patch, 108: 2509218-107.patch, failed testing.

stefan.r’s picture

StatusFileSize
new865 bytes
new10.02 KB

Do you think this is needed as part of this issue?

I think this should be a followup. Created #2573009: Provide a PlainTextFormattedOutput output strategy.

stefan.r’s picture

StatusFileSize
new711 bytes
new10.03 KB
star-szr’s picture

Some thoughts for now:

  1. +++ b/core/lib/Drupal/Component/Utility/HtmlAttributeValueOutput.php
    @@ -0,0 +1,33 @@
    + * Use this when rendering a given HTML string into an HTML attribute value,
    + * such as select list options. Never use this to render strings into
    + * "style"/"on*" attributes, or attributes that are not wrapped in quotes.
    ...
    +  public static function renderFromHtml($string) {
    +    return Html::escape(PlainTextSimpleOutput::renderFromHtml($string));
    +  }
    

    Didn't we say select lists are special from other attributes?

    I'd also say there are use cases for including tags in a select list, for example if you are choosing different HTML tags for output of a field.

  2. +++ b/core/lib/Drupal/Component/Utility/OutputStrategyInterface.php
    @@ -0,0 +1,35 @@
    +<?php
    +/**
    ...
    + * Contains \Drupal\Component\Utility\OutputStrategyInterface.
    + */
    

    Minor: Blank line needed above docblock.

  3. +++ b/core/lib/Drupal/Component/Utility/OutputStrategyInterface.php
    @@ -0,0 +1,35 @@
    + * appropriate for a given context (i.e. plain-text, HTML attributes), through
    ...
    +   * a given output context (i.e. plain-text email subjects, HTML attribute
    

    Minor: I think these i.e. should both be e.g.,.

  4. +++ b/core/lib/Drupal/Component/Utility/PlainTextSimpleOutput.php
    @@ -0,0 +1,37 @@
    + *   into formatted plain text for use in the email body and long texts.
    

    What does long texts mean here?

imiksu’s picture

StatusFileSize
new1.3 KB
new10.03 KB

Fixed minors (#121.2 and #121.3).

almaudoh’s picture

Re #107 #108, #110: Sorry, didn't know there had been a decision to change the names. Patch looks good.

dawehner’s picture

Given our previous discussion with @stefan.r I think we should add explicit UI test coverage for select form elements with quotes in there and HTML just to see what we exactly should do.

stefan.r’s picture

I'm actually not sure about options elements being special. <option>&lt;</option> in a select does /not/ double escape for me.

stefan.r’s picture

StatusFileSize
new1.74 KB
new9.98 KB
stefan.r’s picture

Discussed this patch with @catch earlier today and we didn't see the need of escaping HTML attributes - merely turning them into plain text is enough. I do think it makes sense to have a dedicated class for them though, even if they only wrap the plain text one.

If select elements really are special let's address them in a followup as I don't think they're blocking any criticals whereas this one is. Let's get this patch in?

Status: Needs review » Needs work

The last submitted patch, 126: 2509218-126.patch, failed testing.

The last submitted patch, 126: 2509218-126.patch, failed testing.

catch’s picture

Select options in a followup is fine with me.

pwolanin’s picture

I don't understand the last change. The help text says "Use this when rendering a given HTML string into an HTML attribute value"

If people are putting it into an attribute value (or if we wire this up to a Twig filter) it needs to be escaped.

pwolanin’s picture

StatusFileSize
new9.99 KB
new633 bytes
pwolanin’s picture

Status: Needs work » Needs review
stefan.r’s picture

@pwolanin I had discussed this with @catch and the idea was not to do any sanitization in these output strategies and arrange this with Twig instead. Which would need updated docs if that's what we want to do.

Alternatively maybe we could sanitize and then mark as safe (for output into attributes).

stefan.r’s picture

re #132 why would we need to escape when outputting into a plain text context? It'd be for output into (non-URL/style/on*) attributes in any case, right?

And wouldn't that attribute value get double escaped whenever Twig has a go at that string?

Status: Needs review » Needs work

The last submitted patch, 132: 2509218-132.patch, failed testing.

The last submitted patch, 132: 2509218-132.patch, failed testing.

pwolanin’s picture

@stefan.r - in the twig context I was assuming you'd use this as a filter, not let Twig do the default autoescaping.

stefan.r’s picture

@pwolanin OK can you double check with @catch what we want to do here then?

plach’s picture

@pwolanin

Regardless of what we decide, plain text should not be HTML-escaped. I think the change was applied to the wrong strategy class.

pwolanin’s picture

Oh, huh - yes. I'm a little tired

catch’s picture

If we let Twig don't we have to add knowledge about the strategy to AttributeString since that's what we use mostly.

I don't think we should do that here. The Views case where t() is put into an attribute can do plain text then put that string into AttributeString which does the escaping.

Anything beyond that is major followup for me not release blocking. And we shouldn't add any strategies we can't use in core at all here either.

plach’s picture

Assigned: Unassigned » plach

Ok, removing the HTML attribute strategy altogether. We can add it back later if needed.

plach’s picture

Assigned: plach » Unassigned
Status: Needs work » Needs review
StatusFileSize
new6.7 KB
new5.99 KB

Also improved docs a bit.

stefan.r’s picture

StatusFileSize
new1.55 KB
new5.88 KB
lauriii’s picture

  1. +++ b/core/lib/Drupal/Component/Utility/PlainTextSimpleOutput.php
    @@ -0,0 +1,31 @@
    + *   into formatted plain text for use in the email body and CLI. See
    

    Nit: into fits the previous line

  2. +++ b/core/lib/Drupal/Core/StringTranslation/TranslatableString.php
    @@ -140,9 +140,6 @@ public function render() {
    -    // @todo https://www.drupal.org/node/2509218 Note that the argument
    -    //   replacement is not stored so that different sanitization strategies can
    -    //   be used in different contexts.
    

    Why was this @todo removed in #88?

plach’s picture

Assigned: Unassigned » plach

On this

plach’s picture

StatusFileSize
new847 bytes
new5.88 KB

@lauriii

Why was this @todo removed in #88?

Because we changed approach: we no longer automatically apply the output strategy from the ::render() method, instead we apply it to its return value.

lauriii’s picture

Assigned: plach » Unassigned
Status: Needs review » Reviewed & tested by the community

This looks good for me now :) Thanks @plach!

plach’s picture

Issue summary: View changes
Issue tags: -Needs followup +API addition

I think we should be done here.

plach’s picture

Working on a CR

almaudoh’s picture

RTBC++

plach’s picture

CR at https://www.drupal.org/node/2574697

Discussed with @alexpott and we agreed PlainTextOutput is a better name for the strategy we are adding here, PlainTextFormattedOutput is still a good name for the upcoming advanced strategy.

plach’s picture

StatusFileSize
new5.81 KB

And now with patch!

plach’s picture

Status: Reviewed & tested by the community » Needs work

The last submitted patch, 154: safe_markup-contexts-2509218-153.patch, failed testing.

The last submitted patch, 154: safe_markup-contexts-2509218-153.patch, failed testing.

The last submitted patch, 154: safe_markup-contexts-2509218-153.patch, failed testing.

jhedstrom’s picture

Fails on bot are PHP Fatal error: Uncaught exception 'ReflectionException' with message 'Class Drupal\Tests\Component\Utility\PlainTextOutputTest does not exist'

+++ b/core/lib/Drupal/Core/StringTranslation/TranslatableString.php
--- /dev/null
+++ b/core/tests/Drupal/Tests/Component/Utility/PlainTextOutputTest.php

+++ b/core/tests/Drupal/Tests/Component/Utility/PlainTextOutputTest.php
@@ -0,0 +1,68 @@
+ * Contains \Drupal\Tests\Component\Utility\PlainTextSimpleOutputTest.

due to a mismatch between class and filename.

jhedstrom’s picture

Assigned: Unassigned » jhedstrom

Working on #159, and whatever else @alexpott finds during review.

jhedstrom’s picture

Assigned: jhedstrom » Unassigned
Status: Needs work » Needs review
StatusFileSize
new797 bytes
new5.8 KB

Quick rename so tests can run while review proceeds.

wim leers’s picture

Issue summary: View changes
StatusFileSize
new38.43 KB

PlainTextFormattedOutput makes no sense to me.

How can something be both plain and formatted? Contrast with the text field types we have: — you must choose, you can't have both.

Can somebody enlighten me?

alexpott’s picture

+++ b/core/lib/Drupal/Component/Utility/PlainTextOutput.php
@@ -0,0 +1,31 @@
+ * @todo Provide a PlainTextFormattedOutput strategy that transforms HTML into
+ *   formatted plain text for use in the email body and CLI. See
+ *   https://www.drupal.org/node/2573009.

I'm too am confused by this comment. I thought this patch would give as the tools necessary to do #2572597: Replace !placeholder with @placeholder in mail code. Tbh I don't get why we have to do this - surely MailFormatHelper::htmlToText() just does what it does and this is great.

wim leers’s picture

StatusFileSize
new5.81 KB
new2.46 KB
  1. +++ b/core/lib/Drupal/Component/Utility/OutputStrategyInterface.php
    @@ -0,0 +1,36 @@
    + * appropriate for a given context (e.g. plain-text), through performing the
    

    s/plain-text/plain text/

  2. +++ b/core/lib/Drupal/Component/Utility/OutputStrategyInterface.php
    @@ -0,0 +1,36 @@
    + * relevant formatting. No santization is applied.
    

    s/santization/sanitization/

  3. +++ b/core/lib/Drupal/Component/Utility/PlainTextOutput.php
    @@ -0,0 +1,31 @@
    + * Provides an output strategy for transforming HTML into simple plain text.
    

    What is "simple" plain text? Isn't plain text always "simple"?

    I'm betting this is related to the "formatted plain text", but that doesn't mean this is clear. I think it's fine to omit.

  4. +++ b/core/lib/Drupal/Component/Utility/PlainTextOutput.php
    @@ -0,0 +1,31 @@
    + * @todo Provide a PlainTextFormattedOutput strategy that transforms HTML into
    + *   formatted plain text for use in the email body and CLI. See
    + *   https://www.drupal.org/node/2573009.
    

    Discussed with @jhedstrom, and now #162 is answered. I still think the name doesn't make sense, but we can discuss that further/refine it in the follow-up issue. I will comment there in a minute.

  5. +++ b/core/tests/Drupal/Tests/Component/Utility/PlainTextOutputTest.php
    @@ -0,0 +1,68 @@
    +   * @param array $args
    

    s/array/string[]/

  6. +++ b/core/tests/Drupal/Tests/Component/Utility/PlainTextOutputTest.php
    @@ -0,0 +1,68 @@
    +   *   none.
    

    s/none/the empty array/

EDIT: to be clear, I fixed all my nits.

wim leers’s picture

Ok, per #163, we need to deal with the mail thing here.

I think PlainTextFormattedOutput is an extremely confusing name. So let's try to find something better. This is the relevant code:

  /**
   * Transforms an HTML string into plain text, preserving its structure.
   *
   * The output will be suitable for use as 'format=flowed; delsp=yes' text
   * (RFC 3676) and can be passed directly to MailManagerInterface::mail() for sending.
   *
   * We deliberately use LF rather than CRLF, see MailManagerInterface::mail().
   *
   * This function provides suitable alternatives for the following tags:
   * <a> <em> <i> <strong> <b> <br> <p> <blockquote> <ul> <ol> <li> <dl> <dt>
   * <dd> <h1> <h2> <h3> <h4> <h5> <h6> <hr>
   *
   * …
   */
  public static function htmlToText($string, $allowed_tags = NULL) {

So it is transforming HTML into plain text, while preserving the HTML structure and just mapping it to a syntax defined in RFC 3676. That RFC is titled The Text/Plain Format and DelSp Parameters.

So it's actually transforming text/html to text/plain.

Suggested names based on this so far: StructuredPlainTextOutput, MailPlainTextOutput, PlainTextMailOutput.


Second, let's look at that RFC for inspiration:

3.  The Problem

   The Text/Plain media type is the lowest common denominator of
   Internet email, with lines of no more than 998 characters (by
   convention usually no more than 78), and where the carriage-return
   and line-feed (CRLF) sequence represents a line break (see [MIME-IMT]
   and [MSG-FMT]).

   Text/Plain is usually displayed as preformatted text, often in a
   fixed font.  That is, the characters start at the left margin of the
   display window, and advance to the right until a CRLF sequence is
   seen, at which point a new line is started, again at the left margin.
   When a line length exceeds the display window, some clients will wrap
   the line, while others invoke a horizontal scroll bar.

   Text which meets this description is defined by this memo as "fixed".

Suggested names based on this: PreformattedPlainTextOutput, MonospacedPlainTextOutput.


Conclusion: pick one of these names:

  1. StructuredPlainTextOutput
  2. MailPlainTextOutput
  3. PlainTextMailOutput
  4. PreformattedPlainTextOutput
  5. MonospacedPlainTextOutput

I think PlainTextMailOutput is the best one.

jhedstrom’s picture

Status: Needs review » Reviewed & tested by the community
StatusFileSize
new678 bytes
new5.61 KB

Discussed with @stefan.r and @alexpott, and decided the @todo could be removed entirely, and that this is rtbc assuming it goes green.

wim leers’s picture

Eh, ok. Can we document here why?

I've transplanted my commment at #165 to #2573009-3: Provide a PlainTextFormattedOutput output strategy.

jhedstrom’s picture

re #167 the reasoning is that the new class isn't strictly needed as part of this fix, so an @todo is premature, but the new class can still be discussed in #2573009: Provide a PlainTextFormattedOutput output strategy.

alexpott’s picture

Status: Reviewed & tested by the community » Fixed

Thanks everyone - this looks a great solution. Committed 70bad3e and pushed to 8.0.x. Thanks!

  • alexpott committed 70bad3e on 8.0.x
    Issue #2509218 by plach, stefan.r, pwolanin, effulgentsia, dawehner,...

The last submitted patch, 154: safe_markup-contexts-2509218-153.patch, failed testing.

The last submitted patch, 161: 2509218-161.patch, failed testing.

The last submitted patch, 164: 2509218-163.patch, failed testing.

Status: Fixed » Needs work

The last submitted patch, 166: 2509218-165.patch, failed testing.

berdir’s picture

Status: Needs work » Fixed

Too slow old testbot, too slow.

plach’s picture

:)

sun’s picture

This change attempts to introduce a custom concept of "output strategies", not a utility. Why was the code added to Utility?

plach’s picture

@sun:

hey :)

My original patch was providing an OutputFormatter namespace, @pwolanin asked for a simplification of the approach and since both Html and SafeMarkup live in Utility, that felt like a good place also for PlainTextOutput.

Btw, I think it would useful if SafeMarkup implemented OutputStrategyInterface, providing a ::renderFromHtml() method implementation just casting its input to string. That would allow to pass it to methods receiving any output strategy as input.

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.