Problem/Motivation

Many calls to SafeMarkup::set() are a result of concatenating safe markup together, either with no glue or known safe glue (not coming from user input). Existing patterns / usecases:

  1. A determinate (but possibly conditional) series of translated strings, joined together with whitespace (spaces, break tags, paragraph tags) and sometimes other basic "text" markup like lists, sometimes with some conditional logic based on user permissions, etc. Commonly used in:
  2.  

  3. An indeterminate, imploded list of translated or sanitized strings that is a list of items. Example issues:

     

  4. An indeterminate, imploded list of strings that is not a list of items. Examples:

     

  5. Composed HTML at the element/tag level. Example issues:
  6. Composed HTML at the sub-tag level. Example issues:

     

  7. Other issues:

Currently there is a way to safely join an array of strings only in Twig templates via |safe_join which maps to twig_drupal_join_filter() in twig.engine. Also currently, there is no known way to replicate this directly in PHP for when you need strings other than using #inline_template and immediately rendering the inline template.

In earlier versions of the autoescape issue there was a SafeMarkup::implode() but after discussion there it was not committed. See the discussion starting around #1825952-235: Turn on twig autoescape by default.

Examples

The following examples could likely benefit from whatever we decide on.

Proposed resolution

One suggestion is

  1. Bring back SafeMarkup::implode() in the form used for |safe_join.
  2. Call it SafeMarkup::join() because it's simpler to type.
  3. Let twig_drupal_join_filter() use the new SafeMarkup::join() to avoid code duplication.

Remaining tasks

TBD

User interface changes

N/A

API changes

TBD

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

star-szr’s picture

Issue summary: View changes
joelpittet’s picture

Issue summary: View changes

Updating IS with proposal

xjm’s picture

Just quoting my comment from that issue so people see the context here:

@pwolanin, yeah, I've been debating that all morning, because it encourages people to do a more bad thing than the bad thing they're already doing. I've also been thinking that the implode() method encourages bad practice to begin with. For example, these usages:

+++ b/core/modules/node/src/Plugin/Search/NodeSearch.php
@@ -266,10 +267,12 @@ public function execute() {
-      $node->rendered .= ' ' . $this->moduleHandler->invoke('comment', 'node_update_index', array($node, $item->langcode));
+      $node->rendered = SafeMarkup::implode(' ', array(
+        drupal_render($build),
+        $this->moduleHandler->invoke('comment', 'node_update_index', array($node, $item->langcode)),
+      ));

+++ b/core/modules/rdf/rdf.module
@@ -474,8 +475,8 @@ function rdf_preprocess_comment(&$variables) {
-    $variables['created'] .= $created_metadata_markup;
-    $variables['submitted'] .= $created_metadata_markup;
+    $variables['created'] = SafeMarkup::implode('', array($variables['created'], $created_metadata_markup));
+    $variables['submitted'] = SafeMarkup::set($variables['submitted'] . $created_metadata_markup);

+++ b/core/modules/search/tests/modules/search_embedded_form/search_embedded_form.module
@@ -9,10 +9,12 @@
-  $variables['snippet'] .= drupal_render($form);
+  $variables['snippet'] = SafeMarkup::implode('', array($variables['snippet'] , drupal_render($form)));

+++ b/core/modules/system/tests/modules/batch_test/batch_test.callbacks.inc
@@ -94,7 +95,9 @@ function _batch_test_finished_helper($batch_id, $success, $results, $operations)
-  drupal_set_message(implode('<br>', $messages));
+  // The BR tag is safe as a delimiter.
+  SafeMarkup::set('<br>');
+  drupal_set_message(SafeMarkup::implode('<br>', $messages));

+++ b/core/modules/views/views.theme.inc
@@ -541,7 +542,8 @@ function template_preprocess_views_view_table(&$variables) {
-          $label .= drupal_render($tablesort_indicator);
+          $markup = drupal_render($tablesort_indicator);
+          $label = SafeMarkup::implode('', array($label, $markup));

@@ -632,7 +634,7 @@ function template_preprocess_views_view_table(&$variables) {
-          $field_output = '<' . $element_type . '>' . $field_output . '</' . $element_type . '>';
+          $field_output = SafeMarkup::implode('', array(SafeMarkup::set('<' . $element_type . '>'), $field_output, SafeMarkup::set('</' . $element_type . '>')));

seem questionable to me. And making lists of things should be done though whatever mechanism we use for theme_item_list() now. Edit: The last one is especially egregious, it's doing exactly what we are saying in the set() docs it shouldn't: Adding any (!) of the allowed HTML tags, both their opening and closing versions, to the safe markup list everywhere. This would be necessary to keep the markup from being escaped with or without SafeMarkup::implode(). What we should be doing instead is not setting the tags as safe, checking whether $field_output is already safe and check-plaining it if not, and then marking the whole string as safe without the implode. The DX is worse. But the DX should be worse, because we're doing it wrong.

I still think the correct solution is to refactor to remove early escaping and rendering wherever possible. This was also what @effulgentsia, @alexpott, and I discussed for SafeMarkup in general at DrupalCon Amsterdam (notes on SafeMarkup from that AMS meeting).

However, I recognize this would be a lot more work, and probably disruptive in some cases, and maybe it's too late depending on what exactly it would entail.

To me, the goal of removing the SafeMarkup::set() calls is not only just that it's for internal use and we don't want to set a bad/dangerous precedent. It's also to actually take the places in core we hacked around originally and fix them so we're leveraging Twig more rather than doing all this work twice. (Part of the original goal for enabling autoescaping was to remove overhead, but we acknowledged in the committed patch that we didn't get there and that it would need to be followup work.)

xjm’s picture

Title: Come up with a solution for imploding/concatenating SafeMarkup in a loop » Determine how to update code that currently joins strings in SafeMarkup::set()

Retitling to focus less on a specific solution. I'm -1 on the proposed resolution still.

xjm’s picture

One key difference between SafeMarkup and Twig is that Twig is only escaping strings right there where they are printed. SafeMarkup is adding them to a giant memory hog of a static array with strings within strings all saved separately, and the string that's marked safe once is marked safe for everywhere. See also #2295823: Ensure that we don't store excessive lists of safe strings and #2488538: Add SafeMarkup::remove() to free memory from marked strings when they're printed (the summary documents an actual memory problem on the core permissions page already).

effulgentsia’s picture

I still think the correct solution is to refactor to remove early escaping and rendering wherever possible. However, I recognize this would be a lot more work, and probably disruptive in some cases, and maybe it's too late depending on what exactly it would entail.

From a cursory look at the IS examples, I think such a refactor (to not require SafeMarkup::join() or inline_template) is quite doable for all but the first one. Not sure yet about that first one though. So perhaps we should get through the ones that we can first and then see what's left and whether to solve them with this issue's proposal or something else?

tetranz’s picture

I'm kind of new here so you're all way ahead of me but I came here after looking at https://www.drupal.org/node/2501971

If we were to bring back SafeFormat::implode, would something like this be the way to go?

function implode(array $values, $delimiter)
{
    $tokens = array();
    $keyedValues = array();
    $idx = 0;

    foreach($values as $value) {
        $token = 'token' . $idx;
        $keyedValues[$token] = $value;
        $tokens[] = $token;
        $idx++;
    }
    return SafeMarkup::format(implode($delimiter, $tokens), $keyedValues);
}

I don't think that's quite the same as the earlier uncommitted code.

I can see how it would be better if it at all possible to get the list all the way to Twig as an array and let Twig handle it.

joelpittet’s picture

Issue summary: View changes

Adding another one that would benefit from this helper method in PHP. #2501971: Remove SafeMarkup::set in FieldStorageConfigListBuilder

The biggest reason is that it's using #type table. and the particular cells aren't easily modified in the template and it's trying to build a comma separated list of the pieces.

joelpittet’s picture

@tetranz sorry you said the same thing... so +1 to that:)

star-szr’s picture

Issue summary: View changes

Adding another example. @effulgentsia I would love to know more!

joelpittet’s picture

Issue summary: View changes
joelpittet’s picture

Issue summary: View changes

Removing template_preprocess_views_ui_view_info because I'm moving that to the template with |safe_join.

joelpittet’s picture

chrisfromredfin’s picture

OK, just spitballing here. Not sure if this helps at all or not, but what about a helper method that does the implode via an inline Twig template?

SafeMarkup::join($collection, $glue) {
  $render = [
    '#type' => 'inline_template',
    '#template' => '{{ collection|safe_join("'.$glue.'" }}',
    '#context' => array(
      'collection' => $collection,
    ),
  ];

  return drupal_render($render);
}

...with that said, quite frankly, this seems pretty dumb. I have no idea why this is would be an advantage at all, but since I'm not 100% sure what I'm doing, hopefully this might at least spark something. :)

Another thought I had is that maybe we have a limited number of use-cases here, and maybe we could have some sort of glue whitelist? That might mean something like SafeMarkup::joinComma(), SafeMarkup::simpleConcat(), and SafeMarkup::joinBreak()... or maybe it's something like:

define(ALLOWED_GLUE, ['<br>', "\n", "", "\t", ", "]);

SafeMarkup::join($collection, $glue) {
  if (!in_array($glue, ALLOWED_GLUE)) { 
    return FALSE;  // or $glue = '';   ???
  }
  ...
}

We're talking about DX here, so the question is... do we default to trusting our developers, or no? If we should just trust them, then we need to just have even more faith in our code review process, to make sure things like

SafeMarkup::set('<' . $element_type . '>')

...don't make it in.

Final thought - if all can be refactored except the first one why can't #2501937: Remove SafeMarkup::set in ViewsUIController::reportFields() (the first one) be refactored as an item list? Perhaps as an inline twig template? It's a list of views that implement a plugin (and the other SafeMarkup::set() that appears in this file is a similar list). So I tend to lean on the xjm side, if these can all be refactored, why not? (Potential answers being it's either too hard, or we're refactoring into something that's even lamer than the proposed safe join.)

joelpittet’s picture

Issue summary: View changes
joelpittet’s picture

Issue summary: View changes

Removing the template_preprocess_file_widget_multiple() because I think I have a solution here:
#2502781: Remove SafeMarkup::set in template_preprocess_file_widget_multiple()

xjm’s picture

So, it occurs to me: a comma-joined string is a special case of an item list. I've overridden item list theming in the past to make things comma-separated. See #2502095: Remove SafeMarkup::set in template_preprocess_views_ui_view_info() which joins the items in the template, and also my latest comment on #2501827: Remove SafeMarkup::set() in file_save_upload() and allow render/template code to control a single-item list about the item listing there.

This brings us back to the issue that certain dedicated HTML-string-printing messaging things (like drupal_set_message(), hook_requirements(), probably hook_help(), etc.) should potentially support render arrays, and the tradeoff there for needing to add a dependency on the renderer and needing to have a fallback for non-HTML output at least in tools like drush.

cilefen’s picture

xjm’s picture

Issue summary: View changes

I had a long discussion with @alexpott yesterday. There are actually a number of different patterns where markup is being joined. This list has the same intent as #2280965: [meta] Remove every SafeMarkup::set() call, but the categories I've identified are actually different:

  1. A determinate (but possibly conditional) series of translated strings, joined together with whitespace (spaces, break tags, paragraph tags) and sometimes other basic "text" markup like lists, sometimes with some conditional logic based on user permissions, etc. Commonly used in:
  2. An indeterminate, imploded list of translated or sanitized strings that is a list of items. Example issues:
  3. An indeterminate, imploded list of strings that is not a list of items.
  4. Composed HTML at the element/tag level. Example issues:
  5. Composed HTML at the sub-tag level. Example issues:

There are also a whole mess of different possible implementations, which I'll post more on later.

xjm’s picture

Issue summary: View changes

Categorizing the issues in the summary previously into the categories I've defined above.

xjm’s picture

Issue summary: View changes
xjm’s picture

So, note that #2501667: Remove SafeMarkup::set in __toString() in Attribute class was in the original summary as an example of an issue that could use this proposed join() method, and that's actually an example of exactly the sort of misuse of the API I think join() encourages, and why @pwolanin and I removed it in the original #1825952: Turn on twig autoescape by default. It encourages polluting the string list with partial strings and partial tags that there is no need to print independently. The pollution of the string list is not a trivial problem; by coming up with a different solution for Attribute in #2505701: Document SafeMarkup::set and Use htmlspecialchars() directly in Attribute() so we don't bloat the list of safe strings, @pwolanin helped us remove 15 MB (!) from the SimpleTest UI, according to @alexpott's testing.

join() also can be misused for some really bad practices, producing malformed or unsafe HTML when assembled (when combined with use of set() or format() that it seems to encourage). Again, here's a couple examples that were in the original SafeMarkup patch:

-  drupal_set_message(implode('<br>', $messages));
+  // The BR tag is safe as a delimiter.
+  SafeMarkup::set('<br>');
+  drupal_set_message(SafeMarkup::implode('<br>', $messages));
-          $field_output = '<' . $element_type . '>' . $field_output . '</' . $element_type . '>';
+          $field_output = SafeMarkup::implode('', array(SafeMarkup::set('<' . $element_type . '>'), $field_output, SafeMarkup::set('</' . $element_type . '>')));

This gives a false sense of security and can lead to adding XSS to the safe list.

By the way, I also have similar concerns about using SafeMarkup::format() for assembly of HTML elements from incomplete parts. Consider this example from patches in #2502089: Remove SafeMarkup::set() in template_preprocess_views_view_table():

+++ b/core/modules/views/views.theme.inc
@@ -576,7 +575,7 @@ function template_preprocess_views_view_table(&$variables) {
-          $field_output = SafeMarkup::set('<' . $element_type . '>' .  SafeMarkup::escape($field_output) . '</' . $element_type . '>');
+          $field_output = SafeMarkup::format('<@element_type>@field_output</@element_type>', ['@element_type' => $element_type, '@field_output' => $field_output]);

The code by itself gives a false sense of security. In the context in Views this is safe (or at least covered by a privileged permission) but suppose someone managed to pass this input:

$element_type = "script";
$field_output = "var foo = String(/Haha sucker/); foo = foo.substring(1, foo.length-1); alert(foo);";

Boom. Plus the same thing about polluting the string list.

So, in general, I think we need to be very careful about what API we provide and what examples we provide of using it.

xjm’s picture

xjm’s picture

xjm’s picture

Alright, so, separately from the different categories of joining strings. Here are strategies that have been used in patches to replace SafeMarkup::set($some_joined_strings) (mostly just the fixed issues for now; I'm going through the open ones still):

Edits: adding additional issues.

hass’s picture

Could someone give me a hint why http://cgit.drupalcode.org/google_analytics/tree/src/Form/GoogleAnalytic... shows now <div class="description"> and how I should fix it now, please?

cilefen’s picture

@hass I answered you on the cross-post #2280965-95: [meta] Remove every SafeMarkup::set() call. Wrapping in Safemarkup::format() should do fine here.

(edited a typo)

cilefen’s picture

Issue summary: View changes
FileSize
72.99 KB

@hass

hass’s picture

What is better with SafeMarkup::format() than SafeMarkup::set()? set is shorter and seems working the same way...

cilefen’s picture

@hass #2280965: [meta] Remove every SafeMarkup::set() call. It is considered an internal function.

cilefen’s picture

All:

I went with SafeMarkup::format() on #2531652: Fix double escaping of domain_mode radios. But, was it done properly? I think I should have used tokens. Despite having attended a contribution sprint on this very topic, I am perpetually confused by it and I think a lot of contrib authors are or will be.

hass’s picture

cilefen’s picture

Re #32, that one should be good because the user-entered strings passed through t().

lauriii’s picture

cilefen’s picture

@lauriii Then we are in trouble for sure.

lauriii’s picture

lauriii’s picture

Is there anything I could do to try to move this forward? This is blocking few issues that are blocking Critical meta from getting finished

akalata’s picture

Issue summary: View changes
akalata’s picture

Starting to review and see if there's a consensus buried in here somewhere...

akalata’s picture

Issue tags: -blocker

from xjm @ MWDS: this issue should no longer be listed as a blocker for anything

xjm’s picture

Status: Active » Closed (duplicate)

All the SafeMarkup::set() calls have been removed from core, so this was resolved through those issues. I'm working on updating https://www.drupal.org/node/2296163 with the kinds of solutions used, but we can call this issue a duplicate now. Thanks everyone!