[meta] !placeholder causes strings to be escaped and makes the sanitization API harder to understand [#2506427]

Update 2015-September-15

A hangout was had by a number of key stakeholders; here were the findings:

No objections were raised about the three items decided by the core committers prior to the call:
- We will not ship D8 with !placeholder continuing to behave as it does in HEAD (sometimes causing the string to be marked unsafe if !placeholder is not already independently safe).
- We will not restore the D7 behavior of !placeholder, given the different expectations D8 vs. D7 developers will have regarding auto-handling of security.
- We will not change !placeholder to automatically Xss::adminFilter() the whole string nor Xss::adminFilter(!placeholder) by itself.
(various issues) On the question of deprecating !placeholder or removing it, consensus was to remove it entirely (critical issue). We can write a smoke test (don't have to commit it) that checks all drupalGet()/drupalPost() calls for unexpected escaping bugs. Announce ! is deprecated ASAP, update change records/docs.
#2557113: Make t() return a TranslationWrapper object to remove reliance on a static, unpredictable safe list On the question of making t() return a TranslationWrapper object, consensus was to do it (critical issue). We can add a temporary BC layer that t() only returns TW in cases where HEAD adds it to the SafeMarkup safe list. Until we finish ! placeholder removal, the behavior will be unchanged for those cases where ! would have caused the string to be marked unsafe. In the TW issue, add this temporary BC layer, and create a postponed follow-up to remove the BC layer once ! is gone. Make sure the change record documents things like the #alt / $attributes case, the optgroup case, impacts on assertIdentical() for objects, etc.; anything we had to do to get tests passing.
#2565895: Add a new :placeholder to SafeMarkup::format() for URLs that handles bad protocols On question of providing a :url placeholder for URLs, consensus was to do it (critical issue). Increases surface area of the API, but for good reason, because URLs are their own string format and XSS vector, and it also removes the need to understand the different URL methods when creating a t() string. We do need to make sure documentation is very thorough, including what the cases to use :placeholder are (which attributes etc. to use it with), and what the cases that are NOT covered by :placeholder nor @placeholder are (other attributes/internal parts of tags). Keep ":" as the placeholder.
A number of other sanitization context concerns were raised by catch, including:
Consensus was that none of those problems are critical, they are all major hardening that could be done before, during, or after RC. However, catch would like to investigate esp. the token sanitization one a bit more before making that a hard call.

Problem/Motivation

Following #2399037: String::format() marks a resulting string as safe even when passed an unsafe passthrough argument, using a !placeholder in a formatted string essentially guarantees that the entire string--not just the placeholder--will be double-escaped when rendered.

The change was introduced based on the reasoning that since !placeholder can include anything, it should not be considered "safe" because no escaping or filtering has been done.

There are a few problems with this, however. In Drupal 7, the documented behavior of !placeholder for format_string() is:

Inserted as is, with no sanitization or formatting. Only use this for text that has already been prepared for HTML display (for example, user-supplied text that has already been run through check_plain() previously, or is expected to contain some limited HTML tags and has already been run through filter_xss() previously).

Even if a developer were using !placeholder safely and correctly when no user input is possible, in Drupal 8, the use of !placeholder will still result in the entire string being re-escaped. Further work to reduce the side effects of the SafeMarkup "safe list" including the decoupling of Xss:filter() from SafeMarkup and the upcoming removal of SafeMarkup::checkPlain() will expand this side effect to even more strings. This unexpected and frustrating behavior makes Drupal 8's sanitization APIs harder to understand and leads to developers doing strange and unsafe things or abusing other APIs to circumvent the problem.

In Drupal 8, the SafeMarkup::format() documentation for !placeholder has been updated to this:

Inserted as is, with no sanitization or formatting. Only use this when the resulting string is being generated for one of:

Non-HTML usage, such as a plain-text email.

Non-direct HTML output, such as a plain-text variable that will be printed as an HTML attribute value and therefore formatted with self::checkPlain() as part of that.

Some other special reason for suppressing sanitization.

However, other than the first bullet about non-HTML usage, using !placeholder as described here will still not "suppress sanitization" but, again, will actually guarantee that the string is escaped again when printed on the page. Furthermore, if other placeholders in the same string were sanitized with @placeholder instead, they are guaranteed to be double-escaped--changing the expected behavior of both as well as for any other HTML in the string, even static, known-safe HTML that the developer defined in the first parameter of t().

Finally, even following the change record for the original change, this still confuses even very experienced D8 developers every time, and the documentation does not actually describe most of the hundreds upon hundreds of !placeholder in core.

Proposed resolution

So far, the committers have agreed the following:

!placeholder will not continue to behave as it does in HEAD (sometimes causing the string to be marked unsafe if !placeholder is not already independently safe).
We will not restore the D7 behavior.
We will not change it to automatically Xss::adminFilter() the whole string nor Xss::adminFilter(!placeholder) by itself

Many different approaches have been discussed to mitigate this problem. Each has benefits and downsides. Some are complementary and can be done together.

~~Revert the change that skips marking strings containing !placeholder as safe~~

We have ruled out this option

Pros
- When #2399037: String::format() marks a resulting string as safe even when passed an unsafe passthrough argument was first committed, it introduced many escaping bugs to core, and there were requests to roll back the change immediately afterward.
- Core and contrib developers, even top Drupal 8 contributors, remain perplexed by the current behavior.
- The change was essentially a BC break for !placeholder, but it was not obvious or clear to most contributors from the change record. Also, subtly changing the behavior of a magically-named array key pattern for an optional argument of a function is a very hard BC break to spot.
- While it's true that !placeholder strings are not guaranteed to be "safe", many other APIs also bypass Twig autoescape in varying ways--e.g., even the Full HTML input filter. "Safe" is actually a misnomer, as it's really an API for determining which string should be printed as-is instead of handled by Twig. This is especially true now that we've reduced SafeMarkup significantly (see #2549943: [plan] Remove as much of the SafeMarkup class's methods as possible) and it is mostly just an internal API for handling the escaping of SafeMarkup::format().
- The previous behavior was after all "how it always worked" and the risks were documented for developers that chose to use it.
Cons
- Previous versions of Drupal didn't have Twig autoescape, and therefore didn't have the expectation that output should be "safe(r) by default".
- We have done a lot of work to remove ways for bypassing Twig autoescape (e.g. the removal of SafeMarkup::set()) but t() is very much a public API.
Replace !placeholder with @placeholder wherever possible.

Issues
- #2564353: Remove !placeholder usage from SafeMarkup::format()
- #2506445: Replace !placeholder with @placeholder in t() and format_string() for non-URLs in tests
Pros
- Many uses of !placeholder are arguably misuses, especially if the intent is that it should only be for non-HTML output (which the current documentation suggests).
- The current behavior of !placeholder is only subtly different from @placeholder
Cons
- Breaks a lot of strings.
- Lots of work.
- By itself, simply escaping a string does not always guarantee safeness (e.g. URLs), so these placeholders may provide a false sense of security.
- May hide other problems.
- More short strings in the side-effect-ridden SafeMarkup safe list (though TBH a small % additional vs. HEAD)
Deprecate !placeholder

Pros
- Same reasoning as above for converting to @placeholder.
- The current behavior of !placeholder is only subtly different from @placeholder, so it doesn't really have a clear/independent usecase as-is.
- Discourages misuse of !placeholder to output raw HTML via t() instead of properly adding it to (e.g.) a Twig template.
Cons
- We can't legitimately deprecate something that core uses hundreds of times.
- A hard BC break may be better (though, as mentioned above, it's hard to have a hard BC break for magically named array key patterns in an optional function parameter).
~~XSS admin filter for !placeholders not already marked safe~~

We have ruled out this option

Issues
- #2558791: "!"-prefixed tokens should Xss::filterAdmin() but not affect safeness
Pros
- Parallel to #markup
- Much softer BC break
- Eliminate lingering double-escape bugs and the worst of the DX WTF
- No translations harmed in the making of this patch
Cons
- Conflates escaping and filtering (which is the source of many bugs and much confusion)
- Extra filter calls for something where the developer has presumably tried to say the markup should be printed as-is
- If only the !placeholder is filtered, this may or may not actually be sufficient for XSS filtering. Xss::filterAdmin() is documented for use on "fields", but !placeholder is sometimes used for t() inside HTML attributes, especially for URLs.
- If the whole string is filtered, it may have unexpected side effects, like filtering out intentional/legitimate markup (e.g. a mailto link or other intentional output), as well as be confusing or unpredictable with the results of other @placeholder also being XSS filtered again as part of the whole. (That said, this is arguably not as bad as the current side effect of escaping those same @placeholder a second time.)
Add a different *placeholder for "non-HTML output only"

Issues

Pros
- Makes it clear that the behavior is different from !placeholder in D7 and earlier.
Cons
- Limited usecases.
- API bloat (more options == more confusing)
Convert !placeholder to do URL sanitization

Issues
- #2565895: Add a new :placeholder to SafeMarkup::format() for URLs that handles bad protocols
Pros
- Simply escaping is insufficient for URL sanitization, and people often get this wrong, but it is a very common usecase for translatable content.
- This is the majority of the uses in core. URLs that are escaped once via URL generation presumably need to not be escaped again.
- Breaks fewer strings.
Cons
- This is another insidious subtle BC break with all the inherent problems.
- Still very confusing for D7 and previous where !placeholder meant "raw"/"as-is".
- Does not cover other usecases like non-HTML output, other HTML that should be printed as-is, other tag attributes, translation for legit mailto links
Add a different *placeholder for URL sanitization

Issues
- #2565895: Add a new :placeholder to SafeMarkup::format() for URLs that handles bad protocols
Pros
- As above
- Avoids the "subtle BC break" problem
Cons
- Breaks more strings
- API bloat
- Does not cover other usecases like non-HTML output, other HTML that should be printed as-is, other tag attributes, translation for legit mailto links
Change t() to return a Translation wrapper, eliminating pollution of the safe list

Issues
- #2557113: Make t() return a TranslationWrapper object to remove reliance on a static, unpredictable safe list
- #2566447: Cast safe string objects to string or use assertEqual() in some assertIdentical() comparisons
Pros
- Reduces the possibility of obscure/unpredictable bugs or XSS vectors due to the potential pollution of the safe list with unsanitized !placeholder
- Provides the possibility of late escaping, so that t() can behave more like Twig autoescape, which will improve DX and reduce bugs from mixing escaping strategies, as well as make #2509218: Ensure that SafeString objects can be used in non-HTML contexts simpler
- Allows future iteration
- Many other benefits for other SafeMarkup issues
Cons
- Disruptive
- Does not by itself prevent !placeholder from being a potential XSS vector in the string itself.

Remaining tasks

Overall, we have decided that while breaking lots of strings close to RC is non-ideal, core setting a confusing example is worse, and if we are going to break them, the sooner we do, the better. For that reason, it makes sense to (carefully) remove any clearly incorrect or unnecessary uses of !placeholder as soon as possible, since this minimizes the problem.

Therefore, we recommend starting with these issues, in this order:

Other issues should probably be postponed until we have consensus on the best course to take.

User interface changes

Ideally, fewer completely unexpected double-escaping bugs.

API changes

Depends on the resolution chosen.

Comments

Comment #1

joelpittet

he/him

English

Vancouver

commented 15 June 2015 at 18:05

I'm +1 on this issue. But one thing just itches at me that may be an missunderstanding on my part. I think one of the use-cases that @effulgentsia mentioned that may still be valid is strings that are not going to HTML.

Examples: non-HTML emails, JSON, CLI.

There may be a better approach for these all but I'd like to know more about the indented use of ! for those cases.

Comment #2

xjm

she/her

English

commented 15 June 2015 at 18:12

@joelpittet, I think CLI output is not core's responsibility and should just be using a striptags/br2nl/html-to-string strategy. We could optionally provide API in core but I don't see it as a blocker. (Same goes for the render arrays for hook_requirements() etc.)

For emails I'd almost prefer to provide a completely separate API that has nothing to do with SafeMarkup. Because it's not markup; it's just string formatting. There are cases where the same string is used in an email and hook_requirements(), though, which is part of why I haven't committed #2501683: Remove SafeMarkup::set in _update_message_text() yet for example.

Comment #3

joelpittet

he/him

English

Vancouver

commented 15 June 2015 at 18:40

If we can avoid putting markup in strings in the first place and not early rendering in the render system we may have a chance at providing CLI or email specific rendered output. #dreamsbig

Comment #4

xjm

she/her

English

commented 15 June 2015 at 18:45

@joelpittet, big dreams indeed. :) I think templates, inline templates, and render arrays are the closest thing we have in core to supporting said big dream -- unfortunately I don't think it's in the cards for 8.0.x.

I started on #2506445: Replace !placeholder with @placeholder in t() and format_string() for non-URLs in tests.

Comment #5

effulgentsia commented 19 June 2015 at 23:50

Deprecate !, possibly by making it just a copy of @.

I started on #2509218: Ensure that SafeString objects can be used in non-HTML contexts.

For emails I'd almost prefer to provide a completely separate API that has nothing to do with SafeMarkup.

I agree, but t() is still our API for translation, whether we're translating for HTML or plain text. But I'll post an idea on how to handle that in that issue.

Comment #6

star-szr

he/him

English

commented 23 June 2015 at 10:05

Note: #2409477: Various ! placeholders throughout core are now showing escaped HTML (especially <a href="!..."> examples) exists. I sense some overlap.

Comment #7

subhojit777

he/him

Bengali

commented 27 June 2015 at 09:46

Status:	Active	» Closed (duplicate)
Related issues:		+#2506445: Replace !placeholder with @placeholder in t() and format_string() for non-URLs in tests

Closed on behalf of #2506445: Replace !placeholder with @placeholder in t() and format_string() for non-URLs in tests

Comment #8

subhojit777

he/him

Bengali

commented 27 June 2015 at 09:49

Status:	Closed (duplicate)	» Active
Related issues:	-#2506445: Replace !placeholder with @placeholder in t() and format_string() for non-URLs in tests

I am sorry later I found that this issue is parent of #2506445: Replace !placeholder with @placeholder in t() and format_string() for non-URLs in tests. I am reopening this issue.

Comment #9

xjm

she/her

English

commented 28 August 2015 at 09:31

Comment #10

xjm

she/her

English

commented 28 August 2015 at 09:31

Title:

Remove !placeholder from formatted strings wherever possible and deprecate it

» [meta] Remove !placeholder from formatted strings wherever possible and deprecate it

Comment #11

justachris commented 29 August 2015 at 02:06

Status update on #2506445: Replace !placeholder with @placeholder in t() and format_string() for non-URLs in tests:
Based on advice from from @xjm and conversations with @joelpittet, the large patch that was created there has been broken into smaller pieces. Comment #93 has the suggested approach, and is how we are starting.
Children issues addressing those listed modules have been created, and they appear as children of this issue. Not on that list, #2551743: Replace !placeholder references in Config with @placeholder is being re-purposed since ConfigImporter & tests is a definable chunk.
We should aim to manually test each one of the new patches, which is something originally missing from the conglomerated patch. Also, because the patches are largely coming from the bigger patch, we need to make sure each patch stays in scope for the individual issue.
At the suggestion of @joelpittet, #2506445: Replace !placeholder with @placeholder in t() and format_string() for non-URLs in tests should only deal with leftovers that are too tedious to chunk out individually, and thus has been left open. The patch there should be considerably smaller than the original.

Comment #12

alexpott

he/they

English

🇪🇺🌍

commented 2 September 2015 at 12:29

So before we start committing all of these I think we really have to consider whether it is worth it... see #2560783-16: Replace !placeholder with :placeholder for URLs in hook_help() implementations and the fact that #2558791: "!"-prefixed tokens should Xss::filterAdmin() but not affect safeness will resolve many of the issues with !

Comment #13

andypost

he/him

Russian

commented 2 September 2015 at 13:14

I'd prefer leave hook_help() out of scope because re-translate help strings is serious pita

Comment #14

joelpittet

he/him

English

Vancouver

commented 2 September 2015 at 14:19

@andypost It's find and replace ! with @. And this whole issue will do that. Where does this happen, maybe it can be automated?

Comment #15

catch

he/him

English

commented 7 September 2015 at 12:36

Status:

Active

» Postponed

After talking to @alexpott I think this issue and its children are postponed on #2558791: "!"-prefixed tokens should Xss::filterAdmin() but not affect safeness.

Comment #16

alexpott

he/they

English

🇪🇺🌍

commented 7 September 2015 at 12:38

Where ! is being used to prevent double escaping (even though it won't) we need to be really careful and add tests - see #2564321: file_save_htaccess() generates error logs which are escaped incorrectly

Comment #17

alexpott

he/they

English

🇪🇺🌍

commented 7 September 2015 at 13:26

Created #2564353: Remove !placeholder usage from SafeMarkup::format() to handle some fixes that we should do regardless of #2558791: "!"-prefixed tokens should Xss::filterAdmin() but not affect safeness and that removes all ! from SafeMarkup::format() in the code base.

Comment #18

xjm

she/her

English

commented 9 September 2015 at 18:26

Title:	[meta] Remove !placeholder from formatted strings wherever possible and deprecate it	» [meta] !placeholder causes strings to be escaped and makes the sanitization API harder to understand
Assigned:	Unassigned	» xjm
Priority:	Major	» Critical
Status:	Postponed	» Active

We (@catch, @alexpott, @xjm, @stefan.r, @dawehner) discussed this this morning agreed this morning that we need to step back and come with an overall plan, because there are several overlapping or conflicting approaches.

All children of this issue (with the broader scope) should be postponed, except for patches that specifically remove !placeholder use where it is not needed or incorrect. This means breaking translations, but we agreed that since we are not in string freeze yet, the disruption to translations is something we need to accept in order to resolve the broader sanitization concerns.

I'm working on writing an updated summary that encompasses all of the different concerns.

Comment #19

xjm

she/her

English

commented 9 September 2015 at 22:15

Issue summary:

View changes

Okay, massive summary update complete enough to share with the class. There may be some errors or stuff missing. I know a lot of people are following this so if you want to comment on https://docs.google.com/document/d/1bYPrq3LKndSRLVb-hwQBNPCvnXdFZj1Fio2t... we can make updates without crossposting all over each other's toes. :)

My personal feeling after spending days researching only this and months thinking about it is still that we must revert the change that causes !placeholder to exclude the string from the safe list and make the documentation about it clearer and stronger and full of stern warnings. Anything else is just too confusing and risky. To me, that's the critical part of this issue.

We can then continue with fixing other parts of it, like removing misues of !placeholder, possibly deprecating it, figuring out what our best practice really is for URLs, etc.

Comment #20

xjm

she/her

English

commented 9 September 2015 at 22:18

Issue summary:

View changes

Comment #21

xjm

she/her

English

commented 9 September 2015 at 22:49

Issue summary:

View changes

(Incorporating some edits in response to feedback from @effulgentsia).

Comment #22

xjm

she/her

English

commented 9 September 2015 at 22:53

Issue summary:

View changes

Comment #23

stefan.r commented 9 September 2015 at 22:53

though, as mentioned above, it's hard to have a hard BC break for magically named array key patterns in an optional function parameter

Couldn't we have that hard BC break through an extra check inside the placeholder processing code with an assert/trigger_error/exception in case any !placeholders are still used? Any code analysis would then just have to flag all usages of array keys starting with "!"

Comment #24

xjm

she/her

English

commented 9 September 2015 at 22:54

Issue summary:

View changes

Comment #25

xjm

she/her

English

commented 10 September 2015 at 00:22

Assigned:

xjm

» Unassigned

Couldn't we have that hard BC break through an extra check inside the placeholder processing code with an assert/trigger_error/exception in case any !placeholders are still used? Any code analysis would then just have to flag all usages of array keys starting with "!"

Yeah, that would seem a better course of action if we go the route of changing what the types of placeholders do.

Comment #26

joelpittet

he/him

English

Vancouver

commented 10 September 2015 at 01:21

Re proposal 1) in the Issue Summary

Revert the change that skips marking strings containing !placeholder as safe

Is still my preferred solution. I appreciate the intent behind the change had great intentions of ensuring safety everywhere in SafeMarkup::format() but as mentioned in the "Pros", it forces people to look for creative solutions to put raw data into the template.

In contrib we've seen SafeMarkup::set() used when the output had hard-coded assets(devel module) came from a third party tool (IIRC kint). And the same or another contrib module while it was being deprecated used SafeMarkup::format() to accomplish the same thing and even when we forced the second parameter to be an array tried committing SafeMarkup::format('haha still safe sucker', []). Personally I'd be using SafeString::create() even though it's internal or make my own and extend SafeStringInterface.

Also to add to my appreciation for the issue that removed !placeholder safeness, it has given us a good hard look(though laborious) at how we are using these placeholders, found bugs in core and re-evaluate why we are using !placeholder when we don't need to.

What this would mean, should we go this route, is that contrib will likely abuse !placeholder some more and the security team will have the same work they did in D7 for that, but I don't think us "slowing people down" will prevent that sec task as people will find workarounds which will still be on the sec team plate. IMO

Comment #27

xjm

she/her

English

commented 10 September 2015 at 01:29

Another possible issue that I have not included in the summary is "late escaping" for t(), which might eliminate the need for a separate "non-HTML output" placeholder. Currently, t()/SafeMarkup::format() sanitization happens immediately, whereas Twig's sanitization happens at render time. If escaping for @placeholder were deferred until render time, then non-HTML output could expect the same behavior from t() as it did from the theme layer. That would require either t() returning an object, which I believe alexpott found very difficult to implement at this point in the release cycle. There's an issue for this somewhere too.

@effulgentsia also had an earlier issue for a different proposal (can't remember what exactly) that I haven't been able to find.

Comment #28

joelpittet

he/him

English

Vancouver

commented 10 September 2015 at 01:42

re #27 I still have hopes for this issue too. (look at the green children!)
#2557113: Make t() return a TranslationWrapper object to remove reliance on a static, unpredictable safe list

Comment #29

effulgentsia commented 10 September 2015 at 06:10

@effulgentsia also had an earlier issue for a different proposal

I think you mean #2509218-4: Ensure that SafeString objects can be used in non-HTML contexts. The patch in that comment makes ! behave exactly like @:

If $options['html'] isn't FALSE, then both would checkPlain() if not already safe and the result would be marked safe.
If $options['html'] is FALSE, then neither would perform any escaping and the result would not be marked safe.

At which point, ! could be marked deprecated, since it would just be an artifact from the days when auto-escaping didn't exist so we needed callers to clarify when to escape and when not to. I still believe that to be a better option than this issue's proposed resolution #1 (but not necessarily better than some of the other options).

Looking through some of the examples in #2558791-89: "!"-prefixed tokens should Xss::filterAdmin() but not affect safeness, the only cases I see where that would not give you the desired result is where a !foo value is set to a PHP concatenation of safe variables (i.e., using the . operator). Can someone here point out an example from any of those files (D7 or D8) that contradicts this claim? And if I'm right about that, then do we really want to do the proposed resolution #1 and reopen a very common XSS vector just to reintroduce BC for that concatenation pattern from D7?

Comment #30

stefan.r commented 10 September 2015 at 07:54

Re #27 isn't that what @alexpott proposed in #2509218-8: Ensure that SafeString objects can be used in non-HTML contexts as well?

As to #2557113: Make t() return a TranslationWrapper object to remove reliance on a static, unpredictable safe list I still think this is possible to implement right after RC or even before if we raise the priority or get some more attention to that issue, it's just blocked on the children and on finding a resolution to this current issue. As soon as these are solved we'll be very close to having something reviewable and green there.

I believe @alexpott's opinion about the feasibility of that issue for 8.0.0 changed as soon as we had a green patch there 11 days ago which was not as complicated as expected (and the required changes for contrib also seem to be small). Also the majority of the effort has already been done in the child issues.

Comment #31

alexpott

he/they

English

🇪🇺🌍

commented 10 September 2015 at 08:31

Some thoughts on the options:

Option 1 I don't think SafeMarkup::format() should become a way to bypass sanitization and filtering. We have twig templates and SafeString objects for that. Note that bypassing is not the same using t() or SafeMarkup::format() for non-HTML output.
Option 2 I think we should do this as a matter of course. I think HEAD is full of bad examples of !placeholder and potential XSS - doing thins will fix that.
Option 3 Is a good idea as long as we have a URL placeholder - because we need to provide a way to safely add URLs to string - it is a super common use case.
Option 4 I think this is a complete no-go for two reasons. XSS filtering is not free - it does 4 preg_replace() before it's even started to chop up the string. XSS filtering the entire string will make changes to things over that the !placeholder.
Option 5 I think the solution of making t() and SafeMarkup::format() return an object is way better because then we can just add a method that returns the unescaped string. The other massive advantage of doing that is the removal of the entire safe list which will make whether some is santized at the Twig level completely predictable and knowable by debugging or dsm()'ing.
Option 6 I disagree about this being insidious especially as we could mitigate with an assert to tell the developer that this is now for URLs and doing this would make less work for translators. BUT a new placeholder is fine too and will make it easier to do the work now.
Option 7 if we don't do 6 we have do this.

Some other random thoughts:

We have to have a way of safely putting a URL in a t()'d string
We need to use t() in non-HTML contexts
SafeMarkup::format() should not be a way of munging HTML together

So given all of that my plan would be to do the following:

Add a new placeholder for URL - I suggest ~ because it is identifiable and on all keyboards and as far as I know has no meaning Drupal.
Make all URLs use ~placeholder
Make t() and SafeMarkup::format return objects
Use these objects to get strings for non-HTML
Remove !placeholder because then we don't have to decide what to do if it is present. If we don't want to do that we should make it do the same as @placeholder

Comment #32

catch

he/him

English

commented 10 September 2015 at 10:26

So I think #31 1-5 sounds like a good plan.

The only modification I'd make is I think we should consider deprecating !placeholder (either for 8.0.x with the intention to remove during RC, or for 9.0.x) rather than completely removing it. It could stay in core with exactly the same behaviour as the new URL placeholder just to give contrib/custom code a bit more time to update. I'm not tied to this idea though, if people really want to do a hard break that's OK with me too. I'd also be fine with changing the meaning of !placeholder to be for URLs as an even softer break. But I can see that completely doing away with it forces us to get to an actually consistent state in core very quickly and a avoids a lot of ah um about whether we actually got the conversions done or not.

Comment #33

sutharsan commented 10 September 2015 at 14:06

JustAChris in #11:

We should aim to manually test each one of the new patches, which is something originally missing from the conglomerated patch.

in any of the 'Replace !placeholder' issues:

Manually test the update and post screen shot after patch, review source for any difference in escaping.

What screenshots?
I suggest only those places where the replacement content can be modified, for example by configuration text or translation. Example code: $this->t('@name field is required.', array('@name' => $elements['#title']))) No screenshots required where the replacement content is a URL or the string is part of a test.

review source
Assume that with "review source", you mean the HTML output.

Comment #34

justachris commented 10 September 2015 at 14:50

@Sutharsan: The direction of this plan has changed since comment #11, thus most of the child issues created at that point shouldn't be worked on right now while the new plan is determined. I'm sorry that I failed to marked those children issues as postponed, will do that now.

Comment #35

effulgentsia commented 10 September 2015 at 15:47

We have to have a way of safely putting a URL in a t()'d string

Why is this a need? We don't have a way to do so either in Twig or elsewhere in PHP. For example, aggregator-item.html.twig prints out <a href="{{ url }}">{{ title }}</a>, which means we are required to (and do) have a template_preprocess_aggregator_item() that calls $variables['url'] = UrlHelper::stripDangerousProtocols($item->getLink());. Similarly, all code that sets a user-input URL on a $attributes value must also explicitly protocol filter: e.g., template_preprocess_form() calls $element['#attributes']['action'] = UrlHelper::stripDangerousProtocols($element['#action']);. So, why do we feel that t() must implement protocol filtering rather than making it the caller's responsibility? Or in other words, why must t() be more auto-URL-protocol-safe than Twig and $attributes are?

If it truly is a need though, what if we make both @ and ! have the same behavior (per #29), but give them both some additionally smart behavior. I.e., parse t()'s first argument for if a placeholder is being assigned into a URI attribute value (potentially defining that the same way that Xss::attributes() does), and if so, then protocol filter it. Though I would then ask: if we do this, should we also put similar intelligence into Attribute::createAttributeValue() and into Twig compilation?

Comment #36

catch

he/him

English

commented 10 September 2015 at 16:10

So, why do we feel that t() must implement protocol filtering rather than making it the caller's responsibility?

So for me, I'd say that we automatically check for external URLs to avoid open redirects now per #2507831: Harden redirect responses to make external URIs opt in (was SA-CORE-2015-002 foward-port).

And we automatically apply escaping via Twig.

So attribute filtering at this point feels like the exception rather than the rule.

Also Xss::filter() does apply protocol filtering to attributes, the specific problem we have with t() and the filterXssAdmin() issue is that the URL is used as an attribute but does not run through attribute filtering even if you use @. It's easy to get wrong.

Or in other words, why must t() be more auto-URL-protocol-safe than Twig and $attributes are?

I think we should consider adding it to https://api.drupal.org/api/drupal/core%21lib%21Drupal%21Core%21Template%... and similar too, yes.

Comment #37

catch

he/him

English

commented 10 September 2015 at 16:11

If it truly is a need though, what if we make both @ and ! have the same behavior (per #29), but give them both some additionally smart behavior. I.e., parse t()'s first argument for if a placeholder is being assigned into a URI attribute value (potentially defining that the same way that Xss::attributes() does), and if so, then protocol filter it.

Is there a reason to do this vs. just protocol filtering everything?

Comment #38

xjm

she/her

English

commented 10 September 2015 at 16:44

So I remain concerned that while URLs are a significant usecase for !placeholder (at least for what remains in core at this point), they're not the only one, so we should not focus only on that.

So far I can think of the following usecases that are not covered by @placeholder:

URLs not from user input that do not need to be sanitized, but have already been generated via the URL generator and so should not be escaped a second time
URLs from user input that have already been sanitized including stripping bad protocols, and so should not be escaped a second time
Placeholders for non-HTML output e.g. emails, etc., so e.g. ' should not be converted to an HTML entity.
Embedding HTML that does not need to be sanitized (and therefore has not been marked "safe") in another string (e.g. a hook_requirements() message, or the thing devel was doing) (this usecase should probably be discouraged)
Embedding HTML that does not not need to be sanitized (and therefore has not been marked "safe") in another string with context for translators

Each of these may have a separate solution (including "Don't use a !placeholder for that; use this other strategy that's already in core") but I want to make sure all of them are covered in the summary. Can anyone think of other usecases that are not covered by this?

Comment #39

stefan.r commented 10 September 2015 at 16:52

@xjm that seems to cover it mostly for contrib as well:

https://www.drupal.org/files/issues/d7-contrib-exclamationmark_0.txt
https://www.drupal.org/files/issues/d8-contrib-exclamationmark_0.txt

Comment #40

effulgentsia commented 10 September 2015 at 16:56

Is there a reason to do this vs. just protocol filtering everything?

Yes, we don't want to strip words that precede colons in title (and some other) attributes and non-attribute text. Such as:

$title = 'Llama: a domesticated South American camelid';
t('More<span class="visually-hidden"> posts about @title</span>', ['@title' => $title]);
t('<a href="@url" title="@title">More info</a> about ...', ['@title' => $title, ...]);

Comment #41

effulgentsia commented 10 September 2015 at 18:06

URLs not from user input that do not need to be sanitized, but have already been generated via the URL generator and so should not be escaped a second time

The URL generator doesn't "escape" (in the sense of Html::escape()) a first time, so there's no "second" time to speak of. The URL generator does stripDangerousProtocols() in some cases, so if we want t() to protocol strip in some cases, then we'd have to address the question of whether it's ok to stripDangerousProtocols() twice. I think it's ok functionally, but perhaps feels wasteful.

URLs from user input that have already been sanitized including stripping bad protocols, and so should not be escaped a second time

In Drupal 7, this was done via check_url(), so if we continued to support check_url() in D8, I would say this should be handled by making check_url() return a SafeString. But, check_url() is now deprecated in D8, so I don't know what the specific use case of already Html::escape()'d-but-not-a-SafeString is. Anyone have a concrete example?

Placeholders for non-HTML output

I think either $options['html'] = FALSE or t() returning an object with different methods for html and non-html output is superior to continuing to do this with a placeholder prefix.

Embedding HTML that does not need to be sanitized

Why does it not need to be sanitized? Because it doesn't use any <script> tags or other questionable markup? Or because it does, but the caller trusts it (e.g., something returned by krumo or another trusted library / data source)? I think the former can be done like this:

t('Some HTML follows: @html', ['@html' => drupal_render(['#markup' => $script_free_html])]);

And the latter perhaps like this if we introduce a '#raw' or '#verbatim' render API support?

t('Some HTML follows: @html', ['@html' => drupal_render(['#raw' => $html_with_trusted_javascript])]);

Comment #42

joelpittet

he/him

English

Vancouver

commented 10 September 2015 at 19:37

@effulgentsia re #35 I tend to agree with your here about letting the caller filter their own potentially bad URLs and I think maybe we are over-engineering this !url problem. The only real problem I see is we have 300K patch that mostly does these urls for hook_help() and it's likely we won't wrap each argument in UrlHelper::stripDangerousProtocols() BUT at the same time those are not variables, they are hard coded URLs with 0 chance of being vulnerabilities. So yes I'm onboard with letting the caller do it.

@catch re #36 and @xjm #38 The biggest problem with this auto-magical filtering I believe is performance as @alexpott mentioned regarding preg_replace() in #31 Yet as mentioned in #38 it has other side effects of previously escaped or filtered, but half of that problem is performance too.
Maybe we can do some kind of hybrid of parsing t() mentioned in #35 in an assert() to alert developers, but we'd still need to know if the value was previously sanitized so that may be tricky...

@stefan.r++ thanks for raw data to parse:)

Comment #43

pwolanin commented 11 September 2015 at 02:18

I think we should go for the quickest win here - and ideally somethign that gives us a result similar to Drupal 7, so #1 from the issue summary. Just pass it through.

We can work to remove use of these placeholders as we wrap up 8.0.x, but let's not try to invent a whole new system at this point.

Is there an issue started for that resolution?

Comment #44

lauriii

he/him

Finnish

Finland

commented 11 September 2015 at 11:44

I'm just wondering if we should organize a call to discuss about this? I'm +1 for #31, however given the tight schedule we might want to consider, if it would be possible to separate the critical parts & BC breaking parts to the post RC phase. We could keep improving rest of it during the RC phase. I'm happy to work on this but right now I feel like this is kind of stuck.

Comment #45

dawehner

German

commented 11 September 2015 at 11:47

Yeah I think we should really do #7, it adds more semantic meaning, is the cleanest solution for many of those conversions
and also important help the developer. We discussed about that on the critical call and decided to start with #2565895: Add a new :placeholder to SafeMarkup::format() for URLs that handles bad protocols

Comment #46

gábor hojtsy

he/him

Hungarian

Hungary

commented 11 September 2015 at 15:46

One more thing I wanted to add is as we discussed this on the criticals call, IF we can automate the updates to the localize.drupal.org strings (reupload fixed strings as suggestions for the teams). That is possible to do after RC1.

Comment #47

dawehner

German

commented 11 September 2015 at 15:59

One more thing I wanted to add is as we discussed this on the criticals call, IF we can automate the updates to the localize.drupal.org strings (reupload fixed strings as suggestions for the teams). That is possible to do after RC1.

While talking with catch later we also thought a bit about also updating the strings on the Drupal installations itself using hook_update_N()/hook_post_update_NAME()
but I'm not entirely sure at that point, whether its needed given that we can sync translations ... do we care about custom made translations?

Comment #48

gábor hojtsy

he/him

Hungarian

Hungary

commented 11 September 2015 at 16:03

Yeah if we want to keep custom translations we do need update functions as well yes. Which is one more good reason to have an easy scriptable change pattern.

Comment #49

webchick

she/they

English

Vancouver 🇨🇦

commented 11 September 2015 at 17:49

Went over this with @xjm, @effulgentsia, @pwolanin, @Gábor Hojtsy and a few others. I believe these are the actionable parts of this now for people who are looking to get this polished off sooner than later:

One way or another, we want to get rid of as many instances of ! in strings as possible before string freeze. That's at #2506445: Replace !placeholder with @placeholder in t() and format_string() for non-URLs in tests
#2509218: Ensure that SafeString objects can be used in non-HTML contexts would allow a BC layer in place while we work through what we ultimately want to do
#2557113: Make t() return a TranslationWrapper object to remove reliance on a static, unpredictable safe list Sorry, I forget why this is good, but alexpott wants it, so. :)
#2545972: Remove all code usages SafeMarkup::checkPlain() and rely more on Twig autoescaping This is another big chunk out of the big meta issue to remove SafeMarkup methods and leads to more secure code.
#2565895: Add a new :placeholder to SafeMarkup::format() for URLs that handles bad protocols This one doesn't have consensus yet, so is a bit risky to work on, but having at least a patch here might make other issues earlier, not sure.

Comment #50

dawehner

German

commented 11 September 2015 at 22:36

Went over this with @xjm, @effulgentsia, @pwolanin, @Gábor Hojtsy and a few others. I believe these are the actionable parts of this now for people who are looking to get this polished off sooner than later:

While I totally agree with every dedicated step you listed and its advantages, this list doesn't make it clear which part of those are actually critical,
and which parts needs / can be resolved in order to resolve this particular critical issue. Did you talked about those points?

One point for me which is critical is that we announce when the string freeze actually happens so translators from > 100 languages can actually rely on it or at least
partially can.

Comment #51

effulgentsia commented 12 September 2015 at 09:08

I'll give a more detailed answer to #50 this weekend, but for now, I'll give the short answer of: removing all core usages of !placeholder is very likely Critical. That means #49.1 and #49.5 will likely need to be promoted to Critical. #49.3 and #49.4 would also be very, very good to do, even if they end up not being strictly Critical, and there's a chance they will be. #49.2 might be moot once #49.3 is done: I'll work on an issue summary for that issue to clarify that this weekend.

Comment #52

webchick

she/they

English

Vancouver 🇨🇦

commented 12 September 2015 at 22:59

Just a note that at xjm's request, I've sent around a Doodle to most of the people involved in this and various sub-issues to try and get alignment on a direction on Mon/Tues.

Comment #53

xjm

she/her

English

commented 13 September 2015 at 01:28

Just a note that I've tentatively promoted #2557113: Make t() return a TranslationWrapper object to remove reliance on a static, unpredictable safe list to critical, because of its combined impact on this issue and many other parts of the problem space.

Aside from that one, #2506445: Replace !placeholder with @placeholder in t() and format_string() for non-URLs in tests is still the most straightforward to move forward at this point. Note that it is just the first step; #2566503: [meta] Replace remaining !placeholder for Non-URL HTML outputs only is the second and there will probably be a few more split off.

Comment #54

catch

he/him

English

commented 13 September 2015 at 14:50

While reviewing this I found #2567741: Attribute/drupal_attributes() docs do not mention protocol filtering on URLs and opened an issue to fix the currently incorrect docs on those functions.

Also opened #2567743: Add protocol filtering to the core Attribute component which would make the current docs accurate. Although I think we should work on both independently - the docs one will be quicker.

sutharsan commented 14 September 2015 at 07:26

Issue summary:	View changes
Issue tags:		+Needs issue summary update

Comment #59

xjm

she/her

English

commented 14 September 2015 at 14:30

Issue summary:	View changes
Issue tags:	-Needs issue summary update

Reverting that summary update in #58, because it deleted a lot of info without explanation.

Comment #60

sutharsan commented 14 September 2015 at 14:50

Oops, I guess I added the tag to an old page (prior to your #57 update).

Comment #61

sutharsan commented 14 September 2015 at 15:07

Issue summary:

View changes

Comment #62

alexpott

he/they

English

🇪🇺🌍

commented 14 September 2015 at 15:29

Issue summary:

View changes

Comment #63

xjm

she/her

English

commented 14 September 2015 at 16:02

Issue summary:

View changes

The committers discussed this issue further today and we've agreed on three points:

!placeholder will not continue to behave as it does in HEAD (sometimes causing the string to be marked unsafe if !placeholder is not already independently safe).
We will not restore the D7 behavior.
We will not change it to automatically Xss::adminFilter() the whole string nor Xss::adminFilter(!placeholder) by itself

Updating the summary to indicate this, but keeping the lists of pros and cons to document why the decisions were made.

Comment #64

xjm

she/her

English

commented 14 September 2015 at 18:52

Issue summary:

View changes

(Updating summary for readability and colorz.)

Comment #65

xjm

she/her

English

commented 14 September 2015 at 19:00

Issue summary:

View changes

Status update:

#2506445: Replace !placeholder with @placeholder in t() and format_string() for non-URLs in tests needs a reroll
#2568013: Replace !placeholder in DisplayTest I have suggested a small update
#2567857: Remove !placeholder in NodeRevisionUiTest and make it actually test the expected output needs review

Comment #66

xjm

she/her

English

commented 15 September 2015 at 00:11

The first set of !placeholder removals (for tests) are mostly in now!

In #2566503-10: [meta] Replace remaining !placeholder for Non-URL HTML outputs only, I've recommended nine targeted issues to file for some non-test usages. Let's get those filed, and then they can proceed in parallel.

Comment #67

webchick

she/they

English

Vancouver 🇨🇦

commented 15 September 2015 at 18:31

Issue summary:

View changes

Ooookay.

Had a massive hangout with the following people:

Alex Bronstein (effulgentsia)
Nathaniel Catchpole (catch)
Lauri Eskola (lauriii)
Jess M (xjm)
Joël Pittet (joelpittet)
Alex Pott (alexpott)
Scott Reeves (Cottser)
Erik Stielstra (Sutharsan)
Daniel Wehner (dawehner)
Peter Wolanin (pwolanin)

(If I missed anyone, sorry, I was taking notes, not paying attention to the hangout ;))

On said hangout we discussed a number of questions:

Are we cool with the stuff in #63 that has already been decided?
Should we merely deprecate !placeholder, or should we remove it? And is doing whatever we do there critical?
Should we make t() return a TranslationWrapper, and if so, is it critical?
Should we introduce a :placeholder for URLs, and if so, is it critical?
Are the various sanitization context issues that catch has found hardening, or release-blocking?

Here's the summary from the call. Also adding this to the issue summary:

No objections were raised about the three items decided by the core committers prior to the call:
- We will not ship D8 with !placeholder continuing to behave as it does in HEAD (sometimes causing the string to be marked unsafe if !placeholder is not already independently safe).
- We will not restore the D7 behavior of !placeholder, given the different expectations D8 vs. D7 developers will have regarding auto-handling of security.
- We will not change !placeholder to automatically Xss::adminFilter() the whole string nor Xss::adminFilter(!placeholder) by itself.
(various issues) On the question of deprecating !placeholder or removing it, consensus was to remove it entirely (critical issue). We can write a smoke test (don't have to commit it) that checks all drupalGet()/drupalPost() calls for unexpected escaping bugs. Announce ! is deprecated ASAP, update change records/docs.
#2557113: Make t() return a TranslationWrapper object to remove reliance on a static, unpredictable safe list On the question of making t() return a TranslationWrapper object, consensus was to do it (critical issue). We can add a temporary BC layer that t() only returns TW in cases where HEAD adds it to the SafeMarkup safe list. Until we finish ! placeholder removal, the behavior will be unchanged for those cases where ! would have caused the string to be marked unsafe. In the TW issue, add this temporary BC layer, and create a postponed follow-up to remove the BC layer once ! is gone. Make sure the change record documents things like the #alt / $attributes case, the optgroup case, impacts on assertIdentical() for objects, etc.; anything we had to do to get tests passing.
#2565895: Add a new :placeholder to SafeMarkup::format() for URLs that handles bad protocols On question of providing a :url placeholder for URLs, consensus was to do it (critical issue). Increases surface area of the API, but for good reason, because URLs are their own string format and XSS vector, and it also removes the need to understand the different URL methods when creating a t() string. We do need to make sure documentation is very thorough, including what the cases to use :placeholder are (which attributes etc. to use it with), and what the cases that are NOT covered by :placeholder nor @placeholder are (other attributes/internal parts of tags). Keep ":" as the placeholder.
A number of other sanitization context concerns were raised by catch, including:
Consensus was that none of those problems are critical, they are all major hardening that could be done before, during, or after RC. However, catch would like to investigate esp. the token sanitization one a bit more before making that a hard call.

Comment #68

dawehner

German

commented 16 September 2015 at 17:09

Issue summary:

View changes

Added a smoke test issue.

Comment #69

sutharsan commented 17 September 2015 at 09:42

I've created #2570093: Replace !placeholder with @placeholder where needed in JavaScript as we did not yet cover the !placeholder in JavaScript.

Comment #70

xjm

she/her

English

commented 27 September 2015 at 12:56

Outstanding tasks:

One usage in Migrate's Sql.php on line 509
The usage in #2575599: Remove !placeholder in ContactPersonalTest and Drupal\migrate\Plugin\migrate\id_map\Sql that got rolled into #2571695: Remove !placeholder and unsafe string return from SafeMarkup::format() atm to make tests pass there.
We've agreed informally that removing the placeholder in JS is not critical, but AFAIK this isn't discussed anywhere at all in the issue queue. So we should file a major plan for that. One related issue I know of is #2567727: LocaleJavascriptTranslationTest needs test coverage for @placeholders in JavaScript.
#2571695: Remove !placeholder and unsafe string return from SafeMarkup::format() is the final step.

Other than that, there are the related discussions around attributes still, but those are not in scope here. So we can downgrade this issue once we've addressed those four points.

Comment #71

xjm

she/her

English

commented 27 September 2015 at 13:05

@jsmith said he will take on points 1 & 2 above.

Someone else could help with the other issue in point 3.

Comment #72

xjm

she/her

English

commented 27 September 2015 at 14:46

Here is the (already existing) issue for point 3: #2570093: Replace !placeholder with @placeholder where needed in JavaScript / #2570101: Remove !placeholder support from Drupal.formatString

I'll update their parent so they show up here.

Comment #73

xjm

she/her

English

commented 27 September 2015 at 14:51

Priority:

Critical

» Major

Given that #2571695: Remove !placeholder and unsafe string return from SafeMarkup::format() and #2575599: Remove !placeholder in ContactPersonalTest and Drupal\migrate\Plugin\migrate\id_map\Sql are both independently marked as critical, and everything else in the scope of this issue is now covered by either fixed criticals or by majors, downgrading to major!

Comment #74

xjm

she/her

5 October 2015 at 21:54

Version:

8.6.x-dev

» 8.8.x-dev

Drupal 8.6.x will not receive any further development aside from security fixes. Bug reports should be targeted against the 8.8.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.9.x-dev branch. For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

Comment #83

5 October 2015 at 21:54

Version:

8.8.x-dev

» 8.9.x-dev

Drupal 8.8.7 was released on June 3, 2020 and is the final full bugfix release for the Drupal 8.8.x series. Drupal 8.8.x will not receive any further development aside from security fixes. Sites should prepare to update to Drupal 8.9.0 or Drupal 9.0.0 for ongoing support.

Bug reports should be targeted against the 8.9.x-dev branch from now on, and new development or disruptive changes should be targeted against the 9.1.x-dev branch. For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

Comment #84

5 October 2015 at 21:54

Version:

8.9.x-dev

» 9.2.x-dev

Drupal 8 is end-of-life as of November 17, 2021. There will not be further changes made to Drupal 8. Bugfixes are now made to the 9.3.x and higher branches only. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Comment #85

5 October 2015 at 21:54

Version:

9.2.x-dev

» 9.3.x-dev

Comment #86

avpaderno

he/him

Italian

Brescia, 🇮🇹 🇪🇺

commented 16 April 2022 at 17:41

Version:

9.3.x-dev

» 9.4.x-dev

Comment #87

catch

he/him

English

commented 16 April 2022 at 20:30

Status:

Active

» Fixed

All of the issues here are either closed already, or JavaScript issues which are fine existing in their own right. Closing this out.

Comment #88

30 April 2022 at 20:34

Status:

Fixed

» Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.

[meta] !placeholder causes strings to be escaped and makes the sanitization API harder to understand

Update 2015-September-15

Problem/Motivation

Proposed resolution

Revert the change that skips marking strings containing !placeholder as safe

Pros

Cons

Replace !placeholder with @placeholder wherever possible.

Issues

Pros

Cons

Deprecate !placeholder

Pros

Cons

XSS admin filter for !placeholders not already marked safe

Issues

Pros

Cons

Add a different *placeholder for "non-HTML output only"

Issues

Pros

Cons

Convert !placeholder to do URL sanitization

Issues

Pros

Cons

Add a different *placeholder for URL sanitization

Issues

Pros

Cons

Change t() to return a Translation wrapper, eliminating pollution of the safe list

Issues

Pros

Cons

Remaining tasks

User interface changes

API changes

Comments

Revert the change that skips marking strings containing `!placeholder` as safe

Replace `!placeholder` with `@placeholder` wherever possible.