Problem/Motivation
Drupal aims to eliminate a class of security bugs by distinguishing 3 classes of text:
- Text that is already valid safe markup is represented by an object implementing
MarkupInterface
. - A string containing plain text that is automatically escaped when passed to the template system.
- A string containing unsafe markup that should not be passed to the template system as it would be escaped. Instead can use
Xss::filter()
orXss::filterAdmin()
.
It's currently hard to understand how to use Token::replace()
correctly in these 3 cases:
- The comment on the return string instructs the caller to rely on Twig autoescaping. However the return string is markup so this is wrong and will lead to double escaping.
- The return string is unsafe, even if the input was safe - this is potentially surprising and should be made clearer.
- The correct code for escaping tokens in plain text is awkward, requiring 3 nested function calls and leading to unnecessary conversions. This is easily forgotten, and is even wrong in core, see #3264453: Incorrect usage of Token::replace() can corrupt plain text that resembles HTML
This issue has been classed as a bug because the incorrect documentation and difficult usage led to bugged code in core and contrib.
Proposed resolution
- Fix comments to Token::replace().
- Create a new function
replacePlain()
.
Remaining tasks
Deferred to a separate issue: #3264453: Incorrect usage of Token::replace() can corrupt plain text that resembles HTML
User interface changes
None
API changes
See "proposed resolution" and change record.
Data model changes
None
Comment from original issue
@Berdir #2567257-210: hook_tokens() $sanitize option incompatible with Html sanitisation requirements.
Posting this here for now, will probably open a follow-up issue.
I've been working on updating token.module and adjusting functionality/tests for this. Which is probably something we should have done before committing this, to make sure that this works for more than the few use cases that core has.
I think there is at least one example there why supporting some sort of sanitize => FALSE is useful and important.
The basic use case is when you have user-provided, unsafe input and want it to be continue unsafe and un-escaped, because you then rely on autoescape.
One example in token.module is the block label, it has this code:
function token_block_view_alter(&$build, BlockPluginInterface $block) { $config = $block->getConfiguration(); $label = $config['label']; if ($label != '<none>') { // The label is automatically escaped, avoid escaping it twice. $build['#configuration']['label'] = \Drupal::token()->replace($label, array(), array('sanitize' => FALSE)); } }
The problem is that now the block label tokens are escaped twice. There's a test that is creating a node with a ' in it, and right now, that is getting escaped twice (which is exactly what this code is testing), since we force-escape all token return values and then escape the whole string again.
I don't see a proper way to fix this right now. What technically works is using PlainTextOutput::renderFromHtml() but clearly it is not correct to use that in non-plaintext output.
We don't have to pass it to hook implementations, but I really think we need a flag to prevent auto-escaping. We even document:
The caller is responsible for choosing the right escaping / sanitization
but don't actually allow to caller to do that, at least not for token values. But if the token input is untrusted and will be escaped later, we must treat token replacements as untrusted too or we are guaranteed to have double-escaping problems?
(See also the following comments about some discussion and why various things don't work IMHO)
Comment | File | Size | Author |
---|---|---|---|
#147 | token-plain-text-2580723-interdiff-141-147.txt | 1.68 KB | AdamPS |
#147 | token-plain-text-2580723-147.patch | 6.47 KB | AdamPS |
#141 | token-plain-text-2580723-141.patch | 6.47 KB | AdamPS |
#133 | token-plain-text-2580723-interdiff-122-133.txt | 2.45 KB | AdamPS |
#133 | token-plain-text-2580723-133.patch | 21.27 KB | AdamPS |
Comments
Comment #2
BerdirExample patch, this fixes token_block_view_alter().
Comment #3
BerdirAnother name for that option could be plain_text (or plain-text or plaintext or plain) => TRUE/FALSE wher the default would be FALSE.
Comment #4
dawehnerDid you considered to use the don't encode me twice option on
htmlspecialchars()
I really try to figure out why we can't just use
!empty($options['escape']
Comment #5
catchWhat happens to Safestrings?
Comment #6
Berdir#4: Because it was late ;)
#5. Hm, yes, right now, they'd get escaped again. And they wouldn't with escape + renderFromHtml() But I'm not sure what you'd expect to happen if you use a token that prints out markup, like a formatted node body for a block label? Should that really work? It certainly works the same way right now in D7, if you have sanitize FALSE and HTML in your body, then it will be escaped in the block label, so it's certainly not a regression in any way.
Maybe that's something that a better name for the option could clarify, as I said in #3, something with plaintext or userinput in it. Then that might make it clearer that it's not just about esaping or not but that it's really about treating the text and the tokens as unsafe user input that must and will be escaped later if printed in a HTML context.
Comment #7
Berdir#5: Actually they would as well with renderFromHtml() because the result of that isn't safe, so it'd be escaped as well. Really no difference and no way to make safe strings work in plain text token replacements as far as I see.
Comment #8
catchRight, so the problem here is that we'd be mixing strings that are ready-for-HTML (safe string) and strings which are ready-for-escaping (plain text) into a single string replaced as part of the $text argument, with no knowledge of which is which
It's a problem in D7 too, and was a problem that was brought up in the original token issue before it was committed (but was eventually ignored), and we've just spent about a month fixing it. renderFromHtml() will strip tags as well as entity decoding, so you don't get 'double escaped HTML entities' when you use it. It might not be perfect for all use cases, but it at least does what it says it will.
The tokens are either plain text (to be escaped) or HTML, they are not uniformly safe or unsafe.
Comment #9
BerdirDiscussed this a bit with @catch. Maybe we want a separate method to do plain text only replacements and convert all token replacements to plain text?
Comment #12
BerdirI'm running into this problem over and over again.
The only solution that worked as expected for page_manager is this:
See #2668980: Certain characters in token page titles are double-encoded
Instead of not escaping anymore, we treat the input as unsafe, then escape both the text and the token replacements consistently and then treat the resulting string as safe markup, to prevent double escaping. That was the *only* way to avoid double-escaping consistently when having for example a & in the node title with [node:title] or just putting a & directly in the page variant title.
I suppose the core issue could check if the input is safe markup, then use that as-is and otherwise escape.
Especially if the input is safe markup, then I suppose you could still end up with e.g. tokens in attributes being problematic and unsafe, but honestly, I have no idea how you could make that safe. And it's certainly better than the API's we offer now, that result in double-escaping and other problems for everyone that tries to use tokens and put the output on an HTML page.
Comment #14
andypostYep, this issue becomes more annoying because page templates escapes again, so sometimes you need to hack like:
Comment #19
AdamPS CreditAttribution: AdamPS at AlbanyWeb commentedI agree the assumption to always run Html::escape is definitely too restrictive. For module RRSSB the output of the tokens is used in a URL, so the correct filter function is rawurlencode.
A workaround is possible like this:
I'm not sure the patch form #9 is the way to go - it seems to duplicate the whole function hard-coding a different filter function
PlainTextOutput::renderFromHtml
. Developers will need different filter functions in different cases and surely we don't want to copy the whole function again and again. It seems better to take the patch from #2 subject to comments in #4 which is also more consistent with D7.In that case you pass a callback function that does the escaping of each token.
However the original case raised by @Berdir suggests that in other cases, the whole return string is ready-for-escaping user-text and will be escaped later, perhaps by TWIG.
Comment #20
AdamPS CreditAttribution: AdamPS at AlbanyWeb commentedOK I've had a go at updating the IS based on my experience and the other comments. I feel this can reasonably be viewed as a major bug, because it affects the whole token system and, except where developer have written hacks like #14, there will be double-escaping.
Core experts please tell me if I've got it all wrong:-)
Comment #21
AdamPS CreditAttribution: AdamPS at AlbanyWeb commentedComment #22
AdamPS CreditAttribution: AdamPS at AlbanyWeb commentedOK I think this needs input from a core committer / expert before it can go further.
1) I belatedly notice that the comment for $text does clearly state the the caller must pass HTML text, so that's not a bug. It does seem a bit inconsistent and unhelpful though - why not escape $text automatically too?
2) The return string has a comment like this
That doesn't seem true. The text has already been escaped and the caller mustn't escape it again. Hence the return value should be converted to
MarkupInterface
.3) And the comment continues like this
That procedure will fail to generate the correct string in many cases. A much closer reversal of Html::escape is to call htmlspecialchars_decode, but even that is not perfect. We need a way to use tokens directly on plain text - either a parameter on the existing method or a separate one.
Comment #23
AdamPS CreditAttribution: AdamPS at AlbanyWeb commentedI've had a go at a patch - comments welcome.
Sorry my interdiff tool fails, so I've uploaded a diff. In essence there are two main changes from #9
Comment #25
AdamPS CreditAttribution: AdamPS at AlbanyWeb commentedI suspect D8 has many double-escaping bugs relating to this issue. However many of them are not understood/noticed, because most of the time the text doesn't contain a quote.
I see double-escaping in user notification mails when they are formatted as HTML mails, e.g. by swiftmailer. This lead me to notice that in two places
user_mail()
is missing the required call toHtml::escape()
on the$text
parameter.I have updated the issue summary and created a new patch that implements option 1.
I removed
Token::replacePlainText()
from the patch for the moment because that seems secondary. Whilst it would be a convenience wrapper and better performance, it's not a bug. The current interface gives the correct result for plain text if used as the comments require:Comment #27
AdamPS CreditAttribution: AdamPS at AlbanyWeb commentedFrom the test failures, it seems that option 1 is not going to be possible because of BC. However the broken tests are generally highlighting broken code!
LinkFieldTest:
LinkFormatter::viewElements
passes the title (which is not HTML) without callingHtml::escape
. This line has a comment "Unsanitized token replacement here" but the comment forToken::replace()
doesn't allow such a usage. This code is exactly what #2567257: hook_tokens() $sanitize option incompatible with Html sanitisation requirements intended to avoid - it mixes unescaped $text with token replacements that might contain markup.LocaleStringIsSafeTest: fails because the input strings are encoded HTML but not
MarkupInterface
. That's not generally correct in D8 and will often lead to double-encoding. The strings are in test code so I guess it didn't matter.I added option 3: wait until D9 then we can make a non-BC change.
Comment #28
AdamPS CreditAttribution: AdamPS at AlbanyWeb commentedComment #29
matthiasm11 CreditAttribution: matthiasm11 at Randstad Digital commentedComment #31
bkosborneWe ran into this problem with the CAS Attributes module, which uses the Token utility service to perform token replacement on strings that are intended as field values for the user entity, and thus should have no sanitizing at all applied to them. Currently the module works around this by calling Html::decodeEntities() on the text that was passed thru the service to reverse it. Would be great to be able to tell Token utility to just leave the text alone, which I think is a use case described in the issue summary.
Comment #33
AdamPS CreditAttribution: AdamPS at AlbanyWeb commentedAdded an option 3
Comment #34
AdamPS CreditAttribution: AdamPS at AlbanyWeb commentedDarn, I am repeating myself. Option 3 was tried in patch #25, but it's wrong. There is a clear precedent for passing unsafe markup to
Token::replace()
so we need to allow for that.Comment #35
AdamPS CreditAttribution: AdamPS at AlbanyWeb commentedComment #36
AdamPS CreditAttribution: AdamPS at AlbanyWeb commentedComment #37
andypostIt looks very promising for page titles (is what I mean in #14), just needs to update tests to make sure that text is different object
it needs inline comment about why new object created.
Comment #39
AdamPS CreditAttribution: AdamPS at AlbanyWeb commentedComment #40
AdamPS CreditAttribution: AdamPS at AlbanyWeb commentedComment #41
andypostFix for #37
Comment #42
AdamPS CreditAttribution: AdamPS at AlbanyWeb commentedThanks for #41. I feel that this is ready for RTBC. Apart from that last one line change the patch came from me, so it feels better if you are the reviewer - OK with you?
@Berdir, if you are still following this one, it would be great to get clarification from you as raiser that this patch solves your original problem - thanks.
Comment #43
AdamPS CreditAttribution: AdamPS at AlbanyWeb commentedComment #44
AdamPS CreditAttribution: AdamPS at AlbanyWeb commentedUnblocking this one: AdamPS and andypost have both contributed to it and reviewed each others contribution so it seems fair to set RTBC.
Comment #45
darvanenI've run over the code and have no feedback, +1 RTBC.
I agree with the follow-up issues suggested in remaining tasks > proposed resolution.
Comment #46
darvanenI think this should go to a framework manager who might know more about possible flow-on effects since we're changing the output of the
replace()
method.How confident are we that "calling code does not ever need to be updated because of this patch" per the summary?
Comment #47
BerdirOh I forgot I created this issue originally ;)
You want to test this patch with modules like metatag and pathauto and feed them both actual HTML (e.g. in body field that you use as description metadata) and also special characters like a < and verify that the output remains unchainged. Various contrib modules are doing quite a dance right now with the token replace output to make things work. That might or might not be affected, but it is possible that it can be simplified once this issue happened and they can depend on a corresponding core version.
Comment #48
darvanenWell, here's the result for metatag using the Basic HTML filter in article (node) body from standard installation:
Value:
Output prior to patch:
<meta name="description" content="There's HTML in this body including strong, emphasis, & a link#." />
Output after patch:
<meta name="description" content="There's HTML in this body including strong, emphasis, & a link#." />
I'm not entirely certain what that tells us about this patch. That metatag doesn't require it?
Comment #49
larowlanAdding issue credits first up
Whilst I realise that this change is because the test is wrong, we were passing a safe string to token and it was being double-escaped, I think there would be plenty of projects out there that are relying on this incorrect behaviour as their last line of defense.
For that reason, I think we should go for the complete patch here. I.e adding a second method for
replaceMarkup
and adding the new logic there. We can typehint the text param with MarkupInterface.We can then add a deprecation warning in
replace
if someone passes in anything other than a string.This then allows us to typehint the text param in D10 with string.
I realise we're getting close to the end of the road for D9 deprecations, so we'd need to move fast on that. Feel free to ping me for followups.
Comment #50
darvanen#49: Agree. Thanks for putting your finger on what was making me uneasy about the current approach.
Comment #51
AdamPS CreditAttribution: AdamPS at AlbanyWeb commentedThank you all for good feedback. It seems we are all agreed - let's do this properly. I propose that it we are going down the route of deprecations and forcing people to change code then let's do it 100% properly.
New patch creates three new functions
replaceMarkup()
,replacePlain()
andreplaceUnsafe()
and deprecates the originalreplace()
function. See change record.This makes the new interface extremely clear and forces all calling code to examine their use case and pick the appropriate function. Type hinting string @param is worth doing, but note that PHP automatically converts
MarkupInterface
to string as required (I just ran a test to confirm).There will be lots of test fails however I would like to get feedback on the general approach before going further into it.
Comment #52
AdamPS CreditAttribution: AdamPS at AlbanyWeb commentedComment #53
AdamPS CreditAttribution: AdamPS at AlbanyWeb commentedComment #54
AdamPS CreditAttribution: AdamPS at AlbanyWeb commentedComment #55
AdamPS CreditAttribution: AdamPS at AlbanyWeb commentedQuick easy-win attempt to fix many of the failing tests - but certainly not all.
Comment #56
AdamPS CreditAttribution: AdamPS at AlbanyWeb commentedComment #58
AdamPS CreditAttribution: AdamPS at AlbanyWeb commentedComment #59
AdamPS CreditAttribution: AdamPS at AlbanyWeb commentedTo reviewers - summary of my direction and reasoning:
replaceMarkup()
.replacePlain()
as a convenience wrapper (was previously a follow on step).replaceUnsafe()
to give symmetry and interface clarity.replace()
entirely, rather than selectively detect some specific cases This should ensure that all developer calling the interface take a good look at their code and choose the correct new function.However it does make the patch much bigger. And in some cases it was a little tricky to be sure which of the new functions to call, i.e. what was contained in $text.
Comment #61
AdamPS CreditAttribution: AdamPS at AlbanyWeb commentedComment #63
larowlanI think deprecating ::replace is too disruptive.
As above, I would prefer we only emitted a deprecation error if the passed argument is an object, pointing people to the new replaceMarkup
::replace can just proxy to ::replacePlain or ::replaceMarkup (with the trigger_error)
That way for people using it correctly there's no disruption. As you point out, the casting to string means that we can't type-hint the text argument in D10, but we can throw a Type error - https://3v4l.org/qCJsV
Forcing all the correct calls to ::replace to instead use ::replacePlain feels like a lot of busy-work - what do others think?
Comment #64
AdamPS CreditAttribution: AdamPS at AlbanyWeb commentedThanks @larowlan yes I think you are right. Here is a new patch which is simpler and feels good.
With this patch
Token::replace()
is for use on unsafe markup only. We still have 3 separate functions for the 3 cases. However by keeping the name asreplace()
instead ofreplaceUnsafe()
it saves a lot of work.Comment #65
AdamPS CreditAttribution: AdamPS at AlbanyWeb commentedComment #67
AdamPS CreditAttribution: AdamPS at AlbanyWeb commentedThe deprecation seems to trigger all over the place - I don't yet understand why. Let's try without it.
Comment #68
AdamPS CreditAttribution: AdamPS at AlbanyWeb commentedIt seems that WidgetInterface->getFilteredDescription() sometimes returns MarkupInterface and sometimes string.
So I guess that calling
replace()
is OK provided that you treat the result as unsafe. It doesn't really matter if the input happened to be safe.replaceMarkup() should be called when you know the input is safe and want the result to be.
replacePlain() should be called to apply conversion to markup and back.
Comment #69
AdamPS CreditAttribution: AdamPS at AlbanyWeb commentedSame as #67 but with the commented out code removed.
Comment #70
AdamPS CreditAttribution: AdamPS at AlbanyWeb commentedComment #72
AdamPS CreditAttribution: AdamPS at AlbanyWeb commentedComment #73
larowlanThis is looking really good, much smaller change and far less disruption.
I think we should add @see links to the two new methods here.
I also think we're still missing the trigger_error when $unsafe is an object that implements MarkupInterface.
Comment #74
Berdirsupporting null here seems really strange. I understand that we supported it until now and there are probably edge cases calling it like that with a string or null, but IMHO, for the new methods, we should be more strict. At the very least with the markup object, why would you have a use case where you pass in NULL there? If you need to deal with null or empty string then you do that with what you pass to Markup::create()?
Comment #75
AdamPS CreditAttribution: AdamPS at AlbanyWeb commentedThanks for the comments. I'll load another patch soon when I have more time.
Please see #68. I was really attracted to adding that but it seems difficult without major disruption - there are common functions in core that return MarkupInterface|string, especially
FieldStorageDefinitionInterface::getDescription()
(even though the documentation says the return value is string|null).Yes I found it strange at first too. However when I thought more, it seemed similar to above. You are imagining the call to Token::replace() is preceded by a call to Markup::create() but often it won't be true as the markup comes from a function.
StylePluginBase::getField()
is documented to return MarkupInterface|null, and maybe other functions not explicitly documented. So either Token::replace*() handles null or lots of calling code has to check for themselves.Comment #76
AdamPS CreditAttribution: AdamPS at AlbanyWeb commentedOK I'll have one more try at the ideas in #73 and #74 to see if I can get them to work.
Comment #78
AdamPS CreditAttribution: AdamPS at AlbanyWeb commented#74 Done
#73 I'm afraid I don't see any safe way to add the deprecation
Comment #79
AdamPS CreditAttribution: AdamPS at AlbanyWeb commentedRe#73, to recap my thinking from #68:
replaceMarkup()
should be called when you know the input is safe and want the result to be.replacePlain()
should be called to apply conversion to markup and back.replace()
should be called when you treat the result as unsafe and perhaps it doesn't really matter if the input happened to be safe sometimes (no harm in filtering again).In particular the problems is that WidgetInterface->getFilteredDescription() sometimes returns MarkupInterface and sometimes string.
Comment #80
alexpottI've recently discovered this is not this simple.
For example if you but html into site name and site slogan this will cause the html tags in the site slogan to be removed and it'll cause the html in the site name to be escaped and then unescaped - resulting in the return looking like html!
Why is this here?
One thought is that in HEAD in Token::replace() we do:
This is the point where I think we need different behaviour depending on whether we want plain html free output or output intended from markup.
For the plain case we should be doing:
... and then not calling PlainTextOutput::renderFromHtml() on the final result. This would mean that HTML in the site title would be escaped and HTML in site slogan would be stripped. This is how it would appear in the browser so I think this is correct. It think it is possible to argue that we could do the following instead...
As this would result in more consistent plain output.
Comment #81
AdamPS CreditAttribution: AdamPS at AlbanyWeb commentedThanks for your comments @alexpott
2.
All the evidence I can see indicates that site name is a plain text config entry rather than an HTML one:
Therefore I would say this is exactly correct. If my site name is "Rock & Roll", "The HTML
I believe "the return looking like html" is how the code behaves now, and any change would be a BC-break.
3.
Because the first parameter to str_replace is not nullable, so passing NULL would trigger a deprecation error from PHP8.1 - see RFC.
However I agree it's not really part of this issue so I can take it out and allow someone to fix it separately if that is preferred.
I would say that for the plain text case, we could be doing:
I removed the call to HtmlEscapedText as it would create a bug with unwanted or double escaping - the return value of
replacePlain()
is plain text, so for example it would be escaped again in a TWIG template.The alternative code is likely slightly better performance, but I claim it gives exactly the same result. The original code converts the plain text token value to markup and then converts back to plain text. The optimised code doesn't convert at all.
Comment #82
alexpott@AdamPS if you put
<strong>Site</strong>
as your site name it will appear as escaped html on you site - ie you will see <strong>Site</strong> and in the HTML code it will be<strong>Site</strong>
. Currently if you doPlainTextOutput::renderFromHtml()
on a token with that site name then the eventual output will have the HTML<strong>Site</strong>
in it.This behaviour is not only surprising - it is a bit dangerous. Consider the following:
Comment #83
AdamPS CreditAttribution: AdamPS at AlbanyWeb commented@alexpott Thanks for commenting again.
I ran your code on a site without the patch from this issue, and I got exactly the same result as you did. Is that what you expected? In other words, are you making a criticism of how the existing code works?
The site name and slogan seem to have different behaviour - the name is treated as a plain text field and the slogan as markup. I also find this surprising and there is no indication in the UI. In my tests it appeared that markup for the slogan was filtered for example to remove script tags.
Please can you explain what you would like to happen here?
Comment #84
alexpott@AdamPS I think that the renderPlain() added by this patch needs to return
"My site Slogan!!!!"
for the example in #82. Under hood it needs to dorenderPlain does not need to care about our MarkupObject as far as I can see.
Another possible change would be do change \Drupal\Component\Render\PlainTextOutput
to
That said the problem here is that \Drupal\Component\Render\PlainTextOutput() means render to context where HTML has no meaning. It does not mean remove all html no matter what.
Comment #85
AdamPS CreditAttribution: AdamPS at AlbanyWeb commentedGreat thanks for explaining. So I'm afraid that I don't agree - your argument appears very plausible but I have studied this matter very carefully from my perspective as maintainer of simplenews and swiftmailer modules. The problem with what you suggest is that it would strip perfectly legitimate plain text values that resemble HTML - in particular anything containing a "less than" followed by a letter.
Going back to your paragraph from #82, you said this:
In the first case, the site name has been displayed in an HTML page, so it has been correctly escaped.
In the second case, the key point is that the return value of renderPlain() is plain text - not HTML. So before it is ever displayed on an HTML page, it will be escaped, for example by TWIG auto-escaping. If we already escaped it, then it would be escaped again, which is of course wrong as the escape codes would be displayed on the page. Or the text might be put into a plain text mail. If we escaped it, the mail agent will show the escape codes.
I have uploaded a new patch that improves the code based on your useful comments in #81.
Comment #86
AdamPS CreditAttribution: AdamPS at AlbanyWeb commentedRe-roll
Clarification of previous comment. It might be clearer if we take a differ example, such as "Your <best> buys" (I tried to type this above but apologies I forgot to escape the ampersands).
This token replacing code is not just for site-name of course - it's used for any token such as [node:title]. These can reasonably contain a < for technical sites about HTML programming or maths/science.
Comment #87
AdamPS CreditAttribution: AdamPS at AlbanyWeb commentedComment #88
darvanenI agree with @AdamPS that I would expect replacePlain() to return
"Your <best> buys"
rather than"Your buys"
if I had entered the former as my site name.Is it still the plan to deprecate passing Markup to replace()? Or does the IS need an update?
Comment #89
darvanenJust want to take a step back here and examine #2578569: Move token sanitization from Token::replace() to Token::generate().
Am I right in thinking that is attempting to solve the same problem as this issue? and if so, is it a better approach or not?
There's a related ticket #2577845: Ensure that all core tokens return SafeStringInterface if needed that I think should be a follow-up for whichever of these methods we go with (assuming crossover of purpose). Or maybe it's a duplicate?
Comment #90
AdamPS CreditAttribution: AdamPS at AlbanyWeb commentedThanks @darvanen for taking a look at this issue.
I would say there are 3 separate problems that all need fixing. They could be fixed in any order (it's not really that one "follows-up" another).
Token::replace()
correctly by creating new alternatives with clearer documentation and purpose.Token::generate()
directly, returning tokens that are not sanitised.If you are in agreement, any chance you could set the issue to RTBC please?
Comment #91
AdamPS CreditAttribution: AdamPS at AlbanyWeb commentedI don't see how we can - there is core code that returns ambiguously either string or MarkupInterface. It would just force calling code to put in (string) casts which wouldn't really achieve anything.
Edit: I updated the IS.
Comment #92
darvanen@larowlan you called for the deprecation warning in #49 and I tend to agree. What do you think of #91?
Comment #93
larowlanI'd be interested in seeing what failed if we added the deprecation so we could gauge how much misuse there is in core
It may not end up being the final patch (although I still feel that it should be)
Kind of like #3232018: Trigger an error when using \Drupal\Core\Cache\RefinableCacheableDependencyTrait::addCacheableDependency with a non CacheableDependencyInterface object which shows us all the misuse of that API in core.
Comment #94
darvanen#90: Ok cool, we can worry about those later, or in parallel, as you say.
#91 / #93: I see we've been around the mulberry bush a few times on this and have some competing priorities which must be a bit frustrating. I've gone back through the comments and I think @larowlan you might still be thinking ::replace should eventually be proxied to ::replaceString while @AdamPS has left it in place instead of creating a new ::replaceUnsafe method.
There is a failing patch which demostrates the scope of the problem at #65.
So once again I've come around to your point of view @AdamPS :) thanks for updating the IS.
I think if we did want to do away with the "magic" (but also somewhat broken) ::replace method that could be addressed in a separate issue. This issue will enable change, we don't have to carry out all of that change at once.
However, we can't make this RTBC without test coverage of the new API surface, that's just asking for it to be sent back :)
I think \Drupal\Tests\Core\Utility\TokenTest would be a good place to specficially lock in the behaviour of the new methods.
Comment #95
larowlanok thanks @darvanen, I've been in and out of this issue so if you, Adam and Alex have moved on and have a consensus of how to go forward - ignore me
Comment #96
AdamPS CreditAttribution: AdamPS at AlbanyWeb commentedThanks for the comments. Here is the test.
I also added one last try at the deprecation - it looks like some recent fixes in other issues have solved some of the problems and I fixed one more place in views.
Comment #97
AdamPS CreditAttribution: AdamPS at AlbanyWeb commentedFYI I looked more closely at #2578569: Move token sanitization from Token::replace() to Token::generate() and I propose it could be marked as "Closed (works as designed)"
Comment #98
AdamPS CreditAttribution: AdamPS at AlbanyWeb commentedComment #99
AdamPS CreditAttribution: AdamPS at AlbanyWeb commentedWow - that was easier than I expected! The deprecation is back in, I have adjusted the CR and IS.
Comment #100
AdamPS CreditAttribution: AdamPS at AlbanyWeb commentedComment #101
larowlanThis is looking good to me, it would be good for @alexpott to cast another eye over the latest changes and responses to his earlier feedback
Comment #102
darvanenLooking good to me too.
#99: Well that's good news! I must say I'm a tiny bit suspicous of test failures disappearing like that but if you've found recent work to explain it that's good. Perhaps you could provide one example of where this has happened? Or if that was an assumption we can go looking.
Review of whole patch at #98:
This will now need to reference 9.4.0 and 11.0.0 instead.
Thanks for going the extra mile on this, much easier to read and understand.
The area field in a view is filtered text, it can contain markup. I don't think we should be casting this to a string here, unless I've missed something about how this works?
Thanks for the test, it has enough coverage in my opinion. There's a side-effect here of locking in the string type of the site name and slogan but I don't think that should be an issue, I certainly don't expect those values to change. I think the test illustrates the issue well.
Review of draft change record:
Comment #103
darvanenComment by @andypost on #2578569: Move token sanitization from Token::replace() to Token::generate() that's more relevant here:
I'm not sure how the token replacment method has any effect on translatability though.
Comment #104
darvanenJust closed #2577845: Ensure that all core tokens return SafeStringInterface if needed as a duplicate of this issue . Happy to be wrong, but I'm pretty certain about that one.
I think #2578193: How to deal with invalid URLs which are result of token replacements is solved by the approach we're taking here and would like to also close as duplicate but am less sure. Thoughts over on that ticket please?
Comment #105
AdamPS CreditAttribution: AdamPS at AlbanyWeb commentedThanks @darvanen for a thorough review. All changes done, except where commented below.
#102.1
Done for now. @catch said to @larowlan that we should commit to 9.4.x then after we can discuss a backport - in which case we can change it back.
#102.3
I agree, it can contain markup. The
tokenizeValue()
function is a bit hacky as it is ambiguous whether the value contains unsafe HTML or markup. The new code casts it the input string to force it to unsafe, hence the result is also unsafe - which is fine because the processed_text render element sanitises the text. This is exactly the same situation as the old code, where the result is unsafe.If we want to keep the deprecation this is the only solution I see (short of a major rewrite). It would be the same for anywhere in contrib where the input value is ambiguous. Casting to string is OK, provided that you treat the result as unsafe markup.
#103
Yes I agree. The only affect that I can see is that token replacement forces the translation to take place by casting $text to string - whereas otherwise the translation might wait until rendering. Anyway this issue hasn't changed anything relating to translation.
#104
I commented on that issue.
Comment #107
larowlanUpdating title as this is more than just invalid documentation
Comment #108
darvanen#105: That all makes sense, thanks in particular for spelling out the answer to 102.3.
As I suspected, the failure was a random one not to do with this patch which is now passing.
I can't see anything outstanding so let's see how we go.
Comment #109
AdamPS CreditAttribution: AdamPS at AlbanyWeb commentedThanks both. I agree with the concept of changing the title - I tweaked it a little for accuracy as
::replace()
is not deprecated entirely, only for specific cases.Comment #110
AdamPS CreditAttribution: AdamPS at AlbanyWeb commentedComment #111
darvanenI don’t think you need to change it to needs review for a title change 🙂
Comment #112
AdamPS CreditAttribution: AdamPS at AlbanyWeb commentedOops sorry my browser had cached the "Needs review" status and had set it back.
Comment #113
alexpottI think we should change this to string type check. I.e.
if (!is_string($unsafe)) {
... We should also deprecate passing NULLs to this method because it makes no sense.Should have a return type hint.
Should have a return type hint.
I don't think we should document NULLs here if at present they are allowed. This should be deprecated.
Should have a return type hint. (This is for new methods only)
This assert is not necessary once you have return typehints.
This assert is not necessary once you have return typehints.
Comment #114
AdamPS CreditAttribution: AdamPS at AlbanyWeb commentedThanks @alexpott. Updated patch contains all except 1&4. I did try deprecating the NULL and I recall there were lots of problems. I'll create a second patch after to reveal them again.
Comment #115
AdamPS CreditAttribution: AdamPS at AlbanyWeb commentedComment #116
AdamPS CreditAttribution: AdamPS at AlbanyWeb commentedInteresting - the
is_string()
test of course excludes NULL, so this means the deprecation of NULL is fine.New patch fixes the deprecation notice to be consistent with the revised code, which is also clearer in the case that NULL is passed in. The comment already specified
@param string
so it should be pretty self-explanatory.Comment #117
AdamPS CreditAttribution: AdamPS at AlbanyWeb commentedAs far as I know this is now ready for commit many thanks.
Comment #118
AdamPS CreditAttribution: AdamPS at AlbanyWeb commentedOK I found a problem with the change proposed in #113.5.
The function
doReplace()
will return NULL in case ofreplace(NULL)
, which is deprecated but still allowed until 11.0.0. This fails to meet the new type hint.I propose that the neatest fix is to change
doReplace()
so that the parameter is explicitly string. MarkupInterface will automatically convert. Then we add a string cast inreplace()
in case of the NULL input.Comment #119
alexpott@AdamPS nice catch... it feels like we should have test coverage of passing a NULL in.
This needs test coverage. Using
$this->expectDeprecation()
.Comment #120
AdamPS CreditAttribution: AdamPS at AlbanyWeb commentedComment #121
catchGiven this test was apparently wrong before, I think it would be worth some comments here explaining what we're testing, why things end up escaped how they are etc.
Comment #122
AdamPS CreditAttribution: AdamPS at AlbanyWeb commented@catch OK done.
Comment #123
larowlanI think all of the remaining feedback has been addressed here
Updated issue credits
Comment #125
AdamPS CreditAttribution: AdamPS at AlbanyWeb commentedUnrelated test failure, now working again
Comment #126
effulgentsia CreditAttribution: effulgentsia at Acquia commentedOverall, I think this patch is great!
However,
While I think the proposed change is the desired behavior, is it a BC break that it's changing existing behavior? Am I correct that the BC version of this would be:
Suppose for example, some module wanted to send an email with a subject of:
What do you think of the <blink> tag?
. Given HEAD's current behavior, it would have to set the subject to:What do you think of the <blink> tag?
, so that the existing renderFromHtml() preserves and decodes it.I think what we're saying in this issue is that that's a bug: that given that the semantics of an email subject is that it is plain text, that the caller should be able to express that as
What do you think of the <blink> tag?
without having to escape it, and I agree with that. However, what do we do about the fact that there might be callers that are escaping it, because that's what was required to make it work with Drupal's existing behavior? If we commit this patch as-is, then for those callers that were escaping it to workaround what is arguably a bug, we'll start sending emails without decoding those escaped entities.I wonder if in this issue, if we should change this and the other usages to the backwards compatible invocation (where we make core's callers to replacePlain() also explicitly renderFromHtml() the first argument too), with todos linking to issues to remove that renderFromHtml() call, and that way we can figure out if we need to deal with BC considerations in those respective issues on a case by case basis given the nature of each usage, rather than holding up this patch on figuring that out.
Or would we rather try to figure out the BC implications of each of those callers here rather than punting that?
Comment #127
AdamPS CreditAttribution: AdamPS at AlbanyWeb commentedThanks @effulgentsia for raising this, it is a good question to ask and very well explained.
It's true, this fix will 'break' any sites that have compensated for the bug and are now relying on the bugged behaviour. However I think the same could be said for a fair proportion of bug fixes.
The problem with the "BC" version of the fix that you propose is that I think it doesn't actually fix the bug😃.
What do you think of the <blink> tag
would still be corrupted toWhat do you think of the tag
.I think the only thing we can do here is to add a warning to the change record, which I have now done.
Comment #128
effulgentsia CreditAttribution: effulgentsia at Acquia commentedHm, I don't understand why. Is there something else in our mail system that messes with the email subject after HEAD's existing
PlainTextOutput::renderFromHtml()
call is made? I'll try to make some time early this week to test that out to find out.In the meantime,
This doesn't seem good at all. Perhaps I need to read the comment thread that led to this decision? I think this clearly shows that just because you have a sanitized token template and sanitized token values, does not mean that you end up with an output string that is safe against XSS. So I don't think that it's correct to create a Markup object that says it is. Why not run it through
Xss::filterAdmin()
first before doing that?Comment #129
AdamPS CreditAttribution: AdamPS at AlbanyWeb commentedWell that's what PlainTextOutput::renderFromHtml() does:
I believe the comment trail will explain why the HTML escaping was removed from the replaced values like this:
It probably won't say anything more. I don't think anyone has yet considered the scenario in the way that you have just done. And I see what you mean, this definitely needs careful checking.
However I propose that the security problem actually comes from
LocaleStringIsSafeTest
.The original string
Go to the <a href="[locale_test:security_test1]">frontpage</a>
is unsafe, as it is effectively a link to an arbitrary value. The good news is thatXss::filterAdmin()
would strip out the token inside the link. This happens because[locale_test
is not a secure protocol - the colon in a token (whether by accident or by design) neatly triggers the XSS filtering rule.I propose that the test code has a horrible security bug when it comes to this line - it misuses the
t()
function to mark something safe when it's not.The call to t() resolves to constructing a FormattableMarkup, and the comment for the relevant parameter says this
Therefore I believe that this patch is not yet proven to have a security bug, if used correctly. Probably we should change the test to stop being such a bad example. However there is clearly a risk - there might be code that is currently using the code incorrectly, but is currently safe because of the escaping.
Calling
Xss::filterAdmin()
is good because it makes the markup safe without the corruption from HTML escaping. However it could create a new problem: what about code that is deliberately creating HTML containing script tags based on input from a trusted source? I just tested and the "Full HTML" text format could generate such a thing and then the site could enable the Token Filter module. It seems that adding a call toXss::filterAdmin()
could break this site.Tricky. Thoughts welcome at this point.....
Comment #130
AdamPS CreditAttribution: AdamPS at AlbanyWeb commentedIf the attacker could control both the original string and the token value, then I have an example of how to combine safe markup to get unsafe markup so it is a real problem (I won't post it publicly).
It's worth adding at this point that these problems are not really created by this patch. The same challenge exists in contrib modules that want token replacement to work properly with the existing interface. The example in the CR illustrates it exactly - the two lines of code below have the same risks as each other
And if you don't call
Markup::create()
but instead rely on auto-escaping, then you get double-escaping.Maybe we could add the call to
Xss::filterAdmin()
but have an option to disable it??Comment #131
darvanenWhat if we had a
::replaceTrustedMarkup
?Comment #132
AdamPS CreditAttribution: AdamPS at AlbanyWeb commentedAfter thinking more, I feel that
Xss::filterAdmin()
is the solution. It means that this issue not only solves some bugs but helps improve security.That's a good idea. On balance, my current inclination is to leave it out of this commit:
$markup
has been created by a 'trusted' admin user, this person likely doesn't understand the pitfalls of allowing token replacement and how to avoid them.Xss::filterAdmin()
seems fairly unusual/obscure.::replace()
.New patch coming up...
Comment #133
AdamPS CreditAttribution: AdamPS at AlbanyWeb commentedThank you @effulgentsia for your diligence to spot this problem. Here is a new patch for review and I also updated the CR.
Comment #134
effulgentsia CreditAttribution: effulgentsia at Acquia commentedThanks!
Thinking of how to address #130 - #132...
It's already the case that the Markup class is documented as saying that it is internal and should only be used by the render subsystem. The token system isn't the render system, so should not be creating a Markup object. It looks like we already have some pre-existing callers in HEAD, such as system_requirements(), violating this rule, so that's not great, but perhaps we should follow the rule for this patch regardless.
In which case, the intended design of MarkupInterface is that when a new system needs it, it creates its own implementation class that can document and/or implement its own special needs. For example, we have FilteredMarkup, which only documents its own context, but does not implement anything special, and we also have TranslatableMarkup, which implements a bunch of unique things.
So for this patch, what if we create a
Drupal\Core\Utility\TokenReplacedMarkup
class (better name suggestions welcome), where Token::replaceMarkup() constructs a TokenReplacedMarkup object with the raw result of Token::doReplace(), and where TokenReplacedMarkup::__toString() returns that string filtered by Xss::filterAdmin(), and where we add aTokenReplacedMarkup::getUnfiltered()
method (better name suggestions welcome) to return the unfiltered string?Given #132.3, maybe we don't really need the getUnfiltered() method, but if creating a TokenReplacedMarkup class is a good idea anyway, then why not add that method too?
Comment #135
alexpottA great this about having a TokenReplacedMarkup is that we can do the XSS filtering only on __toString() and therefore will only do the work if the thing is used. And yeah I think having a get unfiltered method would seem like a useful thing if well documented that the caller becomes completely responsible for santisation.
Comment #136
AdamPS CreditAttribution: AdamPS at AlbanyWeb commentedOK here you go. The patch keeps stretching😃 would be great to get this one in.
Comment #137
AdamPS CreditAttribution: AdamPS at AlbanyWeb commentedComment #138
effulgentsia CreditAttribution: effulgentsia at Acquia commentedYes, turns out there is. system_mail() converts from html to text the first time. And then MailManager::doMail() takes what has already been converted to text, treats that as html and converts it to text a second time.
I'm not sure what that means yet in terms of the BC break explained in #126. I need to think about that a bit more.
Comment #139
AdamPS CreditAttribution: AdamPS at AlbanyWeb commentedThat is #3174760: Mails resembling HTML are corrupted which I have also been working on and has an RTBC patch.
As far as I can see we need a fix to both this issue and the MailManager one before it is possible to send a mail with the subject
What do you think of the <blink> tag
. Wouldn't any single call toPlainTextOutput::renderFromHtml()
corrupt this title?Comment #140
AdamPS CreditAttribution: AdamPS at AlbanyWeb commentedI am aware I am keen to get this issue committed but actually I realise that @effulgentsia you are quite right to be cautious here.
I have a suggestion. How about if we split this issue into 2, and create a follow-on for part 2?
So we would keep the changes to Token.php, new file TokenReplacedMarkup.php and the change in Drupal\views\Plugin\views\area\Text.php which is needed to avoid deprecation warnings.
This hopefully leaves the issue in a commitable state, and is probably easier for people to understand - the contents of the patch are a better match with the current issue summary and the title of the CR "New alternatives to Token::replace()"
Comment #141
AdamPS CreditAttribution: AdamPS at AlbanyWeb commentedThis issue has been a learning experience and voyage of discovery for me😃. I made some key assumptions that turned out to be wrong.
1) That substituting safe tokens in safe markup is safe.
replaceMarkup()
was created to give that assumption a place to live without affecting existing code. It occurred to me this morning that seeing as this assumption is false, then there is probably no point inreplaceMarkup()
. The existingreplace()
function can take either safe or unsafe markup and it doesn't matter because the result is the same: unsafe markup.2) That the result of replacing tokens ought to be MarkupInterface. In fact the return comment of
replace()
although partly wrong is also partly correct:It seems the same as hundreds of other places where a string contains unsafe markup - there are various correct ways to handle it.
My confusion mostly came from my attempts to send HTML emails reliably through the venerable Drupal mailsystem, as a maintainer of simplenews and swiftmailer. I found many bugs with corruption from extra escaping or extra rendering to plain making an indiscriminate vendetta on any < symbols that might legitimately occur😃. I've given up that fight and I'm now heading in a new direction with the Symfony Mailer project.
Here's my new proposal for this issue:
replaceMarkup()
from this issue.Comment #142
darvanenVoyage of discovery indeed, for me too.
Things always get more complex before they simplify again, or so I've heard ;)
If for some reason we decide not to implement
replacePlain
the documentation and variable name changes now here are vital in my opinion.Comment #143
AdamPS CreditAttribution: AdamPS at AlbanyWeb commentedThanks @darvanen.
Thank you for your support of the documentation and variable name changes.
I am hoping that we would implement
replacePlain()
😃. It's a brand new function so no BC-risk, and it should make it easier for people to use the interface correctly.Comment #144
larowlanConfirming that this isn't a BC concern for sub-classes
https://3v4l.org/GS5RY
Any reason why this can't just use PlainTextOutput::renderFromHtml regardless?
Comment #145
AdamPS CreditAttribution: AdamPS at AlbanyWeb commentedThanks @larowlan
We don't because that leads to corruption of plain text that resembles HTML.
That's exactly the sort of bug that this issue is trying to avoid, and that we are fixing in the follow-on issue #3264453: Incorrect usage of Token::replace() can corrupt plain text that resembles HTML. In general, we need to avoid rendering something that is not markup, or escaping something that is.
Comment #146
larowlanThanks, I figured that would be the case.
Can we expand the docs a bit on renderPlain to point this out?
That aside this looks greatly simplified and hence far less disruptive
Comment #147
AdamPS CreditAttribution: AdamPS at AlbanyWeb commentedThanks
Good idea - done
Comment #148
larowlanThis looks much simpler now, thanks for your perseverance here @AdamPS
Comment #149
roy_k CreditAttribution: roy_k commentedI tried patch #147. It didnt work and even more I got an error on my site:
the site encountered an error...
I had to undo it. Using D9.3.8 with php 7.4
Solved - If you want to use the entity title and you get the title wrap with HTML, use [node:field_brand:entity] token and it will give you the entity title without the HTML wrap.
Comment #150
larowlan@roy_k
Can you elaborate
- what you expected to happen
- what actually happened
- what the error message was
At present your feedback is missing those details so we can't really provide any insight
Comment #151
roy_k CreditAttribution: roy_k commented@larowlan,
Thank you but I found the solution and put it in my comment so anyone else who had the same problem can find it and use it.
I tried to use the entity title as a file name but when saving the node it created a folder by the title name wrap with HTML tag and not just the entity name. It seems that the wrong token was used and it was not clear which one to use to get it.
Comment #153
alexpottCommitted and pushed 74b25722ac to 10.0.x and 6a2a3a4ea9 to 9.5.x and a4cd4cf82b to 9.4.x. Thanks!
I wish this contained some of the fixes so the only use-case was not a test - but getting this in is more important so committing.
Comment #157
geek-merlinSuch an important fix.
🥂!
Comment #158
AdamPS CreditAttribution: AdamPS at AlbanyWeb commentedGreat many thanks @alexpott, @larowlan and everyone else who helped.