Resulting string format of token_replace(..., array('sanitize' => FALSE)) is undefined [#974782]

Comment	File	Size	Author
#27	token-sanitize-13.patch	67.78 KB	c960657
#26	token-sanitize-12.patch	66.02 KB	c960657
#25	token-sanitize-11.patch	66.87 KB	c960657
#23	token-sanitize-10.patch	66.67 KB	c960657
#22	token-sanitize-9.patch	70.48 KB	c960657
#20	token-sanitize-8.patch	70.34 KB	c960657
#16	token-sanitize-7.patch	38.18 KB	c960657
#11	token-sanitize-5.patch	35.43 KB	c960657
#10	token-sanitize-4.patch	52.07 KB	c960657
	token-sanitize-3.patch	50.97 KB	c960657

Comment #2

German

Karlsruhe

commented 17 November 2010 at 21:44

+++ modules/comment/comment.tokens.inc	17 Nov 2010 20:13:33 -0000
@@ -122,86 +122,85 @@ function comment_tokens($type, $tokens, 
-          $replacements[$original] = $sanitize ? check_plain($mail) : $mail;
+          $replacements[$original] = check_plain($mail);

Changes like this are definitely wrong. When $sanitize is FALSE, then the caller is asking for the original, raw token values, not to be output, but used in other contexts.

Powered by Dreditor.

Log in or register to post comments

Comment #3

dave reid

he/him

English

Nebraska USA

commented 17 November 2010 at 21:46

I would much rather prefer we can pass in $options['output'] = 'plain' or $options['output'] = 'html' or similar. By automatically applying check_plain() to token results we're removing flexibility to contrib modules that need the raw output.

Log in or register to post comments

Comment #4

c960657 commented 17 November 2010 at 23:08

Changes like this are definitely wrong. When $sanitize is FALSE, then the caller is asking for the original, raw token values, not to be output, but used in other contexts.

I think that depends on whether $sanitize == FALSE is supposed to indicate 1) that the output is plain text, or 2) that the output is unsanitized HTML (that should be passed through filter_xss_admin(), not check_plain()). I thought there was consensus about case 1 (i.e. the behaviour is similar to the CHECK_PLAIN/PASS_THROUGH option for drupal_set_title()).

In case 2 we should run decode_entities(strip_tags()) on tokens that may contain HTML in order to convert them to plain text. We can do that, but that actually destroys information. I don't think “sanitize” is the right word to describe this process.

Or, 3) we can have a three-state option, so the function may return either sanitized HTML, unsanitized HTML or plain-text.

Log in or register to post comments

Comment #5

sun

German

Karlsruhe

commented 17 November 2010 at 23:52

So far, the current $sanitize option is meant to return either 1) HTML or 2) *raw* token values (not escaped or sanitized in any way).

If $sanitize is FALSE, then token values are not going to be output, and also not going to end up in HTML. The caller asks for the raw token values as-is. It basically maps to the PASS_THROUGH option of dst().

If $sanitize is TRUE, then token values are going to be output in a HTML context and should therefore be sanitized and escaped in the usual way.

Right now, for some tokens, this means that we're merely escaping via check_plain(), which would map to the CHECK_PLAIN option of dst(). For other tokens, this means that we're sanitizing the value via filter_xss() or other functions.

Can we write down a concrete use-case, in which the current $sanitize option fails?

It sounds like you'd want to replace the current TRUE/FALSE values with:

  * @param $sanitize
  *   (optional) Defaults to TRUE. One of the following values:
  *   - TRUE: Escape and sanitize token values for HTML, respecting user access.
  *   - CHECK_PLAIN: Escape token values for HTML.
  *   - PASS_THROUGH: Return raw token values.
  *   - FALSE: Same as PASS_THROUGH for backwards compatibility.

Log in or register to post comments

Comment #6

David_Rothstein commented 18 November 2010 at 06:29

So is this a critical bug, or is it a feature request?

I don't think I understand where the bug is. Currently, if 'sanitize' is TRUE you get something that is safe to put in HTML (but with the implementing modules deciding exactly what filtering is appropriate for each token depending on where it came from). If you don't want the default filtering, you can pass 'sanitize' as FALSE and then do your own. The user mail system is an example of somewhere that does that (http://api.drupal.org/api/drupal/modules--user--user.module/function/_us...).

So like @sun, I would be curious to see a specific example of where the above doesn't work.

Log in or register to post comments

Comment #7

fago

German

Vienna

commented 18 November 2010 at 08:36

I think with sanitize TRUE everything is ok, however some stuff - like text fields with text processing enabled, is sanitized by default. This might be problematic for modules expecting raw input values.

Thus, for Entity metadata, I've added a flag 'sanitized' => TRUE for the metadata of those properties. Then to ease using it I've adding a 'decode' option:

   *   (optional) A keyed array of options:
   *   - sanitize: A boolean flag indicating that textual properties should be
   *     sanitized for display to a web browser. Defaults to FALSE.
   *   - decode: If set to TRUE and some textual data is already sanitized, it
   *     strips HTML tags and decodes HTML entities. Defaults to FALSE.

Maybe this would fit for token too?
I don't see anything critical here though.

@ improving $sanitize
I don't see why one want have specific CHECK_PLAIN, TRUE, and PASS_THROUGH options though. Only the provider of the data can know how it needs to be sanitized, not the caller.

Log in or register to post comments

Comment #8

c960657 commented 18 November 2010 at 08:51

Can we write down a concrete use-case, in which the current $sanitize option fails?

I'm not sure we understand each other correctly. My point is that if users just gets the raw token values, he doesn't know which format they are in.

For instance, if he calls token_replace('Our slogan is [site:slogan]', array(), array('sanitize' => FALSE)), he will get an HTML string. If he need to use that in a plain-text context (e.g. in a log file or an email subject), he will need to convert it to plain text using decode_entities(strip_tags()). On the other hand, if calls calls token_replace('This is [site:name]', array(), array('sanitize' => FALSE)), he will get a plain-text string that can be used as is. If he calls token_replace($text, array(), array('sanitize' => FALSE)) where $text is some user-supplied value, he has no way of knowing whether he needs to translate the string to plain-text or not. Calling decode_entities(strip_tags()) on a string that is already plain text will not always work.

Log in or register to post comments

Comment #9

sun

German

Karlsruhe

commented 22 November 2010 at 04:13

Priority:

Critical

» Normal

Somehow the provided examples are bogus/wonky.

Calling token_replace('Our slogan is [site:slogan]', array(), array('sanitize' => FALSE)) for usage in a HTML context is wrong. If you want to output HTML, then you use the default value for $sanitize (TRUE).

The $sanitize FALSE option solely exists for code that intends to NOT output token values, but use them in other contexts -- e.g., the file path configuration in a field's settings form.

Demoting to normal, as long as we don't have a valid bug report here.

Log in or register to post comments

Comment #10

c960657 commented 22 November 2010 at 21:55

Status	File	Size
new	token-sanitize-4.patch	52.07 KB

Calling token_replace('Our slogan is [site:slogan]', array(), array('sanitize' => FALSE)) for usage in a HTML context is wrong. If you want to output HTML, then you use the default value for $sanitize (TRUE).

I was talking about a non-HTML context, e.g. in a log file or an email subject.

But let's consider the example with the file field, and let's assume that file_field_widget_uri() actually passes array('sanitize' => FALSE) to token_replace() (it currently does not, but I assume it's supposed to - I have added this change to the patch):

Assume that I have a site with geek humour with the witty name “<jokes>” and the witty slogan “and <stuff>”, i.e. that's what the users will see. On the admin form I have to enter the slogan as and <stuff>, because the slogan field is supposed to contain HTML and is not check_plain()'ed before being displayed on the site. I have configured a file field to store its files in the directory [site:name]-[site:slogan]. This will store the files in http://example.com/sites/default/files/%3Cjokes%3E-and%20%26lt%3Bstuff%3E/foo.txt - in pretty-print (the text displayed in the status bar of modern browser when you hover over the link) http://example.com/sites/default/files/<jokes>-and <stuff>/foo.txt.

What the user sees is the raw values, i.e. exactly what was input on the Site information page. From an end-user point of view it does not make sense, that < appears verbatim in the URL - he would expect to see the < character be encoded the same way for the name and the slogan, i.e. so that the URL would pretty-print as http://example.com/sites/default/files/<jokes>-and <stuff>/foo.txt.

Do you agree?

Log in or register to post comments

Comment #11

c960657 commented 2 August 2011 at 20:52

Version:

7.x-dev

» 8.x-dev

Status	File	Size
new	token-sanitize-5.patch	35.43 KB

Log in or register to post comments

Comment #12

dave reid

he/him

English

Nebraska USA

commented 2 August 2011 at 21:33

The example for using the site slogan isn't really valid in this case. Although it *can* be used it's not a realistic example and this direction is not acceptable to me.

Regardless adding 'sanitize' => FALSE in file_field_widget_uri() is something that *should* be done.

Log in or register to post comments

Comment #13

2 August 2011 at 21:39

Status:

Needs review

» Needs work

The last submitted patch, token-sanitize-5.patch, failed testing.

Log in or register to post comments

Comment #14

dave reid

he/him

English

Nebraska USA

commented 2 August 2011 at 21:54

We probably also probably need to fix some locations where we should also be calling 'sanitize' => FALSE:

system_mail() with $context['subject']
system_send_email_action() with $context['recipient']
system_goto_action()

Log in or register to post comments

Comment #15

c960657 commented 8 August 2011 at 19:19

> The example for using the site slogan isn't really valid in this case. Although it *can*
> be used it's not a realistic example and this direction is not acceptable to me.
Some other examples: Putting the output from token_replace() in a mail subject, in a CSV file or other text file, or in a message posted on IRC.

In these cases we want the result in plain-text, not in HTML, even though some tokens contain HTML and some contain just plain text. Currently, the only way to do this is decode_entities(strip_tags(token_replace($foo, $context, array('sanitize' => TRUE))).

Example:
Create a node with the title “foo © bar” and the body “bar © <strong>foo”. On the node view page, the title appears as “foo © bar” (i.e. exactly as entered) and the body as “bar © foo” (i.e. interpreted as HTML).

Now consider this code used to generate a text file:

$string = "Title: [node:title]\nBody: [node:body]";
$node = node_load(1);
$context = array('node' => $node);
file_put_contents('/tmp/node-a.txt', decode_entities(strip_tags(token_replace($string, $context, array('sanitize' => TRUE)))));
file_put_contents('/tmp/node-b.txt', token_replace($string, $context, array('sanitize' => FALSE)));

node-a.txt contains this:

Title: foo &copy; bar
Body: bar © foo

This is a plain-text representation of what was shown on the node view page, so this is good.

node-b.txt contains this:

Title: foo &copy; bar
Body: bar &copy; <strong>foo</strong>

Here the title is right, but the body contains the raw HTML source code, i.e. the resulting text file contains a mix between HTML and plain text. This is not really useful, except perhaps for generating debug output. In most other real-world use-cases you want to convert the result into some specific format, either plain-text or HTML (or some other format) independent of how the value is represented internally in Drupal.

Log in or register to post comments

Comment #16

c960657 commented 26 November 2011 at 22:54

Status	File	Size
new	token-sanitize-7.patch	38.18 KB

Log in or register to post comments

Comment #17

c960657 commented 26 November 2011 at 22:54

Status:

Needs work

» Needs review

Log in or register to post comments

Comment #18

damien tournoud commented 26 November 2011 at 23:29

I'm not sure I'm happy with this patch.

Just to clarify, we are mixing up two issues all over the place:

The output format: HTML, plain-text, something else?
The sanitization: when output as HTML, is the content pre-filtered for XSS or not?

In core, we basically have four possible options:

raw value (ie. not processed in any way); you are on your own
plain-text (if the source value was HTML, is has been un-encoded)
HTML (unfiltered) (note: if the source value was plain-text, it has been check_plained())
HTML (filtered) (note: if the source value was plain-text, it has been check_plained())

Because we only have one parameter to control those options 'sanitized', the debate is which one of the four it should be:

@c960657 here is arguing that it should be (3) and (4)
@sun and @David_Rothstein are arguing that it should be (1) and (4)
I am arguing that it should be (2) and (4)

After all, the option "you are on your own" has absolutely no use case in core. You cannot use it to output it as plain-text, because it is *not* plain-text :)

Log in or register to post comments

Comment #19

c960657 commented 27 November 2011 at 08:45

@Damian, thank you for the clear summary.

I don't have strong opinions on whether it should be (2)+(4) or (3)+(4). My patch is implementing (3)+(4), but I'd be happy to make a new one implementing (2)+(4). My main concern is that we don't do (1).

Log in or register to post comments

Comment #20

c960657 commented 6 December 2012 at 20:46

Status	File	Size
new	token-sanitize-8.patch	70.34 KB

(One year later ...)

@Damian: I agree, (2)+(4) (or posibly (2)+(3)) makes more sense.

This patch implements (2) and (4). Text passed through _text_sanitize() or check_markup() is considered safe enough (is this how we usually do?).

I also eliminated renamed the option “sanitize” to favour “html” (similar to the option for l()). I think this wording is more precise.

Log in or register to post comments

Comment #21

6 December 2012 at 21:45

Status:

Needs review

» Needs work

The last submitted patch, token-sanitize-8.patch, failed testing.

Log in or register to post comments

Comment #22

c960657 commented 6 December 2012 at 22:03

Status:

Needs work

» Needs review

Status	File	Size
new	token-sanitize-9.patch	70.48 KB

Log in or register to post comments

Comment #23

c960657 commented 8 February 2013 at 20:52

Status	File	Size
new	token-sanitize-10.patch	66.67 KB

Reroll.

Log in or register to post comments

Comment #24

8 February 2013 at 21:56

Status:

Needs review

» Needs work

The last submitted patch, token-sanitize-10.patch, failed testing.

Log in or register to post comments

Comment #25

c960657 commented 9 February 2013 at 15:31

Status:

Needs work

» Needs review

Status	File	Size
new	token-sanitize-11.patch	66.87 KB

Log in or register to post comments

Comment #26

c960657 commented 21 March 2013 at 22:43

Status	File	Size
new	token-sanitize-12.patch	66.02 KB

Reroll.

Log in or register to post comments

Comment #27

c960657 commented 9 June 2013 at 15:28

Status	File	Size
new	token-sanitize-13.patch	67.78 KB

Log in or register to post comments

Comment #28

9 June 2013 at 18:00

Status:	Needs review	» Needs work
Issue tags:	-Security

The last submitted patch, token-sanitize-13.patch, failed testing.

Log in or register to post comments

Comment #29

9 June 2013 at 18:00

Issue tags:

+Security

Restoring issue tags, see #2125755: System messages removed all issue tags during D7 upgrade.

Log in or register to post comments

Comment #30

mgifford

he/him

English

commented 21 March 2014 at 15:53

Issue summary:

View changes

Looks like this patch needs to be completely rewritten..

Log in or register to post comments

Comment #31

21 March 2014 at 15:53

Version:

8.0.x-dev

» 8.1.x-dev

Drupal 8.0.6 was released on April 6 and is the final bugfix release for the Drupal 8.0.x series. Drupal 8.0.x will not receive any further development aside from security fixes. Drupal 8.1.0-rc1 is now available and sites should prepare to update to 8.1.0.

Bug reports should be targeted against the 8.1.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.2.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Log in or register to post comments

Comment #32

21 March 2014 at 15:53

Version:

8.1.x-dev

» 8.2.x-dev

Drupal 8.1.9 was released on September 7 and is the final bugfix release for the Drupal 8.1.x series. Drupal 8.1.x will not receive any further development aside from security fixes. Drupal 8.2.0-rc1 is now available and sites should prepare to upgrade to 8.2.0.

Bug reports should be targeted against the 8.2.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.3.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Log in or register to post comments

Comment #33

21 March 2014 at 15:53

Version:

8.2.x-dev

» 8.3.x-dev

Drupal 8.2.6 was released on February 1, 2017 and is the final full bugfix release for the Drupal 8.2.x series. Drupal 8.2.x will not receive any further development aside from critical and security fixes. Sites should prepare to update to 8.3.0 on April 5, 2017. (Drupal 8.3.0-alpha1 is available for testing.)

Bug reports should be targeted against the 8.3.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.4.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Log in or register to post comments

Comment #34

21 March 2014 at 15:53

Version:

8.3.x-dev

» 8.4.x-dev

Drupal 8.3.6 was released on August 2, 2017 and is the final full bugfix release for the Drupal 8.3.x series. Drupal 8.3.x will not receive any further development aside from critical and security fixes. Sites should prepare to update to 8.4.0 on October 4, 2017. (Drupal 8.4.0-alpha1 is available for testing.)

Bug reports should be targeted against the 8.4.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.5.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Log in or register to post comments

Comment #35

21 March 2014 at 15:53

Version:

8.4.x-dev

» 8.5.x-dev

Drupal 8.4.4 was released on January 3, 2018 and is the final full bugfix release for the Drupal 8.4.x series. Drupal 8.4.x will not receive any further development aside from critical and security fixes. Sites should prepare to update to 8.5.0 on March 7, 2018. (Drupal 8.5.0-alpha1 is available for testing.)

Bug reports should be targeted against the 8.5.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.6.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Log in or register to post comments

Comment #36

21 March 2014 at 15:53

Version:

8.5.x-dev

» 8.6.x-dev

Drupal 8.5.6 was released on August 1, 2018 and is the final bugfix release for the Drupal 8.5.x series. Drupal 8.5.x will not receive any further development aside from security fixes. Sites should prepare to update to 8.6.0 on September 5, 2018. (Drupal 8.6.0-rc1 is available for testing.)

Bug reports should be targeted against the 8.6.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.7.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Log in or register to post comments

Comment #37

21 March 2014 at 15:53

Version:

8.6.x-dev

» 8.8.x-dev

Drupal 8.6.x will not receive any further development aside from security fixes. Bug reports should be targeted against the 8.8.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.9.x-dev branch. For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

Log in or register to post comments

Comment #38

21 March 2014 at 15:53

Version:

8.8.x-dev

» 8.9.x-dev

Drupal 8.8.7 was released on June 3, 2020 and is the final full bugfix release for the Drupal 8.8.x series. Drupal 8.8.x will not receive any further development aside from security fixes. Sites should prepare to update to Drupal 8.9.0 or Drupal 9.0.0 for ongoing support.

Bug reports should be targeted against the 8.9.x-dev branch from now on, and new development or disruptive changes should be targeted against the 9.1.x-dev branch. For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

Log in or register to post comments

Comment #39

catch

he/him

English

commented 20 July 2021 at 22:06

Status:

Needs work

» Closed (outdated)

The API has changed (with SafeMarkup and etc.) and there's no longer a $sanitize parameter.

Log in or register to post comments

Comment #40

catch

he/him

English

commented 20 July 2021 at 22:06

Issue tags:

+Bug Smash Initiative

Log in or register to post comments

Title:	Text format of token_replace(,, array('sanitize' => FALSE)) is undefined	» Resulting string format of token_replace(..., array('sanitize' => FALSE)) is undefined
Issue tags:		+Security

Resulting string format of token_replace(..., array('sanitize' => FALSE)) is undefined

Comments

Referenced by