Various issues for D7 identified that the more flexible theme system and overall introduction of the rendering system in Drupal 7 leads to unpredictable rendering chains. It's very hard to trace and track, in which further functions a variable will be used and/or with other functions are re-using string variables. In turn, easily leading to double encoded strings in the output.

#715142: [beta blocker blocker] Various URLs escaped twice, since check_url() resp. filter_xss_bad_protocol() calls check_plain()
#845000: Inconsistency and lack of clarity for which functions are responsible for stripping dangerous URL protocols
#608048: Double encoding entities when displaying site name and slogan

As of PHP 5.2.3, http://php.net/manual/en/function.htmlspecialchars.php comes with a new $double_encode parameter:

When double_encode is turned off PHP will not encode existing html entities, the default is to convert everything.

--
Note: That line in xmlrpc.inc clashes with my other patch in #882298: XML-RPC request (string) values are not safe for UTF-8 -- just included the change here for the sake of completeness.

CommentFileSizeAuthor
drupal.check-plain-double.0.patch2.78 KBsun
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Damien Tournoud’s picture

-1

Fixing the symptom is always a bad idea, and I don't want to deal with the security issues that will likely result from that stuff.

Damien Tournoud’s picture

Version: 7.x-dev » 8.x-dev

Just to make it clear: there are good ways out there to deal with this problem. Rails 3 html_safe? property and ASP.NET 4 IHtmlString interface provide a consistent syntax and consistent operator semantics for "HTML safe" strings.

PHP's $double_encode option is just a hack, because it doesn't carry around the information about the "HTML safeness" of the string.

What if I want to write an article named "Why you shouldn't use the   entity", for example? I would now have to write "Why you shouldn't use the  nbsp; entity", because check_plain() doesn't know that the title field is actually plain-text, so that it *should* actually "double encode" the entities here.

This change would affects everything that is "plain text" in Drupal (including reading from external sources!) and inadequately blurry the line between "plain text" and "HTML". We really cannot consider this in D7.

sun’s picture

These are really good arguments for a won't fix. ;)

jbrown’s picture

Status: Needs review » Closed (won't fix)

This is definitely not a robust solution.