Various issues for D7 identified that the more flexible theme system and overall introduction of the rendering system in Drupal 7 leads to unpredictable rendering chains. It's very hard to trace and track, in which further functions a variable will be used and/or with other functions are re-using string variables. In turn, easily leading to double encoded strings in the output.
#715142: [beta blocker blocker] Various URLs escaped twice, since check_url() resp. filter_xss_bad_protocol() calls check_plain()
#845000: Inconsistency and lack of clarity for which functions are responsible for stripping dangerous URL protocols
#608048: Double encoding entities when displaying site name and slogan
As of PHP 5.2.3, http://php.net/manual/en/function.htmlspecialchars.php comes with a new $double_encode parameter:
When double_encode is turned off PHP will not encode existing html entities, the default is to convert everything.
--
Note: That line in xmlrpc.inc clashes with my other patch in #882298: XML-RPC request (string) values are not safe for UTF-8 -- just included the change here for the sake of completeness.
Comment | File | Size | Author |
---|---|---|---|
drupal.check-plain-double.0.patch | 2.78 KB | sun | |
Comments
Comment #1
Damien Tournoud CreditAttribution: Damien Tournoud commented-1
Fixing the symptom is always a bad idea, and I don't want to deal with the security issues that will likely result from that stuff.
Comment #2
Damien Tournoud CreditAttribution: Damien Tournoud commentedJust to make it clear: there are good ways out there to deal with this problem. Rails 3 html_safe? property and ASP.NET 4 IHtmlString interface provide a consistent syntax and consistent operator semantics for "HTML safe" strings.
PHP's $double_encode option is just a hack, because it doesn't carry around the information about the "HTML safeness" of the string.
What if I want to write an article named
"Why you shouldn't use the entity"
, for example? I would now have to write"Why you shouldn't use the nbsp; entity"
, because check_plain() doesn't know that the title field is actually plain-text, so that it *should* actually "double encode" the entities here.This change would affects everything that is "plain text" in Drupal (including reading from external sources!) and inadequately blurry the line between "plain text" and "HTML". We really cannot consider this in D7.
Comment #3
sunThese are really good arguments for a won't fix. ;)
Comment #4
jbrown CreditAttribution: jbrown commentedThis is definitely not a robust solution.