PHP 5.4 introduces a new flag to htmlentities() "which makes htmlentities()/htmlspecialchars() replace the invalid multibyte sequences with U+FFFD (UTF-8) or &#xFFFD (other encodings).

In a nutshell, all the invalid character sequences are replaced by "�" instead of discarding the whole thing. That would make debugging way easier in many cases where data is fed into Drupal in non-UTF8 (by the database incorrectly returning non-UTF8 errors, by system extensions, on data import, etc. examples of this type of issue are numerous).

Twig's autoescape uses this internally.
https://github.com/twigphp/Twig/blob/641090378dfff2913929306fbbf227269ba...

Beta phase evaluation

Reference: https://www.drupal.org/core/beta-changes
Issue category Task because it improves DX and consistency with Twig's escape
Issue priority Major because our Twig escaping will produce different results than Html::escape flags
Unfrozen changes Unfrozen because it only changes markup.
Prioritized changes The main goal of this issue is security and consistency
Disruption Not disruptive as it will be doing what Twig is already doing and invalid characters will encode into something visible
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Damien Tournoud’s picture

Issue tags: +PHP 5.4

Tagging.

sun’s picture

If this works reliably, then it would be even be a major reason for bumping PHP requirements to 5.4.

DamienMcKenna’s picture

I wouldn't recommend mandating PHP 5.4 be required for D8, but this definitely could be a good optional feature for when PHP 5.4 is available.

tsphethean’s picture

Do we have a baked in means of detecting PHP version, or would something like

    if (defined('ENT_SUBSTITUTE')) {
      $replace_flags = ENT_QUOTES & ENT_SUBSTITUTE;
    }
    else {
      $replace_flags = ENT_QUOTES;
    }

be appropriate?

tsphethean’s picture

Status: Active » Needs review
FileSize
641 bytes

On the assumption the above is ok i've rolled this patch. Tests pass on my local on php 5.3.25 and 5.4.15 - not entirely sure what to do about writing additional unit tests to cater for this. Do we want php version checking logic in our unit tests, or do we just case that the cases are what we expect?

The last submitted patch, ent_substitute-1211866-5.patch, failed testing.

Anonymous’s picture

Issue summary: View changes

Fix preview

joelpittet’s picture

Component: base system » theme system
Category: Feature request » Task
Issue summary: View changes
Issue tags: +Twig

PHP 5.4 is already a requirement and Twig already does this so it would also keep things in line with Twig's escape filter.

https://github.com/twigphp/Twig/blob/1.x/lib/Twig/Extension/Core.php#L1011

+1 to this issue.

joelpittet’s picture

Status: Needs work » Needs review
FileSize
3.05 KB

Re-rolled.

joelpittet’s picture

Status: Needs review » Needs work

The last submitted patch, 8: enable_ent_substitute-1211866-8.patch, failed testing.

alexpott’s picture

Status: Needs work » Postponed
Related issues: +#2550945: Add Html::escape()

This issue seems like a really good idea. Postponing on #2550945: Add Html::escape()

joelpittet’s picture

FYI twig is using this internally so another reason to have this.
https://github.com/twigphp/Twig/blob/641090378dfff2913929306fbbf227269ba...

stefan.r’s picture

stefan.r’s picture

Issue tags: -Needs tests
stefan.r’s picture

Status: Postponed » Active
alexpott’s picture

Status: Active » Needs work
+++ b/core/lib/Drupal/Component/Utility/Html.php
@@ -371,7 +371,8 @@ public static function decodeEntities($text) {
+   * including "&eacute;" and "&lt;" to "é" and "<", converting invalid
+   * multi-byte sequences to the Unicode Replacement Character ("�").

Let's not add this documentation to the bit about how this method is different from Html::decodeEntities. It belongs in the list of conversions above.

stefan.r’s picture

Status: Needs work » Needs review
FileSize
1.71 KB
1.45 KB

The list is a list of entity equivalents though, if we want it to be in the list we could drop the " with their HTML entity equivalents" bit?

stefan.r’s picture

a newline crept in

alexpott’s picture

alexpott’s picture

Status: Needs review » Needs work
+++ b/core/lib/Drupal/Component/Utility/Html.php
@@ -366,7 +366,8 @@ public static function decodeEntities($text) {
+   * (for example, "&lt;" becomes "&amp;lt;"), and invalid UTF-8 encoding
+   * will be converted to the Unicode Replacement Character ("�").

I think it should be "replacement character" - too many capitals :). I like where these docs are now. Thanks.

stefan.r’s picture

Status: Needs work » Needs review
FileSize
694 bytes
1.51 KB
stefan.r’s picture

FileSize
445 bytes
445 bytes
stefan.r’s picture

stefan.r’s picture

Fixing some more test fails

The last submitted patch, 17: 1211866-17.patch, failed testing.

The last submitted patch, 18: 1211866-18.patch, failed testing.

joelpittet’s picture

Awesome! Thanks for jumping on this after the HTML::escape blocker was in.

stefan.r’s picture

clarifying some comments

The last submitted patch, 21: 1211866-19.patch, failed testing.

The last submitted patch, 23: 1211866-20.patch, failed testing.

The last submitted patch, 24: 1211866-21.patch, failed testing.

Status: Needs review » Needs work

The last submitted patch, 28: 1211866-28.patch, failed testing.

Status: Needs work » Needs review

stefan.r queued 28: 1211866-28.patch for re-testing.

The last submitted patch, 13: 1211866-13.patch, failed testing.

joelpittet’s picture

Status: Needs review » Reviewed & tested by the community

Thank you, green means go.

alexpott’s picture

Status: Reviewed & tested by the community » Needs work
Issue tags: +Needs beta evaluation

Normal tasks need a beta evaluation.

joelpittet’s picture

Title: Enable ENT_SUBSTITUTE in check_plain » Enable ENT_SUBSTITUTE flag in Html::escape
Priority: Normal » Major
Issue summary: View changes
Status: Needs work » Reviewed & tested by the community
Issue tags: -Needs beta evaluation
alexpott’s picture

Status: Reviewed & tested by the community » Fixed

Committed 00360b9 and pushed to 8.0.x. Thanks!

Thanks for adding the beta evaluation to the issue summary.

  • alexpott committed 00360b9 on 8.0.x
    Issue #1211866 by stefan.r, joelpittet, tsphethean: Enable...
Wim Leers’s picture

Wow, this landed fast! Great job, @stefan.r :)

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.