Problem/Motivation

Mixing encodings is a bad idea, as it can lead to bugs, garbled input/output, storing bad data and security issues, so we want to use UTF-8 everywhere and check for this in the Unicode requirements check.

Over in #2332295: Unicode requirements check not working with PHP 5.6 we added support for http_input and http_output for < 5.6 in Unicode::check() be we also want to add support for the newly added internal_encoding, input_encoding and output_encoding for 5.6+.

Per the RFC the precedence of settings for the mb_*, iconv_*, htmlentities functions is:
default_charset < php.* < mbstring.*/iconv.* < encoding specified by functions

So the values must be set as follows:

PHP <5.6

  • mbstring.http_input = pass (set in our .htaccess)
  • mbstring.http_output = pass (set in our .htaccess)
  • mbstring.internal_encoding = irrelevant, we set it ourselves
  • iconv.internal_encoding = empty (default) or utf-8
  • iconv.output_encoding = empty (default) or utf-8

PHP >=5.6

Same as before, except now mbstring.* and iconv.* are deprecated so they may also be empty. Newly introduced variables:

  • input_encoding (misnamed in PHP 5.6/7, the RFC says php.input_encoding) = empty or utf-8
  • internal_encoding (misnamed in PHP 5.6/7, the RFC says php.input_encoding) = empty or utf-8
  • output_encoding (misnamed in PHP 5.6/7, the RFC says php.input_encoding) = empty or utf-8

@see

  1. https://wiki.php.net/rfc/default_encoding
  2. http://php.net/manual/en/mbstring.configuration.php#ini.mbstring.interna...

Proposed resolution

Update Unicode::check() to also test for the newly introduced *_encoding options in PHP 5.6+

Remaining tasks

Review patch

User interface changes

n/a

API changes

n/a

CommentFileSizeAuthor
#77 2455455-nr-bot.txt144 bytesneeds-review-queue-bot
#69 2455455-69.patch7.63 KBalexpott
#61 unicode-requirements-2455455-61.patch4.94 KBDarren Oh
#52 unicode_requirements-2455455-52.patch3.16 KBjoelpittet
#48 unicode_requirements-2455455-48.patch3.15 KBjoelpittet
#43 2455455-42.patch3.15 KBstefan.r
#40 2455455-40.patch3.15 KBstefan.r
#40 2455455-39.patch1.88 KBstefan.r
#38 2455455-38.patch1.91 KBstefan.r
#30 interdiff-23-29.txt10.88 KBstefan.r
#29 2455455-29.patch9.94 KBstefan.r
#29 interdiff.txt1.9 KBstefan.r
#28 2455455-28.patch9.74 KBstefan.r
#23 2455455-23.patch10.83 KBstefan.r
#18 interdiff.txt365 bytesstefan.r
#18 2455455-18.patch11.2 KBstefan.r
#16 2455455-16.patch11.18 KBstefan.r
#14 2455455-14.patch2.09 KBstefan.r
#10 2455455-10.patch2.09 KBstefan.r
#9 2455455-9.patch2.09 KBstefan.r
#8 2455455-8.patch2.11 KBstefan.r

Issue fork drupal-2455455

Command icon Show commands

Start within a Git clone of the project using the version control instructions.

Or, if you do not have SSH keys set up on git.drupalcode.org:

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

benjy’s picture

benjy’s picture

Issue summary: View changes
benjy’s picture

Title: Unicode requirements should check *_encoding for PHP 5.6 » Unicode requirements should check *_encoding for PHP 7
Issue summary: View changes
Parent issue: » #2454439: [META] Support PHP 7
benjy’s picture

Title: Unicode requirements should check *_encoding for PHP 7 » Unicode requirements should check *_encoding for 5.6+
Issue summary: View changes
benjy’s picture

Issue summary: View changes
cosmicdreams’s picture

default_charset seems to be the configuration setting that is most recommended to be set. default_charset=“UTF-8” to be exact.

Is this issue as simple as adding:

<IfModule mod_php7.c>
  php_value default_charset "UTF-8"
</IfModule>
andypost’s picture

stefan.r’s picture

Status: Active » Reviewed & tested by the community
FileSize
2.11 KB

This patch ought to check that the following parameters equal UTF-8...

  • php.input_encoding
  • php.internal_encoding
  • php.output_encoding

...and that the following parameters be either UTF-8 or not set:

  • iconv.input_encoding
  • iconv.internal_encoding
  • iconv.output_encoding
  • mbstring.http_input
  • mbstring.internal_encoding
  • mbstring.http_output
stefan.r’s picture

Status: Reviewed & tested by the community » Needs review
FileSize
2.09 KB

Oops, typo in the patch and a "selecto" in the issue status!

stefan.r’s picture

FileSize
2.09 KB

The last submitted patch, 8: 2455455-8.patch, failed testing.

The last submitted patch, 9: 2455455-9.patch, failed testing.

Status: Needs review » Needs work

The last submitted patch, 10: 2455455-10.patch, failed testing.

stefan.r’s picture

Status: Needs work » Needs review
FileSize
2.09 KB
benjy’s picture

Looks reasonable to me at a quick glance. Do we have tests for this stuff?

stefan.r’s picture

FileSize
11.18 KB

We don't yet, Unicode::check() is a static method so it can't be unit tested.

This patch turns it into a non-static method and adds a unit test. I discussed this with @Xano and ideally we'd make Unicode into a service on the container, but it's probably too late for that...

Status: Needs review » Needs work

The last submitted patch, 16: 2455455-16.patch, failed testing.

stefan.r’s picture

Status: Needs work » Needs review
FileSize
11.2 KB
365 bytes

Status: Needs review » Needs work

The last submitted patch, 18: 2455455-18.patch, failed testing.

stefan.r’s picture

Status: Needs work » Needs review

stefan.r queued 18: 2455455-18.patch for re-testing.

benjy’s picture

  1. +++ b/core/lib/Drupal/Component/Utility/Unicode.php
    @@ -141,49 +141,99 @@ public static function setStatus($status) {
    -      static::$status = static::STATUS_SINGLEBYTE;
    +      self::$status = self::STATUS_SINGLEBYTE;
    

    Why change this? I think we should always be using static:: over self.

  2. +++ b/core/tests/Drupal/Tests/Component/Utility/UnicodeTest.php
    @@ -20,13 +20,152 @@
    +    $class = new \ReflectionClass('\Drupal\Component\Utility\Unicode');
    +    $class->getMethod('iniGet')->setAccessible(true);
    +    $class->getMethod('getPhpVersion')->setAccessible(true);
    

    Is this really required when you're creating a mock, it extends the class so we should be able to override those functions anyway?

stefan.r’s picture

FileSize
10.83 KB

Yes that makes sense! :)

stefan.r’s picture

Issue summary: View changes
benjy’s picture

I really like that we can add all this test coverage but the API change of turning the Unicode class into something that needs to be instantiated will need committer approval.

+++ b/core/tests/Drupal/Tests/Component/Utility/UnicodeTest.php
@@ -20,13 +20,141 @@
+  protected $current_ini_variable = array();

Small nitpick but coding standards are camel case for class properties.

alexpott’s picture

Status: Needs review » Needs work

It feels wrong to change the class into something that must be instantiated just for testing.

alexpott’s picture

To clarify - we get test coverage of this by running our test suite on different PHP versions - which we will do soon (hopefully).

stefan.r’s picture

Status: Needs work » Needs review
FileSize
9.74 KB

That would cover the different PHP_VERSION results, but we still cannot do just do ini_set() on all of these variables, for a few of them PHP will give an error on if we try to ini_set() them from a php script.

I guess a less fancy stub like this might work?

stefan.r’s picture

Issue summary: View changes
FileSize
1.9 KB
9.94 KB

coding standards fix

stefan.r’s picture

FileSize
10.88 KB

xjm’s picture

Priority: Normal » Major
Issue tags: +Needs issue summary update

Discussed with @alexpott -- this issue is probably major, but not critical, because while this issue might severely affect sites with certain PHP configurations, we can still fix it in a patch release, and it's not that Drupal is completely unusable without this fixed on 5.6+.

Can we update the issue summary to explain the implications of this being broken more explicitly?

serg2’s picture

@xjm is this a more thorough explanation that I should put in the summary?

Problem/Motivation

In PHP 5.6 a number of encoding module/functions were deprecated, namely:
iconv.input_encoding (Default: php.input_encoding)
iconv.internal_encoding (Default: php.internal_encoding)
iconv.output_encoding (Default: php.output_encoding)
mbstring.http_input (Default: php.input_encoding)
mbstring.internal_encoding (Default: php.internal_encoding)
mbstring.http_output (Default: php.output_encoding)
all functions that take encoding option use php.internal_encoding as default (e.g. htmlentities/mb_strlen/mb_regex/etc)

The default character set (default_charset) is now "UTF-8" and the following functions are available to change encoding:
php.input_encoding
php.internal_encoding
php.output_encoding

(Source https://wiki.php.net/rfc/default_encoding)

In PHP 7 functionality which has been deprecated during the 5.x cycle will be removed.
(Source https://wiki.php.net/rfc/remove_deprecated_functionality_in_php7)

If we are able to remove the usage of these deprecated/removed functions for PHP 5.6+ while preserving compatibility for those using < PHP 5.6 then we should.

Proposed resolution

In Unicode requirements check not working with PHP 5.6 we added support for http_input and http_output for < 5.6 in Unicode::check() but we also need to add support for internal_encoding, input_encoding and output_encoding for 5.6+

The current patch at https://www.drupal.org/files/issues/2455455-29.patch checks that encoding-related parameters are correctly set for PHP versions 5.6 and up.

Remaining tasks
Review patch

User interface changes
n/a

API changes
n/a

catch queued 29: 2455455-29.patch for re-testing.

stefan.r’s picture

Issue summary: View changes
Issue tags: -Needs issue summary update

clarifying IS

stefan.r’s picture

Status: Needs review » Needs work
stefan.r’s picture

Issue summary: View changes

Apparently these value can be an empty string if not set (php 5.6, 7), FALSE (hhvm) or sometimes NULL (according to the manual), so let's allow all of those.

stefan.r’s picture

Issue summary: View changes
stefan.r’s picture

Status: Needs review » Needs work

The last submitted patch, 38: 2455455-38.patch, failed testing.

stefan.r’s picture

The last submitted patch, 40: 2455455-39.patch, failed testing.

Status: Needs review » Needs work

The last submitted patch, 40: 2455455-40.patch, failed testing.

stefan.r’s picture

Status: Needs work » Needs review
FileSize
3.15 KB

Status: Needs review » Needs work

The last submitted patch, 43: 2455455-42.patch, failed testing.

Status: Needs work » Needs review

mgifford queued 43: 2455455-42.patch for re-testing.

mgifford’s picture

Status: Needs review » Needs work

Patch no longer applies.

mradcliffe’s picture

Issue tags: +Needs reroll

Added tag.

joelpittet’s picture

Applied with fuzz on patch -p1

Version: 8.0.x-dev » 8.1.x-dev

Drupal 8.0.6 was released on April 6 and is the final bugfix release for the Drupal 8.0.x series. Drupal 8.0.x will not receive any further development aside from security fixes. Drupal 8.1.0-rc1 is now available and sites should prepare to update to 8.1.0.

Bug reports should be targeted against the 8.1.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.2.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Fabianx’s picture

Status: Needs review » Reviewed & tested by the community

RTBC - looks great to me.

I don't think we need a test, but D8 core committers might disagree.

alexpott’s picture

Version: 8.1.x-dev » 8.2.x-dev
Category: Bug report » Task
Status: Reviewed & tested by the community » Needs work
Issue tags: +Needs followup, +Needs reroll

The way Unicode::check() works is pretty odd. If you fixed internal_encoding you might then have to fix input_encoding and then output_encoding. And each time the status report will only tell you about the next fail. But this is behaviour is not changed by this patch so I think we should file a followup to report on all of the settings at once rather than one at a time.

As this is a hardening and an improvement of reporting I think it is a task not a bug. And therefore only eligible for 8.2.x.

Needs a reroll.

joelpittet’s picture

Version: 8.2.x-dev » 8.3.x-dev

Drupal 8.2.0-beta1 was released on August 3, 2016, which means new developments and disruptive changes should now be targeted against the 8.3.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.3.x-dev » 8.4.x-dev

Drupal 8.3.0-alpha1 will be released the week of January 30, 2017, which means new developments and disruptive changes should now be targeted against the 8.4.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.4.x-dev » 8.5.x-dev

Drupal 8.4.0-alpha1 will be released the week of July 31, 2017, which means new developments and disruptive changes should now be targeted against the 8.5.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.5.x-dev » 8.6.x-dev

Drupal 8.5.0-alpha1 will be released the week of January 17, 2018, which means new developments and disruptive changes should now be targeted against the 8.6.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.6.x-dev » 8.7.x-dev

Drupal 8.6.0-alpha1 will be released the week of July 16, 2018, which means new developments and disruptive changes should now be targeted against the 8.7.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.7.x-dev » 8.8.x-dev

Drupal 8.7.0-alpha1 will be released the week of March 11, 2019, which means new developments and disruptive changes should now be targeted against the 8.8.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Darren Oh’s picture

Issue tags: +Seattle2019
Darren Oh’s picture

Status: Needs review » Reviewed & tested by the community

Code looks good and works in my local tests.

Status: Reviewed & tested by the community » Needs work

The last submitted patch, 61: unicode-requirements-2455455-61.patch, failed testing. View results

Darren Oh’s picture

Status: Needs work » Reviewed & tested by the community

Test failure was not caused by this patch. All tests pass now.

Status: Reviewed & tested by the community » Needs work

The last submitted patch, 61: unicode-requirements-2455455-61.patch, failed testing. View results

yogeshmpawar’s picture

Status: Needs work » Reviewed & tested by the community

All looks good so setting back to RTBC! Tests failure is not related (see #3055648: Frequent random fail in \Drupal\Tests\media_library\FunctionalJavascript\MediaLibraryTest).

alexpott’s picture

As of Drupal 8.8.x we only support PHP7 so this patch needs adjusting to this reality. It also needs a reroll because #3008870: Drop support for PHPUnit 4.8 once PHP 5 is no longer supported (8.8.x) removed PHP5 code from the codebase.

alexpott’s picture

Status: Reviewed & tested by the community » Needs work

https://www.php.net/manual/en/mbstring.configuration.php - shows that we can ignore mbstring.http_input and mbstring.http_output they're deprecated in PHP 5.6. We should remove them from .htaccess and the requirements checking functions.

alexpott’s picture

Note iconv.* stuff is deprecated in PHP 5.6 too.

For me I think we should deprecate ::check in this issue and move all the checking to system_requirements(). That way we can have a check for each thing rather than the very odd behaviour of checking for each ini setting and returning after the first failure.

alexpott’s picture

Status: Needs work » Needs review
FileSize
7.63 KB

Made a start on this in light of the fact we're now PHP7 only and also to deprecate the Unicode::check() and ::getStatus() functions as these are not necessary any more.

No interdiff as this is a departure in approach.

Version: 8.8.x-dev » 8.9.x-dev

Drupal 8.8.0-alpha1 will be released the week of October 14th, 2019, which means new developments and disruptive changes should now be targeted against the 8.9.x-dev branch. (Any changes to 8.9.x will also be committed to 9.0.x in preparation for Drupal 9’s release, but some changes like significant feature additions will be deferred to 9.1.x.). For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

Version: 8.9.x-dev » 9.1.x-dev

Drupal 8.9.0-beta1 was released on March 20, 2020. 8.9.x is the final, long-term support (LTS) minor release of Drupal 8, which means new developments and disruptive changes should now be targeted against the 9.1.x-dev branch. For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

Version: 9.1.x-dev » 9.2.x-dev

Drupal 9.1.0-alpha1 will be released the week of October 19, 2020, which means new developments and disruptive changes should now be targeted for the 9.2.x-dev branch. For more information see the Drupal 9 minor version schedule and the Allowed changes during the Drupal 9 release cycle.

Version: 9.2.x-dev » 9.3.x-dev

Drupal 9.2.0-alpha1 will be released the week of May 3, 2021, which means new developments and disruptive changes should now be targeted for the 9.3.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Version: 9.3.x-dev » 9.4.x-dev

Drupal 9.3.0-rc1 was released on November 26, 2021, which means new developments and disruptive changes should now be targeted for the 9.4.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Version: 9.4.x-dev » 9.5.x-dev

Drupal 9.4.0-alpha1 was released on May 6, 2022, which means new developments and disruptive changes should now be targeted for the 9.5.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Version: 9.5.x-dev » 10.1.x-dev

Drupal 9.5.0-beta2 and Drupal 10.0.0-beta2 were released on September 29, 2022, which means new developments and disruptive changes should now be targeted for the 10.1.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

needs-review-queue-bot’s picture

Status: Needs review » Needs work
FileSize
144 bytes

The Needs Review Queue Bot tested this issue. It either no longer applies to Drupal core, or fails the Drupal core commit checks. Therefore, this issue status is now "Needs work".

Apart from a re-roll or rebase, this issue may need more work to address feedback in the issue or MR comments. To progress an issue, incorporate this feedback as part of the process of updating the issue. This helps other contributors to know what is outstanding.

Consult the Drupal Contributor Guide to find step-by-step guides for working with issues.

Version: 10.1.x-dev » 11.x-dev

Drupal core is moving towards using a “main” branch. As an interim step, a new 11.x branch has been opened, as Drupal.org infrastructure cannot currently fully support a branch named main. New developments and disruptive changes should now be targeted for the 11.x branch, which currently accepts only minor-version allowed changes. For more information, see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Darren Oh’s picture

Title: Unicode requirements should check *_encoding for 5.6+ » Fix outdated Unicode requirements check
Status: Needs work » Needs review
andypost’s picture

Status: Needs review » Needs work
Issue tags: +Needs change record, +Needs tests

it needs to replace TODO with real CR and add deprecation test

Also there's some usage in contrib

- http://codcontrib.hank.vps-private.net/search?text=Unicode%3A%3AgetStatu...
- http://codcontrib.hank.vps-private.net/search?text=Unicode%3A%3Acheck%28...
- http://codcontrib.hank.vps-private.net/search?text=Unicode%3A%3ASTATUS_M...

Probably constants should be deprecated too as modules can do extension_loaded('mbstring')

OTOH there's polyfill shipped with core

core$ composer why symfony/polyfill-mbstring
phootwork/lang          v3.2.2 requires symfony/polyfill-mbstring (^1.12) 
symfony/console         v6.3.0 requires symfony/polyfill-mbstring (~1.0)  
symfony/dom-crawler     v6.3.0 requires symfony/polyfill-mbstring (~1.0)  
symfony/filesystem      v6.3.0 requires symfony/polyfill-mbstring (~1.8)  
symfony/http-foundation v6.3.0 requires symfony/polyfill-mbstring (~1.1)  
symfony/mime            v6.3.0 requires symfony/polyfill-mbstring (^1.0)  
symfony/string          v6.3.0 requires symfony/polyfill-mbstring (~1.0)  
symfony/validator       v6.3.0 requires symfony/polyfill-mbstring (~1.0)  
symfony/var-dumper      v6.3.0 requires symfony/polyfill-mbstring (~1.0)  
twig/twig               v3.6.0 requires symfony/polyfill-mbstring (^1.3)  

sourabhjain made their first commit to this issue’s fork.