Problem/Motivation
Mixing encodings is a bad idea, as it can lead to bugs, garbled input/output, storing bad data and security issues, so we want to use UTF-8 everywhere and check for this in the Unicode requirements check.
Over in #2332295: Unicode requirements check not working with PHP 5.6 we added support for http_input and http_output for < 5.6 in Unicode::check() be we also want to add support for the newly added internal_encoding
, input_encoding
and output_encoding
for 5.6+.
Per the RFC the precedence of settings for the mb_*, iconv_*, htmlentities functions is:
default_charset < php.* < mbstring.*/iconv.* < encoding specified by functions
So the values must be set as follows:
PHP <5.6
- mbstring.http_input = pass (set in our .htaccess)
- mbstring.http_output = pass (set in our .htaccess)
- mbstring.internal_encoding = irrelevant, we set it ourselves
- iconv.internal_encoding = empty (default) or utf-8
- iconv.output_encoding = empty (default) or utf-8
PHP >=5.6
Same as before, except now mbstring.* and iconv.* are deprecated so they may also be empty. Newly introduced variables:
- input_encoding (misnamed in PHP 5.6/7, the RFC says php.input_encoding) = empty or utf-8
- internal_encoding (misnamed in PHP 5.6/7, the RFC says php.input_encoding) = empty or utf-8
- output_encoding (misnamed in PHP 5.6/7, the RFC says php.input_encoding) = empty or utf-8
@see
- https://wiki.php.net/rfc/default_encoding
- http://php.net/manual/en/mbstring.configuration.php#ini.mbstring.interna...
Proposed resolution
Update Unicode::check() to also test for the newly introduced *_encoding options in PHP 5.6+
Remaining tasks
Review patch
User interface changes
n/a
API changes
n/a
Comment | File | Size | Author |
---|
Issue fork drupal-2455455
Show commands
Start within a Git clone of the project using the version control instructions.
Or, if you do not have SSH keys set up on git.drupalcode.org:
Comments
Comment #1
benjy CreditAttribution: benjy at CodeDrop commentedComment #2
benjy CreditAttribution: benjy at CodeDrop commentedComment #3
benjy CreditAttribution: benjy at CodeDrop commentedComment #4
benjy CreditAttribution: benjy at CodeDrop commentedComment #5
benjy CreditAttribution: benjy at CodeDrop commentedComment #6
cosmicdreams CreditAttribution: cosmicdreams commenteddefault_charset seems to be the configuration setting that is most recommended to be set. default_charset=“UTF-8” to be exact.
Is this issue as simple as adding:
Comment #7
andypostMaybe better to do that in #2455465: Add mod_php7 check to htaccess and remove php5 code
Comment #8
stefan.r CreditAttribution: stefan.r commentedThis patch ought to check that the following parameters equal UTF-8...
...and that the following parameters be either UTF-8 or not set:
Comment #9
stefan.r CreditAttribution: stefan.r commentedOops, typo in the patch and a "selecto" in the issue status!
Comment #10
stefan.r CreditAttribution: stefan.r commentedComment #14
stefan.r CreditAttribution: stefan.r commentedComment #15
benjy CreditAttribution: benjy at CodeDrop commentedLooks reasonable to me at a quick glance. Do we have tests for this stuff?
Comment #16
stefan.r CreditAttribution: stefan.r commentedWe don't yet, Unicode::check() is a static method so it can't be unit tested.
This patch turns it into a non-static method and adds a unit test. I discussed this with @Xano and ideally we'd make Unicode into a service on the container, but it's probably too late for that...
Comment #18
stefan.r CreditAttribution: stefan.r commentedComment #20
stefan.r CreditAttribution: stefan.r commentedComment #22
benjy CreditAttribution: benjy at CodeDrop commentedWhy change this? I think we should always be using static:: over self.
Is this really required when you're creating a mock, it extends the class so we should be able to override those functions anyway?
Comment #23
stefan.r CreditAttribution: stefan.r commentedYes that makes sense! :)
Comment #24
stefan.r CreditAttribution: stefan.r commentedComment #25
benjy CreditAttribution: benjy at CodeDrop commentedI really like that we can add all this test coverage but the API change of turning the Unicode class into something that needs to be instantiated will need committer approval.
Small nitpick but coding standards are camel case for class properties.
Comment #26
alexpottIt feels wrong to change the class into something that must be instantiated just for testing.
Comment #27
alexpottTo clarify - we get test coverage of this by running our test suite on different PHP versions - which we will do soon (hopefully).
Comment #28
stefan.r CreditAttribution: stefan.r commentedThat would cover the different PHP_VERSION results, but we still cannot do just do ini_set() on all of these variables, for a few of them PHP will give an error on if we try to ini_set() them from a php script.
I guess a less fancy stub like this might work?
Comment #29
stefan.r CreditAttribution: stefan.r commentedcoding standards fix
Comment #30
stefan.r CreditAttribution: stefan.r commentedComment #31
xjmDiscussed with @alexpott -- this issue is probably major, but not critical, because while this issue might severely affect sites with certain PHP configurations, we can still fix it in a patch release, and it's not that Drupal is completely unusable without this fixed on 5.6+.
Can we update the issue summary to explain the implications of this being broken more explicitly?
Comment #32
serg2 CreditAttribution: serg2 commented@xjm is this a more thorough explanation that I should put in the summary?
Problem/Motivation
In PHP 5.6 a number of encoding module/functions were deprecated, namely:
iconv.input_encoding (Default: php.input_encoding)
iconv.internal_encoding (Default: php.internal_encoding)
iconv.output_encoding (Default: php.output_encoding)
mbstring.http_input (Default: php.input_encoding)
mbstring.internal_encoding (Default: php.internal_encoding)
mbstring.http_output (Default: php.output_encoding)
all functions that take encoding option use php.internal_encoding as default (e.g. htmlentities/mb_strlen/mb_regex/etc)
The default character set (default_charset) is now "UTF-8" and the following functions are available to change encoding:
php.input_encoding
php.internal_encoding
php.output_encoding
(Source https://wiki.php.net/rfc/default_encoding)
In PHP 7 functionality which has been deprecated during the 5.x cycle will be removed.
(Source https://wiki.php.net/rfc/remove_deprecated_functionality_in_php7)
If we are able to remove the usage of these deprecated/removed functions for PHP 5.6+ while preserving compatibility for those using < PHP 5.6 then we should.
Proposed resolution
In Unicode requirements check not working with PHP 5.6 we added support for http_input and http_output for < 5.6 in Unicode::check() but we also need to add support for internal_encoding, input_encoding and output_encoding for 5.6+
The current patch at https://www.drupal.org/files/issues/2455455-29.patch checks that encoding-related parameters are correctly set for PHP versions 5.6 and up.
Remaining tasks
Review patch
User interface changes
n/a
API changes
n/a
Comment #34
stefan.r CreditAttribution: stefan.r commentedclarifying IS
Comment #35
stefan.r CreditAttribution: stefan.r commentedComment #36
stefan.r CreditAttribution: stefan.r commentedApparently these value can be an empty string if not set (php 5.6, 7), FALSE (hhvm) or sometimes NULL (according to the manual), so let's allow all of those.
Comment #37
stefan.r CreditAttribution: stefan.r commentedComment #38
stefan.r CreditAttribution: stefan.r commentedComment #40
stefan.r CreditAttribution: stefan.r commentedComment #43
stefan.r CreditAttribution: stefan.r commentedComment #46
mgiffordPatch no longer applies.
Comment #47
mradcliffeAdded tag.
Comment #48
joelpittetApplied with fuzz on
patch -p1
Comment #50
Fabianx CreditAttribution: Fabianx as a volunteer and at Tag1 Consulting commentedRTBC - looks great to me.
I don't think we need a test, but D8 core committers might disagree.
Comment #51
alexpottThe way Unicode::check() works is pretty odd. If you fixed internal_encoding you might then have to fix input_encoding and then output_encoding. And each time the status report will only tell you about the next fail. But this is behaviour is not changed by this patch so I think we should file a followup to report on all of the settings at once rather than one at a time.
As this is a hardening and an improvement of reporting I think it is a task not a bug. And therefore only eligible for 8.2.x.
Needs a reroll.
Comment #52
joelpittetRe-rolled
Comment #59
Darren OhComment #60
Darren OhCode looks good and works in my local tests.
Comment #61
Darren OhRe-roll for 8.8.x. This has also been tested locally.
Comment #63
Darren OhTest failure was not caused by this patch. All tests pass now.
Comment #65
yogeshmpawarAll looks good so setting back to RTBC! Tests failure is not related (see #3055648: Frequent random fail in \Drupal\Tests\media_library\FunctionalJavascript\MediaLibraryTest).
Comment #66
alexpottAs of Drupal 8.8.x we only support PHP7 so this patch needs adjusting to this reality. It also needs a reroll because #3008870: Drop support for PHPUnit 4.8 once PHP 5 is no longer supported (8.8.x) removed PHP5 code from the codebase.
Comment #67
alexpotthttps://www.php.net/manual/en/mbstring.configuration.php - shows that we can ignore mbstring.http_input and mbstring.http_output they're deprecated in PHP 5.6. We should remove them from .htaccess and the requirements checking functions.
Comment #68
alexpottNote iconv.* stuff is deprecated in PHP 5.6 too.
For me I think we should deprecate ::check in this issue and move all the checking to system_requirements(). That way we can have a check for each thing rather than the very odd behaviour of checking for each ini setting and returning after the first failure.
Comment #69
alexpottMade a start on this in light of the fact we're now PHP7 only and also to deprecate the Unicode::check() and ::getStatus() functions as these are not necessary any more.
No interdiff as this is a departure in approach.
Comment #77
needs-review-queue-bot CreditAttribution: needs-review-queue-bot as a volunteer commentedThe Needs Review Queue Bot tested this issue. It either no longer applies to Drupal core, or fails the Drupal core commit checks. Therefore, this issue status is now "Needs work".
Apart from a re-roll or rebase, this issue may need more work to address feedback in the issue or MR comments. To progress an issue, incorporate this feedback as part of the process of updating the issue. This helps other contributors to know what is outstanding.
Consult the Drupal Contributor Guide to find step-by-step guides for working with issues.
Comment #80
Darren OhComment #81
andypostit needs to replace TODO with real CR and add deprecation test
Also there's some usage in contrib
- http://codcontrib.hank.vps-private.net/search?text=Unicode%3A%3AgetStatu...
- http://codcontrib.hank.vps-private.net/search?text=Unicode%3A%3Acheck%28...
- http://codcontrib.hank.vps-private.net/search?text=Unicode%3A%3ASTATUS_M...
Probably constants should be deprecated too as modules can do
extension_loaded('mbstring')
OTOH there's polyfill shipped with core