Problem/Motivation
Drupal 8 includes Symfony's mbstring polyfill which means that core's Unicode library never using this the non mbstring code path. This makes the static set in Unicode::check() complete redundant and its default value wrong. This impacts unit and kernel testing as this then uses the non mbstring code path when no real site ever uses it.
Proposed resolution
- Remove the status static from Unicode
- Remove the non-mbstring code path from Unicode - it is never used
- Remove the need to call Unicode::check to set anything
- Fix bootstrap.php to normalise unit and kernel test environments
- Fix Unicode::check() so that unicode_requirements() can actually recommend installing the mbstring extension
How to generate most of the patch
# Apply the smaller not tested issue patch from https://www.drupal.org/project/drupal/issues/2849669
# Replace the occurrances including FQDN. Check core/lib/Drupal/Core/Mail/MailFormatHelper.php.
find ./core -type f -not -path "./core/node_modules/*" -not -path "./core/assets/*" -not -path "./core/misc" -exec sed -i -r 's/\\?(Drupal\\Component\\Utility\\)?Unicode::(substr|strlen|strtoupper|strtolower|strpos)/mb_\2/g;' {} \;
# Revert unnecessary changes
git co core/lib/Drupal/Component/Utility/Unicode.php
git co core/tests/Drupal/Tests/Component/Utility/UnicodeTest.php
# Fix unused uses
composer run-script phpcbf -- --sniffs=Drupal.Classes.UnusedUseStatement,PSR2.Namespaces.NamespaceDeclaration
# Apply the smaller not tested post script issue patch from https://www.drupal.org/project/drupal/issues/2849669
Remaining tasks
# Apply the smaller not tested issue patch from https://www.drupal.org/project/drupal/issues/2849669
# Replace the occurrances including FQDN. Check core/lib/Drupal/Core/Mail/MailFormatHelper.php.
find ./core -type f -not -path "./core/node_modules/*" -not -path "./core/assets/*" -not -path "./core/misc" -exec sed -i -r 's/\\?(Drupal\\Component\\Utility\\)?Unicode::(substr|strlen|strtoupper|strtolower|strpos)/mb_\2/g;' {} \;
# Revert unnecessary changes
git co core/lib/Drupal/Component/Utility/Unicode.php
git co core/tests/Drupal/Tests/Component/Utility/UnicodeTest.php
# Fix unused uses
composer run-script phpcbf -- --sniffs=Drupal.Classes.UnusedUseStatement,PSR2.Namespaces.NamespaceDeclaration
# Apply the smaller not tested post script issue patch from https://www.drupal.org/project/drupal/issues/2849669
Remaining tasks
doit
User interface changes
none
API changes
Deprecate Unicode::setStatus() since this existed for testing purposes only and the static it sets is removed.
Data model changes
none
Original issue summary
Problem/Motivation
KernelTestBase does not provide an environment consistent with a booted DrupalKernel since the locale is not set and by default the use of mb string functions is disabled.
This has been manually worked around in some tests like \Drupal\Tests\token\Kernel\FieldTest::testTokenViewMode()
This was causing unexpected test fails for me when writing new core kernel tests that depend on Drupal correctly lowercasing unicode characters.
Proposed resolution
Copy this code from \Drupal\Core\DrupalKernel::bootEnvironment()
and add it to \Drupal\KernelTests\KernelTestBase::bootEnvironment()
// Set sane locale settings, to ensure consistent string, dates, times and
// numbers handling.
setlocale(LC_ALL, 'C');
// Detect string handling method.
Unicode::check();
| Comment | File | Size | Author |
|---|---|---|---|
| #63 | 2849669-script-2-63.patch | 232.74 KB | alexpott |
| #61 | 2849669-61-script.patch | 232.74 KB | alexpott |
| #60 | 2849669-60-script.patch | 232.71 KB | alexpott |
| #54 | 2849669-script-54.patch | 232.55 KB | alexpott |
| #49 | 2849669-script-49.patch | 232.5 KB | alexpott |
Comments
Comment #2
pwolanin commentedComment #3
dawehnerInteresting. I totally agree that we should add that.
Is there any way we could actually test that? I guess for example by removing the workaround in the token test?
Comment #4
alexpottWe do part of this already in core/tests/bootstrap.php
So that shouldn't be required. The way
\Drupal\Component\Utility\Unicode::check()works is not nice. Why not make it lazy initialize? I.e if the status has not been set - set it correctly ingetStatus(). /me looks at the waySymfony\Polyfill\Mbstringworks and is jealous.Comment #5
alexpottOh hilarious... we're loading vendor/symfony/polyfill-mbstring/bootstrap.php cause Symfony uses it - so our Unicode could depend on that and we could remove the static completely.
Comment #6
alexpottLike so... no static and no unnecessary calls to Unicode::check() which now just does checking rather than any static initialisation.
Comment #7
alexpottAlso Unicode::setStatus() is horrid - why that was added for unit tests I'll never know - we shoulda just used reflection. Ughs. Probably my fault :) #1938670: Convert unicode.inc to \Drupal\Component\Utility\Unicode
Comment #8
alexpottOh and just for fun there is not D8 installation in the land where \Drupal\Component\Utility\Unicode::$status = \Drupal\Component\Utility\Unicode::STATUS_SINGLEBYTE apart from in our tests because
function_exists('mb_strlen')will always return TRUE because of the polyfill :)Comment #9
alexpottEven more fun because this will never return FALSE we'll never recommend someone installs the mbstring extension!
Comment #10
alexpottLet's fix #9 and update the issue summary.
Comment #11
alexpottFixing up UnicodeTest since there's no single byte code path to test anymore.
Comment #12
alexpottMissed a couple of things to clean up.
Comment #13
alexpott7 files changed, 48 insertions, 263 deletions.- less code yay!Comment #14
dawehnerReally nice work alex! I'll review it later properly.
Comment #15
mpdonadioIf the mb mb_convert_encoding() can do the same thing as iconv(), when why do we still have both libraries? Can we just use symfony/polyfill-mbstring instead of symfony/polyfill-iconv? Looks like this is the only usage on iconv() in core.
Comment #16
alexpott@mpdonadio well considering we already have a polyfill for iconv core is always going to be doing that code path. Hence the change I chose to make. And the mbstring string polyfill is dependent on the iconv polyfill cause it uses iconv too...
Comment #17
dawehnerI am wondering whether we should deprecate most of those functions now.
Comment #18
alexpott@dawehner we totally could do that. On it.
Comment #19
alexpottMade Unicode::mb_substr() simpler too... the NULL check here is pointless - https://secure.php.net/manual/en/function.mb-substr.php
We should file a followup to replace all the calls to the deprecated functions.
Comment #20
alexpottCreated the CR https://www.drupal.org/node/2850048 and filed the followup #2850046: Remove usages of \Drupal\Component\Utility\Unicode() functions
Comment #21
dawehnerI'm wondering whether this is some weird performance optimization? I doubt it thought to be honest. Do we have an explicit test to ensure this actually works?
Comment #22
alexpott\Drupal\Tests\Component\Utility\UnicodeTest::testSubstr tests passing a NULL. I looked back at the original commit - #26688: Add mbstring support to Drupal no notes as to why - I just can't see it as a perf optimisation.
Comment #23
alexpottFixed a typo and improved the deprecation message.
Comment #24
Munavijayalakshmi commentedRerolled the patch.
Comment #25
dawehnerI would guess this file is not excluded from the CS checking, right?
Nitpick: We need to point to 8.4.x now ...
Comment #26
naveenvalechawe are requiring a new package but not any changes in composer.lock file?
Comment #27
alexpott@naveenvalecha that's because other things are already dependent on it. But also
core/lib/Drupal/Component/Utility/composer.jsonshould be update to havesymfony/polyfill-mbstringtoo.Comment #30
alexpottSo here's a patch with proper deprecation notices for 8.6.0. The real patch with the conversions can be scripted by doing the following:
Comment #32
alexpottNot bad... just a few stragglers.
Comment #33
alexpott175 files changed, 401 insertions, 724 deletions.A lot less code and once we move to Drupal 9 less surface area to support too - 6 methods from Unicode are deprecated here.Comment #34
mile23Just mentioning because maybe someone more knowledgeable than me should close it: #2911497: Make \Drupal\Component\Utility\Unicode::truncateBytes more readable
Comment #35
alexpottRerolled.
Comment #36
alexpottComment #37
dawehnerThis doesn't remove the use statement
While our test coverage for substr and others is quite good we just have 3 test cases for convertToUtf8. I'm wondering whether we should expand the test coverage for higher confidence.
Any reason to not already require 1.7 or so?
Comment #38
alexpottThanks for the review @dawehner. #37.1 is caused because we ignore the Diff directory in core/phpcs.xml.dist so just removed them in the non-scripted patch.
Re. #37.2 well nothing is changing here. We are always using the polyfill anyway because
if (function_exists('iconv')) {will always return TRUE. So i don't think extra testing buys us anything.Re #37.3 well because it probably is not actually required? As a framework I think we should have as wide as possible dependencies.
Comment #39
mile23Do these deletions break the dependency drupal/core-diff and drupal/core-utility? If so, let's remove it from composer.json.
Comment #40
alexpottWell we have a new dependency. But nice catch!
Comment #43
borisson_I can't see anything to change or nitpick on in #40, that looks super solid. Deleting code is the best! Awesome work.
Comment #44
alexpottThe patch does not apply... rerolled...
The diff of #40 and #44 scripted patches is:
Can't interdiff because it is a reroll.
Comment #45
catchI'm wondering if we really want to deprecate Unicode::strlen() or leave them as wrappers, considering the class itself will be staying in core. With Unicode::strlen() we have a way to document that people shouldn't use strlen(), but without it we don't have a way to tell people to use mb_strlen() instead. Could we maybe just split the deprecations out to a separate issue?
Comment #46
alexpottI'd be against that (a) because ideally we won't load Unicode on many requests. (b) we have 100's of usages in core of
strlen()already so it's not as if we're telling people to not usestrlen(). They already have to decide what to use. And with this patch in D9 it'll be simpler. Just use PHP functions either mb_strlen() or strlen() depending on your use-case.Comment #47
catchThat's fair, back to RTBC.
Comment #48
larowlanWe have a circular reference here, core-render relies on core-utility - is that a problem? And if so do we have an existing issue to fix it (pre-existing in HEAD)?
>80 here, but pre-existing, so don't bother if not rerolling for something else
uber-nit: I realise this is just a test, but should we use mb_strtolower here to set a good example?
Because these are component tests, should the composer.json of this component include the bridge in its require-dev?
Comment #49
alexpottThanks for the review @larowlan re #48:
New patch also cleans up other comment flow issues found as part of checking out #48.2
The patch to commit is the huge one :)
Comment #50
alexpottGiven that the only thing to fix in #48 was an uber-nit moving back to rtbc,
Comment #51
mile23I can tell you with a high degree of certainty that Utility currently doesn't need dev dependencies: #2876669-47: Fix dependency version requirement declarations in components
However, when this patch lands, the travis tests will (or at least should) fail for the reasons in #48.4.
Still trying to figure out how to test for all this stuff in an easy way.
Comment #52
mile23Actually, #51 is incorrect. I can't say with a high degree of certainty about Utility, because it was failing, so I commented it out of the travis test and never got back to it. :-)
Working on it now. https://github.com/paul-m/drupal_component_tester/commit/4004314d0e41a01...
Comment #54
alexpottGit managed to rebase this... let's see what the both thinks.
Comment #55
tstoecklerYay, this looks great!
core/composer.jsonstill has a reference tolib/Drupal/Component/Utility/Unicode.phpinautoload->classmap. That should no longer be necessary after this. On the other hand, I thinklib/Drupal/Component/Utility/Timer.phpthere has also been obsolete for a while so we can also fix that together in a follow-up. Leaving RTBC for now.Comment #56
alexpott@tstoeckler there is an issue to remove those already - somewhere in the issue queue - it will be done as part of #1826054: [Meta] Expose Drupal Components outside of Drupal
Comment #57
tstoecklerOK, perfect then. RTBC++
Comment #58
mile23Based on the work from #52, I ended up discovering that drupal/core-utility depends on drupal/core-assertion, at least for running tests: https://github.com/paul-m/drupal_component_tester/blob/master/patch/fixe...
Leaving RTBC because we don't have a formal way to confirm this in the Drupalsphere (as opposed to the Travissphere).
One of these days we might get #2876669: Fix dependency version requirement declarations in components :-)
Comment #60
alexpottRe ran the script.
Comment #61
alexpottRerolled again.
Comment #63
alexpottRerolled.
Comment #64
catchThis seems like the wrong place for a lot of this - should we open another follow-up to factor it out elsewhere?
Comment #65
mile23Is this really fixed?
Comment #66
alexpottNope not yet. Here's the requested follow-up - good idea to move this somewhere else #2971452: Deprecate Unicode::check() and move the application set up stuff somewhere else.
Comment #67
catchSorry I got interrupted, meant to commit/push the patch while also asking for the follow-up. I've committed/pushed the patch to 8.6.x now so marking this fixed again!
Comment #69
mile23Just tagging #2876669: Fix dependency version requirement declarations in components because that needs a re-roll now.