We don't need to use UTF-8 for all our indexed machine names/UUIDs when they might as well be simple ASCII, especially not if we are looking to implement utf8mb4 in the future, which would further increase index size and could affect query buffer performance. (VAR)CHAR utf8mb4 fields also have a 191 character limitation on their indexes, while ASCII would allow for indexing all 255 characters.
Furthermore, not specifying on the schema level which fields should be UTF-8 and which ones should be simple ASCII adds to our technical debt, as developers may use UTF-8 characters in places where we haven't tested that they will actually work. Using non-ASCII characters in a module name disallows us from using hooks for instance, as PHP only accepts ASCII characters in function names.
Originally proposed by @sun & @Damien Tournoud:
It's probably worth tidying up our schema (for example: machine name-type keys should probably use a ascii character set so as to reduce the size of the index)
we introduce support for an ascii or binary charset and use it to tidy up our indexes and primary keys (especially machine names and UUIDs that have poped up all over the place in Drupal 8).
What needs to happen is that we need to stop using Unicode columns (and indexes) where they are not necessary. This is a comprehensive change that cannot be workaround by simply bumping database version requirements.
- Review patch
- File followup issue for ASCII support in other database engines than MySQL
User interface changes
- By cleaning up our schema definitions in this issue, we explicitly disallow non ASCII characters for certain machine names on the database level. If we didn't disallow this already, we now do, and stick to supporting UTF-8 for content only.
- Addition of a new
is_asciisetting on the string formatter (in addition to the already existing "length" and "case_sensitive" settings).
- Addition of a new
varchar_asciion the schema definition.
Beta phase evaluation
|Issue category||Task because this is an API cleanup / performance issue|
|Issue priority||Normal, but affects performance, blocksand reduces fragility|
|Prioritized changes||The main goal of this issue is API clean-up and unblocking full UTF-8 support in|
|Disruption||Not disruptive for core because we don't use non-ASCII characters in funny places anyway|
PASSED: [[SimpleTest]]: [PHP 5.4 MySQL] 92,250 pass(es).
PASSED: [[SimpleTest]]: [PHP 5.4 MySQL] 91,959 pass(es).
FAILED: [[SimpleTest]]: [PHP 5.4 MySQL] 91,719 pass(es), 97 fail(s), and 145 exception(s).