Follow-up to #1314214: MySQL driver does not support full UTF-8 (emojis, asian symbols, mathematical symbols)

Problem/Motivation

LanguageItem fields are hardcoded to be regular varchar columns in LanguageItem::schema() and also have an is_ascii flag set that does nothing. This flag was supposed to (and did in an earlier revision of the patch) set langcode fields to be ascii.

So currently some langcode fields are ASCII-only (those defined through hook_schema(), ie node_access.langcode), and some aren't (those created through LanguageItem, ie. node.langcode)

Proposed resolution

Convert all langcode fields to ASCII for consistency.

We want to take out is_ascii in the schema definition, as it actually does nothing, and set the column type to varchar_ascii instead.

Remaining tasks

Review patch

User interface changes

N/A

API changes

N/A

Data model changes

N/A

Beta phase evaluation

Reference: https://www.drupal.org/core/beta-changes
Issue category Bug because we changed is_ascii to varchar_ascii in #1923406: Use ASCII character set on alphanumeric fields so we can index all 255 characters and forgot to apply this change to langcode fields
Issue priority Normal
Prioritized changes The main goal of this issue is documentation/DX improvement, as developers may look at core files for documentation on how to implement another field type. is_ascii is a field setting, and varchar_ascii is a schema thing. This is confusing enough as it is, and the current code only confuses things even more.
Disruption No, just an update hook
CommentFileSizeAuthor
#71 2549821-59_0.patch7.23 KBstefan.r
#67 interdiff-58-59.txt385 bytesstefan.r
#66 2549821-59.patch7.23 KBstefan.r
#60 2549821-58.patch7.23 KBstefan.r
#60 interdiff-57-58.txt2.56 KBstefan.r
#57 2549821-57.patch9.99 KBstefan.r
#54 2549821-54.patch7.23 KBstefan.r
#53 2549821-53.patch4.68 KBstefan.r
#49 2549821-49.patch4.68 KBstefan.r
#49 interdiff-42-49.txt694 bytesstefan.r
#46 2549821-46.patch6.36 KBstefan.r
#42 2549821-42.patch3.81 KBstefan.r
#40 2549821-40.patch10.78 KBstefan.r
#37 2549821-37.patch6.34 KBstefan.r
#37 interdiff-35-37.txt610 bytesstefan.r
#35 2549821-35.patch6.35 KBstefan.r
#35 interdiff-21-35.txt385 bytesstefan.r
#31 2549821-21.patch6.28 KBstefan.r
#27 interdiff-21-27.txt1.71 KBstefan.r
#27 2549821-27.patch6.21 KBstefan.r
#21 2549821-21.patch6.28 KBjhedstrom
#21 interdiff.txt695 bytesjhedstrom
#20 2549821-20.patch6.37 KBjhedstrom
#20 interdiff.txt4.47 KBjhedstrom
#17 2549821-17.patch5.56 KBstefan.r
#13 2549821-12.patch4.03 KBstefan.r
#13 interdiff-10-12.txt3.37 KBstefan.r
#11 interdiff.txt951 bytesjhedstrom
#10 2549821-10.patch1.96 KBjhedstrom
#2 2549821-1.patch1.04 KBstefan.r
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

stefan.r created an issue. See original summary.

stefan.r’s picture

Status: Active » Needs review
FileSize
1.04 KB
stefan.r’s picture

Issue summary: View changes
stefan.r’s picture

Issue summary: View changes
jhedstrom’s picture

+++ b/core/lib/Drupal/Core/Field/Plugin/Field/FieldType/LanguageItem.php
@@ -74,9 +75,9 @@ public static function schema(FieldStorageDefinitionInterface $field_definition)
-          'is_ascii' => TRUE,

Won't this result in triggered entity schema changes for columns with data since the stored schema will contain is_ascii => TRUE?

stefan.r’s picture

if it does, it shouldn't, as is_ascii is not a valid schema property

jhedstrom’s picture

It may not be valid, but it is stored:

MariaDB [d8]> select value from key_value where collection = 'entity.storage_schema.sql' and name = 'user.field_schema_data.langcode';
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| value                                                                                                                                                                                                                                                                                                    |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| a:2:{s:5:"users";a:1:{s:6:"fields";a:1:{s:8:"langcode";a:4:{s:4:"type";s:7:"varchar";s:6:"length";i:12;s:8:"is_ascii";b:1;s:8:"not null";b:1;}}}s:16:"users_field_data";a:1:{s:6:"fields";a:1:{s:8:"langcode";a:4:{s:4:"type";s:7:"varchar";s:6:"length";i:12;s:8:"is_ascii";b:1;s:8:"not null";b:1;}}}} |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

I'm not sure why this doesn't break our existing upgrade path tests since the starting database contains data in users_field_data.

jhedstrom’s picture

Assigned: Unassigned » jhedstrom

I'm going to dig into #7 a bit.

Berdir’s picture

What i suggested in IRC is to just go through the stored schema and clean it up by hand.

jhedstrom’s picture

Assigned: jhedstrom » Unassigned
FileSize
1.96 KB

This quick change to an upgrade test does show that this change requires entity updates to be applied. The second runUpdates call was necessary to see the actual exception that gets thrown for tables with data in them. Cleaning the schema by hand may work as suggested in #9.

jhedstrom’s picture

FileSize
951 bytes

Status: Needs review » Needs work

The last submitted patch, 10: 2549821-10.patch, failed testing.

stefan.r’s picture

Status: Needs work » Needs review
FileSize
3.37 KB
4.03 KB

Here's an update hook that removes the is_ascii key from the stored schemas.

stefan.r’s picture

While this patch would still be an improvement, I'm thinking maybe we may as well fix this completely (by changing the mysql charset on LanguageItem fields) instead of making it go from completely broken to half-broken?

Status: Needs review » Needs work

The last submitted patch, 13: 2549821-12.patch, failed testing.

jhedstrom’s picture

instead of making it go from completely broken to half-broken?

Complete fix makes sense to me since. This should be possible using an approach similar to the update hook in #13.

stefan.r’s picture

Status: Needs work » Needs review
FileSize
5.56 KB

Status: Needs review » Needs work

The last submitted patch, 17: 2549821-17.patch, failed testing.

jhedstrom’s picture

Assigned: Unassigned » jhedstrom

The update hook is looking good. I'm going to dig into why the test is failing, and move it to a dedicated test class.

jhedstrom’s picture

Assigned: jhedstrom » Unassigned
Status: Needs work » Needs review
FileSize
4.47 KB
6.37 KB

This test actually looks for the change we're making, and moves it to a dedicated test case.

jhedstrom’s picture

FileSize
695 bytes
6.28 KB

Oops, this removes a lingering debug.

stefan.r’s picture

Test looks great!

+++ b/core/modules/system/src/Tests/Entity/Update/LangcodeToAsciiUpdateTest.php
@@ -0,0 +1,80 @@
+    // Only testable on MySQL.
+    // @see https://www.drupal.org/node/301038

This is fine, varchar_ascii only works on MySQL :)

jhedstrom’s picture

Still good to document though so folks know where to go if they are looking at the code and wondering why it's mysql-only.

stefan.r’s picture

Issue summary: View changes
stefan.r’s picture

Issue summary: View changes
plach’s picture

Overall, this looks great. Only one (minor) complaint:

+++ b/core/modules/system/system.install
@@ -1241,3 +1241,58 @@ function system_update_8003() {
+        $schema = $schema_property->getValue($field_definition);
+        if (isset($schema['columns']['value']['is_ascii'])) {
+          $schema['columns']['value']['type'] = 'varchar_ascii';
+          unset($schema['columns']['value']['is_ascii']);
+        }

Can't we just hard-code the schema definition here? It should be more reliable.

FYI #2542748: Automatic entity updates can fail when there is existing content, leaving the site's schema in an unpredictable state is adding a check so that only relevant field schema properties actually trigger a mismatch between code definitions and state ones.

stefan.r’s picture

@plach like this?

Nice that #2542748: Automatic entity updates can fail when there is existing content, leaving the site's schema in an unpredictable state would simplify this, do we postpone this issue on that or at this point it doesn't matter?

stefan.r’s picture

Technically if we hardcode the schema like this #27 should check for the length of 12 as well -- in the extreme edge case same mistake in a contrib/custom module implementing their own field and is upgrading from an older beta, we might accidently reduce the length to 12 otherwise.

plach’s picture

Yeah, good point, let's go back to #21.

Status: Needs review » Needs work

The last submitted patch, 27: 2549821-27.patch, failed testing.

stefan.r’s picture

stefan.r’s picture

Status: Needs work » Needs review
plach’s picture

Status: Needs review » Reviewed & tested by the community

Thanks!

Status: Reviewed & tested by the community » Needs work

The last submitted patch, 31: 2549821-21.patch, failed testing.

stefan.r’s picture

Status: Needs work » Reviewed & tested by the community
FileSize
385 bytes
6.35 KB

reroll and back to RTBC

Status: Reviewed & tested by the community » Needs work

The last submitted patch, 35: 2549821-35.patch, failed testing.

stefan.r’s picture

Status: Needs work » Needs review
FileSize
610 bytes
6.34 KB

Dumps are now set in setDatabaseDumpFiles()

stefan.r’s picture

Status: Needs review » Reviewed & tested by the community

Test went from red to green, back to RTBC considering the interdiff

Status: Reviewed & tested by the community » Needs work

The last submitted patch, 37: 2549821-37.patch, failed testing.

stefan.r’s picture

Status: Needs work » Reviewed & tested by the community
FileSize
10.78 KB

reroll

Status: Reviewed & tested by the community » Needs work

The last submitted patch, 40: 2549821-40.patch, failed testing.

stefan.r’s picture

Status: Needs work » Needs review
FileSize
3.81 KB

That included a file from another patch :/

plach’s picture

Status: Needs review » Reviewed & tested by the community
alexpott’s picture

Status: Reviewed & tested by the community » Needs work
+++ b/core/modules/system/system.install
@@ -1463,5 +1463,60 @@ function _system_update_create_block($name, $theme_name, array $values) {
+function system_update_8006() {

Needs tests. All updates should have tests and be run against all DBs.

alexpott’s picture

And obviously we need to check with and without data.

+++ b/core/modules/system/system.install
@@ -1463,5 +1463,60 @@ function _system_update_create_block($name, $theme_name, array $values) {
+                if ($database->driver() == 'mysql') {
+                  $database_schema->changeField($table_name, $field_name, $field_name, $schema_copy[$item_name][$table_name]['fields'][$field_name]);
+                }

Ah I see an mysql only update... hmmm. Still needs testing.

stefan.r’s picture

Status: Needs work » Needs review
FileSize
6.36 KB

This actually already had tests :/

stefan.r’s picture

Status: Needs review » Reviewed & tested by the community

Compared to patch previously rtbced by @plach, this has the test again and is green now so can probably go back

stefan.r’s picture

Status: Reviewed & tested by the community » Needs work

And obviously we need to check with and without data.

Missed this bit. This has no filled test here yet, I'll add one

stefan.r’s picture

Status: Needs work » Needs review
FileSize
694 bytes
4.68 KB

Status: Needs review » Needs work

The last submitted patch, 49: 2549821-49.patch, failed testing.

Status: Needs work » Needs review

stefan.r queued 49: 2549821-49.patch for re-testing.

Status: Needs review » Needs work

The last submitted patch, 49: 2549821-49.patch, failed testing.

stefan.r’s picture

Status: Needs work » Needs review
FileSize
4.68 KB
stefan.r’s picture

The last submitted patch, 53: 2549821-53.patch, failed testing.

Status: Needs review » Needs work

The last submitted patch, 54: 2549821-54.patch, failed testing.

stefan.r’s picture

plach’s picture

Status: Needs review » Reviewed & tested by the community

Looks good, thanks

alexpott’s picture

Status: Reviewed & tested by the community » Needs work
+++ b/core/modules/system/src/Tests/Entity/Update/LangcodeToAsciiUpdateTest.php
--- /dev/null
+++ b/core/modules/system/src/Tests/Entity/Update/LangcodeToAsciiUpdateTest.php.rej

oops

stefan.r’s picture

plach’s picture

Status: Needs review » Reviewed & tested by the community

Thanks

The last submitted patch, 57: 2549821-57.patch, failed testing.

The last submitted patch, 57: 2549821-57.patch, failed testing.

Status: Reviewed & tested by the community » Needs work

The last submitted patch, 60: 2549821-58.patch, failed testing.

stefan.r’s picture

+++ b/core/modules/system/system.install
@@ -1570,5 +1570,60 @@ function _system_update_create_block($name, $theme_name, array $values) {
+function system_update_8006() {

is taken...

stefan.r’s picture

stefan.r’s picture

FileSize
385 bytes

The last submitted patch, 60: 2549821-58.patch, failed testing.

stefan.r’s picture

seemingly unrelated fails on postgres/sqlite

plach’s picture

Try to reupload the patch

stefan.r’s picture

plach’s picture

Status: Needs review » Reviewed & tested by the community

Looks good now

alexpott’s picture

Status: Reviewed & tested by the community » Fixed

Committed e8dfde4 and pushed to 8.0.x. Thanks!

Thanks for adding the beta evaluation to the issue summary.

  • alexpott committed e8dfde4 on 8.0.x
    Issue #2549821 by stefan.r, jhedstrom, plach: Langcode fields should be...

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.