Problem/Motivation

The Drupal MySQL driver does not currently provide full UTF-8 support. The (confusingly named) MySQL utf8 charset only provides support for the basic character plane (out of 17 in total), which constitutes 6% of possible characters. As of MySQL 5.5.3 (2010), the utf8mb4 charset provides full UTF-8 support, being completely backwards compatible, not requiring more space than utf8 for characters that are in the utf8 set, and using extra byte for characters outside of the utf8 set.

Proposed resolution

Provide an optional setting in settings.php to enable utf8mb4:

$database['default']['default']['charset'] = 'utf8mb4';
$database['default']['default']['collation'] = 'utf8mb4_general_ci';

Note: everything in the attached patch is completely optional. All current Drupal installs are unaffected until the database settings for a MySQL database are changed to the utf8mb4 charset.

The patch should completely fix new installations, however when using this setting on an existing installation, all existing tables must be converted to the utf8mb4 charset first.

In order for this setting to work, all of the following conditions need to be met:

1. In order to allow for large indexes, MySQL must be set up with the following my.cnf settings:

[mysqld]
innodb_large_prefix=true
innodb_file_format=barracuda
innodb_file_per_table=true

These settings are available as of MySQL 5.5.14, and are defaults in MySQL 5.7.7 and up.
2. The PHP MySQL driver must support the utf8mb4 charset (libmysqlclient 5.5.3 and up, as well as mysqlnd 5.0.9 and up).
3. The MySQL server must support the utf8mb4 charset (5.5.3 and up).

Remaining tasks

User interface changes

N/A

If you set up your database in settings.php prior to running the installer, and your database system does not support utf8mb4, we finish early and show the following error:

API changes

N/A

CommentFileSizeAuthor
#171 interdiff.txt6.89 KBDavid_Rothstein
#171 2488180-171.patch14.58 KBDavid_Rothstein
#165 2488180-165.patch14.58 KBstefan.r
#165 interdiff.txt493 bytesstefan.r
#164 interdiff.txt1.84 KBstefan.r
#164 2488180-164.patch14.59 KBstefan.r
#162 interdiff.txt597 bytesstefan.r
#162 2488180-160.patch14.42 KBstefan.r
#158 interdiff.txt12.52 KBstefan.r
#158 2488180-156.patch14.21 KBstefan.r
#152 2488180-152.patch18.21 KBstefan.r
#152 interdiff.txt9.42 KBstefan.r
#147 interdiff.txt3.7 KBstefan.r
#147 d7-2488180-143.patch16.46 KBstefan.r
#142 Capture d’écran 2016-06-26 à 13.20.44.png136.64 KBstefan.r
#141 interdiff.txt1.55 KBstefan.r
#141 d7-2488180-141.patch16.4 KBstefan.r
#139 d7-2488180-140.patch16.91 KBstefan.r
#139 interdiff.txt908 bytesstefan.r
#138 ss.png198.32 KBstefan.r
#138 interdiff.txt11.41 KBstefan.r
#138 d7-2488180-138.patch16.41 KBstefan.r
#133 interdiff.txt1.19 KBstefan.r
#133 d7-2488180-133.patch11.97 KBstefan.r
#131 d7-2488180-131.patch11.91 KBstefan.r
#129 d7-2488180-128.patch11.68 KBstefan.r
#129 interdiff.txt14.19 KBstefan.r
#120 interdiff.txt886 bytesstefan.r
#120 2488180-120.patch7.59 KBstefan.r
#118 2488180-mysql-utf8mb4-support-reroll-for-7.41.patch7.67 KBKhaledBlah
#87 87-2488180-mysql-utf8mb4-support-reroll-for-7.41.patch6.54 KBadammalone
#85 0001-2488180-mysql-utf8mb4-support-reroll-for-7.41.patch4.38 KBstovak
#60 interdiff.txt2.32 KBNoe_
#60 2488180-60.patch3.89 KBNoe_
#54 0001-mysql-utf8mb4-support-2488180-54.patch3.38 KBstovak
#51 mysql-utf8mb4-support-2488180-51.patch3.2 KBstovak
#48 2488180-48.patch2.01 KBpwolanin
#45 increment.txt702 bytespwolanin
#45 2488180-45.patch712 bytespwolanin
#44 increment.txt710 bytespwolanin
#44 2488180-44.patch703 bytespwolanin
#40 2488180-40.patch664 bytespwolanin
#8 0001-0001-2488180-6-D7-backport-MySQL-driver-does-not-sup.patch3.59 KBstovak
#6 0001-2488180-6-D7-backport-MySQL-driver-does-not-support-.patch3.6 KBstovak
#4 0001-D7-backport-MySQL-driver-does-not-support-full-UTF8.patch2.86 KBstovak
#1 0001-allow-mysql-charset-to-be-configured-in-settings.php_.patch2.93 KBstovak
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

stovak’s picture

stovak’s picture

Assigned: stovak » Unassigned
stovak’s picture

stovak’s picture

stovak’s picture

Status: Active » Needs review
stovak’s picture

stovak’s picture

The last one fixes an issue with schema lengths being incompatible with utf8mb4. Indexed varchar fields cannot be longer than 191 bytes.

stovak’s picture

stovak’s picture

stovak’s picture

Issue summary: View changes
stovak’s picture

Issue summary: View changes
David Strauss’s picture

-1 on adding a configuration option. If "utf8mb4" is as good as you're saying, why not always use it?

joelpittet’s picture

@David Strauss Wordpress recently auto-updated to this and from what I've heard they had a few issues with the transition to utfmb4.

https://make.wordpress.org/core/2015/04/02/the-utf8mb4-upgrade/
https://core.trac.wordpress.org/ticket/21212

Maybe the issues weren't all that severe but opt-in may be a better approach here?

David Strauss’s picture

@joelpittet Looks like the only issue, other than minimum versions, is hitting the maximum index size a little faster.

joelpittet’s picture

@David Strauss they are auto updating the charset/collation as well on the upgrade which we are not. And I failed pretty bad last attempt at this upgrade on a D7 site(willing to try again though). So just wanted to put in a warning.

We could try to do an update hook that covers all the tables Drupal knows about.
https://core.trac.wordpress.org/attachment/ticket/21212/21212.2.diff

stovak’s picture

@david - Existing installs will not have the tables converted and moving the entire Drupal platform to use this encoding exclusively was not something I assumed everyone would want. I have a script to convert the tables to utf8mb4. The initial goal as just "making drupal go" on mb4, which it currently does not. Should i include the table updater in /scripts/ folder with this patch?

joelpittet’s picture

@stovak I'm interested to see the script you use and give it a whirl.

joelpittet’s picture

I'm leaning towards the full on utf8mb4 by default method. How can we mitigate the risks?

joelpittet’s picture

Status: Needs review » Needs work

Definitely needs some docs and some changes to default.settings.php

  1. +++ b/includes/database/mysql/database.inc
    @@ -27,6 +27,10 @@ class DatabaseConnection_mysql extends DatabaseConnection {
    +    ¶
    +    // Allow the drupal default charset to be overridden in the settings.php ¶
    

    Extra white space at the end of the lines.

  2. +++ b/includes/database/mysql/schema.inc
    @@ -77,11 +77,11 @@ class DatabaseSchema_mysql extends DatabaseSchema {
    -
    

    Whitespace removal not needed.

  3. +++ b/includes/database/mysql/schema.inc
    @@ -77,11 +77,11 @@ class DatabaseSchema_mysql extends DatabaseSchema {
    +      // allow charset to be overridden in settings.php
    

    Capital letter to start period at the end. Coding standards.

  4. +++ b/includes/database/mysql/schema.inc
    @@ -129,6 +129,14 @@ class DatabaseSchema_mysql extends DatabaseSchema {
    +      // length should be 191 and not 255
    

    Period at the end.

mikeytown2’s picture

Just throwing this out that if we had support for charset and collation in hook_schema that would be awesome. I've switched some tables over to ascii, ascii_bin for 5-20% faster queries. Best example being the bunder in advagg where I've been able to drop 40ms of query time (100ms to 60ms) #2487087: Speed up the bundler, use char instead of varchar; also use ascii_bin instead of utf8_bin.

joelpittet’s picture

Specified key was too long; max key length is 767 bytes

happened quite a few times with:
ALTER TABLE table_name_here CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci

The following tables:

actions
advagg_aggregates_hashes
authcache_p13n_key_value
cache
cache_* (all of them)
commerce_checkout_pane
commerce_coupon
commerce_order
commerce_product
current_search
ds_field_settings
ds_layout_settings
entityqueue_queue
entityqueue_subqueue
facetapi
field_group
file_managed 
file_usage 
filter
filter_format
image_styles
imce_dir_man
menu_router
message_type
message_type_category
metatag_config
node_access
nodeaccess
oauth_common_nonce
page_manager_handlers
page_manager_pages
page_manager_weights
panels_mini
registry
registry_file
rules_dependencies
rules_tags
search_api_db_product_display_field_* (a few taxonomy terms)
search_api_db_product_display_text
semaphore
system
taxonomy_vocabulary
trigger_assignments
vef_video_styles
workbench_moderation_states

How safe is it to convert the 255 varchar columns to 191? Wouldn't that truncate data? Or is there a better way?

stovak’s picture

@joel - here's a symfony console command i used on my sites. If included in the patch, I would port the script so that the symfony stack wasn't a requirement and use the conventions of the other files in the /scripts folder, e.g. use database from bootstrap.inc.

https://gist.github.com/stovak/eb7361c5de2921197f04

Yes, the script truncates the columns. Currently, i'm only working on a single site buildout that has the emoji requirement and I didn't have any columns with long values that were varchar, so truncation to varchar(191) was not an issue for me. I'm not sure how to mitigate the risk for the rest of the community. That would need some careful thought. Off the top of my head, I think node titles would be the biggest culprit.

I agree that EVENTUALLY D7 should embrace mb4 as its default. MySQL totally screwed the pooch with it's 3byte implementation and it's ridiculous that we find ourselves here. I'm just not sure how the Drupal community... who are resistant to upgrading from php 5.3... would feel having this forced on them in a point Drupal release. There would need to be messaging and pre-warning about the fact that it was a breaking change. There's no going back once you've made the conversion.

If it just pops up in a release that the update changes the table structure of the entire db, I think there would be villagers with pitchforks.

stovak’s picture

@mikey = I think that kind of a change is out of scope of this thread. That would need to be a different issue. This focus is not performance, its ability to store correct text values for languages that require 4-byte characters.

stovak’s picture

Here's the first stab at converting the DB script away from Symfony and using it standalone:

https://gist.github.com/stovak/8b0e45774226b915d0ac

mikeytown2’s picture

If D7 schema supported different charset and collations the conversion could easily be done by using db_change_field().

Code could look something like this once the changes are put into the correct hook_schema implementations.

$schema = drupal_get_complete_schema();
foreach ($schema as $table => $values) {
  foreach ($values['fields'] as $field_name => $field_values) {
    db_change_field($table, $field_name, $field_name, $field_values);
  }
}

Edit: Here's the script I created to change the collation in advagg

/**
 * Convert the table to the specified collation.
 *
 * @param string $table_name
 *   Preform the operation on this table.
 * @param array $fields
 *   An array of field names
 * @param string $collation
 *   The db collation to change to table columns to.
 *
 * @return array
 *   Returns an array of tables and column names.
 */
function advagg_install_change_table_collation($table_name, array $fields, $collation) {
  $db_type = Database::getConnection()->databaseType();
  // Skip if not MySQL.
  if ($db_type !== 'mysql') {
    return FALSE;
  }
  $table_name = Database::getConnection()->prefixTables('{' . db_escape_table($table_name) . '}');
  $results = db_query("SHOW FULL FIELDS FROM $table_name")->fetchAllAssoc('Field');
  foreach ($results as $row) {
    if (!in_array($row->Field, $fields)) {
      continue;
    }
    $charset = strtolower(substr($collation, 0, strpos($collation, '_')));
    $query = "ALTER TABLE $table_name MODIFY {$row->Field} {$row->Type} CHARACTER SET $charset COLLATE $collation";
    db_query($query);
  }
}
David_Rothstein’s picture

Why was this filed as a separate issue from #1314214: MySQL driver does not support full UTF-8 (emojis, asian symbols, mathematical symbols)? I think we might want to close it as a duplicate; that issue is already marked for backport, and having discussions going on in two places at once isn't really good.

Regarding the default behavior, seems worth keeping an eye on WordPress's experience. We don't want to break things, but it would be really nice to make this the default. Presumably if we did that we'd do it in a similar way as they did (don't change the core MySQL requirement; just make it the default only for sites that are running on a new enough version of MySQL). But yeah, making it an opt-in configuration setting is certainly the safer way to go.

In https://www.drupal.org/node/2473301#comment-9858035 @catch suggested making it the default on new installs only (again, only if their MySQL version is high enough), which may be a good compromise.

stovak’s picture

@david The back port fix is so completely different from the D8 fix I assumed it would be tracked separately. The DB subsystems are completely different. But if this conversation needs to go on somewhere else, i'm more than happy to close this issue. Not really interested in anything other than getting the issue resolved and back ported to 7.

The "patch" process for core is a mystery to me so I did what I knew to do.

stefan.r’s picture

Status: Needs work » Active

In terms of charsets we should probably only allow utf8/utf8mb4 (and possibly ascii, although that may be outside of the scope of this issue) as options. We don't support other charsets on D8 either.

Truncating tables to 191 characters seems quite invasive. We'd only need to do it on columns with unique indexes or primary keys (regular columns such as node title are fine), but there are still a lot of those in core and contrib, and some may actually need the extra 64 characters.

Our best may be to only use utf8mb4 only if innodb_large_prefixes is enabled (as of MySQL 5.7 that will be the case by default, in earlier versions that requires some extra configuration), which allows for the varchar fields to stay at 255 characters even with utf8mb4. That way we wouldn't need to deal with the 191 character limitation or backport workarounds such as #1923406: Use ASCII character set on alphanumeric fields so we can index all 255 characters

stefan.r’s picture

Status: Active » Closed (duplicate)

Can we summarize the comments from this thread in #1314214: MySQL driver does not support full UTF-8 (emojis, asian symbols, mathematical symbols)? There's no need for this to be a separate issue and the D8 patch is close to being committed anyway.

rlustemberg’s picture

I don't understand why this is being closed as a duplicate. I have not seen any Drupal 7 solution on https://www.drupal.org/node/1314214

David_Rothstein’s picture

@david The back port fix is so completely different from the D8 fix I assumed it would be tracked separately. The DB subsystems are completely different. But if this conversation needs to go on somewhere else, i'm more than happy to close this issue. Not really interested in anything other than getting the issue resolved and back ported to 7.

Thanks, and no worries. It looks like we got some valuable discussion here anyway, about how to handle the D7-specific part of this.

From a code standpoint, I actually don't think they are very different at all. Most of the Drupal 8 patch is dealing with 191 character issues around the codebase which would need to be considered in Drupal 7 too... Although I guess @stefan.r has an interesting idea above for how to avoid dealing with that in Drupal 7.

stovak’s picture

I agree that truncating those fields is not anything we should put in core. Hence my reticence at including my "update" script. I hadn't thought about truncating ONLY fields that have indexes. I will look into that today and modify the latest version of my conversion script.

@stefan - i think we have three different use cases here:

  1. new installs of drupal where tables/fields are created
  2. existing installs of drupal being converted to the new collation
  3. Adding fields/modules enabling and creating fields to an existing (assumed live) site

My suggestion would be incremental change that eventually resolves all three use cases. I would say that the first push to core should simply solve the problems with *ENABLING* mb4. In other words... its there if you want it, we're going to solve all the problems before we make it mandatory for all users in a later update.

So including an "update" script (not necessarily mine) and a way to simply 'make it go' to solve the problem for people who _NEED_ to solve the problem. Then later come back and update core to ONLY use mb4 encoding once the install and new field process has been sorted.

stovak’s picture

Status: Closed (duplicate) » Active
stovak’s picture

I'd like to continue this conversation a little while longer before we push back to the other ticket, after the d8 patch is committed, if that's ok with all concerned.

stefan.r’s picture

Well the utf8->utf8mb migration module could also be contrib. And actually regular indexes are fine (it's not a big deal if those are only 191 characters, InnoDB will just index 191 characters out of the 255 automatically), it's just primary keys and unique keys that pose a problem.

Which can be worked around if we really need to support utf8mb4 on older MySQL versions, but those workarounds would be a bigger effort than the D8 effort, because of contrib and migration path/BC issues.

stefan.r’s picture

D8 patch has now been committed so feel free to summarize and move discussion to #1314214: MySQL driver does not support full UTF-8 (emojis, asian symbols, mathematical symbols)

pwolanin’s picture

It seems the D8 patch is not done?

@David_Rothstein, for Drupal 7 I would suggest we just strip them on input as a 1st pass. I don't see how the Drupal 7 mysql requirement could be raised to support this?

stefan.r’s picture

Yes, the D8 patch was reverted and needs review now

pwolanin’s picture

Looks like you could just filter using like:

preg_replace('/[^\x0000-\xffff]/u', '', $string)

or maybe:

preg_replace('/[^\x0000-\xffff]/u', '?', $string)

Not sure how slow that might be, however.

Looks like the best place to try it might be in drupal_write_record() - it wouldn't catch 100% of queries, but the schema is already loaded there and we already case some fields to int or string.

pwolanin’s picture

Status: Active » Needs review
FileSize
664 bytes

Here's a little patch to start. Let's see if it breaks anything.

Status: Needs review » Needs work

The last submitted patch, 40: 2488180-40.patch, failed testing.

stefan.r’s picture

+++ b/includes/common.inc
@@ -7267,6 +7267,10 @@ function drupal_write_record($table, &$record, $primary_keys = array()) {
+      elseif ($info['type'] != 'blob' && $info['mysql_character_set'] == 'utf8') {

mysql_character_set may not always be set

mikeytown2’s picture

preg_replace can return NULL so I would want to make sure it's not NULL at a minimum.
http://stackoverflow.com/questions/3466035/how-to-skip-invalid-character...

pwolanin’s picture

Status: Needs work » Needs review
FileSize
703 bytes
710 bytes

ok, easy to add isset() check

pwolanin’s picture

FileSize
712 bytes
702 bytes

oops - forgot to deal w/ NuLL question - we can cast to string like the else branch.

pwolanin’s picture

Title: D7 backport: MySQL driver does not support full UTF-8 (emojis, asian symbols, mathematical symbols) » Drupal 7 MySQL does not support full UTF-8 (emojis, asian symbols, mathematical symbols)
pwolanin’s picture

Looks like we should use instead the invalid character utf-8 character: U+FFFD

pwolanin’s picture

Status: Needs review » Needs work
Related issues: +#1211866: Enable ENT_SUBSTITUTE flag in Html::escape
FileSize
2.01 KB

Seems like the patch above is wrong - $info['mysql_character_set'] is only set when getting the CREATE TABLE sql.

This ought to be closer (includes a test) but everything blows up.

David_Rothstein’s picture

Note that #1314214: MySQL driver does not support full UTF-8 (emojis, asian symbols, mathematical symbols) was committed to Drupal 8 a few days ago - should we merge these issues back together?

stovak’s picture

I have created a table_converter module. Would welcome feedback.

https://github.com/stovak/table_converter

-tom

stovak’s picture

Rerolled original patch for current Drupal HEAD

geerlingguy’s picture

Since #1314214: MySQL driver does not support full UTF-8 (emojis, asian symbols, mathematical symbols) was merged into D8, should this issue be marked duplicate now, and the work merged back into that issue?

dpopdan’s picture

It is working now on Drupal 8, but what about Drupal 7 ?

stovak’s picture

re-rolled for 7.38

stovak’s picture

@geerlingguy - This is a non-breaking change that only ENABLES the ability to do UTF8MB4. What do you think the chances are we can get this in a future update to d7?

geerlingguy’s picture

Status: Needs work » Needs review

I can't speak for the community, or for the D7 core maintainers, but this patch does only add the ability to set a default charset and collation for each database connection, so it's non-destructive, and would be a welcome improvement to allow D7 sites everywhere to work with utf8mb4 if desired.

From #25:

If D7 schema supported different charset and collations the conversion could easily be done by using db_change_field().

That's basically all this patch does, and since it's non-destructive and doesn't change anything for existing Drupal 7 sites, I definitely give a thumbs up. However, one thing that this patch would introduce that is not present in Drupal 8 (AFAICT) is the database charset setting. In Drupal 8's default.settings.php, only the collation is configurable per-database. Also, I think there was a whitespace issue or two in the patch when I grabbed it and applied it to my sites; that would need cleaning up.

If we want that setting added, I would think it needs to be added to D8 first then backported like this patch.

Setting to needs review so the testbot can have a go.

The last submitted patch, 48: 2488180-48.patch, failed testing.

joelpittet’s picture

Status: Needs review » Needs work

I much prefer this incremental approach but echo what @geerlingguy said in #56, glad to see collation and charset configurable per database. Seems to pass tests. May want to add some notes in default.settings.php and clean up the comment coding standards to help this patch get through the core gates.

  1. +++ b/includes/database/mysql/database.inc
    @@ -25,6 +25,8 @@ class DatabaseConnection_mysql extends DatabaseConnection {
    +    $charset = (array_key_exists('charset', $connection_options) ? $connection_options['charset'] : 'utf8');
    

    Same here re isset/empty vs array_key_exists.

  2. +++ b/includes/database/mysql/schema.inc
    @@ -81,7 +81,8 @@ class DatabaseSchema_mysql extends DatabaseSchema {
    +      // allow charset to be overridden in settings.php.
    

    Nit: Capital 'a' to start the sentence.

  3. +++ b/includes/database/mysql/schema.inc
    @@ -81,7 +81,8 @@ class DatabaseSchema_mysql extends DatabaseSchema {
    +      'mysql_character_set' => (array_key_exists('charset', $info) ? $info['charset'] : 'utf8'),
    

    We usually use isset() or empty() unless you expect a NULL or flasey value'd charset key.

  4. +++ b/includes/database/mysql/schema.inc
    @@ -130,7 +131,14 @@ class DatabaseSchema_mysql extends DatabaseSchema {
    +      if ($spec['length'] >= 192 && $info['charset'] == "utf8mb4") {
    +        $spec['length'] = 191;
    +      }
    

    Will this affect existing fields?

stovak’s picture

@joel -
1-3 NP. Will clean these up and re-roll.
4. No, only new fields. This code does not change the database. For changes to existing structure: https://github.com/stovak/table_converter

-t

Noe_’s picture

Status: Needs work » Needs review
FileSize
3.89 KB
2.32 KB

Added some documentation about the patch in /sites/default/default.settings.php
Replaced the array_key_exists by isset function calls.
Fixed the nitpicking start with a capital :)

joelpittet’s picture

Thanks for the clean up, not sure I know enough about this system to RTBC but thanks for code standard nit cleanup and answering my question.

stovak’s picture

Thanks guys, sorry, got in the weeds with another project.

geerlingguy’s picture

Status: Needs review » Reviewed & tested by the community

RTBC by me (@stovak, you can confirm as well) for D7, but with the caveat that we need to add the charset configuration option to Drupal 8 if we add it in D7, otherwise we're introducing a new feature in D7 that's not supported in D8!

This patch adds two optional settings.php configuration directives to specify charset and collation per-database, and in Drupal's critical path, it only adds one isset() call and two string concatenations, so it only adds a couple microseconds to execution time, while affording every D7 site the ability, if so desired, to support emojis. Not having support is a serious deficiency in today's mobile-centric world, where emoji are fairly common in community sites (and even in many 'brochure' style sites, eesh!).

geerlingguy’s picture

Should we consolidate efforts with #1314214: MySQL driver does not support full UTF-8 (emojis, asian symbols, mathematical symbols)? It seems that the only major differences here are we're not changing the default collation, since we have to support older versions of MySQL in D7, and we're adding charset configuration (which should first be added to Drupal 8).

stefan.r’s picture

This needs a check for $charset being either utf8 or utf8mb4, as we probably shouldn't be supporting all possible charsets in core. Maybe we can throw an exception if it's neither of those? Or make it a boolean instead? (TRUE for utf8mb4, FALSE for utf8)

This will also break primary/unique keys that are > 191 characters, unless inno_db_large_prefixes is turned on. IMO we should only support utf8mb4 on D7 if that database option is turned on, otherwise we have to deal with the gargantuan task of reworking all contrib modules.

#63 I don't think this will need a "forward port" to Drupal 8 as a decision was made for Drupal 8 not to support utf8

stovak’s picture

@geerlingguy I don't see this as a "feature that d7 has" so much as "hacky bullshit way of making d7 support something d8 already does by default." A/D?

@stefan.r this module doesn't make any changes to the tables themselves, only makes it POSSIBLE for utf8mb4. It's not turned on by default and you have to explicitly convert the tables in order for the 191 length restriction to be in effect.

-tom

stefan.r’s picture

Status: Reviewed & tested by the community » Needs work

Actually the current implementation won't necessarily break for fields over 191 characters, but if the charset is set to utf8mb4 in the database settings it _will_ limit all new fields (including non-string fields, where the limitation is not needed) to 191 characters on field creation, which could have a couple of unintended consequences we need to see that we can live with.

So this probably shouldn't be RTBC yet :)

joelpittet’s picture

re #67 thanks that was really the worry I had tried to ask in #58.4

stefan.r’s picture

Well one of the main concerns with the current patch is how problematic it would be to support in both core and contrib. Mixing charsets (and by implication, character cutoffs) on the same install doesn't seem like a good idea to begin with, and we also shouldn't have maintainers trying to figure out which table is running on which charset whenever there's a support request caused by this patch.

As to the character limitation, in D8 we found a way to deal with it (with varchar_ascii and some rework) but in D7 I doubt we will, considering the amount of contrib modules out there.

However, if innodb_large_prefixes is turned on, the character limitation doesn't apply and the support concerns disappear as for new installs everything would still be the same as usual, and as far as the upgrade path is concerned, the steps would merely be:

1. Backup the tables
2. Convert all tables to utf8mb4
3. Everything still works as usual, maybe with a tiny performance degradation?

So for any patch we do here, I would suggest to only allow utf8mb4 if inno_db_large_prefixes is enabled (which as of MySQL 5.7 GA will be the case by default anyway), so we can forget about the whole 191 character problem in D7.

stovak’s picture

i'm on board with this. It was my desire to only _ENABLE_ this and not to actually change anyone's install. What changes need to be made to the patch in order for this plan to go forward? Turn off the 191 limit if that setting is applied?

stefan.r’s picture

For the brave ones who are fine with tinkering the 191 limit may be OK (as well as the current patch along with @stovak's migration script), but if we want to get this patch into core it should probably not have any trace of a 191 character limit at all.

I believe we can test for this with SELECT @innodb_large_prefix, so if that doesn't have the right value, we should error if the charset is set to utf8mb4.

The patch should also disallow marking any new fields as utf8mb4 before all existing tables have been converted as we don't want to cause new issues by mixing charsets.

joelpittet’s picture

Can we provide some instructions on how to enable innodb_large_prefix?

joelpittet’s picture

Also is the query not SELECT @@innodb_large_prefix? SELECT @innodb_large_prefix gives me NULL. (Doesn't know much about server variable queries).

I'm using mariadb 10 locally but same NULL on production MySQL 5.5.44 servers.

stefan.r’s picture

Not sure about the query. As to how to turn it on, see https://dev.mysql.com/doc/refman/5.6/en/innodb-parameters.html#sysvar_in...

The following must be in my.cnf:

innodb_large_prefix=on
innodb_file_format=barracuda
innodb_file_per_table=true

tables must also be created with ROW_FORMAT=DYNAMIC

joelpittet’s picture

I think query with the double @@ seems to work quite well on mariadb and mysql. Just need a confirm.

I know how to do these things myself but if we expect people to change MySQL configuration we should give them some kind of rudimentary instructions.

How do you check forROW_FORMAT=DYNAMIC:
Edit:
SELECT row_format FROM information_schema.tables WHERE table_schema="your_db"

And could you explain why those settings are all needed and will all of them be on by default in 5.7?
Edit: it says most of this in the link you provided;)

Noe_’s picture

From what I understand utf8mb4 is in D8 see: https://www.drupal.org/node/1314214#comment-10036841

And what are the next actionable steps to get this into core.
I really need this because now my core is patched, so I would be willing to help out. I just need to know what eed to do.

First check if @@innodb_large_prefix is on and if it is, just allow utf8 and utf8mb4?

stovak’s picture

So I think the suggestion on the table is the code surrounding the 191 character limit on Indexes needs to be replaced with a check for this config flag.

If its there, allow the index to continue without intervention, if not, limit the column to 191 chars...???? I don't know here, i'm asserting. Please feel free to suggest anything you think makes sense.

-t

stefan.r’s picture

Instead of a silent failure like that it should fail harder IMO, like a big error screen if the utf8mb4 flag is set in the config file but the innodb_large_prefix option is disabled... along with a warning in the documentation right above the config file flag that enabling the flag will break your site if that innodb setting is not set.

And we should probably keep failing if that flag is set but tables haven't been converted to the current character set yet (similarly if all tables have been converted to utf8mb4 and people set the config file setting back to utf8)...

So for existing installs, the actual steps would be:

1. Set up InnoDB database settings
2. Set up the config setting
3. Turn on maintenance mode
4. Run migration script that backups database and converts all tables if the backup is successful
5. Turn off maintenance mode

While for new installs it would be:

1. Set up InnoDB database settings
2. Set up the config setting
3. Install Drupal as usual

hass’s picture

Should this patch not allow people to install Drupal with the installer correctly? Editing the settings file first is not very common. People expect this to work from installer, too. I think we need to be able to select the collation in installer, too. The patch also does not verify if mySQL is capable of doing all right plus is does not check the client requirements. There is also a D8 case open for this. As people may lose data otherwise or corrupt it's database I think this need to be much more fools prove.

stefan.r’s picture

I agree ideally we want this to be an installer toggle that's only checkable when the database client/server have support (and the right config settings).

I actually dont think there's any chance of corruption/data loss though, the database driver would fail rather than corrupt anything.

stovak’s picture

Excellent point & i agree completely. I will attempt to add this in the near future if someone doesn't beat me to it.

Personal note: The sites I work on are retail so August/Sept is my busy season getting ready for xmas. Won't have much time in the next 30-60 days.

-tom

devbanana’s picture

I'd love any updates on this. I converted my database and tables to utf8mb4, but now can't install any new modules that create tables because they try to create it in utf8 by default.

darol100’s picture

What are the steps to continue fixing this issue ? What things needs to be done in order to commit this to Drupal core ?

stovak’s picture

Not sure about anyone else, but I think we left it needing a hook for the installer to have the option of installing utf8mb4 if support was available. I haven't been able to code that and probably won't get to it until after xmas.

stovak’s picture

Rerolled for 7.41

twistor’s picture

Status: Needs work » Needs review
adammalone’s picture

Ok let's take a quick pass at this. It's not pretty but it seems to work. I'm open to ideas about how we present the option to the user. Unfortunately we can't check utf8mb4 presence before presenting the db connection form because we'll have no credentials. We are also unable to check after that form has been submitted because the tables will have been created so it's a bit of a Frankenstein's monster in between job.

stovak’s picture

So I think there's 2 parts to this. First basic support, which the #85 patch does. Can we disassociate the user-facing stuff to get it elegant and the developer-facing stuff to just add support? I'd like someone to mark this RTBC on my #85 patch to get it into the next core update. Then we can continue this thread on the changes to the installer to allow utf8mb4 during the install process.

Agree/Disagree?

-tom

geerlingguy’s picture

@stovak:

I'd like someone to mark this RTBC on my #85 patch to get it into the next core update. Then we can continue this thread on the changes to the installer to allow utf8mb4 during the install process.

I vote +1 for this too—let's get in the backend functionality first ("support full UTF-8"), then work on the user-facing components for new installs. Since that will likely require more bikeshedding, I vote we create a separate ticket to implement the UI for choosing utf8mb4 support during site install, and work to get what's in #85 into core.

adammalone’s picture

Status: Needs review » Reviewed & tested by the community

Works for me. Marking this as RTBC based on #85 and spinning out #2600982: Create UI support for utf8mb4 support for the UI patch and inevitable bikeshedding.

stovak’s picture

As they say in Texas, "muchos grassy ass"

stefan.r’s picture

Status: Reviewed & tested by the community » Needs work
  1. +++ b/includes/database/mysql/install.inc
    @@ -29,5 +36,39 @@ class DatabaseTasks_mysql extends DatabaseTasks {
    +      try {
    +        if (db_query("SHOW CHARSET LIKE 'utf8mb4'")->fetchField() &&
    +        db_query("SHOW COLLATION  LIKE 'utf8mb4%'")->fetchField()) {
    +          $this->pass(st('Database is encoded in utf8mb4'));
    +        }
    +        else {
    +          $this->fail(st("The database software must have both utf8mb4 charset and collation available to it."));
    +        }
    +      }
    

    Could we use the Drupal 8 code here instead?

  2. +++ b/includes/database/mysql/schema.inc
    @@ -131,6 +132,14 @@ class DatabaseSchema_mysql extends DatabaseSchema {
    +    $info = $this->connection->getConnectionOptions();
    +    if (isset($spec['length']) && isset($info['charset'])) {
    +      // If the length of the field is set and the collation is utf8mb4
    +      // length should be 191 and not 255.
    +      if ($spec['length'] >= 192 && $info['charset'] == "utf8mb4") {
    +        $spec['length'] = 191;
    +      }
    +    }
    

    I don't think this will be able to go into a core release. I understand this will only apply to new tables but just silently changing the length of the fields from 255 to 191 is a BC break and will lead to issues with contrib or custom modules as well.

    Per #67 maybe we could check for the innodb_large_prefix option instead (which is now the default in mysql 5.7 GA) and refrain from changing the length of fields?

adammalone’s picture

Thanks for the review @stefan.r. Could I politely ask for part 1 of your review to be cross posted over to #2600982: Create UI support for utf8mb4 support since we're limiting the scope of this issue to what the patch in #85 adds; that is, the backend functionality.

For part 2, if the change only applies to new tables and new Drupal installs does that count as BC breakage? With utf8mb4 collation we do have to limit field sizes due to the extra information that's stored - are you suggesting we make innodb_large_prefix support a prerequisite for utf8mb4 collations?

stefan.r’s picture

Yes that is what I was suggesting, as we can't just have a field that refers to the same thing in two tables that has a length of 255 in one place and 191 in another. And changing max length of all string fields from 255 to 191 is definitely a BC break for both contrib and core. We don't do this in D8 either. For D8 we went through a big effort to make this workable for installs without innodb_large_prefix (see #1314214: MySQL driver does not support full UTF-8 (emojis, asian symbols, mathematical symbols)), but I feel doing the same for D7 is not realistic.

So even if we split out the UI functionality some points will still need to be addressed before this is close to RTBC:

  1. The column length issue.
  2. The current patch allows defining any charset. Probably we should only allow utf8 and utf8mb4.
  3. I don't think we want to mix utf8mb4 tables and utf8 tables on the same install, it should be either all utf8 or all utf8mb4. So a prerequisite to fields being created in another charset (whether utf8 or utf8mb4) will be for all the existing ones to be in that same charset as well. So for existing installs either we include conversion logic or we ask the user to convert manually. Until that happens we either fail or we ignore the charset defined in settings.php.
  4. If we split out the UI support, let's take it out of the current patch as well?
  5. This still needs tests.
darol100’s picture

Issue tags: +Needs tests
geerlingguy’s picture

I hid the patch from #87, addressing point #4

stovak’s picture

I will address Stefan's concerns one at a time: addressing #94.1 "Column Length Issue":

So what this code does is truncate the index for fields > 192 chars when installed AFTER the config for UTF8MB4 has been set. This will not effect the install process because we are not making UTF8MB4 available to the install process in this ticket. This is for newly-created fields and enabled modules and doesn't truncate any currently existing data. So I'm thinking we surround the current check for field size with a check for innodb_large_prefix and if innodb_large_prefix fails, then set the length at 192?

Again, this will not actually CHANGE any currently existing field, but will change newly created fields after the charset and collation has been applied and configured.

$info = $this->connection->getConnectionOptions();

$innodb_large_prefix = strtoupper(
  db_query("SHOW VARIABLES LIKE 'innodb_large_prefix'")->fetchField(1)
);
if (isset($spec['length']) 
          && isset($info['charset']) 
          && $innodb_large_prefix != "ON") {
      // If the length of the field is set and the collation is utf8mb4
      // and innodb_large_prefix is not enabled on the server then
      // length should be 191 and not 255 to prevent error
      if ($spec['length'] >= 192 && $info['charset'] == "utf8mb4") {
        $spec['length'] = 191;
      }
}

Stefan?

stefan.r’s picture

@stovak I'm aware of that, and didn't we go through this earlier in the issue? I thought that after #31/#32 you agreed that truncation of the fields should not be put in core...

IMO if innodb_large_prefix fails we should just not allow utf8mb4 at all as we have no good way to deal with the 191 character problem.

stovak’s picture

@stefan: Probably, I don't remember. This thing has dragged out six months and honestly I'm actually very sorry I wanted to contribute any code to core.

Considering the fact that is developer-facing only how do we "prevent UTF8MB4 from being used"? I say we use my fix above with a drupal_set_message that the field has been truncated and let the developer solve the problem by enabling large_prefix...??

stovak’s picture

@stefan:

$info = $this->connection->getConnectionOptions();
$innodb_large_prefix = strtoupper(db_query("SHOW VARIABLES LIKE 'innodb_large_prefix'")->fetchField(1));
if (isset($spec['length']) && isset($info['charset']) && $innodb_large_prefix != "ON") {
 // If the length of the field is set and the collation is utf8mb4
 // and innodb_large_prefix is not enabled on the server then
 // length should be 191 and not 255 to prevent error.
 if ($spec['length'] >= 192 && $info['charset'] == "utf8mb4") {
    $spec['length'] = 191;
    drupal_set_message(":name has been truncted to 191 characters because utf8mb4 has been enabled.",
          array(":name" => $name)
        );
  }
}
stovak’s picture

Updated documentation:

 * For handling proper UTF8 including emojis, asian symbols, mathematical
 * symbols, you may set the collation and charset to utf8mb4:
 * @code
 * $databases['default']['default'] = array(
 *   'driver' => 'mysql',
 *   'database' => 'databasename',
 *   'username' => 'username',
 *   'password' => 'password',
 *   'host' => 'localhost',
 *   'prefix' => 'main_',
 *   'collation' => 'utf8mb4_general_ci',
 *   'charset' => 'utf8mb4',
 * );
 * @endcode
 *
 * NOTE: When enabling UTF8MB4 it's vital you also enable innodb_large_prefix
 * on your mysql install. Otherwise indexes are limited to 191 characters.
 * In your my.cnf the following entries are suggested:
 *
 * [mysql]
 * default-character-set = utf8mb4
 *
 * [mysqld]
 * innodb_large_prefix=1
 * character_set_server = utf8mb4
 * character-set-client-handshake = FALSE
 * collation_server = utf8mb4_unicode_ci
 * default-character-set = utf8mb4

stefan.r’s picture

This thing has dragged out six months and honestly I'm actually very sorry I wanted to contribute any code to core.

I understand your frustration but we all want to get this into core. This is a hard issue which will need quite a bit of work so you help is very much appreciated!

I think truncating is a bad idea, full stop. The Drupal 7 maintainer already mentioned in #31 that _if_ we truncate, we need to work around all the issues that may cause, and a trigger_error on createField is not nearly enough. Ideally we would do a fatal error on connect if people set the charset to utf8mb4 without setting the innodb_large_prefix setting, but that may have performance implications as we don't want to add an extra query every time on connect.

Also see #74, in order to be able to set innodb_large_prefix there are further prerequisites (and this is only available on MySQL 5.5.14 and up).

nithinkolekar’s picture

Until this hard issue get fixed (related to 191 characters) and committed to core could it be possible to support converting 4byte character to its equivalent html entities? just like https://www.drupal.org/node/2531908#comment-10122828 instead of stripping those characters.

update for future readers: feature implemented by helgi in module https://www.drupal.org/project/unicode.

stefan.r’s picture

#103 this seems like a good idea. Maybe a new filter as a contrib module?

nithinkolekar’s picture

back to original issue...

I gone through complete thread and like to revise few key points based on above comments

1. #2600982: Create UI support for utf8mb4 support should be postponed until this issue get RTBC and committed .

2. make utf8mb4 only applicable/available(fresh install) for MySQL >=5.5.14 (this way we can force user to upgrade the mysql if they really need utf8mb4 ?)

3. If MySQL >=5.5.14 check for innodb_large_prefix fails do nothing. (also applies to #2600982: Create UI support for utf8mb4 support As per the #87's concern this must be checked after credential submission. If innodb_large_prefix is disabled then notify user to enable it or suggest them to stick to utf8)

long live 255 chars :)

joelpittet’s picture

@stovak was testing your script out in #24 but it had some issue with $db_url which looks like D6 variable.

Have you thought of maybe a drush script, it would be bootstrapped already and they could just run drush scr convert.php

joelpittet’s picture

@stovak similar script but for drush, didn't have to touch the individual column it would seem (my test was to throw some emoji's on a node).

https://gist.github.com/14c155902e6b62bc8c8e

I had a pain of a time getting open_files_limit working on my OSX because of the conversion to innodb_file_per_table.

Here's the settings I needed:

# Emoji Help #
character-set-server = utf8mb4
collation-server = utf8mb4_unicode_ci
skip-character-set-client-handshake

innodb_file_format     = Barracuda
innodb_file_format_max = Barracuda
innodb_file_per_table  = 1
innodb_large_prefix    = 1

I also had to do this a few times (locally) but that may be because I was changing buffer pool sizes.
http://dba.stackexchange.com/questions/1261/how-to-safely-change-mysql-i...

# Tweaked for brew's mariadb 
mysql -uroot -proot -e"SET GLOBAL innodb_fast_shutdown = 0"
mysql.server stop
mv /usr/local/var/mysql/ib_logfile[01] /tmp
mysql.server start

mysql.server start doesn't work well on OS X with mariadb... (tones of reboots instead;-) <- lack of emoji)

sebas79’s picture

Hello guys,

I'm following this issue for a couple of weeks now. With the guide of midwesternmac i got it working for my site.

As I also mentioned on Midwesternmac I can only add emoji to the body field and not to the title or comment field. Do you have the same experience?

joelpittet’s picture

Oh I did need to touch the columns, I've updated my script to add some of the stuff @mikeytown2 mentioned in #25

https://gist.github.com/joelpittet/14c155902e6b62bc8c8e

In the process of testing it out some more.

@sebas79 Sorry haven't tried out that table converter module, it looks pretty good maybe @stovak can help with why it didn't work for title and other fields.

infojunkie’s picture

For those interested, we've forked @stovak's module to add drush support: https://github.com/meedan/drupal_utf8mb4_converter. We made a few other fixes to handle different edge cases. The big part that still needs fixing is the truncation of keys.

joelpittet’s picture

@infojunkie I added some large prefix support to the README that I mentioned in #107 as a pull request to @stovak's repo

stovak’s picture

So I love the changes you guys have done with my table converter. Send me a pull request and i'll pull all your changes back into my source. I also like the name change.

What I've been wresting with is what to do if innodb_large_prefix = OFF and Schema decides to install a new table with a 255 char length string field index. Anything more than 191 will throw a sequel error. My current solution was to truncate the field index and send the user a message. Stefan is dead set against it and I understand his point.

I just don't know what to do in this eventuality.

For Reference:

   *   Name of the field.
   * @param $spec
   *   The field specification, as per the schema data structure format.
   */
  protected function createFieldSql($name, $spec) {
    $sql = "`" . $name . "` " . $spec['mysql_type'];

    $info = $this->connection->getConnectionOptions();
    $innodb_large_prefix = strtoupper(
      db_query("SHOW VARIABLES LIKE 'innodb_large_prefix'")->fetchField(1)
    );
    if (isset($spec['length']) && isset($info['charset']) && $innodb_large_prefix != "ON") {
      // If the length of the field is set and the collation is utf8mb4
      // and innodb_large_prefix is not enabled on the server then
      // length should be 191 and not 255 to prevent error.
      if ($spec['length'] >= 192 && $info['charset'] == "utf8mb4") {
        $spec['length'] = 191;
        drupal_set_message(
           t(":name has been truncted to 191 characters because utf8mb4 has been enabled.",
          	array(":name" => $name))
        );
      }
    }
    if (in_array($spec['mysql_type'], array('VARCHAR', 'CHAR', 'TINYTEXT', 'MEDIUMTEXT', 'LONGTEXT', 'TEXT'))) {
      if (isset($spec['length'])) {
        $sql .= '(' . $spec['length'] . ')';
      }
      if (!empty($spec['binary'])) {
        $sql .= ' BINARY';
      }
stefan.r’s picture

What we did in Drupal 8 was introduce varchar_ascii, which does allow for 255 character indexes. I don't think we can go through that exercise for Drupal 7, so if innodb_large_prefix = OFF, just don't use utf8mb4 - either throw an error or fall back to utf8

stovak’s picture

I'm ok with it throwing an exception. I don't think the developer should get unexpected behavior of reverting to utf8.

Anyone else care to chime in?

nithinkolekar’s picture

+1 for 113 and that is what I suggested at #105. Instead of struggling with the old version mysql , just update to version which supports utf8mb4.

stovak’s picture

Look well o, wolves:

   *   Name of the field.
   * @param $spec
   *   The field specification, as per the schema data structure format.
   */
  protected function createFieldSql($name, $spec) {
    $sql = "`" . $name . "` " . $spec['mysql_type'];

    $info = $this->connection->getConnectionOptions();
    $innodb_large_prefix = strtoupper(
      db_query("SHOW VARIABLES LIKE 'innodb_large_prefix'")->fetchField(1)
    );
    if (isset($spec['length']) && isset($info['charset']) && $innodb_large_prefix != "ON") {
      // If the length of the field is set and the collation is utf8mb4
      // and innodb_large_prefix is not enabled on the server then
      // length should be 191 and not 255 to prevent error.
      if ($spec['length'] >= 192 && $info['charset'] == "utf8mb4") {
        throw new Exception(t("If you're going to set the charset to UTF8MB4 you must set innodb_large_prefix=ON in your my.cnf config file."));
      }
    }
    if (in_array($spec['mysql_type'], array('VARCHAR', 'CHAR', 'TINYTEXT', 'MEDIUMTEXT', 'LONGTEXT', 'TEXT'))) {
      if (isset($spec['length'])) {
        $sql .= '(' . $spec['length'] . ')';
      }
      if (!empty($spec['binary'])) {
        $sql .= ' BINARY';
      }
KhaledBlah’s picture

@infojunkie @stovak

I still received the error message:

comment: SQLSTATE[HY000]: General error: 1709 Index column size too large. The maximum column size is 767 bytes.

Turns out that the table's row format must be either ROW_FORMAT=DYNAMIC or ROW_FORMAT=COMPRESSED as said before by @joelpittet and @stefan.r.

I've forked @infojunkie's repo at https://github.com/khaledblah/drupal_utf8mb4_converter and sent a pull request (https://github.com/meedan/drupal_utf8mb4_converter/pull/1).

KhaledBlah’s picture

So I used the patch from #87 to support utf8mb4 on my system but although I had all the necessary settings it would still fail with the error message: "General error: 1709 Index column size too large. The maximum column size is 767 bytes." I figured this was because when the table was created ROW_FORMAT=DYNAMIC was not added to the query and ROW_FORMAT=COMPACT (the default) just does not work in conjunction with "innodb_large_prefix=1". (see ROW_FORMAT at https://dev.mysql.com/doc/refman/5.6/en/create-table.html).

So I took the patch from #112 (I slightly modified the message) and added the bit "ROW_FORMAT=DYNAMIC" in createTableSql() in case innodb_large_prefix is set to ON. This is in accordance to the mysql documentation and obviously only done if charset is set to "utf8mb4".

infojunkie’s picture

Thanks KhaledBlah. Can you explain the overlap between the pull request at #117 and the patch at #118? Are they meant to work independently, or together? I'm testing the pull request locally, without the patch at #118, and it seems to function correctly for my use case.

stefan.r’s picture

Applied the fix from #116. Leaving needs work as the issues in #94 still need to be addressed.

KhaledBlah’s picture

Hi infojunkie,

the pull request and the patch in #118 are related because in that patch, new fields will be created using a table with the ROW_FORMAT=DYNAMIC setting if innodb_large_prefix is "ON". In my pull request there is no check for the "innodb_large_prefix" setting, however, since I have quite a few fields with rows of type varchar and a length > 191 characters it was necessary to first alter the tables before the rows and second make the sure a table has the appropriate ROW_FORMAT setting. This way the same indexes will be available on the converted tables/rows like before even if they have more than 191 characters (obviously the "innodb_large_prefix" setting is "ON" in my DBMS).

infojunkie’s picture

OK thanks KhaledBlah. Merged your pull request.

helmo’s picture

After some more reading I found a module (strip_utf8mb4) with a quickfix/workaround.
It was mentioned already in the D8 discussion, but not yet in this issue.

+1 for fixing it properly though.

Helgi Andri Jónsson’s picture

Hello everyone,

I made a module that uses entity metadata wrappers to loop over the properties of an entity on hook_entity_presave and it converts all unicode characters to html entities for properties of type text and formatted text. I know it would be great have this solved in core and its not really a solution to strip them from input as that would be counter to the expectation of a user experience.

I hope the module solves a middle ground. I also made sure to include configurable settings to allow us to blacklist or whitelist entities to check for unicode. Therefore we can isolate such input to certain entities like comments or messages.

https://www.drupal.org/project/unicode

Best Regards,
Helgi.

jduhls’s picture

Working with an existing (dev) site/database:

1. I dumped my database and (UPDATED!) replaced CHARSET=utf8 with CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci ROW_FORMAT=DYNAMIC in the create table statements

2. I upgraded from MySQL 5.1 to 5.6.

3. I updated my mysql configuration using all except "init_connect": http://stackoverflow.com/a/36405914

4. I re-imported my database dump.

5. I updated charset/collation of my drupal database using: https://gist.github.com/jduhls/dd77f7dfea879766d01ee60279ab0c3e

6. I applied #120.

7. I updated my settings.php: http://www.jeffgeerling.com/blogs/jeff-geerling/solving-emoji-problem-dr...

So far so good working with Twitter API results.

DamienMcKenna’s picture

Just to mention it, the Unicode module does indeed fix the problem for Field API, though you have to specifically enable it for the entity types that are affected. So +1 for it as a workaround.

stefan.r’s picture

I will look at this during a Drupal Dev Days next week :)

Aiming to get a simple version of this patch that makes this work with late MySQL 5.6 and 5.7 into Drupal 7.50 (which is due to be released early July at the earliest) :)

barobba’s picture

I am glad this is being addressed for D7 because it is pretty important to me. I am working on implementing a site for a Native American tribe has a 5-byte character (1XXXX). So, I am looking at what to change in the existing site now that I am inserting text, and Drupal (in combination with MySQL) is crashing.

stefan.r’s picture

Status: Needs work » Needs review
FileSize
14.19 KB
11.68 KB

Just posting a first draft of the backport here. The idea is we allow people to set the charset to be 'utf8mb4' in the database connection info array prior to running the installer and then do a hook_requirements() check.

We'll point to https://www.drupal.org/node/2754539 for instructions on how to make this work for existing installations, for which we'll work on a D7 contrib project that checks requirements and converts all tables.

Status: Needs review » Needs work

The last submitted patch, 129: d7-2488180-128.patch, failed testing.

stefan.r’s picture

Status: Needs work » Needs review
FileSize
11.91 KB

Status: Needs review » Needs work

The last submitted patch, 131: d7-2488180-131.patch, failed testing.

stefan.r’s picture

Status: Needs work » Needs review
FileSize
11.97 KB
1.19 KB

Let's see if this fixes the test failure

pwolanin’s picture

+++ b/modules/system/system.install
@@ -518,6 +524,61 @@ function system_requirements($phase) {
+  $utf8mb4_enabled = $connection->utf8mb4IsEnabled();
+  $utf8mb4_supported = $connection->utf8mb4IsSupported();

It seems like these methods need to be implemented in the base DatabaseConnection class?

Does this patch work on postgres or sqlite?

I see now there is a check that the method exists, but still seems like it would be better to have them implemented in the base class or every subclass for consistency?

Fabianx’s picture

Status: Needs review » Needs work
  1. +++ b/includes/database/mysql/database.inc
    @@ -28,6 +28,11 @@ class DatabaseConnection_mysql extends DatabaseConnection {
    +    $charset = $this->utf8mb4IsEnabled() && !(isset($this->connectionOptions['_dsn_utf8_fallback']) && $this->connectionOptions['_dsn_utf8_fallback'] === TRUE) ? 'utf8mb4' : 'utf8';
    

    nit - This line is a _little_ long ... :D

  2. +++ b/includes/install.inc
    @@ -433,7 +433,35 @@ abstract class DatabaseTasks {
    +        if ($e->getCode() == 2019) {
    

    Lets use a constant for that.

    In case this is the same in D8, lets open a child issue to fix that in both versions.

    --

    Nothing checks that this fails due to the charset being set to utf8mb4. It could be set to garbage and this error message is misleading.

Overall the approach is nice, however I can't see why we don't autodetect support for UTF-8 MB and use that automatically?

As that is only for new installations, I don't think it would be a BC-break.

Contrib might totally not support it, so it should indeed be an optional feature - even if the DX to set this up prior to installation is pretty bad.

--

And yes, tests are still much missing. Kill the tests with 'fire'! :D

--

Edit: Do we need to change things in core from varchar to varchar_ascii as well?

We do not as we only support special configurations.

Fabianx’s picture

The technical approach has been approved by a framework manager. (me)

pwolanin’s picture

I think this needs tests for 4-byte utf8, which should pass on all 3 supported SQL backends (needs some hacking around to set the mysql options in the conntection)

stefan.r’s picture

Added tests and addressed #134, #135 :)

Can only run this locally as far as MySQL is concerned, test results attached

stefan.r’s picture

Fabianx’s picture

Status: Needs review » Reviewed & tested by the community

This looks very great to me! RTBC

stefan.r’s picture

stefan.r’s picture

stefan.r’s picture

Issue summary: View changes
stefan.r’s picture

Created a contrib project at https://www.drupal.org/project/utf8mb4_convert that we can point to in the release notes

stefan.r’s picture

Issue summary: View changes

Updated installation instructions in https://www.drupal.org/node/2754539

pwolanin’s picture

Status: Reviewed & tested by the community » Needs work

Some minor concerns over code comments and text

  1. +++ b/includes/database/database.inc
    @@ -1313,6 +1313,26 @@ abstract class DatabaseConnection extends PDO {
    +    // Not applicable, only relevant for database systems that have issues with
    +    // supporting full UTF-8.
    +    return FALSE;
    

    This code comment doesn't seem quite right - it should be something like "By default we assume database back-ends may not support 4 byte utf-8"

  2. +++ b/modules/node/node.test
    @@ -2986,3 +2986,39 @@ class NodePageCacheTest extends NodeWebTestCase {
    +    if ($connection->driver() == 'mysql' && !($connection->utf8mb4IsSupported() && $connection->utf8mb4IsEnabled())) {
    

    It would be good to get this to run on the testbot - I'm not clear why it can't now - at the least a follow-up issue for that?

  3. +++ b/modules/system/system.install
    @@ -518,6 +524,54 @@ function system_requirements($phase) {
    +      $requirements['description'] = $t('Multi-byte UTF-8 for @driver is enabled.', array('@driver' => $driver));
    

    This string is not accurate - MySQL always supports up to 3 byte utf-8, so this is a test for 4 byte or maybe "extended multibyte"?

stefan.r’s picture

Status: Needs work » Needs review
FileSize
16.46 KB
3.7 KB

Filed #2756079: Ensure testbot can test 4 byte UTF-8.

I'm going with "4 byte UTF-8" since "extended multibyte" is not very Googleable (only 2 results ;)

Fabianx’s picture

Status: Needs review » Reviewed & tested by the community

Back to RTBC

Fabianx’s picture

Issue tags: +Pending Drupal 7 commit

Marking for commit, even though probably David should take a look, too.

pwolanin’s picture

It looks like the default.settings.php examples are much less detailed for Drupal 8? Perhaps those need to be updated too?

Fabianx’s picture

Status: Reviewed & tested by the community » Needs work
Issue tags: -Pending Drupal 7 commit

Discussed in IRC:

- We want this to be only enabled if we can be sure that an admin converted all tables first OR is just installing the system

Therefore we will be guarding the DB option with an additional variable that all DB tables have been converted, which is obviously true for new installs by itself.

And warn on hook_requirements() that it was enabled, but tables had not been converted and only reference the contrib project, but not the variable.

stefan.r’s picture

Status: Needs work » Needs review
FileSize
9.42 KB
18.21 KB
Fabianx’s picture

Status: Needs review » Reviewed & tested by the community

Back to RTBC. Already now user input can kill a table and the user warning is loud and clear.

stefan.r’s picture

Assigned: Unassigned » David_Rothstein

Assigning to David for final review

David_Rothstein’s picture

Assigned: David_Rothstein » Unassigned
Status: Reviewed & tested by the community » Needs work

Overall I like the way this is trying to be nonintrusive for existing installations. Here's an initial review; it seems to need work and probably more review and testing from someone else.

  1. +function _system_check_db_utf8mb4_requirements($phase) {
    +  global $install_state;
    +  // Skip the requirements check if we're in the initial step of the installer
    +  // where the settings have not yet been verified.
    +  if ($install_state['settings_verified'] === FALSE) {
    +    return array();
    +  }
    

    Does this ever run in the installer at all? I couldn't get it to (I think the requirements check doesn't ever happen after settings have been verified).

  2. +    if ($utf8mb4_supported) {
    +      // Supported, not active.
    +      $requirements['value'] = $t('Not enabled');
    +      $requirements['description'] = $t('4 byte UTF-8 for @driver is not activated, but it is supported on your system. It is highly recommended you enable this. See <a href="https://www.drupal.org/node/2754539">https://www.drupal.org/node/2754539</a> for more information.', array('@driver' => $driver));
    +      $requirements['severity'] = REQUIREMENT_WARNING;
    +    }
    

    This does not explain why it's a warning condition - what bad thing happens if I don't enable it?

    Also if the only way I can enable it is to edit settings.php by hand, that's going to be a pretty bad user experience (especially if it interrupts the installer).

    Is there a reason we can't just make this a checkbox on the Database Configuration screen in the installer? Then (a) it won't interrupt anything, (b) it should be possible to write it to settings.php automatically (along with all the other database connection info), (c) none of the hook_requirements() code would need to run in the installer at all.

  3. +    $version = $this->getAttribute(\PDO::ATTR_CLIENT_VERSION);
    ....
    +    catch (\Exception $e) {
    ....
    +      catch  (\Exception $e) {
    

    Don't think we can have the leading "\" on any of these (for PHP 5.2).

    Also (minor) there's an extra space in the last one after "catch".

  4. +          $this->fail(t('Your MySQL server and PHP MySQL driver must support utf8mb4 character encoding. Make sure to use a database system that supports this (such as MySQL/MariaDB/Percona 5.5.3 and up), and that the utf8mb4 character set is compiled in. See the <a href="@documentation" target="_blank">MySQL documentation</a> for more information.', array('@documentation' => 'https://dev.mysql.com/doc/refman/5.0/en/cannot-initialize-character-set.html')));
    

    Pretty sure it should use st() rather than t().

  5. +        // Detect utf8mb4 incompatibility.
    +        // Error code for "Can't initialize character set" error.
    +        if ($e->getCode() == self::UNSUPPORTED_CHARSET) {
    +          $this->fail(t('Your MySQL server and PHP MySQL driver must support utf8mb4 character encoding. Make sure to use a database system that supports this (such as MySQL/MariaDB/Percona 5.5.3 and up), and that the utf8mb4 character set is compiled in. See the <a href="@documentation" target="_blank">MySQL documentation</a> for more information.', array('@documentation' => 'https://dev.mysql.com/doc/refman/5.0/en/cannot-initialize-character-set.html')));
    +          $info = Database::getConnectionInfo();
    +          $info_copy = $info;
    +          // Set a flag to fall back to utf8. Note: this flag should only be
    +          // used here and is for internal use only.
    +          $info_copy['default']['_dsn_utf8_fallback'] = TRUE;
    +          // In order to change the Database::$databaseInfo array, we need to
    +          // remove the active connection, then re-add it with the new info.
    +          Database::removeConnection('default');
    +          Database::addConnectionInfo('default', 'default', $info_copy['default']);
    +          // Connect with the new database info, using the utf8 character set so
    +          // that we can run the checkEngineVersion test.
    +          Database::getConnection();
    +          // Revert to the old settings.
    +          Database::removeConnection('default');
    +          Database::addConnectionInfo('default', 'default', $info['default']);
    

    Is all this really just so we can give a better user experience to someone who edits settings.php and puts the wrong character encoding in it? If so, is that even worth it compared to just letting them see the "Can't initialize character set" error that the database gives them?

    Maybe I'm not understanding what this is for.

  6. --- a/includes/database/pgsql/database.inc
    +++ b/includes/database/pgsql/database.inc
    @@ -216,6 +216,20 @@ class DatabaseConnection_pgsql extends DatabaseConnection {
     
         return $id;
       }
    +
    +  /**
    +   * {@inheritdoc}
    +   */
    +  public function utf8mb4IsActive() {
    +    return FALSE;
    +  }
    +
    +  /**
    +   * {@inheritdoc}
    +   */
    +  public function utf8mb4IsSupported() {
    +    return TRUE;
    +  }
    

    Is this really supposed to be always supported but never active?

  7. @@ -851,6 +858,12 @@ function install_verify_settings() {
         if (empty($errors)) {
           return TRUE;
         }
    +    else {
    +      foreach ($errors as $name => $message) {
    +        include('includes/form.inc');
    +        form_set_error($name, $message);
    +      }
    +    }
    

    This doesn't look quite right to me. What will it do on a command-line installation?

  8. +    // On MySQL, this test will only run if 'charset' is set to 'utf8mb4' in
    +    // settings.php.
    +    if ($connection->driver() == 'mysql' && !($connection->utf8mb4IsSupported() && $connection->utf8mb4IsActive())) {
    +      return;
    +    }
    

    I would think the 'mysql' check isn't needed here. Don't we always want to skip these tests (for any database driver) if utf8mb4 isn't turned on?

  9. +    $this->assertTrue(mb_strlen($title, 'utf-8') < strlen($title), 'Title has multi-byte characters.');
    

    Should this be drupal_strlen()? I don't think we require mb_strlen()...

  10. ... <a href="https://www.drupal.org/node/2754539">https://www.drupal.org/node/2754539</a> ....
    

    This should have some link text (maybe "documentation for adding Multi-byte UTF-8 support").

    The page is not anywhere in the drupal.org book structure right now - guess it will need to be moved there eventually. Or should it just become a change notice?

  11. +  /**
    +   * {@inheritdoc}
    +   */
    

    We haven't ever done that in Drupal 7; though I can see the point in starting someday (if it works with api.drupal.org). It's going to lead to inconsistent documentation though.

  12. +    if (FALSE !== strpos($version, 'mysqlnd')) {
    

    Probably put the FALSE on the right side - that's how it tends to be done elsewhere in core.

  13. +
    +    /**
    +   * Check if we can connect to the database.
    +   */
    

    Formatting is wrong.

  14. + * emojis, asian symbols, and mathematical symbols, you may set the collation
    

    "Asian" (capitalized)

  15. + * When using this setting on an existing installation, ensure that
    + * all existing tables have been converted to the utf8mb4 charset,
    + * for example by using the utf8mb4_convert contrib project, so as to prevent
    + * mixing data with different charsets.
    

    "contrib" => "contributed". Also the line spacing looks off on the first couple lines.

    It would be good to link to https://www.drupal.org/project/utf8mb4_convert directly here.

    I think it's OK to reference that contrib project in the core settings.php file, but right now it has an alpha release and doesn't look like many people have tried to use it, so I think the project page should indicate its experimental status somehow. Just so people know what they're getting into... :)

stovak’s picture

Never ask yourselves why more people don't contribute to core. Seriously, this started out as a 5-10 line patch that i've been using in production without issue for over a year. We could have added the patch to core to *support* MB4 for those that needed it and then "bike-shedded" it to fuck and back but whatever.

Good luck with whoever this ends up. i'm out.

-tom

stefan.r’s picture

Status: Needs work » Needs review

Thanks!

1. Absolutely unnecessary indeed! Seems it does run, hence the failing test :)

2. We actually decided to split this out on purpose (validated with xjm, alexpott, pfrenssen) and implement the UI in a follow-up (most likely a checkbox). But yes, very good idea, and the very next thing I will work on as soon as this is committed.

3. If we believe the interruption is too intrusive (maybe it is, but it can also be ignored and clicked away, similarly to the opcache warning in Drupal 8), perhaps we can downgrade the severity level of that specific error during the installer? I believe that would stop the error from showing unless there were other issues. Or alternatively, specifically spell out that it's an expert operation, and tell the user to click continue and ignore the message if it's not a feature they want?

5. Yes, it's backported from D8, to prevent an ugly error from appearing and the installer dying (in preparation for the UI work as well). But for now, it only happens if people actually manually override their settings.php, so the reason for it appearing will probably quite clear (unlike in D8, where utf8mb4 is always on). So for this issue, let's take this all out, and in the UI followup we properly look at putting this back in.

7. Yes, most likely that should be an exception instead. But now that 5 is taken out, we don't need it :)

stefan.r’s picture

stefan.r’s picture

@stovak good to hear the patch worked in production -- the current one is functionally very similar.

Status: Needs review » Needs work

The last submitted patch, 158: 2488180-156.patch, failed testing.

Fabianx’s picture

#156: I can understand your frustration, but we are working to improve the core process.

On the other hand we do have a responsibility and we try to prevent users from shooting themselves in the foot or loosing data due to bugs.

Also we try to prevent 100s of support requests from happening in the first place.

If someone applies the patch, they are on their own. But if someone just reads in a blogpost that for their emoji, they "just" need to change one line in settings.php, then from experience we get to deal with all the different bug cases: MySQL version not supported, Tables just partially converted, etc.

Here you just have 3 core maintainers discussing the last tweaks on this patch. I am pretty sure it will be part of 7.50.

stefan.r’s picture

Status: Needs work » Needs review
FileSize
14.42 KB
597 bytes

So we're going to have to put the code from point 1 back, seems it runs after all :)

David_Rothstein’s picture

2. We actually decided to split this out on purpose (validated with xjm, alexpott, pfrenssen) and implement the UI in a follow-up (most likely a checkbox)...

3. If we believe the interruption is too intrusive (maybe it is, but it can also be ignored and clicked away, similarly to the opcache warning in Drupal 8), perhaps we can downgrade the severity level of that specific error during the installer? I believe that would stop the error from showing unless there were other issues. Or alternatively, specifically spell out that it's an expert operation, and tell the user to click continue and ignore the message if it's not a feature they want?

I would say #2 helps answer #3. Postponing the UI to a followup definitely makes sense to me, but that should mean actually postponing it :) A warning that interrupts the installer is a pretty big UI change.

I think if you want to include a REQUIREMENT_INFO that informs people of the new feature and links to the documentation page, that's a small UI change which is probably reasonable. (And so is a REQUIREMENT_ERROR if they tried to configure it in settings.php but failed to do it correctly.) But I don't see the reasons for putting in any warnings at this point.

So we're going to have to put the code from point 1 back, seems it runs after all :)

Sorry, my earlier comment about this was a little unclear. What I meant was that this code:

  if ($install_state['settings_verified'] === FALSE) {
    return array();
  }

will always return when called during the requirements phase of the installer; any code that runs after it will never be reached.

Except actually I wasn't 100% right... During a normal install with a fresh settings.php I think I was right. But if settings.php has been preconfigured (by hand) with database connection info before running the installer, then your code does work OK.

I am not sure if you intended it to only run in the second case (if so, I'd suggest a code comment), but that seems to be what it's currently doing.

Also, this should probably check isset($install_state['settings_verified']) beforehand since outside of the installer $install_state itself will always be empty?

stefan.r’s picture

Does this seem right? (the warning on enabled was a debugging artifact, slipped in in the previous patch)

stefan.r’s picture

David_Rothstein’s picture

Yes, taking a quick look that looks right to me now - thanks!

travelvc’s picture

Hi all thanks very much for this work on this, including stovak and stefan.r

I successfully applied patch #165 and ran stefan.r's drush tool 'utf8mb4_convert' and it works great. I have about 20,000 users and over 2,000 nodes.

I did find it worthwhile to check the character sets of my tables after running the script with 'show table status from DATABASE_NAME' which identified a few module tables which didn't get converted (such as search_api_db tables) due to what I assume are key lengths.

As a general comment - with the advent of more mobile users with emoji keyboards, I'm seeing many more of our users enter this kind of data into fields. So this update is much appreciated and I'm looking forward to it being officially released so I can move this to my project's master branch.

stefan.r’s picture

@travelvc please file an issue against the contrib project issue queue with further information -- I am assuming it skipped those tables as only Drupal-managed tables (returned by drupal_get_schema()) are converted

scuba_fly’s picture

I successfully applied patch #165 as well.
Also changed my db settings using http://www.jeffgeerling.com/blogs/jeff-geerling/solving-emoji-problem-drupal-7 Thnx @geerlingguy!

I can now successfully insert smileys into the field_body

Fabianx’s picture

Assigned: Unassigned » David_Rothstein
Status: Needs review » Reviewed & tested by the community

Back to RTBC, changes look great to me.

I think David should commit this.

David_Rothstein’s picture

Title: Drupal 7 MySQL does not support full UTF-8 (emojis, asian symbols, mathematical symbols) » Support full UTF-8 (emojis, Asian symbols, mathematical symbols) on MySQL and other database drivers when they are configured to allow it
Issue tags: -backport to D7 +String change in 7.50
FileSize
14.58 KB
6.89 KB

I reviewed this again and did some testing. (I especially tested the status report page since the changes there had less testing than the others.) Good to see that the main functionality has had some nice testing above as well.

I had to make a few changes to the patch (see attached). I'm leaving this RTBC because we really do want to get it in today and the changes aren't huge, but a couple are notable:

  1. The biggest issue is that it wasn't working at all on SQLite because the code was in the wrong place. (I don't think anyone tested the patch there....) Fixed that, and based on manual testing it does now seem like SQLite and PostgreSQL work with this out of the box.
  2. Changed "It is highly recommended you enable this" to "It is recommended that you enable this" in the status report message. (Since this is only a REQUIREMENT_INFO, and since the process for enabling this on an existing site is still pretty experimental, and since many sites don't need this at all... I thought "highly recommended" was definitely too strong.)
  3. Changed "databases need conversion" and similar to "database tables need conversion"... pretty sure that's what it's supposed to be.
  4. Pulled the documentation URL out of the translatable strings (helpful if we ever need to change the documentation URL someday).
  5. Changed this:
    - * utf8mb4_convert contributed project at
    - * @link https://www.drupal.org/project/utf8mb4_convert @endlink, so as to
    - * prevent mixing data with different charsets.
    + * utf8mb4_convert contributed project available at
    + * https://www.drupal.org/project/utf8mb4_convert, so as to prevent mixing data
    + * with different charsets.
    

    I think we don't want @link in settings.php documentation; it's not used anywhere else and the file is supposed to be "human-readable".

  6. A couple other minor/inconsequential changes.

For this:

+  $driver = $connection->driver();

I thought about whether we want to display the human-readable driver name rather than the machine name here... for the most part we would, but "4 byte UTF-8 for MySQL, MariaDB, or equivalent is disabled" looked too weird when I tried it, so I left it as is (with the machine name) :)

Thanks everyone. Hoping to be able to commit this later today.

David_Rothstein’s picture

Issue tags: +Needs change record

We need a change record here. Mostly it can just point people to the documentation page (https://www.drupal.org/node/2754539) I think.

I edited that page now to move it into the drupal.org book structure, as discussed above. Wasn't entirely sure where it belonged, but I put it under the "Troubleshooting" section for now since people will tend to get to it via a message they see on the Status Report page.

Ideally that documentation page should be edited to say something brief about other database drivers too, not just MySQL? (Since to start with, anyone using a non-core driver is going to get a message on the status report page pointing them to this documentation....)

There may be some followups to do (for drupal.org documentation or maybe INSTALL.txt) to point people to this documentation from there as well.

David_Rothstein’s picture

We need a change record here. Mostly it can just point people to the documentation page (https://www.drupal.org/node/2754539) I think.

Actually it should also address the new class methods that database drivers in contrib can implement if they want to support this.

stefan.r’s picture

Those changes look great, thanks. Ha at the SQLite mess-up ;)

Added a bit about SQLite/PostgreSQL in the book page.

Fabianx’s picture

RTBC for the interdiff in #171 as well.

David_Rothstein’s picture

Assigned: David_Rothstein » Unassigned
Status: Reviewed & tested by the community » Fixed
Issue tags: -Needs change record +7.50 release notes

Clicking through to the test results, the new tests added in this issue pass on all environments (there's a failure on the SQLite tests but it's due to a failure to drop tables at the end of the test or something like that - happens for the other tests too and not related to this issue). No obvious regressions either, so this is good to go.

Committed to 7.x - thanks!

I wrote a change record here (review needed):
https://www.drupal.org/node/2761183

I also updated the documentation a bit more regarding other database drivers:
https://www.drupal.org/node/2754539/revisions/view/9847329/9848305

Reviews welcome of that too (especially what I wrote about MariaDB/etc, which I think is correct but am not positive).

  • David_Rothstein committed bc60c92 on 7.x
    Issue #2488180 by stefan.r, stovak, pwolanin, David_Rothstein, Noe_,...
hgoto’s picture

Great!! I'm looking forward to getting this change in my sites. Thank you all who addressed this issue.

David_Rothstein’s picture

For some followups noted above:

Hopefully that covers everything.

ptmkenny’s picture

I'm really glad to see this in core, thanks! For the past year, I was using a patch in this thread to add emoji support, although the linked patch only affects textfields.

I see that the first post now says that however when using this setting on an existing installation, all existing tables must be converted to the utf8mb4 charset first., but with the old patch I was able to only convert the textfields to utf8mb4 and leave everything else as utf8. Is there really no way to use the utf8mb4 collation in 7.50 without converting all tables?

I ask because there is no easy way to run the utf8mb4_convert drush utility on my server, which means I will have to convert the database manually, which may be a bit beyond my skill level.

cilefen’s picture

cilefen’s picture

jibran’s picture

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.

geek-merlin’s picture

vpsaravanan’s picture

utf8mb4_convert requires drupal version 7.50. We have to convert character set and collation in all tables or we can convert the affected tables. provide detailed explaination for sites running lesser than 7.50 version

This variable drupal_all_databases_are_utf8mb4 is only set while running installer. This variable is not set for the existing sites. As this variable is not set, it throws requirement error while running update.php in status report page