Last updated August 2, 2016. Created on June 23, 2016.
Edited by David_Rothstein, jasom, hussainweb, stefan.r. Log in to edit this page.

Drupal 7.50 introduces multi-byte UTF-8 support for MySQL and other database drivers, allowing for emojis, Asian symbols, mathematical symbols, etc.

For MySQL

In order to be able to use this, the following requirements must be met:

  1. In order to allow for large indexes, MySQL must be set up with the following my.cnf settings:
    [mysqld]
    innodb_large_prefix=true
    innodb_file_format=barracuda
    innodb_file_per_table=true

    These settings are available as of MySQL 5.5.14, and are defaults in MySQL 5.7.7 and up.

    After editing the my.cnf file (often located at /etc/mysql/my.cnf), don't forget to restart the mysql service (for example, with the command sudo service mysql restart) in order to load the new configuration.

  2. The PHP MySQL driver must support the utf8mb4 charset (libmysqlclient 5.5.3 and up, as well as mysqlnd 5.0.9 and up).
  3. The MySQL server must support the utf8mb4 charset (5.5.3 and up).

Run the command mysql --version to see your current MySQL version.

Steps to enable

For existing Drupal installations

  • First, back up your database and convert all existing tables to utf8mb4, such as by using the drush command provided by the utf8mb4_convert contrib project.
  • Set the "charset" and "collation" keys on the database connection array in settings.php to "utf8mb4" and "utf8mb4_general_ci" respectively:
    $databases['default']['default'] = array(
      'driver' => 'mysql',
      'database' => 'databasename',
      'username' => 'username',
      'password' => 'password',
      'host' => 'localhost',
      'charset' => 'utf8mb4',
      'collation' => 'utf8mb4_general_ci',
    );
    

For new Drupal installations

  • Prior to running the installer, set the "charset" and "collation" keys on the database connection array in settings.php to "utf8mb4" and "utf8mb4_general_ci" respectively:
    $databases['default']['default'] = array(
      'driver' => 'mysql',
      'database' => 'databasename',
      'username' => 'username',
      'password' => 'password',
      'host' => 'localhost',
      'charset' => 'utf8mb4',
      'collation' => 'utf8mb4_general_ci',
    );
    
  • Run the installer as usual.

Other database systems

  • PostgreSQL and SQLite support 4-byte UTF-8 out-of-the-box, so no special changes are needed for these.
  • MariaDB and other MySQL equivalents should likely work with similar instructions as above for MySQL.
  • Database drivers provided by contributed modules may or may not support this feature. The best place to look for answers is the issue queue of the contributed module which provides the database driver you are using.

Looking for support? Visit the Drupal.org forums, or join #drupal-support in IRC.

Comments

sah62’s picture

I just completed the changes for an existing Drupal installation. I ran into two issues:

  1. One of my databases includes a table prefix specified in settings.php. The utf8mb4_convert drush command didn't account for that correctly and I needed to comment out line #161* to stop the prefix from being appended where it didn't need to be. Commenting out the prefix setting in settings.php might have also worked; I didn't try that.
  2. This database was originally set up for use with Drupal 6.x and some of the tables were still using the MyISAM storage engine. utf8mb4_convert produced key length errors when attempting to convert those tables. I needed to convert those tables to InnoDB before running utf8mb4_convert.

* Line 161:
$table_name = Database::getConnection()->prefixTables('{' . db_escape_table($table_name) . '}');

stefan.r’s picture

@sah can you check whether 7.x-1.0-alpha4 (or up) fixes these issues? Thanks!

sah62’s picture

I'll try if I have time either today or over the weekend. Since I converted all of my tables to InnoDB it won't be easy to see if that fix works, but the double prefix thing will be easy to check.

Thanks for taking the time to write and share the code!

geerlingguy’s picture

Also, be sure to post issues for that module in particular to it's issue queue—stefan.r has been extremely helpful and quick to fix any problems so far, and it's been a very smooth upgrade experience for me!

__________________
Personal site: www.jeffgeerling.com

brad.bulger’s picture

what if we don't want to enable 4 byte UTF-8 for mysql? how can i stop it from complaining at every turn?

sonicthoughts’s picture

There is no need to warn if this is not enabled. This takes up resources if multibyte not being used.

stefan.r’s picture

jaxtheking’s picture

I followed the instructions to the letter but the Status Report page still displayed that UTF8MB4 was not supported.

While all my DB tables were set to InnoDB, I realised the default storage engine in cPanel (even one installed a month ago with PHP-FPM) is still set to MyISAM. This caused the test table creation to fail.

I of course changed the default storage engine to InnoDB now but I wonder... should we not specify ENGINE=InnoDB when creating the test table to see if UTF8MB4 is supported?

David_Rothstein’s picture

andrewmriv’s picture

Perhaps I am missing a step. I think that I have followed all of the steps above but now get the following message on the status update page:

4 byte UTF-8 for mysql is not activated, but it is supported on your system. It is recommended that you enable this to allow 4-byte UTF-8 input such as emojis, Asian symbols and mathematical symbols to be stored correctly. See the documentation on adding 4 byte UTF-8 support for more information.

I am using Ubuntu 14, PHP 7, MySQL version 14.14 (Distrib 5.5.49)

Nico Heulsen’s picture

+1 We are getting the same error on multiple environments.

stefan.r’s picture

You'll want to set this on all relevant databases:

  'charset' => 'utf8mb4',
  'collation' => 'utf8mb4_general_ci',
andrewmriv’s picture

My mistake. I have several Drupal sites running and I was simply accidentally modifying the settings.php on a totally different drupal install!

Everything works just fine on Ubuntu 14

ljwilson’s picture

After converting a dev box with Windows/Apache 2.2/PHP 5.6/MariaDB 10.1 with no issues, I next tested on a dev Linux box running RHEL 6.8/Apache 2.2/PHP 5.3/MariaDB 10.0. All my settings in /etc/my.cnf were taking hold, but I could not get
drush utf8mb4-convert-databases
to work:

$ drush utf8mb4-convert-databases
This will convert all databases defined in settings.php to utf8mb4. Back up your databases before continuing! Continue? (y/n): y
Target MySQL database: mydatabase@localhost (default:default)
The default:default MySQL database does not support UTF8MB4! Ensure that the conditions listed in settings.php related to innodb_large_prefix, the server version, and the MySQL driver version are met. See https://www.drupal.org/node/2754539 for more information.

From googling it looked like if I could use mysqlnd instead of libmysqlclient that is used by the version of PHP 5.3 on RHEL 6 then I could make it work, so I upgraded to the next RHEL supported PHP version (5.4) that had mysqlnd built-in. My Drupal site's status page was happy with that, but drush (even though I had bash set to use php 5.4) was still trying to use the original PHP 5.3, and its libmysqlclient.

So the fix was to make sure to set (e.g.)
export DRUSH_PHP=/opt/rh/php54/root/usr/bin/php

which is where the RedHat Software Collection puts the PHP 5.4 version executable.

From what I've read I believe the libmysqlclient driver version installed my MariaDB 10.x is ok for using utf8mb4, but drush (when using the default php 5.3 that comes with RHEL 6) returns this version (which I assume the client gets from the libmysqlclient)
5.3.12-MariaDB

which fails the check in includes/database/mysql/database.inc

 // The libmysqlclient driver supports utf8mb4 starting at version 5.5.3.
      if (version_compare($version, '5.5.3', '<')) {
        return FALSE;
      }

Even though Drupal Site Status page returns a server version of
5.5.5-10.1.16-MariaDB

...jack

amar.deokar’s picture

There is bit unclarity over what exactly we need to put in settings.php. So for those who have problem in installing drupal 7 with "utf8mb4" support, here are the steps.

1. Create database with default collation utf8mb4_general_ci from mysql prompt.
mysql> CREATE DATABASE databasename DEFAULT COLLATE utf8mb4_general_ci;

2. Fill database connection array with your values in settings.php before running install.php. (You need to set value for all keys, not just value for 'charset' & 'collation' keys.)

$databases['default']['default'] = array(
  'driver' => 'mysql',
  'database' => 'databasename',
  'username' => 'username',
  'password' => 'password',
  'host' => 'localhost',
  'charset' => 'utf8mb4',
  'collation' => 'utf8mb4_general_ci',
);

3 . Run the installer(install.php) in browser.
http://localhost/drupal/install.php

haagendazs’s picture

Thanks for clarifying these steps for a new D7 installation. The primary documentation should be enhanced with your description, as it is slightly misleading.

DrCuriosity’s picture

...a time saver to get rid of the "Enabled, but database tables need conversion" warning on your status page:

drush ev "variable_set('drupal_all_databases_are_utf8mb4',TRUE)"

Only do this if your databases are already converted prior to upgrading to Drupal 7.50+.

[ETA: drupal.org doesn't seem to have utf8mb4 support yet, so don't be surprised if you get a server error if your comment contains an extended emoji character.]

andrewmriv’s picture

I had recent success on CentOS 6, PHP 5.6, MySQL 14.14.

After making the changes to the my.cnf file listed above, running "drush utf8mb4-convert-databases" failed.
Additionally, I was told on the status report that the system did NOT support UTF8MB4.

What fixed it for me on CentOS was removing the following line from my.cnf:
default-storage-engine=MyISAM

After that, everything worked out.

I was able to run "drush utf8mb4-convert-databases" after downloading:
https://www.drupal.org/project/utf8mb4_convert

And making the changes to the settings.php file were a success as well.

sgriffin’s picture

This post is just for google results.. and a thanks for your solution!

Yes, drush complained until I commented out
default-storage-engine=MyISAM
in /etc/my.cnf

This will convert all databases defined in settings.php to utf8mb4.

Back up your databases before continuing! Continue? (y/n): y
Target MySQL database: (default:default)
The default:default MySQL database does not support UTF8MB4!
Ensure that the conditions listed in settings.php related to innodb_large_prefix, the serve r version, and the MySQL driver version are met. See https://www.drupal.org/node/2754539 for more information.

DrCuriosity’s picture

The utf8*_general_ci collations don't incorporate the Unicode sorting rules for extended characters. This Stack Overflow link has a good rundown of the issue.

If your site is ever likely to use more than a standard Latin alphabet, I'd consider using the utf8mb4_unicode_ci collation instead of the general one.

ressa’s picture

I just tested utf8mb4_unicode_ci versus utf8_danish_ci.

# utf8mb4_unicode_ci doesn't sort "å" correctly

ALTER DATABASE COLLATE utf8mb4_unicode_ci;
ALTER TABLE node_field_data CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

Order:

  • a
  • å
  • d
  • Ð

# utf8_danish_ci sorts "å" correctly

ALTER DATABASE COLLATE utf8_danish_ci;
ALTER TABLE node_field_data CONVERT TO CHARACTER SET utf8 COLLATE utf8_danish_ci;

Order:

  • a
  • d
  • Ð
  • å

Try Drupal 8 at simplytest.me
DrCuriosity’s picture

This is true. If you have a specific language case, then you should definitely use the collation appropriate to that language instead. (Though I'd still recommend trying utf8mb4_danish_ci instead of the non-mb4 version.)

The issue with the utf8mb4_unicode_ci collation is that, while it can collect similar symbols like 'a', 'á' and 'å' together for sorting, there's no definitive, canonical ordering that's going to work for the alphabets of all languages. The best bet is to have a read through §11.1 of the MySQL documentation, and choose the collation that's best for your use case.

ressa’s picture

Thank you very much for that tip @DrCuriosity. I tested with utf8mb4_danish_ci and it sorts correctly after this command: ALTER TABLE node_field_data CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_danish_ci;

I tried setting the default collation in MySQL by inserting the below in /etc/mysql/my.cnf and restarting MySQL:

[mysqld]
# UTF8 character-set
init_connect = 'SET collation_connection = utf8mb4_danish_ci'
init_connect = 'SET NAMES utf8mb4'
character-set-server = utf8mb4
collation-server = utf8mb4_danish_ci
skip-character-set-client-handshake

It works fine if I create a database from command line, using the default settings. But if I install Drupal with the "drush si" command, the collation is set to utf8_general_ci ...

# Default MySQL collation

mysql> SELECT @@character_set_database, @@collation_database;
+--------------------------+----------------------+
| @@character_set_database | @@collation_database |
+--------------------------+----------------------+
| utf8mb4                  | utf8mb4_danish_ci    |
+--------------------------+----------------------+
1 row in set (0.00 sec)

# Collation is set correctly, using the default utf8mb4_danish_ci

mysql> create database danish_test;
mysql> use danish_test;
mysql> SELECT @@character_set_database, @@collation_database;
+--------------------------+----------------------+
| @@character_set_database | @@collation_database |
+--------------------------+----------------------+
| utf8mb4                  | utf8mb4_danish_ci    |
+--------------------------+----------------------+
1 row in set (0.00 sec)

# Drupal 8 database, created with drush site-install

mysql> use drupal_installation_test_danish;
mysql> SELECT @@character_set_database, @@collation_database;
+--------------------------+----------------------+
| @@character_set_database | @@collation_database |
+--------------------------+----------------------+
| utf8                     | utf8_general_ci      |
+--------------------------+----------------------+
1 row in set (0.00 sec)

So it seems like Drupal somehow overrides the default collation ...

MySQL variables:

mysql> show variables like 'collation_database';
+--------------------+-------------------+
| Variable_name      | Value             |
+--------------------+-------------------+
| collation_database | utf8mb4_danish_ci |
+--------------------+-------------------+
1 row in set (0.00 sec)

mysql> show variables like '%character%';
+--------------------------+----------------------------+
| Variable_name            | Value                      |
+--------------------------+----------------------------+
| character_set_client     | utf8mb4                    |
| character_set_connection | utf8mb4                    |
| character_set_database   | utf8mb4                    |
| character_set_filesystem | binary                     |
| character_set_results    | utf8mb4                    |
| character_set_server     | utf8mb4                    |
| character_set_system     | utf8                       |
| character_sets_dir       | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
8 rows in set (0.00 sec)

mysql> show variables like '%collation%';
+----------------------+-------------------+
| Variable_name        | Value             |
+----------------------+-------------------+
| collation_connection | utf8mb4_danish_ci |
| collation_database   | utf8mb4_danish_ci |
| collation_server     | utf8mb4_danish_ci |
+----------------------+-------------------+
3 rows in set (0.00 sec)

EDIT: I tried the suggestion in For clarity on new drupal installations with "utf8mb4" support inserting the collation in settings.php, and that worked, so it seems like drush site-install is causing the problem.
Perhaps it will be possible to define database collation with drush site-install as a parameter at some point?


Try Drupal 8 at simplytest.me
metakel’s picture

I am not sure why but my conversion using the utf8mb4_convert module failed.

Is there a way to convert the dumped sql files directly to the new utf8mb4 required?

DrCuriosity’s picture

Is there a way to convert the dumped sql files directly to the new utf8mb4 required?

You could potentially look through your SQL dump and hand edit a variety of charset, collation and key length values manually, but if you're not sure why the conversion failed then it's probably not a good idea to do that unless you absolutely have to and know what you're doing. (And even then, be very sure you have made a backup.)

Londova’s picture

The file "my.cnf" is not available for changes on shared Hosting environment (for example on Godaddy Linux with CPanel).
Is it possible to get a Step by Step guidance how to make the settings on shared Hosting? Thanks.

andrewmriv’s picture

Truthfully, you are best of creating a ticket to contact the system administrator about that.

Depending on your host, it may not be guaranteed that they will do it for you.

Perhaps consider a VPS host in the future.

Good luck!

libbyy’s picture

Hi
How do I convert my database to utf8mb4 without drush?

dodge’s picture

I was able to use this module:

https://github.com/meedan/drupal_utf8mb4_converter

***Edit: Actually forget that, it doesn't seem to be working to convert the actual columns within each table. Maybe you'll have more luck than I am with it though...

stefan.r’s picture

Using https://github.com/meedan/drupal_utf8mb4_converter is not recommended/supported, it cuts off fields at 191 characters which can lead to unexpected behavior. Instead, use https://www.drupal.org/project/utf8mb4_convert, which is more widely tested. If you cannot use drush, invoke the relevant functions from a separate PHP file.

dodge’s picture

I would love to, but those of us (myself included) who don't use Drush may not know how to do that.

There are several references to drush functions in the php code. Presumably "drush_print" and "drush_confirm" can simply be replaced with the usual php print and confirm functions?

There is also:

$charset = drush_get_option('charset', 'utf8mb4');

I've changed this to:

$charset = 'utf8mb4';

And the same for the collation option.

I saved this script on my server and then ran the "drush_utf8mb4_convert_databases()" function and I get "Please install Drupal 7.50 or above prior to running this script."

I'm using Drupal 7.50.

stefan.r’s picture

Try putting your script in the drupal root directory and prefacing the function call with:

define('DRUPAL_ROOT', getcwd());

require_once DRUPAL_ROOT . '/includes/bootstrap.inc';
drupal_bootstrap(DRUPAL_BOOTSTRAP_FULL);
dodge’s picture

Great thank you, that has worked.

abdulvahab232’s picture

Hi team,

I just finished the following steps in 10 min

https://www.drupal.org/project/utf8mb4_convert

Thank you https://www.drupal.org/u/stefanr-0

Kavana’s picture

for this non-drush user and PHP beginner. I am still confused on how to do this.

dodge’s picture

Here's what I did step by step, hopefully it will help!

1. Backup your database

2. Download and extract the project from https://www.drupal.org/project/utf8mb4_convert

3. Do the following in the utf8mb4_convert.drush.inc file:
- Replace all instances of "drush_print" with "print"
- Replace all instances of "drush_confirm" with "confirm"
- In "function drush_utf8mb4_convert_databases()", replace "$charset = drush_get_option('charset', 'utf8mb4');" with "$charset = 'utf8mb4';"
- In the same function, replace "$collation = drush_get_option('collation', 'utf8mb4_general_ci');" with "$collation = 'utf8mb4_general_ci';"
- Save the file

4. Place the edited file in the root drupal directory on your server (/usr/share/nginx/html for me)

5. Create another php script in the same directory, e.g. "nano script.php"

6. Add the following to the file:

<?php
define('DRUPAL_ROOT', getcwd());
require_once DRUPAL_ROOT . '/includes/bootstrap.inc';
require_once('utf8mb4_convert.drush.inc');
drupal_bootstrap(DRUPAL_BOOTSTRAP_FULL);
drush_utf8mb4_convert_databases();

7. Save and run your "script.php" script

8. Follow the rest of the guidelines at the top of this page

Kavana’s picture

Error: Call to undefined function confirm() in drush_utf8mb4_convert_databases() (regel 226 van C:\xampp\htdocs\lsd\utf8mb4_convert.drush.inc).
And I'm not sure why.

Thank you so much for writing it down for me tho.

dodge’s picture

Strange.... you can probably just comment out that section in the code though as it's just a confirmation dialogue asking if you're sure you want to proceed.

Kavana’s picture

It worked like that. Thank you very much!

gisle’s picture

When I visit the status report on my site (Home » Administration » Reports » Status Report), I get the following: "4 byte UTF-8 for mysql is not activated, but it is supported on your system."

However, I've activated this (following instructions on this page - i.e. changing my.cnf and uncommenting the code block that sets the charset and collation in settings.php and restarting mariadb). When I look at the collation for the tables in phpMyAdmin, they are utf8mb4_general_ci (see screendump).

Screendump

The server runs RHEL7 with PHP 5.4.16. MariaDB and libmysqlclient is ver. 5.5.52. Drupal is ver. 7.53.

What can be wrong with my setup?

- gisle

gisle’s picture

The instructions in the summary tells you:

Set the "charset" and "collation" keys on the database connection array in settings.php to "utf8mb4" and "utf8mb4_general_ci" respectively:

This is followed by some example code. You find the exact same example code in the default settings.php that is created when you install Drupal 7. My initial take on this was to uncomment the example. This does not work because there is already code in settings.php to initialize the database settings array, and this existing code takes precedence.

However, if you edit settings.php and add the "charset" and "collation" keys with the right values to the existing database connection array, it works as advertised.

This is how the working configuration looks in my settings.php.

$databases = array (
  'default' => 
  array (
    'default' => 
    array (
      'database' => 'MY_DB',
      'username' => 'MY_UN',
      'password' => 'MY_PW',
      'host' => 'localhost',
      'port' => '',
      'driver' => 'mysql',
      'prefix' => '',
      'charset' => 'utf8mb4',
      'collation' => 'utf8mb4_general_ci',
    ),
  ),
);

- gisle

gisle’s picture

I like to use drush site-install to script setting up new sites. How do I get the correct values for "charset" and "collation" into settings.php when scripting installs of new sites?

- gisle