Last updated January 18, 2011. Created on December 5, 2007.
Edited by jm9, LeeHunter, JirkaRybka. Log in to edit this page.

The MySQL version 4.1 introduced some changes to character set handling, which are often giving us trouble if moving a Drupal site from one to the other (i.e. from 4.0 or below, to 4.1 or above). Basically, Drupal always uses utf-8 encoding, but MySQL 4.0 and lower doesn't support it, so utf-8 data are treated as default Latin 1 inside the MySQL 4.0 engine. It doesn't matter there, as MySQL output echoes the same data as sent in still, so Drupal works fine, but it matters a lot in the migration process, where MySQL might attempt to "convert" the data from Latin 1 to utf-8, not knowing that we have utf-8 in there already, and so the site gets broken then.

There are a few typical symptoms, and typical scenarios, (all discussed later in more detail):

- "Garbled text": Any special characters (non-english, advanced punctuation and the like) in your site's contents turned into seemingly random pairs of weird characters after moving the site to 4.1. (Typically, "ÅÃÄ" are frequently seen, always followed by some other non-english character).

- "Question marks": Your contents is more or less fine, but some special characters (non-english, advanced punctuation and the like) are replaced by question-marks after the move to 4.1.

- "Illegal mix of collations": Your site worked after the move to 4.1 somehow, but later crashed badly (typically after running update.php on the next upgrade, installing a new module, or the like). Missing sidebars, virtually any part of the pages vanished, no way to access even administrative pages - all that is possible. Looking into the logs (manually in the 'watchdog' table in worst case), you'll see error messages "Illegal mix of collations".

IMPORTANT: All these symptoms mean, that your database is NOT healthy after the move. Don't try to run your site in such a state: Any new data stored to the database are only making things worse! You need to fix your database during the migration process, or immediately after, before going live again! Keep backups! If you did a wrong step, you'll most probably need to restore your database from backup, and try again; it's usually easier and more safe, than fixing broken data afterwards. If you don't have any backup, create one now!

Note that basic English characters are the same across all encodings, so your English contents (as well as Drupal's internal data) is not affected, and you might not even notice that there's a problem, if running English site. But still, some things may be broken.


If you (or your hosting provider) just upgraded MySQL to 4.1+, without really moving the data to another machine or another database, you've correct utf-8 data already in your database, but most probably you've wrong encoding information (for details, see (B) in the table below). You see the "Garbled text" problem, and your tables are "latin1_swedish_ci" in the database.

You have two options:

A. Alter encoding/collation directly in database. This must be done in a tricky way (changing columns to binary types and back), to avoid real conversion of stored data. The operation, however simple for the server, is not so easy to do manually, so a specialized helper-script was developed for this purpose.

Just grab the file mysqlfix.php from the sandbox, put it into your Drupal root directory (the one where update.php is), and point your browser there (i.e. something like Then just follow on-screen instructions. The script will offer you a way to choose tables (both Drupal and/or other tables in your database), and process them (including large databases in batch processing). It's compatible with Drupal versions 4.7.x, 5.x and 6.x, supports prefixed tables, and uses the same access checks as update.php does. Note that if you're not logged in as user 1 yet, it might be safer to edit the access check flag in file, rather than attempting to browse the broken site to log in (any data written to the broken database is a risky business).

Although this script is not really dangerous (theoretically it doesn't hurt to process a table twice), you should always take the common safety precautions while doing this sort of work:
- Before starting, always create a backup copy of your database (even if you're backing up a broken site)
- Give appropriate attention to security: If accessing the script with access check disabled setting, restore the original setting immediately after use.
- Remove any single-use helper scripts (like this one) from your server, if no more needed.
- Use on your own risk.

B. Export and re-import. This option is more complicated and resources-expensive for the server, and depends on your permissions on the system, but still it's an alternative solution (especially if you can't run the helper script above for some reason, and/or you're really familiar with command line). It's basically the same process, as migration between two different servers:

First, export your database (from the newly upgraded 4.1 server) in backwards-compatible mode: mysqldump --user=username --password=password --default-character-set=latin1 --compatible=mysql40 dbname > dump.sql
Then, empty your database, by removing all the tables, and change it's default encoding/collation to "utf8_general_ci": mysql --user=username --password=password --execute="DROP TABLE table_name, table_name...... ; ALTER DATABASE db_name DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci;" Since specifying all the tables may be annoying, better create a brand new database instead (if you've permissions): mysql --user=username --password=password --execute="DROP DATABASE dbname; CREATE DATABASE dbname CHARACTER SET utf8 COLLATE utf8_general_ci;" Both these ways should result in an empty database with default encoding/collation "utf8_general_ci".
Finally, import the file back into the database: mysql --user=username --password=password --default-character-set=utf8 dbname < dump.sql

Now, your database should be perfectly OK, "utf8_general_ci" set on all tables, and Drupal working fine. You're done.


If you're migrating to a different server (which may as well be localhost for testing, or the like), you need to export the data from the old 4.0 database, and import to 4.1 along with correct utf-8 encoding information, so that the database engine understand the data correctly and doesn't do some unwanted "conversion". This is important; if you imported your file in a wrong way, your data got broken, and you'll probably need to try again from scratch, rather than fixing the broken data (which is always a bit unsafe).

On the old 4.0 server, export your database:
mysqldump --user=username --password=password --default-character-set=latin1 dbname > dump.sql

On the new 4.1 server, create the new database with "utf8_general_ci":
mysql --user=username --password=password --execute="CREATE DATABASE dbname CHARACTER SET utf8 COLLATE utf8_general_ci;"
OR alternatively alter your existing database:
mysql --user=username --password=password --execute="ALTER DATABASE db_name DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci;"

If the 4.1 database already was "utf8_general_ci", you may proceed happily to the next step; you needn't to do any of these two options then. If the default was utf8, but some other national collation, and you haven't permissions to alter the database settings, you may proceed too, and alter all the imported tables and columns afterwards manually, to be "utf8_general_ci". But if you've the database default encoding other than utf8 (and no permissions to alter), your best chance is probably to try and hack your dump.sql file, adding /*!40100 DEFAULT CHARACTER SET UTF8 */ at the end of every single CREATE TABLE statement, just before the final semicolon (tedious work with an utf-8 compatible text editor). If the database is latin1, you may also import with --default-character-set=latin1, and then proceed to the 1.A scenario above, to fix the rest.

Import the file to your new utf8 database:
mysql --user=username --password=password --default-character-set=utf8 dbname < dump.sql

Now, your database should be perfectly OK, "utf8_general_ci" set on all tables, and Drupal working fine. You're done.

NOTE ADDED JAN 2010: In some situations, you might encounter MySQL ERROR 1062 "Duplicate entry ... for key ..." This happens when two or more key values are considered duplicates when they are evaluated by the new version of MySQL using "utf8_general_ci" (for example "András" and "Andras"). In MySQL 4.0 these values were not considered duplicates. Likely places where this might occur include the "search_total" and "users" tables. Find and eliminate the duplicates, and then reimport the database as described above. If the duplicates are in "search_total", you may need to figure out how to truncate your search tables and reindex the site on the newer version of MySQL.


This is probably rare - it might only happen, if you're moving the site to a local server for testing, having the local server not upgraded yet. It's rather simple: Export in backwards-compatible mode, then import to 4.0:

From 4.1:
mysqldump --user=username --password=password --default-character-set=utf8 --compatible=mysql40 dbname > dump.sql

To 4.0:
mysql --user=username --password=password dbname < dump.sql


MySQL keeps track of your data's nature by using an encoding/collation information on various places. While collation (the part with various nation-names in it) is only affecting some operations like sorting and comparing, and so may be changed safely, encoding (utf8 or latin1) is addressing the nature of your data, and so needs to be always correct:

- Table columns: The encoding here describes your data directly. This is the most important bit for your database health; any change done to single columns is likely to perform a real conversion of the data, so better don't try to alter encoding here. (The script in scenario 1.A above is a way to change encoding without really changing data.)

- Tables: The info on tables is used as a default for any new columns created inside the table. You may change this to utf8 with queries like ALTER TABLE table_name CHARACTER SET utf8 COLLATE utf8_general_ci;.

- Database: The info on database is used as a default for any new tables created inside the database. This is important for imports from 4.0 dumps, as these contain no information about tables encoding/collation. You may change the database defaults to utf8 with the query ALTER DATABASE db_name DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci;, if you have permissions for that.

- Connection: This is the encoding of data coming in/out of MySQL engine, i.e. data processed by Drupal and shown on your screen. It's specified through the --default-character-set=utf8 switch for each new connection (or with SET NAMES utf8 query, as seen in Drupal itself, as well as *some* MySQL 4.1 dumps). Default is latin1, which we don't want.

If the connection is different from the table column's encoding, MySQL performs a conversion, attempting to present more or less correct data. Being that the case, you may encounter the "Garbled" or "Question marks" problems, either on data read from OR stored to the database, or at the very least a bit worse performance. This is where all the damage happens on migration.

Generally, whichever way you choose to manipulate your data, you need all these four settings to be "utf8" and "utf8_general_ci", or at the very least you need to have all table columns as "utf8_general_ci".

Having table columns on latin1 means conversion (bad performance, and question-marks problem), and having table columns different from each other means "Illegal mix of collations" (MySQL can't perform certain operations (joins) between different collations).

When Drupal adds a new table to the database (for example during update.php run, or new module installation), "utf8_general_ci" is always enforced, so you needn't to worry about the defaults of your database. But however, some other outside-of-Drupal tools/situations - especially imports needed for migrations and backup-restoring, or manually performed queries - may fail to enforce that, resulting in the database defaults being used for new tables. That's why we want to have the database default set to "utf8_general_ci" - or else double check all newly created/imported tables, and change them to "utf8_general_ci" immediately.


Sometimes it happens, that you can't use the shown command-line examples, because you're on a different system, you've not enough permissions, or you have no access to the command line at all. If your hosting provider don't let you in, you may install some of existing web-based applications, such as for example the popular PhpMyAdmin, or for large databases some other ones, like MySQL Dumper, which is able to handle large data avoiding php timeout problems. You may then execute your operations using these applications - sometimes it's even more comfortable.

Make sure that such applications are always protected from unauthorized access! If there's no other option (MySQL Dumper), consider at least basic HTTP authentication.

This page can't describe all the options of all applications in detail - please refer to corresponding manuals for that. As for the examples provided here - for dumping/importing data, you need to use the corresponding functionality provided, while other queries may be usually run directly:

For example, from this command:
mysql --user=username --password=password --execute="ALTER DATABASE db_name DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci;"
You need to take the actual query:
And just paste it to the "SQL" input of PhpMyAdmin (or similar).

There's one important bit, though: The connection encoding. To emulate the --default-character-set=utf8 switch, it's necessary to execute SET NAMES utf8 query on every page load. Some applications do this, some not. PhpMyAdmin lets you choose the connection, while MySQL Dumper (speaking of version 1.21-b6) needs to be hacked a bit to import your MySQL 4.0 dump well: Add a line $res=MSD_query('SET NAMES utf8'); just above final return; of function MSD_mysql_connect() in the file mysqldumper/inc/mysql.php.


Depending on how the migration was done (attempted), you may find yourself in four different states:

                   |Import done with the switch | Import done without   |
                   |--default-character-set=utf8|encoding specification,|
                   |  or SET NAMES utf8 query   |   or with latin1      |
|        |File from|                            |                       |
|        |MySQL 4.0|                            |           (B)         |
|        |(encoding|            (A)             | (Also happens after   |
|        |info not |                            | MySQL upgrade without |
|Database|included)|                            | really moving data)   |
|default +---------+----------------------------+-----------------------+
| latin1 |File from|                            |                       |
|        |MySQL 4.1|                            |                       |
|        |(with    |            (C)             |           (D)         |
|        | encoding| (You want to fix database  |(You want to fix datab.|
|        |  info)  |  default collation too )   | default collation too)|
|Database|No matter|                            |                       |
|default | which   |            (C)             |           (D)         |
| utf8   |  file   |                            |                       |

(A) - You have "latin1_swedish_ci" on your tables and columns, your contents looks more or less correct, but some special characters are replaced by question-marks. Currently, your data are stored in Latin 1 encoding, which can't handle all the characters, and requires MySQL to perform a conversion on each query.

It's probably best to use your backup now, and start the migration over again.

If you have no backup to revert to, you may try to execute the following query for each of your Drupal tables:
ALTER TABLE table_name CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci;
This will convert your data and tables back to correct utf-8 state (including any data submitted to the site in the broken state, even), but however, the question-marks in your contents will stay. This is caused by latin1 being incapable to store full range of utf-8 characters, so the damage is permanent, and this way should not be considered safe.

You still want to correct the database defaults, to avoid running into the "Illegal mix of collations" error later (see paragraph 4.):

(B) - You have "latin1_swedish_ci" on your tables and columns, and your contents looks "Garbled". But still, your data are stored as correct utf-8 and no damage done yet! You have only just wrong encoding information in your database (causing MySQL to break your data on the way out).

The fix is not as simple as executing a query or two. Please read the scenario 1. above for details, or revert to a backup if this is a result of unsuccessfull import just done. The fix is safe, however, unless some new data got submitted in garbled state.

It's also a good idea to fix the database defaults (see above).

(C) - Congratulations! You have "utf8_general_ci" on your tables and columns, correct data inside, and Drupal working well. Unless you need to correct the database defaults, you're done already, and completely safe.

(D) - You have "utf8_general_ci" on your tables and columns, but your contents is "Garbled". Unfortunately, that's really how your data are stored in this case (after double utf-8 encoding, all two-byte characters turned into four bytes, and so into pairs of weird characters, one per original byte).

It's probably best to use your backup now, and start the migration over again.

If you have no backup to revert to, you may try to convert your data back with the following query for each of your Drupal tables:
ALTER TABLE table_name CONVERT TO CHARACTER SET latin1 COLLATE latin1_swedish_ci;
If everything went well, this took you back to the (B) state, which is still wrong, but allows you to proceed to an other fix at least. Now you may perform the steps described in scenario 1. above, to fix the remaining problem. But since your data undergone double conversion, this way should not be considered safe.

Depending on the circumstances, you might still need to correct the database defaults.


See also the official MySQL page.

More discussion may be also found on the original issue and various pages linked from there.

Looking for support? Visit the forums, or join #drupal-support in IRC.