Demo only records/dumps the charset for tables and columns, but not their collation.

Hence, when dumping and restoring from a snapshot, all tables and fields are recreated using MySQL's default collation (which is almost impossible to alter, as far as I can tell).

Especially when trying to use snapshots between MySQL 5.0 and 5.1 servers, this bug prevents successful importing of snapshots through other means (phpMyAdmin). To spit out some non-obvious details here, MySQL 5.1 contains an incompatible change in handling of keys/indexes for the utf8_general_ci collation, which apparently is only mentioned in a single, almost hidden sentence in MySQL's upgrade docs. When trying to import a snapshot from 5.0 in this collation, you'll get nasty duplicate key errors and similar weird errors. The solution is to switch the collation from utf8_general_ci to utf8_unicode_ci, which can be done in various ways, but doesn't matter for this issue.

What matters is that Demo snapshots do not contain any collation specs for table/column schema information when exporting on 5.0. Apparently, the information exists when exporting on 5.1.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

sun’s picture

Issue tags: +utf8_unicode_ci, +MySQL 5.1

Tagging.

sun’s picture

Issue tags: +utf8_general_ci

.

sun’s picture

Background info:

List of default charsets and collations: http://dev.mysql.com/doc/refman/5.1/en/charset-charsets.html

The original bug report that lead to the backwards-incompatible collation change:
http://bugs.mysql.com/bug.php?id=43593

Incompatible change: Character set or collation changes were made in MySQL 5.1.21, 5.1.23, and 5.1.24
that may require table indexes to be rebuilt.
For details, see Section 2.12.3, “Checking Whether Table Indexes Must Be Rebuilt”.

Source: http://dev.mysql.com/doc/refman/5.1/en/upgrading-from-previous-series.html

MySQL 5.1.24, 5.4.0 (Bug#27877)
Affects indexes that use the utf8_general_ci or ucs2_general_ci collation for columns that contain 'ß' LATIN SMALL LETTER SHARP S (German).
Affected tables can be detected by CHECK TABLE ... FOR UPGRADE as of MySQL 5.1.30, 5.4.0 (see Bug#40053).

Source: http://dev.mysql.com/doc/refman/5.1/en/checking-table-incompatibilities....

CAST() is using the default collation of the connection.
http://dev.mysql.com/doc/refman/5.1/en/charset-convert.html

SET NAMES uses the default collation of the server (utf8_general_ci for charset utf8), only SET CHARACTER SET respects the default collation of the database.
http://dev.mysql.com/doc/refman/5.1/en/charset-connection.html

Both in D6 and D7, Drupal's MySQL driver initializes the connection with:

  mysqli_query($connection, 'SET NAMES "utf8"');
sun’s picture

Actually, SHOW CREATE TABLE contains collation information, but only if the table/column collation differs from the default collation.

We therefore have separate issues:

  1. Demo module needs to insert "similar" lines like the following (and like PMA + mysqldump) into dumps, so as to read the dumped values in the proper charset and collation during reset/import:
    /*!40101 SET @OLD_CHARACTER_SET_CLIENT=@@CHARACTER_SET_CLIENT */;
    /*!40101 SET @OLD_CHARACTER_SET_RESULTS=@@CHARACTER_SET_RESULTS */;
    /*!40101 SET @OLD_COLLATION_CONNECTION=@@COLLATION_CONNECTION */;
    /*!40101 SET NAMES utf8 */;
    
    ...
    
    /*!40101 SET CHARACTER_SET_CLIENT=@OLD_CHARACTER_SET_CLIENT */;
    /*!40101 SET CHARACTER_SET_RESULTS=@OLD_CHARACTER_SET_RESULTS */;
    /*!40101 SET COLLATION_CONNECTION=@OLD_COLLATION_CONNECTION */;
    

    We can skip the conditional comments, as D6 requires MySQL 4.1.

    We perhaps want to use SET CHARACTER SET instead of SET NAMES. Not sure though.

  2. Additionally, the default database collation can differ between servers. Since SHOW CREATE TABLE only adds collation info for tables and columns in case they differ from the defaults, we need to include a COLLATE statement to the CREATE TABLE statement:
    CREATE TABLE `batch` (
      `bid` int(10) unsigned NOT NULL auto_increment,
      `token` varchar(64) collate utf8_unicode_ci NOT NULL,
      `timestamp` int(11) NOT NULL,
      `batch` longtext collate utf8_unicode_ci,
      PRIMARY KEY  (`bid`),
      KEY `token` (`token`)
    ) ENGINE=MyISAM AUTO_INCREMENT=21 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
    

    Note that CHARSET is always added, also if it is equal to the default.

    The MySQL reference manual does not state whether the collation specified via COLLATE for the table is taken over as default for table columns. Let's hope so, but needs manual testing.

  3. Drupal core needs to use SET CHARACTER SET instead of SET NAMES when initializing the database connection, so as to respect the database's default collation for new tables and columns, instead of using the servers/connections default collation.
sun’s picture

Status: Active » Needs review
FileSize
5.14 KB

Implements 1) and 2), plus plenty of comments.

sun’s picture

sun’s picture

Status: Needs review » Needs work

On Win32, PHP 5.2.6 PDO with MySQL 5.0.x also seems to append an InnoDB free space marker to all table comments, which we should manually remove, if existent:

) ENGINE=InnoDB COMMENT='Sets up access permissions for blocks based on user roles; InnoDB free: 84992 kB';
sun’s picture

Status: Needs work » Patch (to be ported)
FileSize
6.75 KB

Committed attached patch to HEAD (D7).

sun’s picture

Status: Patch (to be ported) » Needs review
FileSize
8.5 KB

Same patch for D6 (untested).

Last hunk needs further consideration. We likely need a new helper to generate that table info and status.

sun’s picture

Status: Needs review » Fixed
FileSize
9.5 KB

Committed attached patch to D6.

Status: Fixed » Closed (fixed)
Issue tags: -utf8_unicode_ci, -MySQL 5.1, -utf8_general_ci

Automatically closed -- issue fixed for 2 weeks with no activity.