If you run a 4.6 database with two similar values in a table ("Bèr Kessels@drupal.org" and "Ber Kessels@drupal.org" in my case, in both users and authmap) through the update process to 4.7, Drupal will spit out a:
user warning: Duplicate entry 'Bèr Kessels@drupal.org' for key 2 query: ALTER TABLE authmap CHANGE `authname` `authname` varchar(128) CHARACTER SET utf8 DEFAULT '' NOT NULL, CHANGE `module` `module` varchar(128) CHARACTER SET utf8 DEFAULT '' NOT NULL in /sr...........in/includes/database.mysql.inc on line 120.
error. This error will also stop the processing of any other variables in the table. In the users table (Update 169):
Failed: ALTER TABLE {users} CHANGE `name` `name` varchar(60) CHARACTER SET utf8 DEFAULT '' NOT NULL, CHANGE `pass` `pass` varchar(32) CHARACTER SET utf8 DEFAULT '' NOT NULL, CHANGE `mail` `mail` varchar(64) CHARACTER SET utf8 DEFAULT '' NULL, CHANGE `theme` `theme` varchar(255) CHARACTER SET utf8 DEFAULT '' NOT NULL, CHANGE `signature` `signature` varchar(255) CHARACTER SET utf8 DEFAULT '' NOT NULL, CHANGE `timezone` `timezone` varchar(8) CHARACTER SET utf8 DEFAULT '' NULL, CHANGE `language` `language` varchar(12) CHARACTER SET utf8 DEFAULT '' NOT NULL, CHANGE `init` `init` varchar(64) CHARACTER SET utf8 DEFAULT '' NULL, CHANGE `data` `data` longtext CHARACTER SET utf8 DEFAULT '' NULL, CHANGE `picture` `picture` varchar(255) CHARACTER SET utf8 DEFAULT '' NOT NULL
This causes some of the values in the table to not be UTF8. This means users in the non-converted part of the database can:
1) Not post nodes (they'll reciveve user doesn't exist errors, even though they are logged in)
2) Not be found by the user search engine (even if you enter the exact name, or run the query from the command line select * from users where LOWER(name) LIKE LOWER('a_name_you_know_is_there_but_with_a_higher_uid_than_error_user');
)
This occurred for me while updating a 4.6 site to 4.7 with many thousands of users. It would be extremely time consuming (and potentially disastrous) to go though and remove nicks with special characters. The database that Drupal upgrade started as latin1.
If anyone needs further logs or information I'm on IRC as "mozillaman", on skype as "mozillamonks" or available at my contact form.
Thanks!
Robin
| Comment | File | Size | Author |
|---|---|---|---|
| #8 | update.php.patch.txt | 780 bytes | patricksettle |
Comments
Comment #1
robin monks commentedManually scanning the users table, and then renaming all users with odd characters in their names from both the users and authmap tables allowed this to update correctly. Although this is no excuse for Drupal's updater's bad converting.
Robin
Comment #2
killes@www.drop.org commentednobody else reported this problem, downgrading.
Comment #3
patricksettle commentedI've the exact same problem upgrading from 4.6 to 4.7.3 There's two problems that need to be addressed, 1st... while the upgrader gave the SQL error on the first field that it came across that it thought it was a "duplicate" (in my case it was Éleanore and Eleanore) it did not give a notification that it failed the update of the user table down below with where the "updates" are listed. It should show that which ever update that touches the user table has failed.
Secondly, the error it self. I'm not sure why MySQL is seening Éleanore and Eleanore as the same, so I'm not sure what the problem is. I'll be digging around mysql.com to see what I can come up with... but someone alot more familure with MySQL than me should also be looking into this.
Seems like a pretty critical bug to me, what with it fubar-ing the User table... without much notification to the user. Feel free to correct me.
Comment #4
patricksettle commentedDid some more research, and I beleive the "problem" is that the update is using utf8_general_ci for the Collation. utf8_general_ci is case insensitive. Which means F == f when it comes to unique fields, such as the user name field. It also means that É == E as it's the same letter but with an accent accourding to MySQL's collation. A possible solution is changing any Char based unique fields to utf8_bin collation. Which is Case Sensitive.... thus É != E the down side of course is that, in this case, usernames would be come Case sensitive ( bob and Bob would be two different users), which could be annoying.
The reason it worked under Latin is because there's no connection between É and E they are unawear of each other... thus not the same character.
Still looking for addition info on this.
Comment #5
patricksettle commentedApparently the MySQL folks feel that it's simply a miss-labled collation, and that a Accent Sensitive Collation is a "new" feature request. http://bugs.mysql.com/bug.php?id=19567
To avoid this using the LOWER() or UPPER() SQL command to convert all the names to lower or uppercase then using UTF8_bin, and changing the user.module to do the same on user functions, that would allow users to punch in their usernames any way they want, and the system would see them in all lowercase for example.... I'll poke around the user.module and see if I can come up with some sort of patch... but I'm sure someone out there's -alot- more familure with this than I am....
Other 'solutions' I can dream up at the moment is just not using utf8 encoding on the user table for MySQL untill they fix/implement "new feature", leaving it latin1... Or remove the Unique setting on the name field and do regexp searches for the username, and have php check for existing names to prevent username dups... which are both craptastick ideas if you ask me. :)
Comment #6
patricksettle commented><
Just looked into the code and found it already there....
So the only thing that needs to be changed is the Collation. Instead of UTF8-general-ci for the name field of the user table, it needs to be UTF8_bin...
Comment #7
patricksettle commentedsorry about that... previous message was this:
I checked int he code and found the lower() already in use. The only thing that need to happen to "fix" this bug in mysql is to use the utf8_bin collation. Don't have PostgreSQL, so I'm not sure if this is a problem for them.
Comment #8
patricksettle commentedHere's a possible patch for update.php to change the collate to UTF8_bin for the name field in the user table. I've tested it out a couple times using 4.7.3 and mysql 4.1 worked fine.
But I've only glanced into pgsql, and from the little I saw it seems you may not have a choice in the Collation of a field. Can some pgsql user confirm this?
If that is the case then the following would have to be worked into my patch.
Comment #9
Steven commentedWe want to use the case/accent-insensitive collation. The fact that the colliding usernames were both allowed in 4.6 should be considered a bug.
I'm sure we could eventually come up with a fix that uses forced collations before doing the actual conversions, but it would be a ton of work. Honestly the chances of this problem occuring are tiny. When it does occur, it is relatively easy to fix by hand.
I'm going to mark this as won'tfix.