Drupal.org's collations are binary, allowing users with names only varying by case, like http://drupal.org/user/689172 and http://drupal.org/user/21209. Association.drupal.org does not allow this, so bakery can not copy the 2nd user.

Work with all users with case-only changes to make their names unique. Use non-binary collation everywhere.

Drupal.org on D7 now prevents new users from using case-insensitive duplicate names, stopping the source of the problem. Now we need to resolve existing duplicates.

Queries to run on staging only:

Find all accounts with
SELECT count(1) c, u.name FROM users u WHERE u.uid > 0 GROUP BY lower(u.name) HAVING c > 1;

Find blocked duplicates to rename
SELECT count(1) c, sum(u.status = 1) active, u.name FROM users u WHERE u.uid > 0 GROUP BY lower(u.name) HAVING c > 1 AND c > active;

Summarize duplicates with elevated roles:
SELECT count(DISTINCT u.uid) c, sum(ur.rid IN (3,26,16,4,38,12,30,14,22,28,32,34,7,24,20)) roles, u.name FROM users u INNER JOIN users_roles ur ON ur.uid = u.uid WHERE u.uid > 0 GROUP BY lower(u.name) HAVING c > 1 AND roles > 1 ORDER BY NULL;

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

drumm’s picture

(This combined with #1033360: bakery_request_account() needs to check for existing email and be better at cleanup causes uid 0 to be deleted, which was lots of fun to debug.)

Gerhard Killesreiter’s picture

How many accounts are there that would need fixing? Generally, I would prefer to use A).

Gerhard Killesreiter’s picture

Neil has compiled a list of accounts. It is unfortunately a lenghty list ( ~ 25k users).

I've compiled a 2nd list that looks whether the account was ever used. That cuts it to 13k users.

I believe we should do the following:

1) Make sure that no new "duplicate" accounts can be created through some code in drupalorg

2) somehow take care of the 12k not used accounts

3) Add code that tells users to chose a different username when they log in or edit their account.

4) Look into which of the used duplicate accounts still got used in the past year or so. Change the usernames of those that didn't get used.

We should then see what further actions need to be taken after some time.

Gerhard Killesreiter’s picture

We could also consider to send a mail to all the affected accounts. This would need some way for them to indicate whether

1) they still intend to use their account on d.o

2) If they are willing to change their username

This should have links in the mail to click on. In cases where people do not reply we should then deactivate the account (and change the username).

nnewton’s picture

subscribe

drumm’s picture

We should go ahead and put binary collations on everything since these duplicate accounts aren't going away quickly. Then proceed with cleanups as killes described.

drumm’s picture

Title: Sync collations on {user}.name, mail, and init » Deny new accounts with case-insensitive duplicate names
Project: Drupal.org infrastructure » Drupal.org customizations
Version: » 6.x-2.x-dev
Component: Operating system » Code
Priority: Major » Normal

I ran ALTER TABLE users CHANGE name name varchar(60) collate utf8_bin default NULL; on

http://localize.drupal.org/
http://api.drupal.org/
http://groups.drupal.org/
http://association.drupal.org/
http://association.drupal.org/intranet/
https://security.drupal.org/
http://chicago2011.drupal.org/

And verified each had uid 0, g.d.o did not. I made a note of this at http://drupal.org/node/1030588.

This issue is now about preventing more duplicate accounts.

twardnw’s picture

As this came up in conversation recently, I went and tested this out, I was able to register TwardNW

Franz-m’s picture

"If they are willing to change their username"

I guess that depends on the effects on 'connected' accounts: If going on several other Drupal-sites with the same nick (but outside the reach of the drupal.org account which I expect to update automatically) , this would be messy ...

Franz-m’s picture

I still bother for the 'old accounts', and I am affected by this bug ...

I would like to propose a modified procedure:
Simply "allow" for "old accounts" with duplicate/different case names to log in on a subdomain account in spite of violating the normal case rule - and only the old cases, simply recognized by registration date.
- No new account with "wrong" case will be allowed, so now new cases are to expect.
- the "old accounts" can be catered for in the bakery module. Log in /with such an account/ is not so frequent to give a performance problem I assume.

drumm’s picture

Title: Deny new accounts with case-insensitive duplicate names » Deny new & changed accounts with case-insensitive duplicate names

Properly supporting old accounts which only vary by case isn't just bakery module. The root cause is the database's binary or not collation on the user name column. Bakery (correctly) works with what core does, assuming usernames aren't case sensitive.

Patches and modifications for speed on Drupal.org unintentionally opened up this situation. We always want to get back to vanilla Drupal core when practical, and more-vanilla subsites are running into this right now.

DanZ’s picture

Version: 6.x-2.x-dev » 7.x-3.x-dev

A strange thing happened to me that may be related to this. Here's the history, as well as I can remember it:

Account http://drupal.org/user/102772, already existed, named "danz".

Years later, I created account http://drupal.org/user/2101870, named DanZ. It worked fine. I used it to make forum posts, set up GIT sandboxes, and write comments on api.drupal.org.

I had to use a different GIT name. However, it's not possible to change that once it's set. I decided to make a new account with the same user name and a different GIT name.

I re-named http://drupal.org/user/2101870 to "Dan Z" (with a space).

I added a new e-mail address to 2101870 and made it primary.

I removed the original e-mail address from 2101870.

I created this account, named "DanZ" http://drupal.org/user/2392514 with the original e-mail address.

This account mostly works, but connecting to api.drupal.org gives me:

Your user account on Drupal API appears to have problems. Would you like to try to repair it yourself? Otherwise you can contact the site administrators.

This takes me to http://api.drupal.org/bakery/repair.

Repair account
An account with a matching username was found. Repairing it will reset the email address to match your master account. If this is the correct account, please enter your Drupal API password.
Username: *
Email address: * *hidden*
Will change to dzerkle-drupal@ztwistbooks.com.
Password: *
Enter the password that accompanies your username.

It looks good up to here, but then it doesn't recognize my password, and I'm stuck.

See #1851332: api.drupal.org account broken for my support request.

(This is a 7.x-3.x issue now, right?)

drumm’s picture

Since we are upgrading Drupal.org to D7, the version doesn't matter. If we want Drupal.org to handle this on D6, we need it there. And on D7 regardless.

What happens is that each subsite will get one user or the other in their users table. Since the core default is non-binary collation, the subsite loads that user for any cases used. Luckily, this doesn't result in you becoming the other user or anything. It does prevent bakery from functioning correctly.

Really, we should never have allowed this on Drupal.org, but it was an unintended side effect of performance improvements. We can't just switch back because we would need to stop new duplicates from happening and resolve the existing ones.

markpavlitski’s picture

Status: Active » Needs review
FileSize
2.25 KB

This patch adds a validation function when registering or editing a user profile, preventing duplicate case-insensitive usernames.

drumm’s picture

Status: Needs review » Fixed

Looks good, committed.

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

drumm’s picture

Title: Deny new & changed accounts with case-insensitive duplicate names » Clean up accounts with case-insensitive duplicate names
Project: Drupal.org customizations » Drupal.org infrastructure
Version: 7.x-3.x-dev »
Component: Code » Other
Status: Closed (fixed) » Active

Now that Drupal.org on D7 will prevent new users from using case-insensitive duplicate names, stopping the source of the problem, we need to resolve existing duplicates. We need a plan along the lines of:

  1. Block/rename or delete duplicates who have never logged in.
  2. Rename duplicates who are blocked.
  3. Make a policy & process to handle duplicates between two active people.
drumm’s picture

Issue summary: View changes

Getting ready to kick back to infra.

drumm’s picture

Issue summary: View changes

This currently affects just over 48,280 accounts.

drumm’s picture

We're (Drupal Association staff) are starting by cleaning up the duplicates of users with elevated roles. They can be found with a query like:

SELECT DISTINCT u.uid, u.name, dupe.name, dupe.uid, dupe.status, from_unixtime(dupe.access), from_unixtime(dupe.login) FROM users_roles ur INNER JOIN users u ON u.uid = ur.uid INNER JOIN users dupe ON lower(dupe.name) = lower(u.name) AND dupe.uid <> u.uid WHERE ur.rid IN (3, 26, 16, 4, 38, 12, 30) ORDER BY u.uid;

Since this is borderline-sensitive information, there is a slim chance someone made a duplicate with the intent of impersonation, I'm not posting them here. We're only getting a handful of accounts fixed manually at this time.

drumm’s picture

This initial round has been cleaned up, the next is including Git vetted users, (3, 26, 16, 4, 38, 12, 30, 14, 22, 28, 32, 34, 7, 24). 383 more duplicates.

I see quite a few of the duplicates have not logged in, their value in users.access is 0. Should we rename or delete their accounts?

drumm’s picture

Specifically, 88 out of 379 have users.access = 0.

(4 more were blocked accounts that I renamed.)

killes@www.drop.org’s picture

I think deleting them is fine.

tvn’s picture

I agree, deleting them should be ok. Except if there are any accounts which were created a few hours/minutes before you ran the query.

drumm’s picture

The count right now is 48,556.

Notice this has gone up since #18. I believe this is due to #2299919: Ensure accounts created on subsites can not have duplicate usernames. That is closed up as of earlier today, with #2319463: Redirect to Drupal.org for new accounts being backported and deployed don the D6 sites.

drumm’s picture

Specifically, 88 out of 379 have users.access = 0.

I was mistaken, there were only 9, which are now deleted. The 88, now ~79, have user.login = 0. Are those okay to delete or rename?

tvn’s picture

I'd same rename, unless their last access date is old, like older than a year.

drumm’s picture

There are currently 62 accounts duplicating Git Vetted users with user.login = 0. The most-recent access time is over a year ago, 2013-05-06.

(The query I'm using now is SELECT DISTINCT u.uid, u.name, dupe.name, dupe.uid, dupe.status, from_unixtime(dupe.access), from_unixtime(dupe.login) FROM users_roles ur INNER JOIN users u ON u.uid = ur.uid INNER JOIN users dupe ON lower(dupe.name) = lower(u.name) AND dupe.uid > u.uid AND dupe.login = 0 WHERE ur.rid IN (3, 26, 16, 4, 38, 12, 30, 14, 22, 28, 32, 34, 7, 24) ORDER BY dupe.access;)

tvn’s picture

Then it's probably ok to delete those.

drumm’s picture

Done. I can get a new list once everything is synced down to staging.

joshuami’s picture

drumm’s picture

Issue summary: View changes

We've been consistently choosing to rename blocked accounts that are duplicates of other accounts. Adding the query to find those to the issue summary.

drumm’s picture

Issue summary: View changes
drumm’s picture

Issue summary: View changes

Adding the query from #27 to the issue summary.

drumm’s picture

This currently affects just over 47,828, we've made a bit of progress. Users with role administrator, content moderator, Documentation moderator, drupal.org issue queue squad, Full HTML user, Git administrator, Git vetted user, list maintainer, Packaging whitelist maintainer, security team, spam fighter, testing administrator, user administrator, and webmaster have had their duplicates cleaned up.

The next roles to tackle are Git user and Not a spammer

drumm’s picture

Issue summary: View changes

For duplicates of accounts with the Git user role, there are 87 accounts, all of which have not actually logged in, and were created over a year ago. We've been deleting those, so I'll continue doing that. It would be good to make this into a general policy, so we can free up unused usernames.

drumm’s picture

Issue summary: View changes
drumm’s picture

This currently affects 45,486 users.

lizzjoy’s picture

Git Users with duplicate names have been updated.

drumm’s picture

This currently affects 44,970 users.

drumm’s picture

Issue summary: View changes

Putting in a new “Find blocked duplicates to rename” query that’s much faster. We have 19 cases of that we could resolve easily. 2,2087 total.

drumm’s picture

Assigned: Unassigned » drumm
Issue summary: View changes

Putting in faster query for “Summarize duplicates with elevated roles.” These and duplicates involving blocked accounts have been resolved, except for one where the two accounts appear to be owned by the same person.

We will be renaming accounts starting with those who have not logged in the longest.

drumm’s picture

I am currently renaming conflicts where an account that has not logged in conflicts with one that has. This will remove roughly 1/3 of the duplicates.

drumm’s picture

Next round of automatic renames are for people who have not logged in since Jan 1, 2010. That will resolve 8,409 of the 12,322 remaining conflicts.

drumm’s picture

Status: Active » Needs review

We’re down the the last 5 conflicts which we’re resolving manually. The queries used to pick users to rename were:

$result = db_query('SELECT count(1) c, sum(u.login = 0) stale, u.name, group_concat(if(u.login = 0, u.uid, NULL)) uids FROM {users} u WHERE u.uid > 0 GROUP BY lower(u.name) HAVING c > 1 AND stale > 0 AND c > stale');
$result = db_query("SELECT count(1) c, sum(u.login = 0) stale, u.name, substr(regexp_substr(group_concat(u.uid ORDER BY u.uid), ',.*'), 2) uids FROM {users} u WHERE u.uid > 0 GROUP BY lower(u.name) HAVING c > 1 AND stale > 0");
$result = db_query("SELECT count(1) c, sum(u.login < unix_timestamp('2010-01-01')) stale, u.name, group_concat(if(u.login < unix_timestamp('2010-01-01'), u.uid, NULL) ORDER BY u.uid) uids FROM {users} u WHERE u.uid > 0 GROUP BY lower(u.name) HAVING c > 1 AND stale > 0");
$result = db_query("SELECT count(1) c, sum(u.login < unix_timestamp('2015-01-01')) stale, u.name, group_concat(if(u.login < unix_timestamp('2015-01-01'), u.uid, NULL) ORDER BY u.uid) uids FROM {users} u WHERE u.uid > 0 GROUP BY lower(u.name) HAVING c > 1 AND stale > 0 AND stale < c");
$result = db_query("SELECT count(1) c, sum(u.login < unix_timestamp('2015-01-01')) stale, u.name, substr(regexp_substr(group_concat(if(u.login < unix_timestamp('2015-01-01'), u.uid, NULL) ORDER BY u.uid), ',.*'), 2) uids FROM {users} u WHERE u.uid > 0 GROUP BY lower(u.name) HAVING c > 1 AND stale > 0");

And the actual renaming was done with

foreach ($result as $row) {
  $uids[] = explode(',', $row->uids);
}
$uids = call_user_func_array('array_merge', $uids);
foreach (array_chunk($uids, 50) as $uids_chunk) {
  foreach (user_load_multiple($uids_chunk) as $account) {
    // Find usable username.
    $n = 0;
    do {
      $n += 1;
      $new_name = $account->name . '-' . $n;
    }
    while (db_query_range('SELECT 1 FROM {users} WHERE lower(name) = lower(:new_name)', 0, 1, [':new_name' => $new_name])->fetchField());
    drush_log(dt('@uid @old_name → @new_name', [
      '@uid' => $account->uid,
      '@old_name' => $account->name,
      '@new_name' => $new_name,
    ]));
    $account->name = $new_name;
    $account_wrapper = entity_metadata_wrapper('user', $account);
    $account_wrapper->field_notes = ltrim($account_wrapper->field_notes->value() . ' ' . dt('Renamed due to https://www.drupal.org/project/infrastructure/issues/1034852'));
    user_save($account);
  }
}
hestenet’s picture

Status: Needs review » Fixed

After giving these final 5 users 1 weeks notice, the final five accounts are cleaned up.

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.