Drupal.org's collations are binary, allowing users with names only varying by case, like http://drupal.org/user/689172 and http://drupal.org/user/21209. Association.drupal.org does not allow this, so bakery can not copy the 2nd user.
Work with all users with case-only changes to make their names unique. Use non-binary collation everywhere.
Drupal.org on D7 now prevents new users from using case-insensitive duplicate names, stopping the source of the problem. Now we need to resolve existing duplicates.
Queries to run on staging only:
Find all accounts with
SELECT count(1) c, u.name FROM users u WHERE u.uid > 0 GROUP BY lower(u.name) HAVING c > 1;
Find blocked duplicates to rename
SELECT count(1) c, sum(u.status = 1) active, u.name FROM users u WHERE u.uid > 0 GROUP BY lower(u.name) HAVING c > 1 AND c > active;
Summarize duplicates with elevated roles:
SELECT count(DISTINCT u.uid) c, sum(ur.rid IN (3,26,16,4,38,12,30,14,22,28,32,34,7,24,20)) roles, u.name FROM users u INNER JOIN users_roles ur ON ur.uid = u.uid WHERE u.uid > 0 GROUP BY lower(u.name) HAVING c > 1 AND roles > 1 ORDER BY NULL;
Comment | File | Size | Author |
---|---|---|---|
#14 | drupalorg-prevent-duplicate-names-1034852-14.patch | 2.25 KB | markpavlitski |
Comments
Comment #1
drumm(This combined with #1033360: bakery_request_account() needs to check for existing email and be better at cleanup causes uid 0 to be deleted, which was lots of fun to debug.)
Comment #2
Gerhard Killesreiter CreditAttribution: Gerhard Killesreiter commentedHow many accounts are there that would need fixing? Generally, I would prefer to use A).
Comment #3
Gerhard Killesreiter CreditAttribution: Gerhard Killesreiter commentedNeil has compiled a list of accounts. It is unfortunately a lenghty list ( ~ 25k users).
I've compiled a 2nd list that looks whether the account was ever used. That cuts it to 13k users.
I believe we should do the following:
1) Make sure that no new "duplicate" accounts can be created through some code in drupalorg
2) somehow take care of the 12k not used accounts
3) Add code that tells users to chose a different username when they log in or edit their account.
4) Look into which of the used duplicate accounts still got used in the past year or so. Change the usernames of those that didn't get used.
We should then see what further actions need to be taken after some time.
Comment #4
Gerhard Killesreiter CreditAttribution: Gerhard Killesreiter commentedWe could also consider to send a mail to all the affected accounts. This would need some way for them to indicate whether
1) they still intend to use their account on d.o
2) If they are willing to change their username
This should have links in the mail to click on. In cases where people do not reply we should then deactivate the account (and change the username).
Comment #5
nnewton CreditAttribution: nnewton commentedsubscribe
Comment #6
drummWe should go ahead and put binary collations on everything since these duplicate accounts aren't going away quickly. Then proceed with cleanups as killes described.
Comment #7
drummI ran
ALTER TABLE users CHANGE name name varchar(60) collate utf8_bin default NULL;
onhttp://localize.drupal.org/
http://api.drupal.org/
http://groups.drupal.org/
http://association.drupal.org/
http://association.drupal.org/intranet/
https://security.drupal.org/
http://chicago2011.drupal.org/
And verified each had uid 0, g.d.o did not. I made a note of this at http://drupal.org/node/1030588.
This issue is now about preventing more duplicate accounts.
Comment #8
twardnw CreditAttribution: twardnw commentedAs this came up in conversation recently, I went and tested this out, I was able to register TwardNW
Comment #9
Franz-m CreditAttribution: Franz-m commented"If they are willing to change their username"
I guess that depends on the effects on 'connected' accounts: If going on several other Drupal-sites with the same nick (but outside the reach of the drupal.org account which I expect to update automatically) , this would be messy ...
Comment #10
Franz-m CreditAttribution: Franz-m commentedI still bother for the 'old accounts', and I am affected by this bug ...
I would like to propose a modified procedure:
Simply "allow" for "old accounts" with duplicate/different case names to log in on a subdomain account in spite of violating the normal case rule - and only the old cases, simply recognized by registration date.
- No new account with "wrong" case will be allowed, so now new cases are to expect.
- the "old accounts" can be catered for in the bakery module. Log in /with such an account/ is not so frequent to give a performance problem I assume.
Comment #11
drummProperly supporting old accounts which only vary by case isn't just bakery module. The root cause is the database's binary or not collation on the user name column. Bakery (correctly) works with what core does, assuming usernames aren't case sensitive.
Patches and modifications for speed on Drupal.org unintentionally opened up this situation. We always want to get back to vanilla Drupal core when practical, and more-vanilla subsites are running into this right now.
Comment #12
DanZ CreditAttribution: DanZ commentedA strange thing happened to me that may be related to this. Here's the history, as well as I can remember it:
Account http://drupal.org/user/102772, already existed, named "danz".
Years later, I created account http://drupal.org/user/2101870, named DanZ. It worked fine. I used it to make forum posts, set up GIT sandboxes, and write comments on api.drupal.org.
I had to use a different GIT name. However, it's not possible to change that once it's set. I decided to make a new account with the same user name and a different GIT name.
I re-named http://drupal.org/user/2101870 to "Dan Z" (with a space).
I added a new e-mail address to 2101870 and made it primary.
I removed the original e-mail address from 2101870.
I created this account, named "DanZ" http://drupal.org/user/2392514 with the original e-mail address.
This account mostly works, but connecting to api.drupal.org gives me:
This takes me to http://api.drupal.org/bakery/repair.
It looks good up to here, but then it doesn't recognize my password, and I'm stuck.
See #1851332: api.drupal.org account broken for my support request.
(This is a 7.x-3.x issue now, right?)
Comment #13
drummSince we are upgrading Drupal.org to D7, the version doesn't matter. If we want Drupal.org to handle this on D6, we need it there. And on D7 regardless.
What happens is that each subsite will get one user or the other in their users table. Since the core default is non-binary collation, the subsite loads that user for any cases used. Luckily, this doesn't result in you becoming the other user or anything. It does prevent bakery from functioning correctly.
Really, we should never have allowed this on Drupal.org, but it was an unintended side effect of performance improvements. We can't just switch back because we would need to stop new duplicates from happening and resolve the existing ones.
Comment #14
markpavlitski CreditAttribution: markpavlitski commentedThis patch adds a validation function when registering or editing a user profile, preventing duplicate case-insensitive usernames.
Comment #15
drummLooks good, committed.
Comment #17
drummNow that Drupal.org on D7 will prevent new users from using case-insensitive duplicate names, stopping the source of the problem, we need to resolve existing duplicates. We need a plan along the lines of:
Comment #17.0
drummGetting ready to kick back to infra.
Comment #18
drummThis currently affects just over 48,280 accounts.
Comment #19
drummWe're (Drupal Association staff) are starting by cleaning up the duplicates of users with elevated roles. They can be found with a query like:
SELECT DISTINCT u.uid, u.name, dupe.name, dupe.uid, dupe.status, from_unixtime(dupe.access), from_unixtime(dupe.login) FROM users_roles ur INNER JOIN users u ON u.uid = ur.uid INNER JOIN users dupe ON lower(dupe.name) = lower(u.name) AND dupe.uid <> u.uid WHERE ur.rid IN (3, 26, 16, 4, 38, 12, 30) ORDER BY u.uid;
Since this is borderline-sensitive information, there is a slim chance someone made a duplicate with the intent of impersonation, I'm not posting them here. We're only getting a handful of accounts fixed manually at this time.
Comment #20
drummThis initial round has been cleaned up, the next is including Git vetted users,
(3, 26, 16, 4, 38, 12, 30, 14, 22, 28, 32, 34, 7, 24)
. 383 more duplicates.I see quite a few of the duplicates have not logged in, their value in users.access is 0. Should we rename or delete their accounts?
Comment #21
drummSpecifically, 88 out of 379 have users.access = 0.
(4 more were blocked accounts that I renamed.)
Comment #22
killes@www.drop.org CreditAttribution: killes@www.drop.org commentedI think deleting them is fine.
Comment #23
tvn CreditAttribution: tvn commentedI agree, deleting them should be ok. Except if there are any accounts which were created a few hours/minutes before you ran the query.
Comment #24
drummThe count right now is 48,556.
Notice this has gone up since #18. I believe this is due to #2299919: Ensure accounts created on subsites can not have duplicate usernames. That is closed up as of earlier today, with #2319463: Redirect to Drupal.org for new accounts being backported and deployed don the D6 sites.
Comment #25
drummI was mistaken, there were only 9, which are now deleted. The 88, now ~79, have user.login = 0. Are those okay to delete or rename?
Comment #26
tvn CreditAttribution: tvn commentedI'd same rename, unless their last access date is old, like older than a year.
Comment #27
drummThere are currently 62 accounts duplicating Git Vetted users with
user.login = 0
. The most-recent access time is over a year ago, 2013-05-06.(The query I'm using now is
SELECT DISTINCT u.uid, u.name, dupe.name, dupe.uid, dupe.status, from_unixtime(dupe.access), from_unixtime(dupe.login) FROM users_roles ur INNER JOIN users u ON u.uid = ur.uid INNER JOIN users dupe ON lower(dupe.name) = lower(u.name) AND dupe.uid > u.uid AND dupe.login = 0 WHERE ur.rid IN (3, 26, 16, 4, 38, 12, 30, 14, 22, 28, 32, 34, 7, 24) ORDER BY dupe.access;
)Comment #28
tvn CreditAttribution: tvn commentedThen it's probably ok to delete those.
Comment #29
drummDone. I can get a new list once everything is synced down to staging.
Comment #30
joshuamiComment #31
drummWe've been consistently choosing to rename blocked accounts that are duplicates of other accounts. Adding the query to find those to the issue summary.
Comment #32
drummComment #33
drummAdding the query from #27 to the issue summary.
Comment #34
drummThis currently affects just over 47,828, we've made a bit of progress. Users with role administrator, content moderator, Documentation moderator, drupal.org issue queue squad, Full HTML user, Git administrator, Git vetted user, list maintainer, Packaging whitelist maintainer, security team, spam fighter, testing administrator, user administrator, and webmaster have had their duplicates cleaned up.
The next roles to tackle are Git user and Not a spammer
Comment #35
drummFor duplicates of accounts with the Git user role, there are 87 accounts, all of which have not actually logged in, and were created over a year ago. We've been deleting those, so I'll continue doing that. It would be good to make this into a general policy, so we can free up unused usernames.
Comment #36
drummComment #37
drummThis currently affects 45,486 users.
Comment #38
lizzjoyGit Users with duplicate names have been updated.
Comment #39
drummThis currently affects 44,970 users.
Comment #40
drummPutting in a new “Find blocked duplicates to rename” query that’s much faster. We have 19 cases of that we could resolve easily. 2,2087 total.
Comment #41
drummPutting in faster query for “Summarize duplicates with elevated roles.” These and duplicates involving blocked accounts have been resolved, except for one where the two accounts appear to be owned by the same person.
We will be renaming accounts starting with those who have not logged in the longest.
Comment #42
drummI am currently renaming conflicts where an account that has not logged in conflicts with one that has. This will remove roughly 1/3 of the duplicates.
Comment #43
drummNext round of automatic renames are for people who have not logged in since Jan 1, 2010. That will resolve 8,409 of the 12,322 remaining conflicts.
Comment #44
drummWe’re down the the last 5 conflicts which we’re resolving manually. The queries used to pick users to rename were:
And the actual renaming was done with
Comment #45
hestenetAfter giving these final 5 users 1 weeks notice, the final five accounts are cleaned up.