In email_registration_user, if the generated username already exists, the code will issue a query to find the last numeric sequence number in use with that username, then add one to it and change the username to that. But because Drupal is generally run in a multi-process or multi-thread environment and the check and the update do not happen atomically, it is possible for another user to create an account with the same generated name, which would be assigned the same sequence number. The sequence is like this:
- User 1 creates account 'racer@example.com'
- User 2 creates account 'racer@aol.com' at about the same time
- email_registration_user runs for User 1, finds that username racer already exists, chooses sequence number 1
- email_registration_user runs for User 2, finds that username racer already exists, chooses sequence number 1
- email_registration_user runs for User 1, changes username to racer_1
- email_registration_user runs for User 2, changes username to racer_1. SQL statement fails because of duplicate username, and user gets a weird SQL error in their browser
My solution is just to always append their UID. We already know it is unique. This also simplifies the query, and make it work on Postgres (see http://drupal.org/node/421078).
Patch is attached.
Comment | File | Size | Author |
---|---|---|---|
#37 | 423920-37.patch | 3.64 KB | andypost |
#37 | interdiff.txt | 757 bytes | andypost |
Comments
Comment #1
twistor CreditAttribution: twistor commentedI just posted a patch to another issue when I saw this and it got me thinking. Why not just replace the @ with an underscore and put the whole email address in. There shouldn't be any collisions at all that way, plus every user name could be computed from their email address without added information.
Comment #2
scottgifford CreditAttribution: scottgifford commentedThere is a privacy issue with revealing the user's entire email address as their username, if their username is ever visible.
Comment #3
gregglesI agree with scottgifford - that we can't show the entire e-mail address. Even showing partial e-mail is a potential issue because spammers can guess the domain (yahoo, hotmail, gmail will cover 75% of the users on most sites).
What about #247717: provide a hook so other modules can help generate the user name which would let the admin define a style for the username generation including UID. On small sites the race condition will never be a problem and uids in the name are kind of annoying so they could be ommitted there.
Comment #4
scottgifford CreditAttribution: scottgifford commentedIMO it is a mistake to ignore the race condition, even for smaller sites. Don't sites generally avoid showing the username if they are using email registration, and so the username doesn't matter at all?
Without using the UID, the trick would be to either make the change atomically (perhaps by locking the SQL tables), or else to detect and handle the error, and try again with a new username. That's probably a better solution overall, but using the UID was fast and easy. :)
Comment #5
Christopher Herberte CreditAttribution: Christopher Herberte commentedThis would fix also #421078 postgres compatibility issue. Adding _UID certainly would make the function less complicated although there would need to be a check for existing usernames for sites already using the module.
Comment #6
scottgifford CreditAttribution: scottgifford commentedHrm, so you're saying that maybe I create an account for "user@example.com that is assigned the UID "123", so username "user_123" is generated. Previously a user has signed in and already has the name "user_123". I hadn't thought about that; we don't have to worry about it on our site because we have used this patch since the beginning.
Maybe the easiest solution is to just lock the tables, then. Adding a new user is a fairly infrequent event, and the tables will just be locked for a short time.
Comment #7
gregglesDrupal now has a semaphore system that we could potentially use instead of the UID.
Comment #8
gregglesUsing the uid seems like a decent idea so we avoid the slowish semaphore system.
Attached is a patch to do this. I didn't want to do it in the uniqueness check nor the email_registration_cleanup_username because those have a specific purpose and this seems like it would be an unwanted side-effect in the use of those functions.
Comment #10
greggles#8: 423920_prevent_race_with_uid.patch queued for re-testing.
Comment #11
gregglesWhoops. I committed that patch accidentally so it stopped applying - see https://drupal.org/node/1996664#comment-7823177
Comment #13
gregglesOK, that's a valid failure now though probably just the test needs to be updated.
Comment #14
gregglesahem.
Comment #15
gregglesHere's a reroll/rethink.
1) this should be in the overridable default cleanup function
2) it should be keep the functions compatible with any current usage (e.g. I work on a codebase that uses the function directly without passing in uid)
3) for some reason $uid was in a function where it wasn't used - removed that
Comment #17
gregglesOK, let's try this.
Comment #19
gregglesOne more try - somehow I missed the fact that $uid was being used in the unique check.
Yay testing!
Comment #20
gregglesAnd status to get it tested.
Comment #22
gregglesGreat, fixed!
I don't plan to backport things to 6.x. If someone cares about that please reroll the patch and change the status to needs review.
Comment #24
joachim CreditAttribution: joachim commentedIs there any way to turn this off? My clients are complaining about the appended uid :(
Comment #25
gregglesOf course! implement hook_email_registration_name and then this code gets skipped. Copy and paste what you like.
I'd also be open to making this optional if you want to write a patch for that instead.
Comment #26
joachim CreditAttribution: joachim commented> Using the uid seems like a decent idea so we avoid the slowish semaphore system.
Is the semaphore system that slow? Surely user registration is an event that doesn't need to be optimized for speed -- we're doing a load of DB inserts anyway.
And while I can write a patch to add an option to turn off the uid suffix:
a) it means anyone turning it off is then vulnerable to the race condition problem, which at least one person in this thread thinks is unwise even for small sites (plus I don't know what is meant by 'small'!)
b) it means that we still have the really ugly appending of the uid enabled by default, and the only available option for sites that are large enough to be concerned about the race condition
Can we rethink this?
Comment #27
gregglesIn my experience the semaphore system is slow, yes. I guess that was using the default mysql-based lock which is slower than something like a redis/memcache lock, but most sites do use the default one.
I don't find the uid on the end to be particularly "ugly". I also believe that most people using this module have hidden the username on their site and are using something like https://drupal.org/project/realname
There's a proposal which addresses this in a slightly different way at #2273265: An interface to configure username. Maybe that interests you?
Comment #28
joachim CreditAttribution: joachim commented> In my experience the semaphore system is slow, yes
That's fair enough, but my point is, does performance matter for a process such as user registration? It happens relatively rarely (compared to things like page loads and user logins), and it's already a slow process as it involves a lot of DB writes.
> I don't find the uid on the end to be particularly "ugly".
My client's words, not mine! :)
On my project, we're just telling users to log in with the first part of their email address -- due to #657472: Add setting to allow users to login with email address or username. If they now have a UID appended, then that's no longer true, and they have to be aware of what their username is -- which this module is meant to hide!
> There's a proposal which addresses this in a slightly different way at #2273265: An interface to configure username: An interface to configure username. Maybe that interests you?
The thing is that that won't fix the race condition if you configure it to not append the UID -- you'll hit the problem that this issue was meant to tackle.
PS. In case it's not clear, I'm volunteering to work on changing this to use the semaphone system :) -- should I reopen this or file a new issue?
Comment #29
emjayess CreditAttribution: emjayess commentedcaught me off-guard (no I don't always read release notes :P); do find it unsightly; did not have realname in place... but expediting that now; thanks for the documentation! :)
Comment #30
andypostTests are fixed in commit 4278/57f, needs port to 8
Comment #31
estoyausentePorted to D8, please test it.
Comment #32
estoyausenteComment #34
estoyausenteI ported it but I can't reproduce because I can't create two user at time, only have 1 mouse ^^
But... if it's a condition to create a 8.0 version we must to fix it :P
Comment #35
andyposttabs should be replaced with spaces
not properly ported
Comment #36
andypostRe-roll and fix nits, not sure it's ok to change it but no usages in contrib found
- http://grep.xnddx.ru/search?text=email_registration_cleanup_username
- http://grep.xnddx.ru/search?text=email_registration_unique_username
Comment #37
andypostunderscore expected
Comment #38
gregglesNeeds work for the test fails.
Comment #39
gregglesThis also changes the API, so is worth an extra big release note on that.
Comment #40
Grevil CreditAttribution: Grevil at DROWL.de commentedComment #41
Grevil CreditAttribution: Grevil at DROWL.de commentedThis can be still reproduced in 2.x, in "hook_ENTITY_TYPE_presave()" through commenting out
$new_name = email_registration_unique_username($new_name, (int) $account->id());
in line 58 in the email_registration.module.This is quite the edge case, but should be solved nonetheless. I am not a big fan of adding a UUID at the end of the username. Could this eventually solved by executing an SQL transaction?
Comment #42
joachim CreditAttribution: joachim commented> Why not just replace the @ with an underscore and put the whole email address in. There shouldn't be any collisions at all that way, plus every user name could be computed from their email address without added information.
> There is a privacy issue with revealing the user's entire email address as their username, if their username is ever visible.
Alternative idea: instead of appending the uid, or the email address, append a short hash of the username. That way we get something (probably!) unique, but we don't need the uid, and we don't expose the email address.
See https://roytanck.com/2021/10/17/generating-short-hashes-in-php/ for instance for creating short hashes.
Comment #43
AnybodyGreat idea @joachim! Anyone willing to implement that?