Support for Drupal 7 is ending on 5 January 2025—it’s time to migrate to Drupal 10! Learn about the many benefits of Drupal 10 and find migration tools in our resource center.
I've been playing around with sql-sanitize for a while and I've found myself executing UPDATE users SET init = mail;
almost every time after --sanitize-email.
I'm aware that in some cases, like when you're using bakery, users.init doesn't contain an email address but I think it's the edge case and probably drush can provide a command modifier for such cases.
So should --sanitize-email modify users.init?
I think at least a mention to init field in the help/docs would be useful.
Best regards
Comment | File | Size | Author |
---|---|---|---|
#15 | 1827954.patch | 808 bytes | douggreen |
#12 | 1827954.patch | 1.48 KB | douggreen |
#10 | 1827954.patch | 1.23 KB | douggreen |
#4 | 1827954.patch | 1.2 KB | douggreen |
#3 | 1827954.patch | 1.2 KB | douggreen |
Comments
Comment #1
moshe weitzman CreditAttribution: moshe weitzman commentedYes, I agree that we should sanitize this.
Comment #2
douggreen CreditAttribution: douggreen commentedAttached is a simple fix. It involves two additional passes on the {user} table. But I thought this was preferable to some fairly messy SQL, that would need to be different for MySQL and Postgres.
I don't think I set the version right. This is a patch for 6.0-dev.
Comment #3
douggreen CreditAttribution: douggreen commentedAttached is an updated version that (a) makes sure we are sanitizing emails, and (b) goes a step further and sanitizes user names that look like emails.
Comment #4
douggreen CreditAttribution: douggreen commented... And fixing a message to be consistent ("email" instead of "mail").
Comment #5
gregglesI think this is a better version.
Comment #6
greg.1.anderson CreditAttribution: greg.1.anderson commentedThis looks like a good idea. If users change their email address, though, then init != mail, and init will not be sanitized. Would it make more sense to just invariantly set init to mail after sanitizing mail?
Comment #7
gregglesIt's really hard to say what the "right" thing is to do. If you use bakery and want to sanitize a subsite then init will contain http://example.com/user/1/edit - is it appropriate to clear out the init column in that case?
I think #4 attempts to retain some data if that's appropriate, but as #6 points out that might leave some emails exposed. I think it would be best to just munge the init column 100% and/or move that to http://drupal.org/project/paranoia
Comment #8
douggreen CreditAttribution: douggreen commentedYes, the init = mail check does not catch everything. But it will catch quite a bit more that we don't current catch. So, while it's not perfect, it is better than what we have.
The other option is to write a LIKE to find email addresses. But as soon as we do this, someone will need to test the SQL on Postgres.
This will look for email addresses like foo@bar and ignore it if it's within a serialized string.
FYI, any module that does something different with init should implement hook_drush_sql_sync_sanitize.
Comment #9
greg.1.anderson CreditAttribution: greg.1.anderson commentedI think that LIKE '%@%' AND init NOT LIKE '%:%' is good. It is unfortunate that bakery and other modules overload the users init field, but I suppose there is nothing we can do about that here.
Comment #10
douggreen CreditAttribution: douggreen commentedre-rolled with new WHERE clause ....
Comment #11
moshe weitzman CreditAttribution: moshe weitzman commentedI think we should just set init to the new value of mail and be done with it. We don't need an extra query just for init. if bakery module is too lazy to add own column or join to own table, then those sites will need to write own --my-sanitize routine (if they actually care abut preserving the init column).
Comment #12
douggreen CreditAttribution: douggreen commented@moshe, are you also against the user name UPDATE?
Attached is a patch that just sets init, but also resets name's that look like email addresses.
Comment #13
douggreen CreditAttribution: douggreen commentedComment #14
moshe weitzman CreditAttribution: moshe weitzman commentedWe should leave username alone IMO.
Comment #15
douggreen CreditAttribution: douggreen commentedOk, attached patch fixes only the init issue per Moshe's request/suggestion.
I'm a bit frustrated that the additional sanitation code isn't making it into drush. For anyone else interested in this, here is the custom code to do the additional sanitation, just put this in your own sample.drush.inc.
Comment #16
douggreen CreditAttribution: douggreen commentedComment #17
greggles@douggreen - I feel your frustration, a similar discussion happened in #1565686: make drush sql sanitize more thorough which is part of why I suggested earlier that you make your work a patch to paranoia. It would be totally appropriate there to mess with usernames. I hope you will make a followup to paranoia.
Comment #18
moshe weitzman CreditAttribution: moshe weitzman commentedCommitted to 5 and 6. Thanks.
Comment #19
gregglesI created #1884338: more sanitizing - usernames/emails in paranoia so we don't forget it.