I've been playing around with sql-sanitize for a while and I've found myself executing UPDATE users SET init = mail; almost every time after --sanitize-email.

I'm aware that in some cases, like when you're using bakery, users.init doesn't contain an email address but I think it's the edge case and probably drush can provide a command modifier for such cases.

So should --sanitize-email modify users.init?

I think at least a mention to init field in the help/docs would be useful.

Best regards

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

moshe weitzman’s picture

Title: Shouldn't sanitize-email update users.init field? » Sanitize-email update users.init column
Category: support » bug

Yes, I agree that we should sanitize this.

douggreen’s picture

Version: » All-versions-4.x-dev
Component: SQL » User Commands
Status: Active » Needs review

Attached is a simple fix. It involves two additional passes on the {user} table. But I thought this was preferable to some fairly messy SQL, that would need to be different for MySQL and Postgres.

I don't think I set the version right. This is a patch for 6.0-dev.

douggreen’s picture

FileSize
1.2 KB

Attached is an updated version that (a) makes sure we are sanitizing emails, and (b) goes a step further and sanitizes user names that look like emails.

douggreen’s picture

FileSize
1.2 KB

... And fixing a message to be consistent ("email" instead of "mail").

greggles’s picture

Version: All-versions-4.x-dev » 7.x-5.8

I think this is a better version.

greg.1.anderson’s picture

This looks like a good idea. If users change their email address, though, then init != mail, and init will not be sanitized. Would it make more sense to just invariantly set init to mail after sanitizing mail?

greggles’s picture

It's really hard to say what the "right" thing is to do. If you use bakery and want to sanitize a subsite then init will contain http://example.com/user/1/edit - is it appropriate to clear out the init column in that case?

I think #4 attempts to retain some data if that's appropriate, but as #6 points out that might leave some emails exposed. I think it would be best to just munge the init column 100% and/or move that to http://drupal.org/project/paranoia

douggreen’s picture

Yes, the init = mail check does not catch everything. But it will catch quite a bit more that we don't current catch. So, while it's not perfect, it is better than what we have.

The other option is to write a LIKE to find email addresses. But as soon as we do this, someone will need to test the SQL on Postgres.

This will look for email addresses like foo@bar and ignore it if it's within a serialized string.

UPDATE users SET init = 'sql-sync' WHERE init LIKE '%@%' AND init NOT LIKE '%:%';

FYI, any module that does something different with init should implement hook_drush_sql_sync_sanitize.

greg.1.anderson’s picture

Status: Needs review » Needs work

I think that LIKE '%@%' AND init NOT LIKE '%:%' is good. It is unfortunate that bakery and other modules overload the users init field, but I suppose there is nothing we can do about that here.

douggreen’s picture

FileSize
1.23 KB

re-rolled with new WHERE clause ....

moshe weitzman’s picture

I think we should just set init to the new value of mail and be done with it. We don't need an extra query just for init. if bakery module is too lazy to add own column or join to own table, then those sites will need to write own --my-sanitize routine (if they actually care abut preserving the init column).

douggreen’s picture

FileSize
1.48 KB

@moshe, are you also against the user name UPDATE?

Attached is a patch that just sets init, but also resets name's that look like email addresses.

douggreen’s picture

Status: Needs work » Needs review
moshe weitzman’s picture

Status: Needs review » Needs work

We should leave username alone IMO.

douggreen’s picture

FileSize
808 bytes

Ok, attached patch fixes only the init issue per Moshe's request/suggestion.

I'm a bit frustrated that the additional sanitation code isn't making it into drush. For anyone else interested in this, here is the custom code to do the additional sanitation, just put this in your own sample.drush.inc.

/**
 * Implements hook_drush_sql_sync_sanitize().
 */
function sample_drush_sql_sync_sanitize($source) {
  drush_sql_register_post_sync_op('stratfor-user-name', dt('Reset email based names in user table'), "UPDATE users SET name = mail WHERE name LIKE '%@%';");
  drush_sql_register_post_sync_op('stratfor-user-init', dt('Reset customer init in user table'), "UPDATE users SET init = NULL;");
  drush_sql_register_post_sync_op("stratfor-vars", dt('Reset variables'), "UPDATE variables SET value = NULL WHERE name IN ('cron_key')");
}
douggreen’s picture

Status: Needs work » Needs review
greggles’s picture

@douggreen - I feel your frustration, a similar discussion happened in #1565686: make drush sql sanitize more thorough which is part of why I suggested earlier that you make your work a patch to paranoia. It would be totally appropriate there to mess with usernames. I hope you will make a followup to paranoia.

moshe weitzman’s picture

Status: Needs review » Fixed

Committed to 5 and 6. Thanks.

greggles’s picture

I created #1884338: more sanitizing - usernames/emails in paranoia so we don't forget it.

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.