With an approach finally settled in #898816: Consider using real names and/or e-mail addresses for the author/committer metadata?, we need a small, custom module that collects the necessary information from d.o users. Much of what that module will need to capture is described in #898816-15: Consider using real names and/or e-mail addresses for the author/committer metadata?, but I'll recap here.

The module must gather the following data from current d.o CVS account holders:

  • If they'd like to use the pseudo-email [username]@[uid].no-reply.drupal.org (e.g., I would be sdboyer@146719.no-reply.drupal.org). This is the default.
  • Or, if they'd like to use a real email address. If so, they should be able to select between their primary d.o email address or any email address registered with the Multiple Email Addresses module.

I don't really care if this module form_alters the account registration form we see at http://drupal.org/user/!uid/edit/cvs , or if it defines its own form on the user account - whatever's easiest for getting the data.

What makes this a bit more complicated is the data storage requirements. Ordinarily, it'd be fine to just store this on $user->data or something, but there are a few requirements that make it a little funky:

  • We need to ensure that there's an entry for every CVS account holder, including those who haven't ever touched this form, and that the entry for them is the default, anonymized email. That probably means a hook_enable() implementation which fills a db table with all the initial values.
  • We need to be able to make arbitrary additions to the table and be ensured that they're going to stick around. For example, Ken Rickard changed his CVS username somewhere waaay back from 'agentken' to 'agentrickard', and it'd be nice if we could manually capture that. Of course, that'd be data we manually enter, no need to accommodate it in the UI.
  • We need to be able to write this data to a flat file on disk, so that the migration scripts can use it easily. The way to do this that'll work best for our infra is creating a drush command that dumps all the data into the flat file. Just gotta make sure the drush command takes a path to the desired output file location as a parameter.

I'm figuring this module will probably need a table with no more than three fields - uid, name, email. That's all we really need. Though if uid is made a PK, it'll complicate scenarios like the agentken/agentrickard one above...well, figure it out :P

Note that I'm assigning this to neclimdul because he's the one primarily working on the migration scripts right now, but we're hoping for a volunteer on this one. So if you wanna do it, feel free to reassign to yourself :)

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Jonathan Webb’s picture

Are the anonymized email addresses based on values from {versioncontrol_accounts}? For example would the rough idea for the initial data population be:

INSERT INTO {versioncontrol_migration} (`uid`, `vcs_username`, `mail`)
SELECT `uid`, `vcs_username`, CONCAT(`vcs_username`, '@', `uid`, '.no-reply.drupal.org') AS mail
FROM {versioncontrol_accounts}

Would this be a D6 module?

Josh The Geek’s picture

Drupal.org runs D6, so yes.

Jonathan Webb’s picture

Assigned: neclimdul » Jonathan Webb

Assuming I'm on the right track with the query above, I should have some code ready for testing by tomorrow afternoon.

marvil07’s picture

{versioncontrol_accounts} is probably going to disappear(see #983926: Remove account class), but probably that data also is going to live at versioncontrol_account_status.

Anyway, the data should be written on an independent module and naturally you can not assume that it lives there(versioncontrol is not on d.o now).

The data about accounts is now at {cvs_accounts} table (see cvslog project). And the PK in that table is the "cvs_user" field, so, @sdboyer it seems like it is tracking the many accounts per user :-)

Jonathan Webb’s picture

Thank you for the info, Marco!

I've posted a preliminary version at: https://github.com/webbj74/cvsmigration

[removed verbose status info --JW]

So far it appears to be operating as desired. I should have the Multiple Email Address functionality in time for the stand-up tomorrow afternoon.

Jonathan Webb’s picture

Status: Active » Needs review

This module is ready for review: https://github.com/webbj74/cvsmigration

What it does presently:

  • When enabled it goes through {cvs_accounts} and associates anonymized cvs_user@uid.no-reply.drupal.org email addresses with all of the cvs_user entries that don't already have a repository mail associated with them.
  • Adds "Repository Email" select item to the cvs_user_edit_form; options include:
    • User's d.o email address
    • List of all repository emails for the user (i.e. allows the 'agentken' to 'agentrickard' situation).
    • List of all emails for the user from Multiple Emails Module (if enabled).
    • Anonymized cvs_user@uid.no-reply.drupal.org email (if it isn't already listed)
  • Drush command to download a flat-file
    • Command: drush cvsmigration-export [filepath]
    • Example CSV Output:
      "cvs_username1","example1@999.no-reply.drupal.org"
      "cvs_username2","example2@999.no-reply.drupal.org"
      

Tried to upload an example of the csv file output, but d.o is not allowing it.

Josh The Geek’s picture

Name the file blah.txt and it should allow it. Shouldn't the output include the uid?

neclimdul’s picture

@josh the geek technically it does in the fake email address. I don't see how discreetly having it would help me.

Josh The Geek’s picture

@neclimdul OK.
The code looks good, but I don't have a CVS repo to test it with.

sdboyer’s picture

This is really fantastic. I've tested it out, and it appears to all work quite swimmingly. There are just a couple, pretty small issues:

  • It's actually the Drupal username, not the cvs username, that we want to use in generating the anonymized email address. These anonymized email addresses will still be possible for people to use post-migration (when the idea of a CVS username will be a non-sequiter), so it doesn't make sense to use them now.
  • There was so much discussion & focus on the email address in #898816: Consider using real names and/or e-mail addresses for the author/committer metadata? that I think I completely forgot to mention - we actually need another piece of user information to properly fill up git data. This one's (relatively) trivial, though - we need the contents of the full-name field that d.o has. On any $user object retrieved via user_load() on d.o, that'll be in $user->full-name. Yeah, I dunno what's up with that CLEARLY invalid property name, but whatever...you can access it by transforming $user into an array or by assigning a var with the value and then using it to access the property (e.g., $var = "full-name"; $fullname = $user->$var - though note that, VERY frustratingly, I was only able to access it if I did $var = "full-name". $var = 'full-name' consistently failed.). So, I dun much care how it gets implemented - maybe the drush command loads the user account and stitches it in, or maybe we add another column to the {cvs_migration} table.

I'd SWEAR there's a third quick thing, but I can't remember it to save my life. I've made a pull request with changes that cover item #1: https://github.com/webbj74/cvsmigration/pull/1

Regardless, this is close enough that I intend to demo it tomorrow.

Jonathan Webb’s picture

Thanks for the feedback. After I pull in your changes, I'll update the .install to also perform the change based on Drupal user name (since that is where the emails are created en masse).

I may modify the set of invalid email username characters to be a little more conservative "just in case" (right now invalid chars are converted to underscores). I'm just a little concerned about the liberalness of the characters allowed in Drupal usernames (such as apostrophes) and whether they translate into actually valid email addresses. I currently use the same regular expression as Drupal's valid_email_address, but this expression isn't a perfect match to RFC 2822.

Also I think it makes the most sense to add $user['full-name'] in the Drush stage. So I will make that change as well (and I will include the uid to the output for good measure).

Jonathan Webb’s picture

FileSize
346 bytes

Updated: https://github.com/webbj74/cvsmigration to include sdboyer's changes, and some minor mods.

The drush output has 4 fields: "d.o uid","d.o Full Name","CVS Username","Email" Example (q.v. attached zip):

"1","Jonathan W. Webb","jonathan","developer2011+cvsmigration@jwwebb.info"
"21","Test\", \"CSV Injection","testuser","trasiwitucl@21.no-reply.drupal.org"
"24","shuswucl","shuswucl","shuswucl@24.no-reply.drupal.org"

In the event that the full name field is not populated, drush will fall back to $user->name. I use addslashes() to clean the "Full Name" field.

I also reduced the set of valid email address characters to [a-zA-Z0-9_\-\+]+. If $user->name has chars outside of this set, they are converted to underscores.

Josh The Geek’s picture

You should probably reduce the full name to a-zA-Z0-9_-+ too. The slashes really mess up quicklook on the mac. I'll fork your repo and fix this. My GitHub username is JoshTheGeek.

Josh The Geek’s picture

See my pull request at https://github.com/webbj74/cvsmigration/pull/2 . It also allows spaces.

Jonathan Webb’s picture

I merged this change to the drush output, but modified the regular expression to strip just control chars and quotes (rather than transform them to underscores). Thanks!

sdboyer’s picture

Status: Needs review » Reviewed & tested by the community

This all looks good to me. Let's get this module up, make it a real thing. Once we fix up multiple_email, we'll get it on d.o.

eliza411’s picture

Issue tags: +git sprint 7

Tagging for Git Sprint 7 with the intention of having this poised for launch right at the beginning of Sprint 8.

Jonathan Webb’s picture

I will create a project on d.o and upload it to CVS.

Jonathan Webb’s picture

Project CVS Migration Prefs has been added. I imported the project history from github using git-cvsexportcommit (not perfect, but was able to maintain credits for the contribs by sdboyer & joshthegeek). Dev snapshot is pending. Thanks for the assistance everyone!

marvil07’s picture

Status: Reviewed & tested by the community » Fixed
Josh The Geek’s picture

Status: Fixed » Postponed

Shouldn't we set this to fixed after it's launched on d.o?

eliza411’s picture

Status: Postponed » Fixed

The module is created and hosted on d.o. which satisfies the issue. When and how it is deployed will be accounted for in a different issue as part of the specific deployment planning process.

Status: Fixed » Closed (fixed)
Issue tags: -git phase 2, -git low hanging fruit, -git sprint 6, -git sprint 7

Automatically closed -- issue fixed for 2 weeks with no activity.