All Git commits have metadata for committer and author in the form of “John Doe ”. Currently, the commits imported from CVS are stored in the form of “mikl ” – ie. the d.o username as both name and e-mail address.
When people start committing directly to Git, it will change to have the details the user provides in his .gitconfig. That will obviously cause a mismatch between the two.
The way the most Git sites does it, is that the commits are matched to users via the e-mail address. So if I commit as mikkel@example.com and that is also one of my registered e-mails, commits will be credited to my user account.
I think we currently have both name and e-mail available on d.o, and in my opinion the best thing to do would be to simply use those for the metadata, but if that is not an option, I’d like to suggest that we at least change the e-mail to something more unique, like username@git.drupal.org or something, so we’ll be able to distinguish the ported commits and figure out which user they belong to.
| Comment | File | Size | Author |
|---|---|---|---|
| #23 | git-workflow.png | 86.16 KB | webchick |
Comments
Comment #1
mparker17I don't particularly like the idea of my personal e-mail address floating out there in the wild for anyone to find. I get enough spam as it is. I'd much rather that people who want to contact me use the personal contact form I've enabled on my Drupal user account page.
I like the way that Github identifies users: by an SSH key. That way, I can set
user.emailin my.gitconfigfile to be"mparker17@work"or"mparker17@home"or whatever. I'd also be happy with keeping myuser.emailset to username@git.drupal.orgComment #2
rfayI guess I think that "realname " is the natural git way to do this, but don't blame people for not liking that. Too many people will be annoyed about that to do it.
mparker17's "username@git.drupal.org" would work fine.
Comment #3
miklOr perhaps uid@git.drupal.org, since people can change their names – so in my case 58679@git.drupal.org
Comment #4
mparker17That sounds like a great idea to me!
Comment #5
bojanz commentedI understand the reasoning behind using the UID, but what most people look for is the username itself. It needs to be shown somewhere.
Bojan Zivanovic / bojanz@git.drupal.org sounds cool (although I have no problem with using my real email, already do it on GitHub)
Comment #6
rfayAnd I don't think it's that horrible that usernames may change over time. It's happened even with core committers, and it hasn't ruined anything. username++, username@git.drupal.org++
Comment #7
webchickI like username@git.drupal.org as well. :) And no worries on it not syncing with the Drupal.org username; our current CVS ones don't either.
Comment #8
webchickMarking as "needs review". I have a feeling this is related to #782408: Decide on an approach to recording commit statistics (and thereby, standardize the recommended merge workflow) .
Comment #9
webchickComment #10
sdboyer commentedSome of this has already been discussed in #716300: Formatting of ~/.gitconfig and git user data. This conversation is, to some extent, a duplicate of that one. I'll leave it open for the moment, though, and tag it.
My goal in this discussion is to walk the line between protecting privacy and following the general patterns used in the wider git community (which is to provide a real email address, though typically an intentionally public-facing one).
To be clear, while doing direct repository browsing will reveal email addresses, regardless of the solution we decide on here we will have author information point to d.o profiles on the equivalent to http://drupal.org/cvs .
Comment #11
mparker17This does make me feel better. Brief clarification though: when you talk about e-mails being revealed through direct browsing above, does this include browsing the web interface at http://git.drupal.org/ (which redirects to http://git.drupalcode.org/ at the time of writing) or would someone see e-mail addresses only if they were to
git clonesome part of the repository?Comment #12
mparker17Random, crazy, off-the-wall idea that might satisfy both sides of this discussion: what if username@git.drupal.org forwarded to the e-mail address in our user profile but also respected the hourly threshold of Drupal's per-user contact form (i.e.: "3 submits per hour" rule)?
Comment #13
eliza411 commentedTagging for consideration in git sprint 5
Comment #14
sdboyer commentedGonna wrap this up in sprint 5.
Comment #15
sdboyer commentedAfter some further reflection and a clearer plan on how versioncontrol will actually handle mapping raw git user data to d.o users (short version - we'll have a pluggable, extensible system for performing the mapping, so we can implement as many different mapping strategies as we want), I've come to a solution that should satisfy all camps, and not require a ton of coding. Let me start by running through the relevant considerations:
user.emaildata must be some sort of real, unique, verifiable identifier of a person. A proper email address is preferable, but not strictly necessary. What really matters is that we can take that raw data and map it to a d.o user account.user.email. And, where reasonably possible, we should adopt common practices of the wider world. More concretely: a d.o user profile link could never, ever resolve to a user account on github, but an email address can. It's very VERY important to understand that even if commits do contain real email addresses, at NO point will those emails EVER be visible over any d.o-based HTTP offerings. Spambots would have to clone a git repository and look at raw git data to get emails.With all that in mind, then, here's my plan:
As I said, I think this proposal is adequate and ready to roll. And if I don't hear any big, whiny objections to it in a couple days, I'm gonna mark this fixed and we're gonna roll into implementation :)
Comment #16
sdboyer commentedOh, and, sorry - username@git.drupal.org is not a good idea for a few reasons:
A profile link has none of these problems, and thus is vastly preferable.
Comment #17
webchickHm. I actually think we need to pick one single way and apply it to everyone. Otherwise, reading commit messages will get extremely hairy, as will displaying data from commit messages (if $user->email is a URL, print it this way, else if it's an email address, print it that way). Providing choices also means additional coding, additional documentation to explain the differences and pros/cons between the two, etc. Ick.
I guess I'm confused why we can't do the Multiple Email addresses thing, since Git natively works with email addresses, and no matter which e-mail is associated with a commit, replace it with a link to their user profile once commits are pushed. That protects contributors' e-mail addresses and ensures that the most important data about a person on d.o -- their d.o profile -- remains the primary way of identifying them.
Or do I miss a cluestick?
Comment #18
sdboyer commentedFirst question is whether you're talking about "reading commit messages" as in the output of
git log, or if you're talking about reading the commit messages that appear on d.o. WRT the output of git log, I actually tend to find the 'name' portion to be what I look at to quickly identify who made a commit, not the email, and I think that's what most people do. What people put for their name is entirely up to them (we won't be using it at all for the purpose of mapping), but I suspect most people will invest some effort in keeping it consistent.As for output on d.o, here's the cluestick :) vcapi keeps '
author_uid' and 'committer_uid' fields in its records of commits - foreign keys to{users}.uid. These fields contain the result of the mapping logic, which is pluggable & extensible - and therefore, can handle the variety of different possible types ofuser.emailstrings from git. Mapping logic is run and these fields are populated when the commit data is initially read in (and later re-run on cron for commits that failed to map to a known user). The only thing that Views ever looks at is these uid fields - the endpoint of the mapping logic. All Views ever has to do is turn a uid into a link to a profile.Comment #19
dwwChoice is inevitable here, since people can configure their Git clients however they want. Granted, we could make an arbitrary rule that says "only if your Git email address [sic] is really a link to your d.o user profile will we associate the commits with your d.o account", but a) that's not going to mean everyone's going to pay attention to the rule and b) it doesn't necessarily make it any easier to code this stuff (unless we go out of our way to build an inflexible hard-coded system from the start, which would be rather silly).
That said, a 1 week window is *way* too short to expect all CVS account holders to get the email, read it, understand the implications of the choices they have, and make their decision. For example, I could easily be offline for more than a week at a time over the next few months, and then I'd miss my chance. Assuming there are no major objections to the kind of plan Sam spelled out above, I think we need to move forward on the multiple addresses module and radio button ASAP to give existing CVS users more time to make this choice.
Cheers,
-Derek
Comment #20
miklI agree with Derek here. Getting this done could take a while, and I do think its necessary. The commits I make with Git get tagged with my real e-mail, and if those I've done with CVS don't get the same, it will be kinda confusing when the Git history is used elsewhere, like on Github.
Comment #21
webchick"Choice is inevitable here, since people can configure their Git clients however they want."
Sure, but they won't abuse users.email to put a d.o profile address in there unless we explicitly tell them to do this. And we shouldn't. We should just tell them to put in that whatever they'd have to put in there normally, and resolve it to a d.o user profile link when it's output on the site (which is what Sam says).
So why offer them a choice to abuse users.email for something it's not intended for, and would be totally bizarre and d.o-specific?
Comment #22
webchickI mean, if I'm concerned about people not knowing my email, I can always stick yourmom@mailinator.com in there. But it should still be an email address. The setting is called users.email. :)
I still feel like I'm missing something here, so I guess I'll request an after-talk on our call today.
Comment #23
webchickIn other words, here's how I would expect this to work:
Drupal.org would cross-reference the e-mail address associated with the commits with its "multiple mail" table, and return the user ID, which VCAPI stores. Then on commit message views, it does theme('username').
And I guess if you try and perform a push and the multiple email looker upper doesn't find your email, it kicks back an error and directs you to your user profile.
No choices. No documentation. No complicated explanations about identity. No weird timeboxing of having to make some kind of major decision. We just deal with it. And a person's username/d.o profile remains their primary means of identity on the site.
Comment #24
mikl#23: I think the choice is mainly concerned with what is to be done with all the CVS commits when converted to Git. Should they have user.email set? And if so, to what address?
Comment #25
marvil07 commentedThe diagram is pretty clear :-), thanks!
Yep, "Multiple email lookup thinger thing" is #979040: Make pluggable the process of mapping of raw vcs data to Drupal users.
The only problem I see is what are we supposed to show when author_uid or committer_uid is 0 (not mapped, this can happen if someone is not including mail/whatever-we-map-to). It is actually going to happen on #970244: Create a views handler to map operation author/committer to their drupal user. I mean, if we just show the plain VCS data for author/committer it would follow backend data, which means whatever VCS user have put on his/her git configuration.
Comment #26
webchickRe: #24, I think we just use Drupal.org's users.mail field. I don't quite see why we can't do this, if the following is true:
Or, if we want to be especially paranoid in preparation for said future smart spambots, we can simply make all incoming records map to http://drupal.org/user/XXXX as Sam said. Done and done.
I'm still not quite understanding why we need to offer users the choice on how to map their old data. It seems like picking the way we deal with legacy commits is a policy change firmly under our control. VCAPI doesn't care one way or the other, because it has the uid association, which is the only thing that matters in terms of associating "karma".
#25: IMO you stop this at the push level with some validation hooks that check for a condition where it can't resolve someone's users.email property to something in the multiple_mails table. We reject the push and tell them to add the email address XXX@XXX.XXX to their Drupal.org user account under the "Multiple mails" tab.
We probably also then need some UI validation so that we don't allow people to delete an e-mail address associated with their account if it's associated with one or more commit messages in VCAPI.
Comment #27
mikl#26: Well, I am all for just sticking people's e-mail in there, but I know that there are privacy concerns here.
As for validating e-mails on push, that would be troublesome, since you may want to push commits made by someone else, and not want to associate their e-mail with your d.o account.
Additionally, user.email will get set to your Unix username if user has not set his mail-account. So dude@MacBookPro.local or similar. Probably not something you want to associate with your d.o account either.
I think we should be as flexible as possible in this regard. Git is a new tool to people, and there is plenty of stuff to be confused about, without us adding additional complexity.
Comment #28
webchickAh, that's true about pulling in commits from other folks who may or may not have accounts on d.o. I hadn't thought about that.
What if we treat it like theme('username') does then? Store 0 as the uid in VCAPI, but reference whatever users.name is from Git when displaying:
"23c431a by webchick (not verified)"
I guess we need a cron job to periodically attempt to re-map these unmapped contact records then? Hm.
Comment #29
marvil07 commentedThat's the current behaviour ;-)
Sounds like a good idea, I mean, instead of printing the plain {versioncontrol_operations}.committer (or author) pass it for a function to extract the name(this is naturally only for git backend), so an analog of VersioncontrolBackend::formatRevisionIdentifier() for author/commiter should be fine(proved on views already on #976136: Let backends overwrite revision field output on views using backend class).
Comment #30
mikl#28: As for the (not verified) part, that may indicate that other commits with your name on them are verified to be yours. That is not so, unless you were to start GPG-signing your commits (that possiblity is actually one of Gits design parameters).
There's nothing stopping me from setting up my Git up to use “Angie Byron” and “angie@lolbots.com” for user.name and user.email and then having my Git actions on d.o show up as yours. That's the price of decentralisation, so we should probably not do anything to indicate that these data are completely reliable.
Comment #31
sdboyer commentedQuite literally exactly my plan :) Really, all of it. The handler, and the cron job.
Comment #32
sdboyer commented@webchick #898816-26: Consider using real names and/or e-mail addresses for the author/committer metadata?
Maybe I wasn't clear - people are making the choice about how their old commits get mapped, but this is ALSO a choice they'd be free to make going forward. If someone (like say mparker17 in #898816-2: Consider using real names and/or e-mail addresses for the author/committer metadata?!), doesn't ever want an email address to go floating anywhere - and that is really, REALLY well within their right - then they can choose to have their old CVS commits use the profile link, AND make all their new git commits use the profile link as well. It'll still work.
@webchick #898816-21: Consider using real names and/or e-mail addresses for the author/committer metadata?
Lemme put this in no uncertain terms: I believe that neither I, nor anyone else, have the right to begin take personal information (an email address) given to me with a particular set of expectations about where it is publicly displayed, then make the arbitrary choice to start displaying that information in some other public manner. I am bound, at least ethically and possibly legally, to consult them prior to playing with their privacy - and that consultation is what I'm proposing here.
If you accept all that, then it's just a question of what the most intuitive alternative is going to be - and I explained why I prefer the profile link in #898816-16: Consider using real names and/or e-mail addresses for the author/committer metadata?. And since we can't make it just work for legacy commits, the approach gets grandfathered in.
@dww #898816-19: Consider using real names and/or e-mail addresses for the author/committer metadata?
You're absolutely right, it was stupid to even put that there. I put it there because we were originally trying to make this happen at the same time as our public unveil, but there's really just no way in hell that'll happen. Really, the better plan would be to basically let people decide right up until (a few days before) launch.
@mikl
You're quite correct. Given that the only thing one can really do is give OTHER people credit for their own work, though, my thought on this has always been that it'd be more of a nuisance than a cause for real concern. However, if it does become a problem, we'll be able to consult the push history to figure out who's actually been putting in erroneous authorship information, and deal with the situation accordingly.
Comment #33
sdboyer commentedOn Friday's sprint wrap-up call, Angie convinced me/us that we'd be OK using the pseudo-email (e.g., sdboyer@no-reply.drupal.org). Of the points I raised in #898816-16: Consider using real names and/or e-mail addresses for the author/committer metadata?, the only true blocker is the possibility of user name changes, as it causes a potentially nasty data inconsistency issue. We can get around that one of two ways:
So we get to respect git standards, have consistent logs, and respect peoples' privacy. woot!
Comment #34
Josh The Geek commentedSee related #992802: Create a module to capture the migration preferences of current d.o CVS account holders.