Disclaimer: I really appreciate all the work drupal core developers do. This is about suggesting ideas for the benefit of all.

Problem/Motivation

Since we moved to git, we have a lot more flexibility about how to accept changes for the project. Originally, on CVS, each maintainer was responsible for commit messages, merge strategies and tagging.
Now that we are using git, those aspects have changed, and now the responsibility is sometimes shared by authors.

Workflow

With Git, we can decide on how to include the changes into branches(instead of just use commit on CVS).
This is related to the workflow(s) we are using. Currently, we seem to be using two workflows:

  • one for normal issues: Centralized, applying the patches manually(avoiding git-am when possible, see the attribution section too).
  • and one for initiatives: Dictator and Lieutenants, merging from public initiative sandboxes.

It's OK to use more than one workflow, but we should use the same conventions for commit messages and merge strategy levels.

Merge strategy

About the centralized workflow: it does not make sense to have merge commits(e.g. f356dc2), because there is no really original information in commits.
These merges are probably auto-generated when pull'ing before push, and in that case it makes sense to just rebase the changes(e.g. do git fetch, git rebase origin/8.x). In general it is a bad idea to make merges without reason.

About the lieutenants workflow:

  • It does make sense to have merge commits from initiative sandboxes to upstream.
  • Initiative sandboxes should not merge from upstream unless needed(code added upstream needed for the initiative), see references for details.
  • Initiative owners should ensure they receive a clean history, see references for details.

References:

Commit messages

Historically, the drupal project has enforced coding standards, and I remember a lot of issues being marked to needs work because of empty spaces, or typos on documentation.
Also, a lot of the history of drupal core code embraces the commit messages handbook page.

So, it seems like as a community, we support the idea of paying attention to detail.

But, recently, mainly from initiative merges, we have included a lot of commits that do not conform with the commit messages handbook page.
It's not only about the attribution and issue linking, but instead about the contents of the commits, they include trivial changes such as 4a8adb2(those should be kept private when possible). Think i.e. is that commit really one that should be at the side of 0275068 where attribution is shared?

I have also noticed that some commit messages start with "- #nid" instead of "Issue #nid", I guess that's an old version of dreditor on a pusher maintainer?

So, we need to decide if we are going to use commit messages handbook page for all commits, or we just drop it in favor of any other convention(s). Please notice this includes commits merged from initiative sandboxes.

Attribution

On drupal core issue queue, we usually have many people participating on one issue, so we usually want to make attribution for many people on one commit.

On CVS, we were forced to include attribution on the commit message, because there was no VCS supported way to do it.
Now with Git, we can use real attribution, but sadly only for one author and one commiter.

Some projects like the linux kernel or git itself, uses special strings at the end of the commit message for that propose, maybe we should consider using the same.
sdboyer also propossed another way using git notes.

Any solution we choose for this, we need to be consistent for every commit added upstream(also all commits coming form initiative merges).

See #2323715: [policy, no patch] Determine format for commit credit for individuals/organizations/customers for a possible implementation.

Tags

Git provides two types of tags: simple and annotated, the last one can also be gpg-signed.

Currently we are using simple tags.
Annotated gpg-signed tags are usually the recommended for real release tags, mainly for verification.

Should we start using gpg-signed annotated tags?

Proposed resolution

(description of the proposed solution, the rationale behind it, and workarounds for people who cannot use the patch)

Discuss about the three topics to agree on conventions.

Remaining tasks

  • Receive feedback about the three points
  • Agree on conventions
  • Public conventions as a handbook page on documentation.

Comments

dww’s picture

I think this is a really important issue, although I can't bump it to major in good consciousness. Given how completely obsessed we are with the fine details in every aspect of the code (including comments), it shocks me how sloppy and apparently unconcerned we are about the history of the changes. To me, this is immensely valuable information. When debugging a problem, git blame and git bisect are two of your best friends, but unless the history is clean and coherent, these tools are much less useful, and their output doesn't necessarily bring you to the right issue where the decisions that led to the buggy code were actually made. After spending hours, sometimes 100s of them, on a given change, we leave it up to the over-worked core maintainers to construct the right commit message, which they often do totally improv and rushed, they instantly commit / push with 0 input/review from the community, and then it's locked that way forever. (Note: I know webchick often asks in IRC for help on the commit message, but it's inconsistent and she doesn't always have time for that nor get the help she deserves).

As marvil07 points out, this occasionally sloppy history is compounded many times via merging from initiative sandboxes, where there's even less attention to the detail of the history itself (and divergent standards/practices for maintaining the history).

As I wrote almost exactly 1 year ago on g.d.o I think it *must* be the responsibility of sandbox maintainers and/or initiative leads to ensure clean proper history that can be safely merged into core. We can't possibly expect the core maintainers themselves to clean up the messes in the sandboxes.

In terms of the commit message itself, a very simple improvement might be to include a section in the recommended issue summary template where everyone participating in an issue can be constructing a proper, well-formatted, clear, concise commit message to accompany the issue. That would be useful both for our patch workflow and issues that involve merging commits from a sandbox. We should let the patch authors and issue reviewers/participates help draft and refine the commit messages -- ensures some peer review, should hopefully save the core maintainers time, and help improve the overall quality and consistency of the commits. Of course, the sandboxes themselves also need good commit messages (and the right granularity of commits) for those merges not to introduce crap into the upstream history. But even still, considering the commit message part of what has to get written for a real issue (in addition to the code, comments, potentially change notice, etc) seems like a win all around.

I know some people advocate the "commit early, commit often, even trivial small things, and don't waste your time with issues for every change" approach. Okay, fine. Do that on your local clone. You can even push that dreck to your sandbox if you must. But don't ask a core maintainer to merge it into core. Make a new branch where you consolidate "fixed stuff", "fixed more stuff", "got it working", "sort of", "whoops", and "try this!" into a single "Issue #1234567: Implemented the frobulator." commit that actually tells someone looking at git blame output what happened, why, and where to read the details. If you say "but the detailed history is really important" then don't delete your sandbox once it's merged and reference the "raw" branch where all your flailing was happening in your clean commit/merge message. But don't force me to wade through dozens of your incoherent changes unless I actually need to crawl that deeply into your brain to figure out WTF you were thinking. ;)

Attribution is a harder problem since Git natively only supports a single author and committer. I don't think the Linux approach of "signed-off-by" conventions at the bottom of the commits is the way to go, nor the "Issue #X by Y, Z" we do now. In both cases, you can't fix it after the fact. Our convention is particularly problematic since it eats up valuable space in what should be the very short 1-line (e.g. ~50 chars) summary of what changed. I think sdboyer's proposal for git notes is the only sane way to actually do attribution right given our situation. However, I honestly think that deserves to be spun out into a separate issue (which can be linked from here) since it's a hefty conversation to have on its own.

Bottom line: the history of when/why/how we're changing our code is at least as valuable as the latest version of the code itself, and yet as a community, we rarely value this history and give it much attention at all. I strongly support changing this and fostering a culture where clean history is nearly as important as clean code. The good news is it's a *lot* easier to keep the history clean than the code itself, so long as people understand the tools we're using. ;) And if we can make any changes to our tools to make it easier, I'm ready to help.

Thanks,
-Derek

catch’s picture

In terms of branch merges, this is a duplicate of #1462472: [policy] Define standard for merging in branches from core sandboxes.

On the centralized workflow, it does not make sense to have merge commits(e.g. f356dc2), because there is not really original information on the commits.
This merges are probably auto-generated when pull'ing before push, and in that case it make sense to just rebase the changes(e.g. do git fetch, git rebase origin/8.x). In general it is a bad idea to make merges without reasons.

I think this one is "remind Dries to do git pull before committing patches".

Some projects like the linux kernel or git itself, uses special strings at the end of the commit message for that propose, maybe we should consider using the same.
sdboyer also propossed another way using git notes.

I'd quite like to move to something like this, and completely agree we should use one method of attribution and stick to it (at least for things like contributor stats), and that author of commit messages won't work for us.

jhodgdon’s picture

Component: documentation » other
Issue tags: +coding standards

Although this is not really "coding" standards, it is a process standard... and it's not documentation. So tagging to trigger that people who are interested in standards and processes can find this issue, and changing component.

Crell’s picture

A couple of thoughts in no particular order:

Core committers should get in the habit of doing a pull --rebase. That does a pull but a rebase of the local commits instead of a merge, resulting in the same end result as if you had pulled first before committing anything locally. This is a "safe" use of rebase.

I largely agree with Linus's stance, but not entirely. There are two issues.

One, it assumes that I'm working in a vacuum. On almost anything interesting I am pushing my branch regularly to a sandbox of some sort, for simple backup purposes if nothing else. But usually, I want to collaborate with others. Collaborating with others means rebasing becomes a very bad idea. In fact, rebasing a branch and then pushing the rebased branch is something that d.o actively prevents you from doing. There are ways around it, but they're highly annoying. We may need to give up on that "safety check".

Two is the definition of "when necessary". The kernel branch for WSCCI, for instance, has a LOT of merges from upstream; more than I want, really. Generally they happened for one of two reasons: Some code changed upstream that I needed (affecting something in the kernel's line of site), or because I want to roll a patch for testbot. The latter is an artifact of the problem that there is really only one viable way to test a change to Drupal, and that is to post a head-compatible patch to an issue and let testbot handle it. That requires merging from upstream first to generate a viable patch at all. While that could be done in a temporary side branch, that is a bigger hassle and also can lead to misleading results; very often there are merges that need to be manually resolved when merging from upstream, which in turn change the very code I'm testing. That manual merge really should be done in the mainline kernel branch, not in a throw-away branch, because we'll just have to redo it anyway.

So for initiatve and other sandbox work, making testbot work for arbitrary branches in sandboxes is a prerequisite for not abusing Git merges. Given how extensive the changes are that we're talking about in some cases, I simply cannot maintain a tidy history and get validation that we haven't just broken the entire system.

For downstream branches, I have been strongly encouraging people to use feature/issue branches in the WSCCI sandbox that I can merge. That way Git tracks their credit for me. I am, frankly, not all that concerned about commit counts so "does this change deserve 2 commit objects or 3" is not something I lose sleep over. Lately I have been rebasing feature branches, then ff-merging them into my mainline, then deleting the feature branch. I also sometimes remove or squash intermediary commits if I subjectively think they make sense. (Some developers would commit something, then commit a revert, then do it a different way. I'd generally rip out both the original commit and the revert.) That keeps the history reasonably linear aside from the upstream syncing that I cannot avoid, as above.

If we can figure out a way to handle the upstream syncing better than we do now (which I agree entirely results in a messier history than we want, but I cannot avoid right now), we'll be in a much better place.

marvil07’s picture

Adding a new tag because of the attribution part of this issue.

marvil07’s picture

marvil07’s picture

Issue summary: View changes

Fixed language-proper typos up to half of the article.

marvil07’s picture

Link to possible implementation ticket for attribution.

Version: 8.0.x-dev » 8.1.x-dev

Drupal 8.0.6 was released on April 6 and is the final bugfix release for the Drupal 8.0.x series. Drupal 8.0.x will not receive any further development aside from security fixes. Drupal 8.1.0-rc1 is now available and sites should prepare to update to 8.1.0.

Bug reports should be targeted against the 8.1.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.2.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.1.x-dev » 8.2.x-dev

Drupal 8.1.9 was released on September 7 and is the final bugfix release for the Drupal 8.1.x series. Drupal 8.1.x will not receive any further development aside from security fixes. Drupal 8.2.0-rc1 is now available and sites should prepare to upgrade to 8.2.0.

Bug reports should be targeted against the 8.2.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.3.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.2.x-dev » 8.3.x-dev

Drupal 8.2.6 was released on February 1, 2017 and is the final full bugfix release for the Drupal 8.2.x series. Drupal 8.2.x will not receive any further development aside from critical and security fixes. Sites should prepare to update to 8.3.0 on April 5, 2017. (Drupal 8.3.0-alpha1 is available for testing.)

Bug reports should be targeted against the 8.3.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.4.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.