Spin-off from #1321540-83: Convert DBTNG to namespaces; separate Drupal bits
Problem
- Ugly Drupal core commit log/history after merging in a larger complex branch from a sandbox.
Goal
- Actively support and endorse merging of larger/complex changes from core sandboxes.
- Retain a clean commit log/history for Drupal core.
- Define a standard for merging in branches from core sandboxes.
Details
- The commit log/history looks very awful after merging in #1321540: Convert DBTNG to namespaces; separate Drupal bits. Drupal core's commit history makes little sense due to this merge now. It's close to impossible to determine and review the individual commits that belong to this change(set).
- The common practice is to prepare a to-be-merged branch in a separate merge branch upfront.
Read a high-level description of this in Rebasing and merging: some git best practices and the often referenced Linus Torvalds' direction on clean history. - That dedicated merge branch is rebased onto the latest core branch. Hence,
- all change commits are known to be based on the latest code (which may be the case here now, but we factually don't know)
- the git commit log/history is clean; i.e.:
- all commits from the merged branch cleanly appear after each other
- no upstream merge commits from the sandbox
- Additionally, pull request branch maintainers are asked to clean up the merge branch during the rebase; e.g.,
- combine multiple related commits into one (avoiding plenty of
"Add more/fix use statements"
commits) - squash other needlessly verbose commits
That is, because such commits are completely stupid and pointless with regard to the long-term history. Simply have a quick look at node.module's history now — next to a plethora of senseless upstream merge commits of the sandbox, you also see commit messages like
"Add more use statements and class renaming."
, which are completely missing context and thus don't make sense in terms of Node module's history. - combine multiple related commits into one (avoiding plenty of
- If there's any need for retaining the original branch for downstream repos/branches, then that branch remains to be unaffected.
We should really follow that practice of prepared merge branches, because, in fact, it made more sense to review the patch file in the issue, which ultimately contradicts the merge of the branch (and thus, it would have made more sense to commit the patch instead). A properly prepared merge branch avoids all of these issues.
Comments
Comment #1
xjmClosed #1462712: Discuss best practices for managing core merges as duplicate. Here's what I said over there:
Rebasing the main sandbox/merge branch is probably not a good idea because it alters the actual history of the branch development.
However, if we are going to do more of these merges in the future, here's the workflow I'd suggest:
This would also apply to individual developers making a self-contained change. The change might include multiple commits in that developer's branch, but the developer would squash it into a single commit when merging back into the main sandbox branch.
Comment #2
sunNo, the rebase changes the history of the (temporary) branch to be merged upstream only. The original development branch(es) can happily remain as is.
To roughly clarify what to do, once the sandbox branch is ready:
After that,
bigchange
still contains the unmodified original development branch containing the entirety of the history, including upstream merge commits, individual commits, and everything else. In short: Everything inbigchange
remains as-is.However,
bigchange-merge
is completely rebased against latest HEAD of Drupal core, and can be further cleaned up and modified as needed, before the branch is merged into core.After
bigchange-merge
has been merged into core, if needed,bigchange
can simply merge in upstream again to take over the new history. Its actual content/history doesn't matter at all, because if there's any other dependentotherbigchange
topic branch in the sandbox that will be merged into core, then that will be equally branched and rebased into anotherbigchange-merge
, too.Comment #3
xjmI'm still not sure I agree about rebasing the main branch against 8.x before the merge. What about commits that are in response to changes in core? If you rebase, suddenly those commits are out of context.
Comment #4
Crell CreditAttribution: Crell commentedOne huge squash merge also offers zero advantages over just having a patch. You get no history, no context, no nothin'. The history looks no different than having a patch file, so we gain nothing. That doesn't mean we shouldn't rebase out some commits, but we should not destroy the entire history.
We should also consider that making a "tidy" merge branch and then merging it is extra steps for the branch/patch submitter and committer need to synchronize on. That could easily add an extra week to the time-to-commit of an issue as that gets sorted out. That harms development velocity.
I will also note that the "log doesn't make sense" issue is only true if you're using a linear log view. If you're using a branch-aware log view (gitk, gitX for Mac, or just git log --graph), you get a more accurate representation of the history. Really, a linear log for a non-trivial git project makes no sense in the first place unless you're just looking at a small part of recent history.
Comment #5
xjmTo clarify, I did not mean "one huge squash merge"--I meant many small squash merges into the main sandbox branch, and then a single full merge from that branch into core HEAD.
Edit: E.g.:
Get squashed into:
Which gets merged into:
I should note this is similar to the workflow we use at work, so I'm partial. :)
Edit: Also, it sounds like Crell is thinking this is something that would be done at the end of development. I'd disagree with any proposal along those lines. I am saying a small-scale part of the daily workflow that involves bundling fixing one's whitespace and typos in something with the work on that thing before one foists that commit off on other devs. :)
Comment #6
neclimdulrebasing destroy's history. end of story. dates are changed, order is rearranged, history is changed. If you're on a private branch and preparing a patch or even a series of patches this destruction is sometimes acceptable. With large changes it is not acceotabke, this is how weird regressions happen.
While a straight chronological view can get a /little/ weird with merges, they are actually more clear because they show when actual development was happening and tools like gitk, giggle and the others crell mentioned can help you inspect when, where and why regressions happened which would be completely impossible with a rebased branch.
As far as having 2 branches for changes, this doesn't actually help it, if anything it makes it worse because there's no way to actually tell that to resolve some commits' history you actually need to look on a different branch to get the full history. As I said before, history is lost.
edit: fix quotes
Comment #7
gddI pretty much completely agree with everything Crell said. I'm not necessarily against rebasing out some pointless commits, but on the other hand I do think it is a little bit of a waste of time on the part of the maintainers managing those sandboxes. Git inherently lends itself to small atomic commits, and I'm going to be less likely to do them if I know I'm going to have to go through and clean them all out again later. This doesn't mean that perhaps we shouldn't all become a little more cognizant of our commit messages and cleanup commits (I haven't really been giving that much thought to be honest.) I completely agree that one squash-merge is a total waste. Note that at the very beginning of the initiative process, I opened this topic up for discussion here:
http://groups.drupal.org/node/148184
A lot of that discussion is still directly relevant.
I plan on having CMI merged into core. The 20-odd contributors to the branch deserve full credit and I want to see them get it.
Comment #8
Damien Tournoud CreditAttribution: Damien Tournoud commentedWhat we need to learn here is how to build patchsets.
The history of the trial and error of the developer working on a patch has absolutely zero value. On the other hand, splitting a big change into separate smaller changes that make sense in isolation and can be reviewed separately has a lot of value.
In that particular case, I don't see how a proper patchset would have been done (it's basically moving a lot of code around), but at the minimum the merge history of the development branch should have been removed:
Comment #9
neclimdulI have a hard time disagreeing with anything Damien said. We definitely as a community don't understand managing development branches or auto merge commits. Education and probably some helpful process guidelines for getting a clean merge to core are inline. Maybe we're all arguing the same point and disagreeing on the semantics. Thanks Damien.
Comment #9.0
Damien Tournoud CreditAttribution: Damien Tournoud commentedUpdated issue summary.
Comment #10
cyberswat CreditAttribution: cyberswat commentedI use the --no-ff flag to accomplish something similar to this type of grouping. It gives us the ability to quickly evaluate a group of merges for a feature and keep them grouped together so they can be targeted efficiently. Each developer uses a regular merge as work is happening with the release manager using a merge --no-ff. This has proven to be invaluable where I work.
Comment #11
webchickWe've merged from sandboxes a few times now, and no one's bumped this issue in almost a year, I'm going to go out on a limb and say it's not "major" anymore.
It would be good to get this wrapped up one way or another, though.
Comment #12
webchickAlso, feels like documentation to me.
Comment #13
neclimdulSeems realistic though I'd like to see more interest in things like this. I feel like passing up and ignoring things like this leads to things like this https://twitter.com/Dries/status/288682798636756993 not happening.
disclaimer: I'm not in a position to really complain as I'm in as much of a time crunch for community efforts as anyone else I just wanted to voice my opinion on the importance.
Comment #14
marvil07 CreditAttribution: marvil07 commentedBacklink to #1583840: Define conventions around drupal core git interaction, which consider merges and also other aspects on git interaction.
Comment #14.0
marvil07 CreditAttribution: marvil07 commentedUpdated issue summary.
Comment #15
mgiffordOk, so what are the todo's to get this done? This has been inactive for almost a year.