Problem

  • Even with maximum allowed critical/major bugs, the time to release can get very long.

Goal

  • Tweak the issue queue thresholds over time as an attempt to shorten the code freeze.

Details

  • We introduced issue queue thresholds, but there's no guarantee that we won't have the maximum of issues for a long period of time.
  • In fact, we had a very long period at the end of D7 with only a few critical bugs, which were insanely hard to resolve.
  • The fix for one critical bug can easily introduce two new critical bugs.

Proposed solution

Pick a rough date for a final, hard code freeze after which only straight bug fixes are accepted. This is Month 0. If possible, release the first RC at this point. 4 months before that, we start a count down:

  1. Each week, both the critical bugs and tasks thresholds get reduced by 1. (i.e., after 15 weeks, both are zero)
  2. Before we hit zero, and as long as we're under thresholds, API clean-ups, additions, and minor features can still get in. Otherwise, only fixes for bugs.

    Any patch which is known to lead to new critical issues (e.g., adding a new API then converting stuff to it), cannot be accepted. (i.e., beta release after 15 weeks, due to no major changes)

  3. When hitting zero, no more API changes.
  4. After hitting zero critical bugs before the thresholds get to 0, then the RC period should be much shorter.

    If it's still way over when we get there, then this idea didn't work but it should not have been much worse than before.

Original report

by @catch

Have been thinking about this for a while. I haven't talked to Dries about it yet (although I've sent him a link to the piratepad version). Just to make it clear, this isn't "catch the Drupal 8 co-maintainer" posting the issue, it's "catch who lived through the Drupal 6 and Drupal 7 code freezes and just about survived".

Drupal 7 code freeze lasted 12 months if you include only freeze, 18 months if you include slush, and 24 months if you include the first 6 months or so after Drupal 7.0 where very little 8.x patches went in. However you count it, it's too long.

We have issue queue thresholds now, and they're mostly keeping the major/critical bug counts stable (and those queues triaged so it's possible to see what's in there). However at some point we need to get down from 15 critical bugs to 0. And even if there are always 15 critical bugs at any one time, it's possible to fix hundreds of them via throughput. So it's never going to be a case of fixing exactly 15 bugs then releasing, sadly.

It should hopefully be better than 300 -> 0 this time. But there's no guarantee that getting from 15-0 couldn't take a long time, and it is time that is usually not much fun, when much of the rest of core development is shut down.

So we could try something like this, is just a rough idea but anything would be better than last time.

First, pick a rough date for a final, hard code freeze date after which only straight bug fixes are accepted, and we release as soon as possible. This is Month 0. If possible, we'd put out an RC as soon as we hit it.

4 months before that, we start a count down.

Each week, both the critical bugs and tasks thresholds get reduced by 1. Meaning that after 15 weeks, both will hit zero.

While all thresholds are under, but before they hit zero, clean-ups, API additions and minor features can still get in. If thresholds are over, tough.

Any patch which is known to lead to new critical issues (for example adding a new API then converting stuff to it), is out of bounds unless thresholds are way, way under - since we will be actively trying to reduce the number of critical bugs/tasks each week, so knowingly taking them over that would actively go against this. This means we should be able to release a beta towards the end of the 15 weeks since no major changes should be able to land then.

When we hit zero (or if we're over thresholds the whole time), it means de-facto no API changes at all - since any API change means a change notification which is a critical task. As always we have to keep open exceptions when an API change is required for a major or critical bug fix, but those are fairly rare.

If we got to 0 critical bugs before the thresholds get to 0, then we could expect a much shorter RC period. If it's still way over when we get there, then this idea didn't work but it should not have been much worse than before.

Comments

moshe weitzman’s picture

Could you elaborate on why current thesholds are an inadequate solution? I can't parse " it's possible to fix hundreds of them via throughput."

My one worry is that the period before freeze is typically a tremendously productive period for Drupal. We often get high profile features like drap+drop on block admin,garland theme (picked some oldies here - sorry), etc. It would be a shame for all our users if RTBC was in feature shutdown during this period. It kinda muddies the effectiveness of a single deadline.

catch’s picture

Current thresholds keep us at 15 critical bugs.

Once we hit 'code freeze', then we have to get from 15 to zero by some point. During that time, people could open dozens of critical bugs, so we could end up needing to fix 150 critical bugs during code freeze, to actually get from 15 to 0.

For example webchick said that over 700 critical bugs were fixed during Drupal 7 code freeze, even though the total never went over 300.

So the idea is instead of a hard freeze, and trying to get from 15 to zero in one go, instead we have a soft freeze (similar to slush last time), but how slushy it is, depends on the state of the (ever decreasing) thresholds. That way, if we're doing really well, then we get to keep putting (small) features and API cleanup right up until beta/RC, if we're not, then we didn't lose anything because currently it would've been a hard freeze anyway at that point.

arianek’s picture

Huge +1 here.

I think this would go a long way to avoiding what went on the last year of D7's cycle and getting us launch ready efficiently while still allowing some changes in. It's almost like turning up the 'agile' meter from level 5 to 9 progressively before launch, so that by the time we want to be launch ready, we're fully agile - ie. always ready to do the full official release while still working on improvements.

This would also superbly improve on morale of those whose work doesn't qualify for the kinds of extensions that some features got in D7. It was so disheartening to have to stop working on small changes while huge changes continued for another year.

As far as the work that happens in the final crunch, it will just have to happen a bit earlier. And once people get used to working like this, the release cycles should also speed up a bit so it won't be such a panic that if the feature doesn't get in now it has to wait 3 more years. I could see us getting down to even 18 month cycles with a system like this.

Aside: Scope creeping the conversation slightly... [Edit: as I'm not sure where this should actually go, people were saying not to discuss it on the other thread.]

I was skimming this Debian stuff that damien_vancouver was posting about http://wiki.debian.org/DebianReleases#Introduction and it sounds like (despite being a fairly big change in how we think about our major versions), it could make a lot of sense. (Just skimming the intro will give you a quick idea of the system.)

So, the idea that damien_vancouver was talking about is when D8 is released and D6 becomes unsupported, D7 remains the "stable" release which most end users use in production, D8 becomes "testing" which more advanced companies and those who are contributing work with, and D9 becomes "unstable" most recent version. And everything trickles down in backports as usual. It seems like a much more accurate way of naming/thinking of things to what we actually do, how the contrib lag affects the broader community, etc.

arianek’s picture

Issue summary: View changes

Updated issue summary.

sun’s picture

Issue summary: View changes

Updated issue summary.

Crell’s picture

Debian is distributed as one big "thing", with all packages. They (sort of) have control over that repo.

Core devs have 0 control over when contrib devs update modules, or decide to rewrite modules, or decide to abandon modules. Any such plan is predicated on the idea that holding back core development for a period will make contrib development move faster. There is no empirical evidence for that claim whatsoever as far as I am aware.

arianek’s picture

@crell I don't think the point was that contrib development would go *faster*. I think the point was that it would make it clearer, and easier for end users to know what version to use (partly depending on what level they're at). So.... are you totally against the idea of thinking of the major versions like described in #3? (And if so, would love to hear why, as well as your thoughts on how this might relate back to how we hone in on releasability.)

webchick’s picture

I think the argument against it is that until the newest version is stable, clients don't have a reason to request it, so developers don't have anyone paying them to update modules, so modules only get ported on a "when I get around to it" schedule. Which is basically the same situation that we have each release during code freeze, when APIs are stable (more or less).

In terms of empirical evidence to support this view, see http://webchick.net/node/89. Both Drupal 6 and Drupal 7 (which actually did have an entire year of code freeze) repeated the pattern of "when I get around to it" accounting for a small handful of module porting before release, versus 6 months post-release more than doubling that number. Many of those ported modules come from client work, from both large and small sites.

We also know from http://drupal.org/node/1333898#comment-5301952 that the community (including contributed module authors) doesn't give a flying hoot about Drupal N+1 until it's stable (N.0). The installs of Drupal 7 shot from around 6500 to around 25,000 *overnight*. So in order to ensure you ever get out of a deadlock situation wrt contrib, you *have* to announce the stable version, promote the crap out of it, and get people moving to it so "trickle down" occurs.

Based on that data, releasing 8.0, then calling it "unstable" for a year, is likely only going to mean that we reach that "plateau of productivity" two years after release instead of one.

(This whole discussion doesn't really seem on-topic for the discussion of how we should handle code freeze, however. :))

sun’s picture

Contrib porting has nothing do with this issue.

(But for the record, I need to add that it was wrong to declare a plateau of productivity, whereas only having ~66% of contrib modules ported. That said: Wrong issue, wrong topic, wrong discussion. To explore this topic further, please create a new issue and link to it from here. Thanks!)

This issue is about making sense of issue queue thresholds during the last release cycle phases/periods.

catch’s picture

The only way this relates to the release itself, is that if it went really, really, well we might be able to have a shorter beta period, and a longer rc period.

Drupal 6 had a critical security bug fixed on the morning of the day it was released. Thanks git log, here it is #221072: Missing access_control on theme configuration.

Drupal 7 had a critical bug opened less than 36 hours after release #1017672: D6 to D7 update process permanently deletes comment bodies and other data, and throws fatal SQL errors, one that was actually a known issue in August 2010 but which had been forgotten about as it was the final limp towards a beta and just needed a Drupal 6 backport to get fixed #895014: All fields of a node type are lost on module disable. Now that Drupal 7 criticals will block a Drupal 8 release, we can't make that particular mistake again fortunately.

Now there's always going to be something that gets caught just after 8.0 release time when the extra 20,000 people decide to try it out, but if we're able to not have such a horrible time getting to release candidate, then we might be able to survive sitting at that stage for a bit longer.

catch’s picture

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

Anonymous’s picture

Issue summary: View changes

Updated issue summary.