Problem/Motivation

There is more discussion about this in Drupal Slack, #contribute. The links to those threads are in #90.

We have broadly three types of issues committed against core:

1. Features, introducing new functionality.
2. Tasks, changing or refactoring existing functionality
3. Bug fixes, fixing a bug with existing functionality, usually but not always without significant changes to behaviour.

For features we always require test coverage, although in general the bigger the feature being added, the harder it can be to determine whether the test coverage is sufficient.

For tasks the amount of new or changed test coverage depends on the kind of change being made. However refactoring that doesn't break tests or requires only minor changes to tests is generally assumed to have coverage.

For bug fixes we assume that if there's a bug, it must always be possible to write a test that fails without the bug fix being applied, so more or less every bug fix requires an accompanying test - there are rare occasions where multiple issues are being worked on at once, and test coverage is added in dependent issues.

While it's not the intention, this means that the test requirements for fixing a bug, are more strict than they are for refactoring or adding new features.

The types of bug we have are obviously very disparate, from single character typos in exception message placeholders to upgrade path data loss and fatal errors.

A test written for a specific bug fix will ensure it doesn't regress over time, and if the code path was otherwise completely untested it may prevent other regressions in the future, but in general it's less likely that a bug fix causes a regression than a task or new feature.

Bug fixes are sometimes complex issues where multiple code paths need testing, but sometimes they're straightforward developer mistakes that weren't caught in review, and we'd never have thought to have added explicit test coverage if they'd been spotted during patch review instead of after commit. See case studies below.

Bug fixes are also often the place where new or irregular core contributors interact with the core queue most, because they're contributing back fixes from client sites (where by definition they're manually testing the bugfixes they contribute, and will often have them applied via composer.patches.json). In general there is a high incentive to make improvements to the bug fix itself, but often neither knowledge nor incentive to write comprehensive test coverage - if you have the patch in composer.patches.json it's already fixed for you and apart from having to re-roll every year or so that might be the last time you think about it. Even if you do 8 re-rolls of the same 1-5 line bugfix, it will take you less time than 8 hours writing an rewriting a 50-line test and responding to reviews - and you might get reviews after you've already moved on from the project that needed the bug fix.

On top of this, it can be very hard to know where to add test coverage in core due to the sheer number of tests that we have, and unclear responsibilities and boundaries for when to add new assertions to an existing test, new methods on an existing test class, new test classes altogether etc. This represents a high barrier to contribution where there is an easy opt-out and quite significant consequences to the rest of the community for those issues not being fixed (possibly not for the specific issue, but in aggregate across hundreds of small bugfixes).

The is a further consequence that often the tests we require for small bugfixes aren't well scoped at all - they're one-off tests for very specific developer mistakes or simulating very specific contrib or custom code path interaction edge cases, not well structured generic tests of functionality (like the JSON:API/REST coverage) or high-impact coverage like upgrade path tests. By being extremely specific, they don't necessarily provide good coverage of the area they're testing, or they might be very tied to implementation and result in maintenance burden over time. This often leaves us with a choice of committing poor quality tests to get the bug fix applied, or ever-expanding the scope of a tiny issue to introduce better test coverage it was never the issue's fault didn't exist in the first place.

And with phpstan, test coverage is sometimes not the best cure anyway - we can look for general types of developer mistakes across core and contrib, vs. testing one specific case of it we already fixed. This is only just starting, but it gives us better options than we had in 2010.

Case study 1

#3223302: Changed Method label() in Taxonomy Terms results in Field/Name Changes is a still open issue, with a one-line bug fix, and ~100 lines of test coverage. The bug was opened in July 2021, a merge request was opened in November 2021, it was marked RTBC in January 2022, it was marked needs work in April 2022 for one change to the bugfix, and to add test coverage, tests were added, it was then marked needs work again for documentation issues with the test coverage.

The test added ensures that the specific developer mistake (calling ::label() in a getEntityLabelFieldValue() method) won't happen again for taxonomy terms. However, it does not add test coverage for other entity types in core - this would be out of scope for that issue, even though they're equally prone to a similar regression being introduced. The bug can't be reproduced with just core, you need contrib or custom code altering the term label to run into it.

Further, it's not even clear that test coverage is the best way to stop this happening again, the most likely cause would be a new entity type copying and pasting this pattern (in contrib or core) and making the same mistake, and for that phpstan seems more appropriate.

So to fix a one-line bug, we're adding a 100-line test that doesn't address the issue for other entity types or stop new entity types making the same mistake, and the bug still isn't fixed in any version of core.

My suggestion in this case would be to open a new issue 'Document and/or add test coverage/code style rules to ensure that entity label field getters don't call ::label()), and just commit the one-liner, but our current policy works against that kind of solution because it would be committing the bugfix 'without tests'.

Case study 2

#2561639: Wrong placeholder for exception shown when new fields are created during module installation fixes a typo in an exception message resulting in a placeholder not being replaced. It was opened in September 2015 with a patch. Marked needs work for tests in November 2015. Tests were written in the same month. It has gone between needs work, needs review, and RTBC for seven years due to various issues with the test coverage. It was eventually closed as a duplicate because the issue was fixed in #3258969: Wrong argument for @message in ModuleInstaller::install call to watchdog_exception, and committed without tests, meaning that all the effort on the original issue was wasted. And of course in the meantime the bug was still occurring.

Heuristics

One risk of adopting this approach is it increases the likelihood of discussions on issues about whether they need test coverage or not, which has the potential to delay commits and burn people out too. To try to prevent this, we can start with a reasonably strict set of heuristics to apply to issues, unless the answer to the majority of these is 'yes' or there is some other specific reason not to include test coverage, we'd require tests by default. see below in the proposed text.

Original report:
Writing tests is a good approach. It helps to reduce the number of bugs as we add new features. However for patches that do not bring any new functionality but fix bugs in existing features requiring tests is a destructive policy because it slows down the process of fixing bugs in Drupal core.
It is a quite typical case, when a patch for a bug is submitted and even reviewed but then it gets declined by core committer because of lack of tests. After this the issue may get stuck for a very long time (months/years). And there is a great chance the patch will never be commited. Furthermore it also happens that a fix that was already commited is reverted because of this testing policy. Which is kind of crazy because reverting a commit that fixes a bug is like deliberately commiting this bug.

Another point here is that writing tests may require different level of skills and area of expertise than fixing a bug. Drupal testing sub-system is rather complicated. At this moment it consists of 5 (?) different types of tests. On the other hand the patches for many bugs are trivial and could be processed in novice issues.

Proposed resolution

Documentation changes

1. Upload a test case that fails

Current text

Details

Bug fixes should be accompanied by changes to a test (either modifying an existing test case or adding a new one) that demonstrate the bug. This can be shown by uploading a patch with only the test changes, which Drupal.org's automated testing system should show as a fail, alongside a patch containing both the tests and the bug fix itself. If a bug fix is not accompanied by tests, there should be an explanation in the issue for why no tests are needed (usually only for trivial bugs where testing would be convoluted) before it is marked RTBC. Patches that need tests written can have the Needs tests [637 issues] tag added. If the issue might affect specific environments differently (such as a change affecting how database queries are built), the test-only patch should be tested against all supported environments in the automated testing infrastructure.

Proposed text

Details

Bug fixes should be accompanied by changes to a test (either modifying an existing test case or adding a new one) that demonstrate the bug. This can be shown by running the 'tests only' job in the gitlab merge request pipeline interface. If a bug fix is not accompanied by tests, there should be an explanation in the issue for why no tests are needed (usually only for trivial bugs where testing would be convoluted) before it is marked RTBC. Patches that need tests written can have the Needs tests [637 issues] tag added. If the issue might affect specific environments differently (such as a change affecting how database queries are built), the test-only patch should be tested against all supported environments in the automated testing infrastructure.
Add the following to the Upload a test case that fails in the details section, below the existing paragraph.

An automated test for a bug fix may not be needed if the following are true and if the majority of answers to the questions below is 'Yes'.

  1. The issue has clear ‘Steps to reproduce’ in the Issue Summary.
  2. The fix is 'trivial' with small, easy to understand changes.
Questions
  1. Is the fix is easy to verify by manual testing?
  2. Is the fix in self-contained/@internal code where we expect minimal interaction with contrib? Examples are plugins, controllers etc.
  3. Is the fix achieved without adding new, untested, code paths?
  4. Is an explicit 'regression' test needed?
  5. Is it easy for someone who did not work on the original bug report to add the test coverage in a followup issue?
  6. Does the issue expose a general lack of test coverage for the specific subsystem? If so, is it better to add generic test coverage for that subsystem in a separate issue?
  7. If this fix is committed without test coverage but then later regresses, is the impact likely to be minimal or at least no worse than leaving the bug unfixed?

Tests may still be required at the discretion of core committers.

Remaining tasks

framework manager review

Done. Update Testing core gate
Done. Create a 'Needs no tests' tag that was suggested in #93, #104.
Done. Follow up to document when an update path test is needed See #100. Followup #3438786: [policy] Document when update tests are needed.
Done. A followup has been made for the suggestions about testing #3372120: [policy, no patch] Improving the test suite

Comments

Chi created an issue. See original summary.

larowlan’s picture

I'm a solid -1 on this proposal.

Tests are your insurance policy.

I don't want to commit untested code.

The issue summary claims requiring tests slows down progress. I would claim the converse is true. So many Drupal 6 patches (even in the security queue) languished because we couldn't be sure whether the patch broke something else.

Tests are an indication of quality and are part of Drupal's value proposition. Without them, that prestige is lost.

alexpott’s picture

I recognise the problem the issue talks about. I have seen issues get stuck at needing tests. But I disagree with the solution. For one thing it really shouldn't just a core committer's job to discern that there is ample test coverage for a bugfix. It really should be part of the process of marking something as "reviewed and tested by the community". Recently I've added the following template to the committer list of response templates to try to give some resources to new contributors:

In order to commit a bug fix we need an automated to test to prove that we've fixed the bug and ensure that we don't break it again in the future. For more information about writing tests in Drupal 8 see the following links:

  1. https://www.drupal.org/docs/8/testing
  2. https://api.drupal.org/api/drupal/core%21core.api.php/group/testing/8.5.x

For me, at the end of the day unless we have a test of some sort there is no proof that the bug is actually fixed or will stay fixed.

chi’s picture

Tests are an indication of quality and are part of Drupal's value proposition.

It's too formal. I think the main indication of quality should be number of bugs. At the moment we've got 800+ bug reports tagged with "Needs tests".
https://www.drupal.org/project/issues/search?projects=Drupal+core&projec...
A significant number of them have patches.

To comply with this value proposition we have to keep all these bugs in core and make people involved into reporting/fixing/reviewing these issues frustrated.

I'd like to emphasize that this issue is only about bug reports. The sole purpose of tests supplied with bug fix is to prevent a regression. It's nice to have it but this shouldn't lock the patch for a bug from getting commited.

dawehner’s picture

I think fixing bugs has a underestimated effort, especially compared to new features. Bugs is something every site is potential affected by. Things which are "normal" tagged here can be quite major for sites, and major issues can easily end up being truely critical. Then you multiply this effect by 200.000 and you realize that bug fixes are huge!

The point you made though that we should get these bugs fixed quicker is not a bad one.

I think not writing tests though isn't the solution for getting these bug fixes in quicker. Both reviewers and committers needs to have confidence that the problem was actually understood properly. Reading the test can dramatically help with that.

The question though now is: How can make it easier to write those tests?

  • Are there any problems with writing them?
  • Can we add some additional motivation for random people to pick those tests up?
  • Can we do something like a bug fixing initiative, trying to get people involved?
chi’s picture

@dawehner, this is connected to other global d.o. problems, lack of reviewers and terrible issue management.

catch’s picture

Another way to look at this, is that when code has bugs, this means it was originally added without test coverage.

There are multiple reasons for this:

- the code was added before the test suite
- independently tested pieces of code don't work well together (i.e. missing functional tests)
- code was added via refactoring or feature development without adequate coverage.

The last one we could potentially improve on by measuring code coverage a bit more. As well as showing where code isn't tested when it's new, we could also identify untested code in core and open test-only issues to add coverage. Although code coverage tools don't measure quality of testing so there's only so far it gets you.

#2737719: EntityResource: Provide comprehensive test coverage: for every entity type, every format, every method was a good example of systematically adding test coverage orthogonally to fixing bugs.

dawehner’s picture

@dawehner, this is connected to other global d.o. problems, lack of reviewers and terrible issue management.

Right, the tests are just a symptom :)

Version: 8.6.x-dev » 8.7.x-dev

Drupal 8.6.0-alpha1 will be released the week of July 16, 2018, which means new developments and disruptive changes should now be targeted against the 8.7.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.7.x-dev » 8.8.x-dev

Drupal 8.7.0-alpha1 will be released the week of March 11, 2019, which means new developments and disruptive changes should now be targeted against the 8.8.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.8.x-dev » 8.9.x-dev

Drupal 8.8.0-alpha1 will be released the week of October 14th, 2019, which means new developments and disruptive changes should now be targeted against the 8.9.x-dev branch. (Any changes to 8.9.x will also be committed to 9.0.x in preparation for Drupal 9’s release, but some changes like significant feature additions will be deferred to 9.1.x.). For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

damienmckenna’s picture

Another way to look at this, is that when code has bugs, this means it was originally added without test coverage.

Not necessarily, bugs can often occur because of unintended consequences that weren't thought of - you can't imagine every possible scenario when adding a change, so you add test coverage for scenarios you expect.

anavarre’s picture

@Chi I understand where you come from and have certainly experienced the same pain at times in the pre-D8 era. But now that we have Composer and can easily add patches with cweagans/composer-patches, I have less felt the urge to get something committed quickly instead of being committed correctly (with tests). Is it not a solution that works well enough (or that you find acceptable) in your Drupal projects?

My main fear is by fixing a bug without test coverage we might just have introduced another bug.

chi’s picture

composer-patchesis a great tool, but before applying the patch you have to identify the bug, search drupal.org for it, review the found patch and occasionally re-roll it.

My main fear is by fixing a bug without test coverage we might just have introduced another bug.

Typically tests for bug fixes only cover the bug is being fixed. The won't prevent introducing another bugs. That's a job for existing tests.

webchick’s picture

We introduced this policy originally in part because in Drupal 5/6 days, Form API checkboxes were broken in different ways (and sometimes the same ways) every other release. We most definitely do not want to go back to those days, especially now that Drupal core now is about 100x the size/complexity as it was back then.

At the same time, I think @Chi does bring up valid points that while fixing a bug is something all PHP developers are at least capable of doing, writing tests to ensure the bug you found is fixed is something a much, much smaller subset of all PHP developers are capable of doing, at least without investing between 2x and 1000x the time it took for them to fix the bug in the first place. (Especially now that we got rid of SimpleTest and there are at least 3-5 different decisions that need to be made about what tests to even write in the first place...)

I wonder if there's some kind of middle-ground that could be explored, like if we could generate some kind of objective heuristics to measure things like, "how likely is this bug to be reintroduced again" and "what's the impact on the product if this breaks in the future." It sounds messy though, and open to arguments, vs. "All regressions must come with tests" is a super easy rule to wrap your head around, and to communicate.

Are there any heuristic frameworks out there that other open source projects employ? The only one I've personally seen is just a straight-up dumb measurement for "test coverage %age" and you aim for that to be 100. To get there, we do in fact need test coverage for every bug.

There might also be other more "human" solutions to this problem, such as forming a "test squad" (similar to a "bug squad") for people passionate about QA, that goes around looking at the "Needs tests" queue and filling in holes.

Anyway. I think it's a valid concern. Let's brainstorm creative ideas in which we might be able to address it, without sacrificing on product quality.

dww’s picture

I totally see both sides of this one. I just spent 70 minutes investigating the cause of a weird (but major) regression in core:

#3101299-15: Install module from .zip URL fails

Once I understood what had changed, it took 5 minutes to write the fix and upload a patch (#16).

Then it took me another ~2 hours to get test coverage working that showed the bug and proved the fix (comments #17-#22). :( And I'm in the "much smaller subset of all PHP developers [that] are capable of doing [it]." :/

OTOH, the lack of solid test coverage in the Update Manager is what allowed #2908176: Deprecate archiver procedural functions to break this functionality in the first place...

Here's another critical bug in core that I've been working on, that now has a fix, but is "Needs work" because I haven't had the ~4 hours to write tests for it: #3113992: The 'Update' page has no idea that some updates are incompatible

Basically, I love and hate our test suite. ;)

I love that it does as much as it does to help ensure the quality of Drupal and prevent regressions. 10+ years ago I was on the Drupal Security team, in the era of Drupal 5-6, and every time we had a security issue we were considering, we had to collectively "click through Drupal" to try to figure out if we were about to break anything. It was a nightmare. It's a miracle we did as well as we did. I love that the existing tests have regularly prevented me from breaking things that I wouldn't have considered while I was "fixing" something else. Regularly.

Yet, I hate that it slows us down so much, that every issue runs the risk of being delayed by hours / months / years while we sort out perfect tests for it. I hate that otherwise ready issues get shot down for minuscule quibbles about indentation of comments in the test coverage. I hate that in spite of this policy and all the angst and frustration it causes, we still have totally bizzaro tests and cryptic stuff that basically no one understands. I hate how hard it is to write and debug tests (even for someone as waist-deep in the muck as I am).

Unfortunately, given our track record on getting follow-ups done and committed, I don't think we can afford to move the 'Needs tests' part to a follow-up. That just means we're "never" going to do it. ;)

Sadly, I think I have to fall on the -1 side of this specific proposal. But I deeply sympathize with @Chi's points, and am definitely in support of finding ways to:

a) Make writing tests easier (I'm not sure how).

b) Better document the existing tests (e.g. issues like #3115435: Make clear why each XML update.module fixture is created the way it is).

c) Incentivize test writing. Like cash money. ;) I wish the organizations that financially benefit so much from Drupal sponsored more test writing, test sprints, etc.

d) Prioritize (like core initiative style) efforts to rationalize, standardize and clean-up our existing tests. We've learned *a lot* as a community about tests over the last ~6+ years, but lots of our tests haven't caught up to that. The tests for Views module, for example, are completely whack. Many views handlers have dedicated Kernel tests for them (yay), but other handlers are only tested via previewing a view in the views UI with a Functional test (WTF?). Part of why writing new tests is so hard is that if you look for existing test coverage of something, it's very unlikely you'll find it, and if you do, you might find a completely absurd thing to base your new test on. :(

e) Make it easier and faster to iterate on tests in the issue queue (e.g. #2569585: Split incoming patches into psr0/psr4 tests and code and run just new tests first.)
...

catch’s picture

There are obviously tests that are more important and less important, and easier and harder to write.

Anything in the database layer, form API etc. will have wide-ranging impact if it regresses. Also in general, we already have a lot of test coverage for those components, so fixing tests is usually adding some assertions and/or extra test data and/or a new test method to an existing suite of tests.

Upgrade path tests - we know that exact code is going to run on hundreds of thousands of real sites and have terrible consequences if it goes wrong.

One-off admin forms on the other hand might have zero test coverage, which is where you get the massive time discrepancies dww describes. [#13485281] is something that either works or does not, but does not really impact the rest of the system because it's isolated to one specific form/feature.

chi’s picture

Once again, tests for bugs will not catch any possible side issues as they only cover the bug itself. So that bug fixes without tests will not make the code more unstable than it is. These bugs are already in core.

One more example. I've just spent a couple hours debugging rather curious issue. Turns out it was already reported.
#2706241: AccessAwareRouter does not respect HTTP method
It has 21 followers (they likely wasted a lot of time on this as well) and very trivial fix (3 lines of code) proposed 4 years ago.

dww’s picture

@Chi re: #18: We have bugs in Drupal because we don't have complete test coverage. Every time there's a regression, it's because other features / bugs didn't come with test coverage (or were added before we even had tests, etc). No, the required tests for fixing bug A don't solve test coverage for bug B. But if bug A had/has test coverage, fixing bug B will not cause a regression in bug A. That's the point.

chi’s picture

#re 19. I understand the intention but.. it simply does't work. Instead of bug fix + test we are getting nothing.

Another point here is that regression tests are typically not well designed because of fragmentation. Someone just asked me why it is not possible to set CSS class in views-view-list.html.twig template. That of course led to the D.O. issue that had a reviewed patch but blocked because of missing tests.
#2845400: Adding attributes to views-view-list.html.twig doesn't work if Views List class Style option is empty
The fix there is trivial but a test would be not. Curently we do not test preprocess hooks at all, if we create a test only for template_preprocess_views_view_list() it would inconsistent.

dww’s picture

@Chi: Yup. I'm coming around to your point of view. At first I was highly sympathetic but leaning the other way. Now I think I might join you. In the month since I last mentioned #3113992: The 'Update' page has no idea that some updates are incompatible that issue more or less broke my spirit about trying to improve core. I had a working fix for a critical bug in core back in Feb 26. 1 month, ~50 comments, ~25 patches, and 3 core releases (all with this known critical bug!) later, I finally had "perfect" test coverage that the core team was willing to commit. It's a miracle that bug is now fixed and committed. I nearly gave up multiple times. If I didn't feel some personal responsibility towards the Update Manager (having been the original author), I would have walked away and we'd probably still be shipping versions of core with a nice select all checkbox to crash your site, while various folks bickered over the perfect comments for the tests. *sigh*

catch’s picture

So we already don't require test coverage for constructor param deprecations. This is because at first we weren't adding those deprecations at all, then we started adding them as a best effort bc layer.

Deprecating the constructor param is only an if statement and a @trigger_error(). Testing the deprecation would require at least an entirely new test method. Also test coverage we know we'd have to remove again in the next major core version.

Bug fixes are slightly different because at least the test coverage remains permanently (at least as long as the feature being tested does), but #2845400-5: Adding attributes to views-view-list.html.twig doesn't work if Views List class Style option is empty is an example of where I don't think we necessarily need to add test coverage with the patch.

If we look at the patch:

diff --git a/core/modules/views/views.theme.inc b/core/modules/views/views.theme.inc
index 4866d1c..8b73bcb 100644
--- a/core/modules/views/views.theme.inc
+++ b/core/modules/views/views.theme.inc
@@ -801,6 +801,7 @@ function template_preprocess_views_view_unformatted(&$variables) {
  */
 function template_preprocess_views_view_list(&$variables) {
   $handler  = $variables['view']->style_plugin;
+  $variables['list']['attributes'] = new Attribute();
 
   // Fetch classes from handler options.
   if ($handler->options['class']) {
@@ -808,7 +809,7 @@ function template_preprocess_views_view_list(&$variables) {
     $class = array_map('\Drupal\Component\Utility\Html::cleanCssIdentifier', $class);
 
     // Initialize a new attribute class for $class.
-    $variables['list']['attributes'] = new Attribute(['class' => $class]);
+    $variables['list']['attributes']->addClass($class);
   }
 
   // Fetch wrapper classes from handler options.

There is no new code path added here, it's just a modification of an existing code path. So any implicit test coverage we have of template_preprocess_views_views_list() is going to test as much or as little as it currently does.

Preprocess functions, like constructors, also aren't part of our backwards compatibility promise as such - what they do is arbitrary and can be altered by contrib modules etc., the resulting $variables array counts as a data structure. So the test would tell us that the preprocess function is doing things a certain way, but it would not tell us that it's doing things correctly.

Also any potential regression is going to be limited to preprocessing of views list views, it's not in the views API, or the routing system, or Twig integration, or similar places where even an obscure regression could end up having a big knock-on effect in contrib and real sites.

However on that specific issue, #2845400-9: Adding attributes to views-view-list.html.twig doesn't work if Views List class Style option is empty was suggesting adding generic test coverage in a follow-up, and could have put the issue back to needs review or RTBC, or anyone else could have done the same in the intervening three years. So I'm not sure the issue was really blocked on test coverage - it might be the reason @alexpott didn't commit it that specific day, but it's not the reason it stayed stalled for three years afterwards. There are also a couple of additional points made in subsequent comments that may or may not require changes to the patch or their own follow-ups. Even if we explicitly don't require test coverage for every bug fix, there will still need to be a discussion about whether test coverage is necessary, and that will sometimes take longer than writing the test would have.

Another case where we don't have 100% test coverage, but could really use it, is upgrade path tests. We test the update functions themselves, but we tend to only have a database dump for a specific version.

For example:
1. A database dump first created with 8.0.0 and progressively upgraded and dumped to 8.8.0
2. A new install of 8.8.0
3. U[dating from 8.8.0 to 8.9.0, then 8.9.0 to 9.0.0
4. Updating from 8.8.0 to 8.8.5 (and every other patch release), then to 9.1.0

etc. etc.

We ran into this lack of test coverage in #3098427: Manipulating the revision metadata keys before running the BC layer breaks the BC layer. However manually creating these tests would be an impossible effort, and requiring critical upgrade path fixes to wait on it would mean real sites losing data in the meantime, so we rely on testing a specific known quantity of upgrade path but not every possible route a site can take (and it's impossible to then add in contrib modules and custom code that occasionally break our assumptions).

Version: 8.9.x-dev » 9.1.x-dev

Drupal 8.9.0-beta1 was released on March 20, 2020. 8.9.x is the final, long-term support (LTS) minor release of Drupal 8, which means new developments and disruptive changes should now be targeted against the 9.1.x-dev branch. For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

Version: 9.1.x-dev » 9.2.x-dev

Drupal 9.1.0-alpha1 will be released the week of October 19, 2020, which means new developments and disruptive changes should now be targeted for the 9.2.x-dev branch. For more information see the Drupal 9 minor version schedule and the Allowed changes during the Drupal 9 release cycle.

Version: 9.2.x-dev » 9.3.x-dev

Drupal 9.2.0-alpha1 will be released the week of May 3, 2021, which means new developments and disruptive changes should now be targeted for the 9.3.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Version: 9.3.x-dev » 9.4.x-dev

Drupal 9.3.0-rc1 was released on November 26, 2021, which means new developments and disruptive changes should now be targeted for the 9.4.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

catch’s picture

Another example of an issue where tests were IMO optional or even not particularly desirable: #3266525: MINIMUM_SUPPORTED_PHP is less than MINIMUM_PHP. It only took a week to get in, but all of that week was reviewing the (non-trivial, for a trivial bug) test coverage.

joachim’s picture

I think #3223302: Changed Method label() in Taxonomy Terms results in Field/Name Changes is another example.

It's a one-line fix, and it needs a whole ton of code and an expensive functional test. The fix is just 'getName() should return just the name field, and not piggy-back the label() method'.

catch’s picture

Yes I think #3223302: Changed Method label() in Taxonomy Terms results in Field/Name Changes is a good example. Commented there since it brings up other questions too.

catch’s picture

Title: [policy no patch] Stop requiring tests for bug fixes » [policy no patch] Better scoping for bug fix test coverage
Category: Task » Plan
Issue summary: View changes

I've rewritten the issue summary here to try to narrow slightly the scope of this issue and make it a bit less alarming in general.

catch’s picture

Issue summary: View changes
danflanagan8’s picture

Here's an example of an issue that didn't need tests: #3126746: LayoutBuilderHtmlEntityFormController breaks decoration

That's a case where a drupalcs or rector rule would be more appropriate. If a class name ends in FormController does it extend FormController?

Instead, I added an expensive functional test that even came with its own test module.

Most cases aren't as cut and dry as this one, but it would be awesome to have some heuristic to go to.

catch’s picture

Issue summary: View changes
catch’s picture

Issue summary: View changes
catch’s picture

Two more examples:

#2895933: EntityReverse will always use the 'standard' join plugin because of an undefined variable $def - would be/is caught by phpstan.

#3266341: Views pagers do math on disparate data types, resulting in type errors in PHP 8 - missing cast that's only an issue on PHP 8.1 - potentially also catchable by phpstan if it can determine the original variable is a string.

In both cases these issues might highlight lack of test coverage, but adding test coverage for those changes is definitely slowing down fixing the actual functional bugs/PHP 8.1 compatibility, where phpstan would be enough to stop a regression.

catch’s picture

A one character fix for a placeholder in an error message, open for seven years due to missing test coverage then reviews of the test coverage.

#2561639: Wrong placeholder for exception shown when new fields are created during module installation

catch’s picture

Issue summary: View changes
catch’s picture

Issue summary: View changes
quietone’s picture

@catch, thanks for updating the IS and providing the case studies.

I went back to look at the Testing section of the core gates. It states

If a bug fix is not accompanied by tests, there should be an explanation in the issue for why no tests are needed (usually only for trivial bugs where testing would be convoluted) before it is marked RTBC.

It seems to me that the proposed resolution is in line with the Testing gate and goes further by defining bugs which are 'trivial'.

danflanagan8’s picture

Here's another mini case study: #3078030: Duplicated summary item when linking to content with the MediaThumbnailFormatter

This is a admin-facing visual bug with a very easy and obvious fix. I picked it up to add test coverage, which turned out to be surprisingly difficult because I could only find one example in all of core where a field formatter's form was tested. (Maybe there are more tests in there, but I looked hard and didn't find them.) And the test I found (and copied off of) was an expensive FunctionalJavascript test (which was also in the wrong namespace and was like NOT a good example to copy off of). Then when the test got reviewed there was a request to make a small change in the test and then it got stuck in NR limbo.

Why is the bug relevant?

Bug fixes are sometimes complex issues where multiple code paths need testing, but sometimes they're straightforward developer mistakes that weren't caught in review

^^ This bug clearly falls into the "straightforward developer mistakes" category

we'd never have thought to have added explicit test coverage if they'd been spotted during patch review instead of after commit

^^ Obviously we wouldn't have added test coverage if the bug was found during review, because almost no core field formatters have any test coverage for their form.

it can be very hard to know where to add test coverage

^^ In this case there is no existing test class or test method to modify and no good examples to copy off of.

The is a further consequence that often the tests we require for small bugfixes aren't well scoped at all - they're one-off tests for very specific developer mistakes...This often leaves us with a choice of committing poor quality tests to get the bug fix applied, or ever-expanding the scope of a tiny issue to introduce better test coverage it was never the issue's fault didn't exist in the first place.

^^ The test I wrote exposes the bug and shows the patch fixes it, which would allow the fix to get committed. But I don't think it's a "high quality" test. It's a FunctionalJS test (i.e. expensive) and covers only a small piece of how the field formatter form operates. There are way more interesting assertions that could be made about other functionality of the field formatter form. And why stop at the MediaThumbnailFormatter? Why not expand the scope to all field formatters since (almost) none of them have coverage?

larowlan’s picture

Adding #730374: Module packages using different letter-casing appear multiple times as a litmus test

It's been open for 12 years

No-one has asked for tests (yet) but technically even though it's a one-line change we'd normally ask for one.

I know above I said I was a solid -1 on this but in retrospect I think there are cases where an obvious fix is fine without a test, especially if the test adds a maintenance overhead an the impact of a regression is minimal.

#730374: Module packages using different letter-casing appear multiple times would be a good test case - should it have tests or not?

catch’s picture

Copying the patch over from #730374: Module packages using different letter-casing appear multiple times to discuss a couple of things:

diff --git a/core/modules/system/src/Form/ModulesListForm.php b/core/modules/system/src/Form/ModulesListForm.php
index 0a52d2f..bb274f2 100644
--- a/core/modules/system/src/Form/ModulesListForm.php
+++ b/core/modules/system/src/Form/ModulesListForm.php
@@ -157,7 +157,7 @@ public function buildForm(array $form, FormStateInterface $form_state) {
     $form['modules']['#tree'] = TRUE;
     foreach ($modules as $filename => $module) {
       if (empty($module->info['hidden'])) {
-        $package = $module->info['package'];
+        $package = Unicode::ucfirst(mb_strtolower($module->info['package']));
         $form['modules'][$package][$filename] = $this->buildRow($modules, $module, $distribution);
         $form['modules'][$package][$filename]['#parents'] = ['modules', $filename];
       }

As discussed in #bugsmash, I think the main thing missing from this issue is clearer heuristics for which kinds of issues we're talking about. The one advantage of the current rule is it doesn't result in endless discussions about whether a bug fix needs a test or not, and these could also cause issues to get stalled and burn people out.

Questions I've been asking myself about the various patches in this issue, to try to come up with some heuristics for when there's an edge case. The answers to the questions aren't that important, more using the patch to generate some questions. Although in this case it's a 'yes' for all of them.

  1. Are there clear steps to reproduce on the issue?
    Yes.
  2. Is it easy to verify via manual testing that the bug is fixed?
    Yes - it's purely a UI bug.
  3. Is it a 'trivial' patch with small, easy to understand changes?
    Yes.
  4. Is the code being changed in self-contained/@internal code that we don't expect contrib modules to interact with significantly? (particularly plugins, controller etc.)
    Yes.
  5. Is the bug fix achieved without adding new, untested, code paths?
    Yes.
  6. To add test coverage, would it need an explicit 'regression' test that mainly prevents us reverting the patch rather than improving coverage in general?
    Yes. We'd need to add a test module and explicitly check that there aren't multiple packages on this form.
  7. If the test coverage was deferrred to a follow-up, would it be easy for someone who didn't work on the original bug report to pick it up? Yes (although in this case I'm not sure we want test coverage for this at all).
  8. (edit: One other question, but answering it for #3078030: Duplicated summary item when linking to content with the MediaThumbnailFormatter) Does the issue expose a general lack of test coverage for the specific subsystem, and if so would it be better to add generic test coverage for that subsystem in a separate issue so that the test coverage can be added in a maintainable way, rather than a regression test? Yes.
catch’s picture

Issue summary: View changes
catch’s picture

Also bringing this back from dww in #18:

Prioritize (like core initiative style) efforts to rationalize, standardize and clean-up our existing tests. We've learned *a lot* as a community about tests over the last ~6+ years, but lots of our tests haven't caught up to that. The tests for Views module, for example, are completely whack. Many views handlers have dedicated Kernel tests for them (yay), but other handlers are only tested via previewing a view in the views UI with a Functional test (WTF?). Part of why writing new tests is so hard is that if you look for existing test coverage of something, it's very unlikely you'll find it, and if you do, you might find a completely absurd thing to base your new test on. :(

Would add I think this is extremely important too. The ideal remedy for a bug fix missing test coverage is that you just add an extra assertion or data provider entry, or maybe an extra method to an existing test. Some of our test coverage is like this, a lot of it is not. If we spent less effort writing (and reviewing) one-off regression tests for typos, and more effort consolidating and modernising our test coverage, it will slowly get easier to add new coverage over time too instead of constantly adding to the soup of hard-to-maintain and understand tests. We have a lot of test coverage, which stops a lot of regressions, but the tests are also poorly organised and themselves a cause of technical debt - as evidenced by how hard it was to remove aggregator, hal, quickedit from core, mainly due to them being dependencies of dozens of tests in different parts of core.

effulgentsia’s picture

In addition to the questions in #42, for me the most important question/heuristic is:

When this fix gets regressed, are we ok with production sites that have come to rely on the fixed behavior encountering this bug?

I think for the example in #42, our answer would be Yes. It's not that big a deal if sites that got developed with the correct casing got launched in production, and in some future update, those production sites start seeing incorrect casing on the module list form.

In other cases (not necessarily any of the examples brought up in this issue), I think committing a fix without a test would be a bad idea, because in a way it's better for sites to encounter the bug before launching, then for sites to launch with the bug fixed, and then later have the bug reintroduced after the site is already in production.

catch’s picture

Issue summary: View changes

I've added the questions to the issue summary under a new 'heuristics' heading, incorporating a slightly reworded version of @effulgentsia's (as "If the bug was to regress, is this likely to impact sites that were applying the patch or working around the bug in some other way, that would have removed those mitigations once the bug was fixed in core?").

alexpott’s picture

Here's another example where we've added a test for a deprecation message and nothing else - #3256549: Deprecate core/drupal.date asset library in 9.4.0. Discussed with @catch and we agreed that the test added here does not add much value.

If we deprecated code that was heavily used by core and contrib and we changed the underlying logic to call non deprecated code - I think there is some value in testing deprecated code to prove that for the same inputs the outputs are the same. But the generic does the deprecation message text get triggered test is not offering much (well apart from "does this code trigger PHP notices?" - if called in the in the way the deprecation test calls it).

catch’s picture

#47 is a similar case to what we ended up deciding for constructor argument deprecations (which are sometimes tasks, but sometimes side-effects of bugfixes).

Initially, we didn't provide bc layers for constructor changes at all because they're @internal, but that made it tricky for some modules to provide support for multiple branches. So instead, we switched to 'best effort' bc - as in we provide bc and trigger deprecations even for @internal code where it's straightforward to do so. However, if we also required test coverage for those best effort deprecations, then the very small change to add/remove/change a constructor argument balloons to lots more work, and not only that but the test coverage is removed again in the next release anyway. So we settled on 'trigger deprecations but don't bother testing' as the most useful for contrib with the least overhead for core contributors.

effulgentsia’s picture

Issue summary: View changes

In other cases (not necessarily any of the examples brought up in this issue), I think committing a fix without a test would be a bad idea, because in a way it's better for sites to encounter the bug before launching, then for sites to launch with the bug fixed, and then later have the bug reintroduced after the site is already in production.

Actually, I think Case Study 1 (#3223302: Changed Method label() in Taxonomy Terms results in Field/Name Changes) is an example of this. For sites that subclass Term and override label(), this is a data destroying bug. It's better to encounter this bug before launching the site (which is the case if we leave HEAD as-is and don't commit the fix) than to encounter it after launching the site during a Drupal core update (which would be the case if we commit the fix without a test, and it later gets regressed, which is a not insignificant risk when you don't have test coverage).

#3223302-33: Changed Method label() in Taxonomy Terms results in Field/Name Changes points out that if all we do is add test coverage for Term, then we're still missing the equivalent test coverage for every other entity type. We shouldn't block that issue on test coverage for other entity types, since that would be scope creep. And so then the question is whether it's sufficiently desirable (or desirable at all) to add the test coverage for Term and then have a follow-up to generalize it, or whether it's better to commit the fix without a test, and have a follow-up to add generalized test coverage and/or a Rector rule.

My current opinion is that for that type of issue (where the impact of regression is non-trivial), that it is desirable to add the test with the fix, even if the test coverage has limited scope (only the entity type being fixed) and gets generalized and/or converted to a Rector rule later.

While I appreciate questioning how valuable it is to protect against a regression in Term while leaving all other entity types untested (and therefore just as prone to regression as Term would be if we didn't add a test for it), I think the key difference is that while it's theoretically possible that the same bug could get introduced into other entity types, it actually did get introduced into Term. For me, test coverage is about building up reliability gradually based on learning from what actually happened. Generalizing to cover similar scenarios is wonderful, but followups might or might not happen, so at a minimum adding reliability for the specific situation that occurred is still better than not doing that.

However, I also appreciate the need to balance the value we get from tests against the morale cost of issues being stuck for a long time. That's why I'm +1 for allowing bug fixes without tests for issues where the impact of a regression is no big deal. But for issues where the impact of a regression would be potentially a big deal, I think it's better to require a test, but perhaps be less picky about the details of how the test is implemented: as long as the test sufficiently protects against the specific regression, then I think that's good enough and any improvements to it could be a followup.

catch’s picture

It's better to encounter this bug before launching the site (which is the case if we leave HEAD as-is and don't commit the fix)

Just to note this assumes that people are always finding bugs before launching sites, and not afterwards, which is very often the case.

So it is really the question between 'should we leave this unfixed for both new and existing sites, vs. fixing it but with the potential for it to regress later'.

And my main question with the term label one, is why do we think that issue would ever get regressed back to calling ::label() except for a straight revert of the previous commit? I haven't done the archaeology yet, but my guess would be the bad ::label() call was added when that method was first committed to core, and it's never been changed since, which would indicate a very tiny chance of the logic changing again in the next 10 years or so after we fix the bug.

edit: git blame confirms the logic was added in 2014 via #2016701: Expand TermInterface to provide methods.

a8a8b2cbcc9 core/modules/taxonomy/lib/Drupal/taxonomy/Entity/Term.php             (Alex Pott           2014-03-17 10:11:38 +0000 255)     return $this->label();

Other areas that aren't one-line methods with identical logic in 8 other entity types that get more churn the answer would be different though.

danflanagan8’s picture

Here's an issue that just came to mind that fits the bill here really well: #2521758: Module anchors on Permissions pages do not get you where you would expect

Let's see what the heuristics have to say about this one.

  1. Are there clear steps to reproduce on the issue? YES
  2. Is it easy to verify via manual testing that the bug is fixed? YES
  3. Is it a 'trivial' patch with small, easy to understand changes? YES
  4. Is the code being changed in self-contained/@internal code that we don't expect contrib modules to interact with significantly? (plugins, controllers etc.) YES
  5. If the bug was to regress, is this likely to impact sites that were applying the patch or working around the bug in some other way, that would have removed those mitigations once the bug was fixed in core? NO
  6. Is the bug fix achieved without adding new, untested, code paths? YES
  7. To add test coverage, would it need an explicit 'regression' test that mainly prevents us reverting the patch rather than improving coverage in general? YES
  8. If the test coverage was deferrred to a follow-up, would it be easy for someone who didn't work on the original bug report to pick it up? YES
  9. Does the issue expose a general lack of test coverage for the specific subsystem, and if so would it be better to add generic test coverage for that subsystem in a separate issue so that the test coverage can be added in a maintainable way, rather than a regression test? NO

Heuristics definitely favor no test here. Maybe #5 could be re-worded so that an answer of YES suggests a test may not be needed.

Something interesting and new about this issue, when compared to the ones already discussed, is that I don't have any idea how one even WOULD write a test for it. Is it even possible? This might be a bug where manual testing is the ONLY way to go.

Here's another issue where I believe manual testing is the only possibility due to a limitation with chromedriver as described in a comment of mine: #3062742-27: Drag-and-drop reordering doesn't work if the region content is nested in other elements.

But in this case, the heuristics make it clear that this issue is not trivial. I say that because a trivial patch should definitely have YES as the answer to 1-4.

  1. No
  2. Yes
  3. No
  4. No
  5. Yes
  6. No
  7. No
  8. No
  9. No

I guess this is a long way of asking, "What to do in situations where an automated test is impossible or at least extraordinarily cumbersome?"

catch’s picture

Issue summary: View changes

Tried rewording the regressions bullet so that the YES/NO is the same way around as the others.

Version: 9.4.x-dev » 9.5.x-dev

Drupal 9.4.0-alpha1 was released on May 6, 2022, which means new developments and disruptive changes should now be targeted for the 9.5.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Version: 9.5.x-dev » 10.1.x-dev

Drupal 9.5.0-beta2 and Drupal 10.0.0-beta2 were released on September 29, 2022, which means new developments and disruptive changes should now be targeted for the 10.1.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

catch’s picture

Another example: #3173639: Entity storage exception during module install missing @message parameter in watchdog_exception() call. for that one it was RTBC with test coverage, I committed only the fix and dropped the tests.

Version: 10.1.x-dev » 11.x-dev

Drupal core is moving towards using a “main” branch. As an interim step, a new 11.x branch has been opened, as Drupal.org infrastructure cannot currently fully support a branch named main. New developments and disruptive changes should now be targeted for the 11.x branch, which currently accepts only minor-version allowed changes. For more information, see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

quietone’s picture

Issue summary: View changes

The change proposed here would go into the Testing core gate. So, I have added suggested text for that change to the Issue Summary. That includes simplifing the wording of the 9 heuristic questions and re-ordering them to put the simplest ones to answer first (I think).

quietone’s picture

Title: [policy no patch] Better scoping for bug fix test coverage » [June 19] [policy, no patch] Better scoping for bug fix test coverage
Issue summary: View changes

Updating the title to meet the 'Special issue titles'.

Adding a date to set a defined review period because this affects the Testing Core Gate. Any points not agreed to in that time frame will be moved to a follow up issue. The review time of three weeks is experimental. In another organization I was in, a policy was open for review and feedback for one month. But this issue is not a full review of the policy, it is about adding a new section specific for Bug Reports. And since this issue already has two rounds of discussion and a clear proposal, two weeks seems sufficient.

quietone’s picture

Issue summary: View changes

Fix review date in the IS and add item to test the heuristics.

smustgrave’s picture

Not sure how I feel about the idea of skipping tests. Definitely can see the argument but could be a dangerous can of worms where people start pushing for more changes without tests.

If this were something we did I think we should incorporate
a) A crystal clear list of what qualifies for allowing to skip tests
b) I think 2 sign offs from either framework manager or product managers.

poker10’s picture

I would say that we are going to offer possibility to skip tests to many bug fixes, if we use this wording:

If the answer to the the majority of the following questions is 'no' an automated test is required.

Personally I would tighten this to at least 2/3 of "YES" answers needed to allow skipping tests. Currently the Testing core gate allows to skip tests on some specific trivial fixes, see #39. But I see the added value of this policy change by providing explicit questions, so that all contributors would know whether the tests are needed or not.

Looking at this as a provisional committer, I would say that for me it is a way better/safer to have a failing test as a proof, than going thru steps to reproduce and trying to "decipher" what the reporter meant. Certainly there are bug fixes, where the fix is trivial, but the steps to reproduce are not and are tricky.

So beside the change in ratio I have proposed, I think we should define which questions should have mandatory "YES" to allow tests to be skipped. For example the questions 1, 2 and 3. I cannot imagine how the process will be better if there will be "NO" answer to the question no. 1 (so the issue summary will not be clear / complete), but the issue will still qualify for tests to be skipped, so there will be no failing test. Very risky to me.

neclimdul’s picture

I'm definitely have concerns about loosening testing too much. The point of testing is to prove that a bug doesn't exist so IMO its pretty important that the majority of bug fixes have a test to prove they have a fix.

There are definitely cases we should allow bypassing patches but they should be pretty exceptional. We don't have a framework for testing it, it requires elaborate conditions or impractical environments, etc.

Talking in slack, it came up that a lot of features don't have tests so simple bug fixes require and entire boiler plating to add tests. I was running into this so many times I spent a large part of the d8 release cycle just picking subsystems and writing unit tests. But lets be real, if we don't add the test, its just going to get an follow up that no-one pays attention to and never gets committed.

Looking directly at the current heuristics, I don't really understand a lot of them. Others seem like things that should _always_ have tests so I don't understand how they would count toward bypassing testing.

catch’s picture

@neclimdul did you look through case studies 1 and 2? And also the example from #47?

chi’s picture

I'm definitely have concerns about loosening testing too much.

We can't loose something that we don't have. If we block a bug fix with "Needs tests" tag the total number of tests in Drupal core will not increase.

chi’s picture

So far the main argument in favor of updating the policy is a number of efforts sometimes needed to create a test for simple bug fixes. This has blocked a huge number of bug fixes from being committed.

However, that's not the only reason. Regression tests are hard to design well because they have a limited scope. Imagine a case when you need to write a single test for a component that does not have any tests at all. Even a simple test quite often requires creating a base class, helper modules, fixtures, etc. Such work would be definitely out of scope of a bug report.

The proposed heuristics look quite complex. I am afraid they will complicate the issue workflow a big deal. What if we just replace the "Needs tests" tag with "Needs test follow-up"? So any bug fix that don't have a test should provide a link to a follow-up issue.

The only exception when "Needs tests" is really needed is the case when a bug is hard to reproduce.

smustgrave’s picture

See what you mean about when tests don't exist at all, so adding a quick assertion isn't possible.

But if we pushed tests to follow-ups what are the odds they'll get worked on?

chi’s picture

Follow-ups may have a wider scope. Multiple bug reports can reference a single test issue.

poker10’s picture

This has blocked a huge number of bug fixes from being committed.

But how many new bugs these fixes could cause, if they would have been committed without tests? I have seen many D7 issues where a trivial bugfix patch was ready, but eventually it was found that it was not fixing the issue mentioned in the IS. And only the test-only patch discovered that.

Not to mention what is more efficient/more safe approach - set-up a test site, trying to simulate steps to reproduce vs check a failing test where the exact steps to reproduce the bug are present? Yes, ideally you want both, but without a test it is a way worse.

The only exception when "Needs tests" is really needed is the case when a bug is hard to reproduce.

I disagree and would completely turn it around. Tests should be needed everytime, except when answers to selected questions (not majority) from heuristics are YES (because I cannot imagine both "NO" answer to the question no. 1 + skipped tests, for example). Tests are a guarantee that the same bug will not appear in the future. If we decide to skip tests in a decent number of bug fixes, then we can easily end up with more bugs that we have now, because of regressions.

But if we pushed tests to follow-ups what are the odds they'll get worked on?

+1. I would say these issues are not so attractive. Even if they can have a wider scope.

I see the point of skipping tests when adding tests would require very big effort and consume a lot of time, but these cases should be rare.

chi’s picture

But how many new bugs these fixes could cause, if they would have been committed without tests?

Exact same amount as with tests. Regression tests have very limited scope. They must not test anything besides the bug they created for. So that they won't help on catching any new bugs produced by the fix.

neclimdul’s picture

First, I love this discussion. I've scrapped an entire comment I was thinking about because @smustgrave nailed it. To expand though, the bugs exist because there wasn't a test. The bug almost _is_ not having a test so skipping the test means we didn't really fix the bug. Also I 100% agree the follow will just languish and we'll revert of re-introduce the bug. Chi is also hitting a lot of the other points I'm thinking about though maybe from a different perspective so I'll talk more about that.

I've been thinking about this all day in response to catch's comment because I had in fact read the case studies and they seem like issues that definitely need patches to me. So the implication that they would change my mind(2nd time since they came up in slack) made me take a step back. Clearly we're looking at the problem different and I needed to do some soul searching.

So... here goes.

I'd been starting to resolve on a conclusion and reading Chi's comments I think its reinforcing what I was thinking. I think the problem at the heart of at least the case studies is that writing tests is too is hard and slow to iterate on.

Basically, fixing a string should not require deep understanding of core subsystems and weeks or months fighting with testbot. So if it it took 2, 3, or more years to make tests pass because of that, I don't think that's a "we bypass testing" problem but a "we do testing bad" problem.

From personal experience, writing this sort of test is several hours of digging through subsystems to understand requirements on how I could test the code, then guessing at what might pass on testbot, then submitting it to testbot and waiting an hour or two for it to complete, meaning I'll probably get feedback the next day. And that means when I do get feedback I've probably moved on to other things and because job is not working on core all day I'm probably moved on to some other task and not be able to look at it again for a week or two and...

If that aligns with most peoples experience, and its that hard for us, then imagine what its like for someone outside the core developer community and I think we've reached the root of the problem. So if we still think testing is important(🙏), I think that means we should be focusing our efforts and making it all around easier and faster to write tests not remove requirements for when tests are needed.

Things that stand out and have been in the back of my mind for a while are:

  • Easier testing
    • Out of the box ways to quickly create consistent testing environments with ddev, lando, et.al so people can quickly test things in a way that _will_ pass on testbot.
    • Easier ways to test from local and remote IDE's. For example, run-test serves us well but it often it leads different results drawing out test development.
  • Faster tests. I know the importance of this is controversial but its a serious burden to creating tests.
    • Push tests out of browser tests. I think a lot of people default to it because it doesn't require any understanding of core systems to have something that "works". I think we can provide test assertions, mock tools like we do for logging for key systems like entities, plugins, the installer along with documentation we can point people too.
    • make browser tests faster and easier. I know there's a long standing issue to provide mock db's etc. we should figure that out because one is slow. Many is miserable.
  • Can we export a failed test list in a way we can iterate on locally? I don't know how but its so hard to parse the current output and recreate and that should be front and center along with documentation on how to run them.

There's probably more that could go into this but it's already a monster information dump so I'll stop there and hope for the best.

catch’s picture

@Chi

We can't lose something that we don't have. If we block a bug fix with "Needs tests" tag the total number of tests in Drupal core will not increase.

Yes this is my position as well. We are constantly adding test coverage to core, and this issue wouldn't change that, what it would mean is some bugfixes getting committed faster and the same amount of tests getting committed as we do currently.

Since I added case study #2 to the issue summary, the bug was fixed (without a corresponding test) in #3258969: Wrong argument for @message in ModuleInstaller::install call to watchdog_exception, in five comments and less than one month in a duplicate issue. Has that resulted in less test coverage than leaving the other issue open for five years?

@neclimdul I do agree with a lot of points in #70, but we already have issues like #3082230: [meta] Convert some tests to Build tests and #3041700: [META] Convert some tests into Kernel or Unit tests, which notably people do work on even though those issues aren't prerequisites for people's bug reports. So I think the idea that people don't work on tests unless we refuse to commit their bugfixes is not really the case. In fact I think we'd be much, much better off if our testing effort was spent on refactoring the test coverage we already have, than adding more one-off tests for incorrect placeholders in error messages, or per my example from #47 adding a test that we called trigger_error() with an arbitrary string in code that's going to be deleted in a few months, along with the test we added for it.

There is so much refactoring we could do - from converting more functional tests to kernel tests to unit tests, to making some entity-specific tests generic to all entity types (node revision UI tests for example?), to make it possible to write upgrade path tests without having to import database dumps into specific old tags of Drupal and re-export and compress them. This would then make adding new test coverage easier in the long run, but in the meantime we have one character bugfixes that we can either commit or hold up.

catch’s picture

To expand though, the bugs exist because there wasn't a test. The bug almost _is_ not having a test so skipping the test means we didn't really fix the bug.

So this is true in a way, and it was the motivation behind the current policy when simpletest was added to core, but I think it's missing an important component:

We commit three categories of issue, bugs, tasks and features.

Sometimes tasks and features are expected to come with new test coverage, but often for smaller tasks and even some features, not breaking the existing test coverage is sufficient.

When large new features or tasks are committed, because we don't have any code coverage reports for core, we can't see what code is untested or under-tested.

However for bug reports, we assume that more or less every line of the bug report should have test coverage, and in some cases require adding test coverage for much more than the code that's changed because it didn't previously exist.

This means the bar for fixing bugs is consistently higher than it is for at least a subset of tasks and features. There are other things that make tasks and features hard to work on, it's not that they get an easier time overall, but it's by definition easier to introduce a bug without test coverage than to fix it.

Also, although we revert a lot more issues than we used to, the nature of core development is that if a bug was introduced more than a few weeks ago, the chance of a revert is pretty low - because we stack issues on top of each other which can make straight reverts impossible in a lot of cases.

Typing this out did remind me that code coverage reports on MRs would be really nice - i.e. when looking at lines of code changed, also see the code coverage report, and ideally a diff of that from the target branch.

chi’s picture

Issue summary: View changes

Removed unrelated text from the issue summary.

neclimdul’s picture

Sometimes tasks and features are expected to come with new test coverage, but often for smaller tasks and even some features, not breaking the existing test coverage is sufficient.

Is there an issue to fix that? It seems like a big problem kicking this off the original feature developer that understood the code to someone that just wants to fix their problem.

When large new features or tasks are committed, because we don't have any code coverage reports for core, we can't see what code is untested or under-tested.

I used to run coverage reports daily but it became a big pain and was constantly breaking because run-test isn't phpunit and it broke on the regular. The bigger problem though was the heavy use of browser tests which will never trigger useful coverage. @see better testing tools outside browser tests.

This means the bar for fixing bugs is consistently higher than it is for at least a subset of tasks and features.

Still sounds like we found the problem upstream. The feature should have had more tests. Bugs should still need tests, its the right thing.

the nature of core development is that if a bug was introduced more than a few weeks ago, the chance of a revert is pretty low

You don't have to tell me... but that's a rant for a different time.

neclimdul’s picture

Just opened my issue dashboard to find a trivial issue that got kicked back to NW last year because I added an entire unit test with almost full coverage. Its now RTBC because someone removed the test. Maybe I'm just doing this all wrong. :(

catch’s picture

Is there an issue to fix that? It seems like a big problem kicking this off the original feature developer that understood the code to someone that just wants to fix their problem.

I don't think there's an explicit issue. We are not usually knowingly committing stuff with gaping holes in test coverage, it's just that without code coverage reports, it's hard to review (especially with already large patches) exactly how much of the code is covered. If an issue is adding loads of new code and not much tests you can see it, when existing code is being refactored it's harder.

Once again the chaotic nature of much of our existing test coverage makes all of this difficult, because it's hard to tell what's already covered vs. not. A lot of these issues date all the way back to the original addition of simpletest where we tried to add as much coverage as possible without the experience of organising tests, and to a ten year old, huge, untested code base. We still have some of those original functional tests in core more or less unchanged.

I still remember writing bits of TermTest, some hunks in there were committed in 036a026433 which is from 2008, which was itself a refactor of test coverage that had been added previously, it was probably one of the first simpletests I worked on. It predates terms as entities and terms as entity reference fields. ::testTaxonomyNode might possibly have made sense at the time, but now it's doing.. what... testing that entity reference fields work? I'll open an issue to delete that method now (edit: now here #3365040: Remove TermTest::testTaxonomyNode(). We've have probably installed Drupal and run that test hundreds of thousands of times, maybe millions. If you look through git history, we have been updating it for changes to the testing API, moving it to PSR-0, moving it under core/ (both of these broke git blame so I had to resort to git log -S to find the original commit), all for a method that has been pointless since probably 2011 or so.

This is partly what I want to avoid adding to - we probably have dozens of tests that are not actually adding 'code coverage' to core, they're duplicating coverage in other tests. When we have to add an entirely new test to assert very small bug fixes, it's sometimes a sign that we have no coverage at all, but sometimes it's a symptom of not being able to find the appropriate test to add a couple of assertions to, even though that test exists in core somewhere. If we spent more of our time on refactoring test coverage, and less on adding new functional tests for specific regressions, we might end up in a better state overall because we'd know where to add those assertions to and duplicate less.

A recent example where I broke something is #956186: Allow AJAX to use GET requests - the existing test coverage caught multiple bugs in iterations of the patch, and the patch updated some tests for new expectations. It also removed one assertion it shouldn't have (mea culpa) which provided a hint as to the regression. #3364088: Ajax state leaking to Views destination paths was opened two months later - fortunately it's an easy fix and came with tests (although they're FunctionalJavascript tests, but we don't have an easy way other way to test ViewsAjaxController), and it was all discovered and fixed before the 10.1.0 release candidate. If we'd found the regression a day or so after the original patch landed, we'd probably have reverted and recommitted with extra tests added.

The original issue adding ViewsAjaxController to core was probably... views in core, which overall we have a lot more views test coverage in core now (and a lot more core APIs tested via Views) than we did before it was added, but it's also one where we were adding coverage for years-old untested code. So there is not a moment in that issue where we can point to a specific event where code was added without coverage (except perhaps me removing one assertion, or before automated testing existed), it's more of a cumulative thing over time.

I used to run coverage reports daily but it became a big pain and was constantly breaking because run-test isn't phpunit and it broke on the regular. The bigger problem though was the heavy use of browser tests which will never trigger useful coverage.

I think unit and kernel test coverage would give us a lot of useful information, and it would be useful to have them as separate reports (or at least somehow filterable). We could potentially never add functional test coverage reports, seem a bit pointless anyway.

danflanagan8’s picture

I'm an example of a community member who would be interested in working on follow-up issues to add missing test coverage and/or issues to refactor test coverage in general. We'd just need some kind of tag so they'd be easy to locate.

And I'm not all talk here. I've posted test-only fail patches to dozens of bugs. I coordinated a large part of the effort to refactor tests that used Classy to use Stark leading up to D10. I've taken time to refactor tests on contrib modules I maintain. I've added test coverage for contrib modules I don't maintain that never had test coverage.

I give a big +1 to the heuristics proposed in this issue. When it comes to the actual implementation of the heuristics, I think it would be nice to have a tag that could be added to an issue that indicates that the community believes no tests are needed. Kind of like the opposite of the "Needs tests" tag. That would allow interested or concerned parties to review such issues from time to time.

The term "testing theatre" just occurred to me. We've heard of "security theatre" before and agree that some security measure provide no value. There are some tests that provide no value too. They're all for show. "Testing theatre" is part of what would be addressed by this issue.

catch’s picture

I think it would be nice to have a tag that could be added to an issue that indicates that the community believes no tests are needed.

Yep I think we're also talking about at least two things here:

1. Cases where we actually don't want to add a test at all - for me there are not many cases like that, but I do think deprecation tests where the only logic tested is a call to trigger_error() is an example.

i.e. where the tested code looks more or less like

function old_function($foo, $bar) {
  trigger_error('...', E_USER_DEPRECATED);
   new_function($foo, $bar);
}

Partly because what it's testing is... not doing much, partly because the test has to be removed along with the deprecated code, so we're 'digging a hole and filling it in again' to quote Keynes. This doesn't include bc layers that have actual logic involved which generally I'd want tests for even if we're deleting it later, because there is something to go wrong that we have to maintain in the interim with those.

2. Cases where we would like to have test coverage, but adding test coverage is particularly onerous compared to the actual bugfix and/or the test coverage that we add just to have a regression test for the bug is likely to be of lower quality than if we wrote a test for the functionality in general.

You can say we should write a test for the functionality in general, but that is expanding the scope of the bug fix way beyond what it should be, where other similar bugfixes might involve adding a single additional assertion to an existing test (or even data provider). You could then say we should open a prerequisite issue to add the test coverage, and postpone the bugfix on that - and in most cases I think that is worse than committing the bugfix and adding the tests in a follow-up.

catch’s picture

Actually there's a #3 which is 'covered by a PHP stan rule' - we are not quite there with PHPStan levels, just starting to get that from the levels we're enabling now, but it will be more in the future.

quietone’s picture

I have read all the comments and summarized as follows.

This is the for/against the proposal which show support for the proposal. Comment if I have made a mistake.

for: poker10, danflanagan8, catch, larowlan, webchick. dww, alexpott, effulgentsia, chi
against: smustgrave, neclimdul, anavarre, dww
unknown: dawehner

Next are the case mentioned when tests are not needed.

  1. Deprecation tests where the only logic tested is a call to trigger_error().
  2. Where the impact of a regression is minimal and the fix is obvious or a test adds maintenance overhead.
  3. Where we would like to have test coverage, but adding test coverage is particularly onerous compared to the actual bugfix and/or the test coverage that we add just to have a regression test for the bug is likely to be of lower quality than if we wrote a test for the functionality in general.
  4. Covered by a PHP stan rule

Here are other suggestions.

  1. There was concern that the heuristics were complex
  2. Tighten the number of YES responses to the heuristic questions from a majority to 2/3
  3. Identify the heuristic questions that must be a YES
  4. Make use of a new tag 'Needs test followup'
  5. There were at least 3 people who wanted to spend time improving the testing suite instead of making this change. dww proposed the following:
    • Make writing tests easier.
    • Better document the existing tests.
    • Incentivize test writing.
quietone’s picture

Title: [June 19] [policy, no patch] Better scoping for bug fix test coverage » [policy, no patch] Better scoping for bug fix test coverage

I suggest that the next step is to work through the suggestions in #80.

1. Are the question complex themselves or are there too many? How can they be made simpler?
2. What is you opinion on changing the criterion?
3. Should we identify the questions that must be a YES? If this was done, then those question could be changed to set criteria.
4. Is a new tag needed?
5. I guess this one needs a followup. I have not looked. Catch mentions some issues in #71 but that does not cover the documentation and incentive point.

I am removing the date from the title because there was no comment that would block making this change.

larowlan’s picture

Issue summary: View changes

Adding an anchor for Heuristics heading

quietone’s picture

Component: base system » other
Issue summary: View changes
Status: Active » Needs review

I have made a followup for the suggestions regarding testing.
I have updated the Issue Summary with proposed text change to the Core Testing gate.

Can anyone explain how the proposed tag 'Needs test followup' will be used compared to 'Needs tests'?

quietone’s picture

quietone’s picture

Issue summary: View changes

Update IS to clarify next steps.

This got a behind schedule as a result of my holiday. The proposed resolution has been updated by me with what I think is a summary of the discussion. This needs to be reviewed, possibly tweaked, and approved so the Core Gate documentation can be updated.

mikelutz’s picture

Status: Needs review » Needs work

Some grammar and punctuation fixes:

The fix 'trivial' with small, easy to understand changes

The fix is 'trivial' with small, easy to understand changes

Is explicit 'regression' test needed?

Is an explicit 'regression' test needed? / Are explicit 'regression' tests needed?

If this fix is committed with test coverage but then later regresses, is the impact likely to be minimal or at least no worse than leaving the bug unfixed. For example, few sites were applying workarounds or patches that were removed when the fix was committed?

If this fix is committed with test coverage but then later regresses, is the impact likely to be minimal or at least no worse than leaving the bug unfixed? For example: few sites were applying workarounds or patches that were removed when the fix was committed.

quietone’s picture

Issue summary: View changes
Status: Needs work » Needs review

@mikelutz, thanks!

Fixes made for #86. Back to NR

quietone’s picture

Issue summary: View changes

In Slack, mikelutz said he didn't like this sentence.
For example, few sites were applying workarounds or patches that were removed when the fix was committed?
I too found it not very clear or helpful so we agreed to remove it.

I am updating the Issue Summary.

xjm’s picture

catch’s picture

Issue summary: View changes

Took the original 'heuristics' out of the issue summary, since I got confused which was the proposed text.

poker10’s picture

I think that the selected questions for mandatory *yes* are well chosen, so +1 from me for the current proposal.

bbrala’s picture

After reading rhis full issue and IS, I want to give my +1 tot his proposal. Combined with a tag which helps us identify if a test is needed.

Wish I had the time, writing tests is not too bad imo and fun even at times.

catch’s picture

Another issue where a test wasn't helpful or would be nearly impossible to write #3378341: claro.jquery.ui css assets may be added the page multiple times. Might have been caught by generic CSS regression testing, but otherwise we'd be testing the exact structure of Claro's library definitions.

quietone’s picture

Status: Needs review » Reviewed & tested by the community

There haven't been further suggestions/tweaks here so I am going to set this to RTBC.

This still needs feedback from the framework managers, release managers and the testing subsystem maintainers.

quietone’s picture

Issue summary: View changes
mondrake’s picture

@quietone asked me to comment here.

In general, +1.

TBH I think this could be debated forever, so the sheer fact that some consensus was built in five years deserves recognition. If we were quicker in this sort of things, my approach would be 'let's go and see how it works, and fix the rough edges along the way'. But we're generally not quick, so I think more reviews are needed to agree this is robust enough for the next few years to come.

Couple of points.

1) PHPStan - I suggest issues that are fixing PHPStan baseline (are they bugs or tasks?) are exempt from adding tests if the existing tests do not fail in the process. We have around 1k baselined error on levels 0 and 1 alone, and we have been on PHPStan for 2 years already IIRC. Fixing them is sometimes a really painful process. Level 2 will add 2k more. IMHO we need a sort of fast track here or we will remain negligible levels for a long time. Note: the lower the PHPStan level, the higher the risk of type related bugs in code - so while we stay on L1, more and more bug issues will be opened.

2) Follow-ups - IMHO we should also add to the commit gate that a follow-up issue EXISTS with the description of the additional test required, when heuristics tell us that one is needed.

catch’s picture

Agree with #1 and #2 of @mondrake's points.

PHPStan is like free test coverage for some types of issue, so if we can fix a phpstan baseline error without breaking existing tests, then phpstan will at least protect against us reverting the specific fix we made, if not necessarily regressing the functional bug that resulted from it.

The follow-up needing to exist and have a legible issue summary also makes sense.

catch’s picture

Since @mondrake has +1d, removing 'needs subsystem maintainer review' tag.

acbramley’s picture

I'm wondering if we can also document when an update path test is needed?

Take #3043330: Reduce the number of field blocks created for entities (possibly to zero) for example - a critical bug that greatly affects performance of large Drupal installs.

The update hook is trivial - it sets a boolean value in a new config.

Writing an update path test for that is not trivial - even for someone extremely familiar with writing tests, and someone that has written upgrade path tests in the past (like me).

This one in particular is harder because layout_builder needs to be installed on top of the generic db dump file before we run updates, I spent ~60 minutes this morning getting this test working, and it would've been MUCH longer if I hadn't had multiple resources to help me (one being the existing MenuLinksetSettingsUpdateTest to copy from, and the layout-builder.php stub file that I stumbled upon to do the module install for me.

We then run that update hook on every single MR going forward, all to test that a one liner runs correctly? Something that any number of contributors can look at and visually confirm it's doing the right thing.

larowlan’s picture

I realise in comment #1 I said solid -1 but that was before the criteria were established.

So +1 from me from a framework manager perspective.

Leaving the tag so @alexpott and @effulgentsia can also opine

danielveza’s picture

I have a couple of issues that I think are worth raising in regards to this:

#3115759: Prevent auto-adding new fields to LB layouts is in the exact same situation that @acbramley raised in #100. The new functionality is tested, the only part that is missing a test is the update hook, which is simply setting a new boolean config value to TRUE.

#3033153: Contextual links can be used to drag and drop Layout Blocks. is a one line fix on a 5+ year old issue. Testing dragging Layout Builder blocks appears to be an existing issue because the existing tests around this simulate the element moving in the DOM rather than performing an actual drag.

Testing this particular issue requires us to be able to click and hold on an element. We don't have this in core and I can't think of a way to do this in a test without writing code that I feel like may be prone to random failures.

Considering this is a very old issue with a one liner fix that can be easily tested manually I think it may be a good candidate for this.

dww’s picture

+1 to #100 and #102.

Here's another to consider: what tests could / should we add for #3097907: Active toolbar tray has weak affordance and fails WCAG color criteria? That issue has already been so painful, I don't think I could handle another 2 years for the tests. 😬 I'm not really sure what we'd be testing. It's a major accessibility bug with an entirely CSS fix. Are we supposed to have automated coverage for the accessibility of our CSS? (beyond the lint job, which is already passing).

dww’s picture

P.s. we’ve already got the “Needs tests” tag. Do we want to introduce something like “Needs no tests” to indicate we’ve considered it and it’s not worth it, undesirable, can be punted to follow-up, etc?

longwave’s picture

I have been watching this issue for a long time and never commented because I am very on the fence about it. On the one hand I see that we would move faster and fix more bugs if we committed some fixes without tests, but on the other I am worried about introducing regressions later on. But after sitting on it for a while I think the proposal here is a good compromise to both those arguments.

I also agree with #100/102 that update tests are one of the most painful things to write, and concur that if the update hook is trivial then we should not require explicit testing; we do have test coverage that ensures that all updates are at least successful, and we have test coverage for e.g. the config/state APIs, so there seems little value in ensuring that an update hook can correctly make a minor change via those APIs. I further agree with #103 that CSS fixes and some accessibility fixes really can't be tested.

Overall, the suggestion in the IS that the majority of the questions must be answered yes is a good failsafe. However:

  1. Is the fix is easy to verify by manual testing?
  2. Is the fix in self-contained/@internal code where we expect minimal interaction with contrib? Examples are plugins, controllers etc.
  3. Is the fix achieved without adding new, untested, code paths?
  4. Is an explicit 'regression' test needed?
  5. Is it easy for someone who did not work on the original bug report to add the test coverage in a followup issue?
  6. Does the issue expose a general lack of test coverage for the specific subsystem? If so, is it better to add generic test coverage for that subsystem in a separate issue?
  7. If this fix is committed with test coverage but then later regresses, is the impact likely to be minimal or at least no worse than leaving the bug unfixed?

Question 4 stands out to me here. It's either the wrong way round - Is no explicit regression test needed? Yes means we can proceed without a test - or perhaps even it can be removed as a duplicate of question 7?

The first part of question 6 also suffers from a similar problem, and I think "with" should be "without" in question 7.

Suggested rewording where, to me, primarily "yes" answers point towards not requiring test coverage:

  1. Is the fix is easy to verify by manual testing?
  2. Is the fix in self-contained/@internal code where we expect minimal interaction with contrib? Examples are plugins, controllers etc.
  3. Is the fix achieved without adding new, untested, code paths?
  4. If this fix is committed without test coverage but then later regresses, is the impact likely to be minimal or at least no worse than leaving the bug unfixed?
  5. Is it easy for someone who did not work on the original bug report to add the test coverage in a followup issue?
  6. If the issue exposes a general lack of test coverage for the specific subsystem, is it better to add generic test coverage for that subsystem in a separate issue?

I moved the final regression question higher as to me that is more important than the latter two questions.

I also agree with #104 in that we could use "Needs no tests" when a decision has been made after answering the questions, or "Needs tests" if we decide the opposite, and also in conjunction with "Needs followup" if we decide to push test coverage to a separate issue (which should be opened and the tag removed before the bug fix is committed).

catch’s picture

What I was thinking of when I wrote 'is a specific regression test needed' is we have in some cases written one-off tests that are very specifically written for 'is this bug that we used to have not there any more', and are not 'does this system work as it's intended to', and that over time makes the overall test suite harder to maintain, and reduces our test vs. coverage ratio (i.e. we've have several cases of multiple tests testing more or less exactly the same thing, added years apart in different issues). But I think the rewording in #105 is plenty here and covers that situation well enough.

smustgrave’s picture

Also want to add an issue type to the discussion. What about php warnings? Usually I push back on those tickets because someone just puts an isset() or is empty() to mask an issue. But if that’s the fix does it need a test?

Also agree on most of the suggestions here but would request some post be made in slack #contribute, #bugsmash, and #needs-review-queue-initiative when a final decision is made

catch’s picture

Per #106 here's an example of what would be a pure regression test if added: #3416184: Content Moderation should only validate its own workflow type - we'd only be asserting that something that used to happen doesn't happen any more, probably with quite a lot of test code involved to do that, but the actual code flow is already tested so it wouldn't add actual additional code coverage.

quietone’s picture

Issue summary: View changes
Issue tags: -Needs release manager review

I have updated the Issue Summary with the suggestions in #105. Thank you, @longwave, for rewriting the questions.

There is support in the later comments for a 'Needs no tests' tag that was suggested in #104.

And I think a follow up is needed to document when an update path test is needed? See #93, #100.

I also agree with this change so I will remove the 'needs release manager' tag.

quietone’s picture

Issue summary: View changes
mondrake’s picture

Status: Reviewed & tested by the community » Needs work

From the IS

This can be shown by uploading a patch with only the test changes, which Drupal.org's automated testing system should show as a fail, alongside a patch containing both the tests and the bug fix itself.

This was the flow before introduction of MRs, that is now going away. Needs to be adjusted to current reality.

catch’s picture

Issue summary: View changes
Status: Needs work » Reviewed & tested by the community

That text was pre-existing and not changed by the proposal here, but yeah we might as well update it while we're doing this. Changed to:

This can be shown by running the 'tests only' job in the gitlab merge request pipeline interface.

nicxvan’s picture

Issue summary: View changes

I updated the final bullet point with -> without, I think that was the intention.

nicxvan’s picture

I have been thinking about this all day as well, I think while having a patch sit stale for a long time (particularly waiting for tests) can be disheartening, especially for the user that created it.

I think the practical reality for most project is that as long as people know about patching and follow best practices with build processes having a patch that fixes the bug can be nearly as good as having it committed. (There are obvious exceptions to this, but I'm specifically talking about the simple fixes to logic or typos that are mentioned in this issue).

I think in general having a patch only get committed once a test is there to ensure no future regression is essential to the stability.

I think having regressions are more damaging than having to re roll patches every so often.
I do think that finding a way to increase velocity is also essential for these types of fixes.

pameeela’s picture

Issue summary: View changes

Updated IS to note that the 'Case study 2' issue was eventually closed as a duplicate because the issue was fixed in another ticket, committed without tests :|

Certainly drives home the point a bit.

quietone’s picture

Issue summary: View changes

@nicxvan, thanks for the feedback. Yes, there is risk with this change but there are benefits. We can trial this change for about a year or so and adjust as needed.

There is support for this and I have begun to work on the remaining task. I updated the core gates and then notices this is tagged for framework manager review. Oops. I will ping the framework managers.

alexpott’s picture

I think the case studies outline why this is a good idea. And I think it is worth being as pragmatic as possible when committing changes. This is a framework manager +1 for the documentation updates. We will definitely make some mistakes due to this - but we make mistakes already and I don't think this will make it worse and has a good chance of making some simple things better.

One interesting case is #3370946: Page title should contextualize the local navigation... that issue got caught out by #2862702: PrepareModulesEntityUninstallForm::formTitle does not exist I think that with this policy we might have committed a 1 line fix for #2862702: PrepareModulesEntityUninstallForm::formTitle does not exist which means that #3370946: Page title should contextualize the local navigation would not have caused a break adn then might not have ended up so robust. I'm not sure what this tale tells us BUT the fact that we've had a callback point to non-existant code for years and fixing has proved hard should tell us something (maybe that Drupal's titling system is very complex).

quietone’s picture

Issue tags: +needs no tests

Making tag

quietone’s picture

Issue summary: View changes

Adda this description for the tag:

Indicates that tests are not needed according to Upload a test case that fails of the testing core gate. Intended to be used sparingly.

quietone’s picture

Status: Reviewed & tested by the community » Fixed

The next steps here was discussed in committer slack a few days ago. One committer expressed a desire to give this a closer review but that was well over a month ago. So, to close this we agreed to open a new issue to review this change in 6 months. That issue is #3442384: [policy review] Testing core gate, heuristics section.

Therefor I am changing this to fixed.

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.

quietone’s picture

Version: 11.x-dev » 10.3.x-dev

Changing to latest version when this was closed.