Problem/Motivation

Phpunit has some functionality for raising flags for tests that might cause problems. "Risky" tests. This was previously strict mode which cased some problems but the more recent versions of phpunit provide more fine grained control. Such flags can help identify problematic tests before they cause problems.

Proposed resolution

Enable some sane strict flags but don't enforce any failures yet.

Remaining tasks

none.

User interface changes

none.

API changes

none.

Beta phase evaluation

Reference: https://www.drupal.org/core/beta-changes
Issue category Bug because we decide to be strict but we are not.
Issue priority Major because this was a regression and we decided to be strict
Unfrozen changes Tests
Disruption Should be limited. Most phpunit tests do a good job already and we don't seem to be adding risky tests.

Comments

dawehner’s picture

Status: Active » Needs review
FileSize
434 bytes
PASSED: [[SimpleTest]]: [MySQL] 58,397 pass(es). View

Removed the strict tag as this the only way to allow debugging of that stuff.

pwolanin’s picture

It seems that phpunit allows you to add --strict on the command line, but no way to suppress a strict flag in the config. So, either we should do this, or anyone who wants to debug has to copy and edit the config file (or hack it temporarily).

dawehner’s picture

So, either we should do this, or anyone who wants to debug has to copy and edit the config file (or hack it temporarily).

I am sorry but this is the worst idea ever.

tim.plunkett’s picture

Status: Needs review » Reviewed & tested by the community

This is a bummer, but the correct thing to do.

After this is committed, can it be set back to postponed? We can fix this for real once https://github.com/sebastianbergmann/phpunit/issues/914 happens.

dawehner’s picture

Well, Drupal 8 probably can't use phpunit 3.8 as it requires php 5.4

chx’s picture

Component: simpletest.module » other
Mile23’s picture

OK, so how about have 'needs to pass --strict' as a coding standard, set the testbots to use --strict, and remove it from phpunit.xml.dist?

#2057905: [policy, no patch] Discuss the standards for phpunit based tests

webchick’s picture

Status: Reviewed & tested by the community » Postponed

This bug report made me laugh out loud. Thanks for that. :)

Committed and pushed to 8.x. Thanks! Marking postponed, per #4.

I didn't quite follow #7 though? So we can restore this functionality in a different way?

tim.plunkett’s picture

Status: Postponed » Needs review
FileSize
459 bytes
PASSED: [[SimpleTest]]: [MySQL] 58,810 pass(es). View
1.05 KB
PASSED: [[SimpleTest]]: [MySQL] 58,413 pass(es). View

Actually, what about this?
Then testbot will run them as strict.

(The fail patch reverts one of the fixes to make it strict).

dawehner’s picture

Well, it is a problem if tests in the testbot behave different then on local systems.

Mile23’s picture

Yah, it's better to have consistency.

So: Farewell, --strict. It was nice knowing you. But maybe still have a PHPUnit coding standard about always using an assertion, for clarity, and also because you can't make a fail message without it.

tim.plunkett’s picture

Status: Needs review » Postponed

Okay, back to postponed, in case PHPUnit itself is ever fixed.

tim.plunkett’s picture

PHPUnit fixed this https://github.com/sebastianbergmann/phpunit/commit/9101f51

But that isn't in stable yet and will be only in 3.4...

Mile23’s picture

Issue summary: View changes
Status: Postponed » Needs work

Drupal's composer.json has us up to PHPUnit 4.1.*.

Is strict a problem or not at this point, for debugging?

Could other more granular settings be used? As per: http://phpunit.de/manual/4.1/en/strict-mode.html

Over in #2298667: Code review for report generation we have some degree of contention on this issue, and really, that's not where we should be deciding it, since this issue exists.

sun’s picture

Component: other » phpunit

PHPUnit 4's new strict mode applies different code coverage rules than before, which is not really documented yet. Sebastian Bergmann's recent speaker slide-deck mentions it, but doesn't explain it. This blog post attempts an explanation; short version:

Unless every function called by the tested code is specified in @covers, a test will be marked as "risky". — That's a very explicit level of code coverage reporting, which may be suitable for 100% stable code that isn't supposed to be changed anymore, but for Drupal, that's too much.

I recommend to use the following, granular sub-options instead, to enable all aspects of --strict but explicitly disable the new "risky" behavior:

  beStrictAboutTestsThatDoNotTestAnything="true"
  beStrictAboutOutputDuringTests="true"
  beStrictAboutTestSize="true"
  checkForUnintentionallyCoveredCode="false"
  forceCoversAnnotation="true"  [optional]
Mile23’s picture

@sun: Unless every function called by the tested code is specified in @covers, a test will be marked as "risky". — That's a very explicit level of code coverage reporting, which may be suitable for 100% stable code that isn't supposed to be changed anymore, but for Drupal, that's too much.

Actually, it's not hard to fix those things, and what we're talking about here is which kind of errors should fail in the testbot.

If we're past the days when strict kills the debug process, then let's start catching errors in tests during CI, instead of after the fact like this: https://www.drupal.org/comment/8980283

Mile23’s picture

Here are the strict rules for PHPUnit: https://phpunit.de/manual/4.1/en/strict-mode.html

Here are the rules for coverage: https://phpunit.de/manual/4.1/en/appendixes.configuration.html

I would argue that checkForUnintentionallyCoveredCode should be TRUE, rather than false as @sun suggests, because it only really affects coverage reports and that only seems to interest metrics nerds like myself.

But the rest will go a long way towards helping maintain the quality of the unit tests through the testbot, so that we can at least run a coverage report to figure out what to test next.

The main problem is that 'risky' doesn't equal 'fail,' which means no one will discover the errors of their ways.

neclimdul’s picture

Issue summary: View changes
Status: Needs work » Needs review
Related issues: +#2542486: Fix risky phpunit tests
FileSize
706 bytes
PASSED: [[SimpleTest]]: [PHP 5.5 MySQL] 99,741 pass(es). View

Oh hey, look at this guy still sitting here. #2542486: Fix risky phpunit tests to cleanup "risky" tests this didn't catch since it never got finished.

And a patch. I added beStrictAboutChangesToGlobalState because it should be easy enough to be good about that now and checkForUnintentionallyCoveredCode true because i agree with Mile23

dawehner’s picture

  1. +++ b/core/phpunit.xml.dist
    @@ -1,6 +1,12 @@
    +         beStrictAboutTestsThatDoNotTestAnything="true"
    

    I really don't like that one. If you have a void function you have a void function, don't you?

  2. +++ b/core/phpunit.xml.dist
    @@ -1,6 +1,12 @@
    +         beStrictAboutOutputDuringTests="true"
    

    +1 about that

neclimdul’s picture

How would you have a void function for a test? The only one that sort of did this in our current suite was SpecialAttributesRouteSubscriberTest::testOnRouteBuildingValidVariables() because it asserts that an event handler doesn't throw a PHP warning when called with some valid values. There isn't an explict method for this in phpUnit but because phpUnit throws an exception in its error handler you can just assert that code is run after the method call. Everything else was basically doing something like:

if (is_null($value)) {
  $this->fail($msg);
}

instead of

$this->assertNotNull($value, $msg);
neclimdul’s picture

Note: None of these changes affects testbot because as Mile23 noted,

The main problem is that 'risky' doesn't equal 'fail,' which means no one will discover the errors of their ways.

To do that we'd need probably have update the test runner.

dawehner’s picture

Well, let's assume your method returns NULL always but you want to test the interaction with some storage backend. Testing a ::set() method is a common example for that.

neclimdul’s picture

Mock commands are assertions so mock the backend. I assume we already test a lot of set methods?

For things that don't use mocks generally there is either a connected get method you use to confirm set worked which you can assert or there is some behavior you have to test which implies an assertion on that behavior.

dawehner’s picture

Mock commands are assertions so mock the backend.

OH I wasn't aware of that ...

neclimdul’s picture

Yeah, there are likely many current tests that are missing ->assertX methods but are covered for this by their mocks. This is a feature of phpUnit and by design specifically for this sort of case.

Mile23’s picture

Yes, setting an expectation, like:

$mock_thingie->expects($this->once())
  ->method('foo');

..counts as an assertion.

If you say $mock->expects($this->any()); I'm pretty sure it doesn't count as an assertion.

Also, if you have a void function you can say assertNull(), which asserts that the function is still void as expected.

Review:

+++ b/core/phpunit.xml.dist
@@ -1,6 +1,12 @@
+         checkForUnintentionallyCoveredCode="true"
+         forceCoversAnnotation="true">

I'm generally of the opinion that stricter=better, but have you tried to create a coverage report with these settings?

./vendor/bin/phpunit --coverage-html ../../coverage is an easy way.

We'd probably need a follow-up to fix all the @covers errors. There are a lot. :-)

neclimdul’s picture

coverage worked. checkForUnintentionallyCoveredCode I actually check in my on CI so I would have caught that previously. I've made a couple patches on and off. ;)

And yeah... forceCoversAnnotation doesn't make sense for what they actually does:

Code Coverage will only be recorded for tests that use the @covers annotation documented in the section called “@covers”.

neclimdul’s picture

forceCoversAnnotation is sort of a double edged sword btw. Because we don't really use coverage a whole lot as a community the report is sort of in a weird place and I'm ok with this going either way.

On one hand, it can be really helpful to have it turned off and see what code should have had @covers entries. The downside being you get giant reports of every thing that didn't have a coverage tag and used a service. This is sometimes so large you have to inspect the page HTML to see all the entries.

On the other hand it is really handy to see only the tests explicitly covering the code you are looking at. Its terse and clear. The downside is that we aren't good about marking coverage so lots of code looks to have low coverage. @see previous hand for figuring out what's actually covering code.

Mile23’s picture

Status: Needs review » Needs work

If we'd been disciplined about it from the start, we'd want forceCoversAnnotation, but since we're haven't, we don't. :-)

It can be useful to look at a coverage report and see which tests touch specific lines of code, even if it's because they have lazy multiple @covers, or don't have @uses.

Really that's the only way to see such things, since 'risky' from checkForUnintentionallyCoveredCode isn't a fail. And we can't fix them if we don't see them. In fact, it's the only way to decide if it's important.

So: No forceCoversAnnotation but yes checkForUnintentionallyCoveredCode.

dawehner’s picture

Please keep in mind that with BrowserTestBase and KernelTestBaseTNG you will run more than one class, this is for sure.

Mile23’s picture

You mean functional tests, right? Yes. Different from unit tests. Functional tests should be marked as @coversNothing, because you're not testing behavior at the method or line-of-code level, you're testing behavior at a systems-interaction level.

neclimdul’s picture

Status: Needs work » Needs review
FileSize
667 bytes
PASSED: [[SimpleTest]]: [PHP 5.5 MySQL] 111,373 pass(es). View
546 bytes
Mile23’s picture

Status: Needs review » Reviewed & tested by the community

Ossum.

Turn on the strict stuff, please.

alexpott’s picture

Status: Reviewed & tested by the community » Needs review

Are we sure that this is not going to create random fails depending on machine performance. Basically I'm asking are all our tests sized correctly?

neclimdul’s picture

I don't have a good way to confirm that it wont but I'm fairly confident. The only tests at this point that even have @medium have it because of sleep calls. I don't know about going forward but the thing we should probably keep an eye out for causing random failures would tests doing lots of file operations. Scanning all of core or something.

I will say that I added the --enforce-time-limit option to my coverage reports. It runs in a vm sitting on a 6 year old vm server with a bunch of other development vms running and with xdebug and xdebug's coverage engine running it takes a moderate amount of time. It has not had any problem with the current sizing of tests.

alexpott’s picture

Status: Needs review » Needs work
Issue tags: +Needs issue summary update

So we should have had a new issue after #8 because the rtbc patch is doing exactly the opposite of the issue title and summary. However that's all history.

The issue summary needs updating (and should record the potted history) and we need a new title.

neclimdul’s picture

Title: phpunit strict mode basically kills debugging as you just have 1 second » Add some sane strictness to phpunit tests to catch risky tests
Issue summary: View changes
Status: Needs work » Needs review

dawehner’s picture

Issue summary: View changes
  1. +++ b/core/phpunit.xml.dist
    @@ -1,6 +1,11 @@
    +         beStrictAboutTestsThatDoNotTestAnything="true"
    

    Doesn't that requires some pointless assertNull(someVoidFunction());

  2. +++ b/core/phpunit.xml.dist
    @@ -1,6 +1,11 @@
    +         beStrictAboutTestSize="true"
    

    So this enables a time limit?

  3. +++ b/core/phpunit.xml.dist
    @@ -1,6 +1,11 @@
    +         beStrictAboutChangesToGlobalState="true"
    

    +1 if you can explicit opt out.

neclimdul’s picture

FileSize
628 bytes
PASSED: [[SimpleTest]]: [PHP 5.5 MySQL] 112,530 pass(es). View
474 bytes
  1. If so you should probably delete that test method because its not really testing anything. Mocks are assertions, expected exceptions are assertions, assertions are assertions, its generally pretty easy to assert something. These false assertions really happen is a test is literally testing "method runs without dieing." This is better covered by testing what the method is suppose to do otherwise there isn't actually any proof that the method did the thing that might have died.
  2. Yes but... no... &*%$. I went to confirm this marked things risky now but doesn't work. Not documented anywhere but hidden in the code it only works if you include the phpunit/php-invoker library which we currently dont. Removed it for now.
  3. 3) I don't think you can. I don't know why we'd want to allow that, if you add a global, cleanup up the global. We've already fixed the core tests that had been doing this.

An important point that may address all of your points though, all of these flags now only mark the test as risky. This is mostly just additional information helpful to developers manually testing and maybe someday in drupalci.

dawehner’s picture

If so you should probably delete that test method because its not really testing anything. Mocks are assertions, expected exceptions are assertions, assertions are assertions, its generally pretty easy to assert something. These false assertions really happen is a test is literally testing "method runs without dieing." This is better covered by testing what the method is suppose to do otherwise there isn't actually any proof that the method did the thing that might have died.

Ah I see, so in those cases it should at least have a $this->assertNull($return);.

3) I don't think you can. I don't know why we'd want to allow that, if you add a global, cleanup up the global. We've already fixed the core tests that had been doing this.

Good point!

+++ b/core/phpunit.xml.dist
@@ -1,6 +1,10 @@
+         checkForUnintentionallyCoveredCode="true">

I'm a bit confused about this particular statement ... won't this cause issues all over the place with our kernel tests? Did you actually ran all tests having xdebug enabled for code coverage?

neclimdul’s picture

Lost response...

I'm a bit confused about this particular statement ... won't this cause issues all over the place with our kernel tests? Did you actually ran all tests having xdebug enabled for code coverage?

Yes, run it on every commit. Yes, most of our tests treat everything as coverage of everything it touches. Only a couple people writing tests are very strict about their covers flags and there are _lots_ of old tests without them so this flags quite a bit even in our unit tests. However, it only does so when running coverage reports. Even with xdebug enabled, unless you're running one of the coverage options, its completely ignored.

neclimdul’s picture

Status: Needs review » Needs work

Wow, so i run all the same options as command line arguments but somehow that's not behaving the same and there is actually an error in HEAD that popped up last month. Working on figuring out what's going on.

dawehner’s picture

@neclimdul
So do you think we would one day add code coverage reports on the testbot?

neclimdul’s picture

neclimdul’s picture

@dawehner I think it might be useful but it would be an incremental thing. We're not treating anything Risky as a failure yet so even then it would just have a bunch of R's in the runner.

Here's what things look like now.

.RRRRRRRRRRRRRRRRRRRR.RRRRRRRRRRRRRRRRRRRRR..................   61 / 8819 (  0%)
..................RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR  122 / 8819 (  1%)
RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR  183 / 8819 (  2%)

RRRRRRRRRRRRRR.RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR  244 / 8819 (  2%)

RR...RRRR...........................RRRRRRRR..............RRR  305 / 8819 (  3%)

RRR.R.RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR..RRRRRR  366 / 8819 (  4%)

RRRRRRRRRRRRRRRRRRRRRRRRRRR.....RRR..........................  427 / 8819 (  4%)
Mile23’s picture

I added the patch in #40 and did ./vendor/bin/phpunit -c core -v, but I don't see Rs in those places. I got this (plus the verbose info):

FAILURES!
Tests: 8966, Assertions: 17478, Errors: 1, Skipped: 146, Risky: 6.

The error was SimpletestPhpunitRunCommandTest, reported above.

Risky:

There were 6 risky tests:

1) Drupal\Tests\ComposerIntegrationTest::testAllModulesReplaced
This test did not perform any assertions

2) Drupal\Tests\Core\Entity\EntityManagerTest::testOnFieldDefinitionUpdate
This test did not perform any assertions

3) Drupal\Tests\Core\Update\UpdateRegistryTest::testRegisterInvokedUpdatesWithoutExistingUpdates
This test did not perform any assertions

4) Drupal\Tests\Core\Update\UpdateRegistryTest::testRegisterInvokedUpdatesWithMultiple
This test did not perform any assertions

5) Drupal\Tests\Core\Update\UpdateRegistryTest::testRegisterInvokedUpdatesWithExistingUpdates
This test did not perform any assertions

6) Drupal\Tests\Core\Update\UpdateRegistryTest::testFilterOutInvokedUpdatesByModule
This test did not perform any assertions

The skipped tests dealt with SIMPLETEST_DB because I forgot to supply that. :-)

Mile23’s picture

Re: #41:

Ah I see, so in those cases it should at least have a $this->assertNull($return);.

If the test is whether something happened without throwing an exception, then that's kind of reasonable. But it also might leave us with a false pass, which is worse than a strict fail.

neclimdul’s picture

neclimdul’s picture

neclimdul’s picture

Status: Needs work » Needs review
FileSize
1.19 KB
591 bytes

Patch rerolled with fix for #2597510: Simpletest unit test is failing to assert success case correctly. seems like since this patch is changing the behavior we can go ahead and fix it here I think.

neclimdul’s picture

I'm going to backtrack on my previous opinion only slightly. checkForUnintentionallyCoveredCode is going to need to be false false for now. But only because it causes leaky problems with our current UnitTestCase implementation. As it is now, we need to add

 * @uses \Drupal
 * @uses \Drupal\Component\FileCache\FileCacheFactory

to ever single unit test to make it work because methods on these classes are called in the setUp method and will be treated as unintentionally covered code. That's not reasonable so adding a follow up to sort this out.

neclimdul’s picture

derp, didn't finish the todo comment.

Mile23’s picture

Status: Needs review » Reviewed & tested by the community

we need to add [@uses] to every single unit test to make it work because methods on these classes are called in the setUp method and will be treated as unintentionally covered code. That's not reasonable so adding a follow up to sort this out.

Actually, it would be cool to avoid anything in a base test class that requires setUp(). We don't want to carry dependencies with us that we don't need. IIRC many or our tests are based on \PHPUnit_Framework_TestCase for this reason, and the testing system can deal with that.

But yah, definitely in a follow-up.

Also, I approve.

alexpott’s picture

Status: Reviewed & tested by the community » Fixed
Issue tags: -Needs issue summary update

This is a sensible improvement to PHPUnit testing. Committed adfca62 and pushed to 8.0.x and 8.1.x. Thanks!

  • alexpott committed e746bd9 on 8.1.x
    Issue #2105583 by neclimdul, tim.plunkett, dawehner, Mile23: Add some...

  • alexpott committed adfca62 on
    Issue #2105583 by neclimdul, tim.plunkett, dawehner, Mile23: Add some...
klausi’s picture

We have now some risky tests appearing in Rules and PHPunit tells us that the global variable 'loader' is not the same after the test run for BanIpTest. Executed with:

../vendor/bin/phpunit ../modules/rules/tests/src/Integration/Action/BanIPTest.php --verbose

Any hints where and how a global 'loader' variable is manipulated in Drupal phpunit tests appreciated!

chx’s picture

core/tests/bootstrap.php has $loader = require __DIR__ . '/../../autoload.php'; in the global scope. Later core/tests/Drupal/KernelTests/KernelTestBase.php utilizes it $this->classLoader = $GLOBALS['loader'];.

neclimdul’s picture

this isn't using KTB but something similar sounds likely.

chx’s picture

The bootstrap is not KTB specific core/phpunit.xml.dist:<phpunit bootstrap="tests/bootstrap.php" colors="true"

neclimdul’s picture

So what wasn't clear to me was when it got modified. But, looking at the error and how the error is reported I think has a clue.

-    'loader' => Composer\Autoload\ClassLoader Object &000000006e1802190000000003d296ee (
+    'loader' => Composer\Autoload\ClassLoader Object &000000006e18032d0000000003d296ee (

The differing string is a hash of the object. So something inside the class loader is changing. Chx asked a important question of how this ever passed because the class loader should always be changing but I'm guessing a lot of class loading happens in some hidden step and everything just "worked." But, somehow in this contrib test something is different my guess is it has to do with traits because the only thing in the test is a call to a method on a trait but I'm not 100% on that.

Either way, having the class loader stored in a global variable is problematic for asserting global state. My preference would be to follow up and put the loader logic in a function and take it out of the global scope. I've tested this and it solves the problem.

Mile23’s picture

BanIPTest extends RulesIntegrationTestBase, which does its own stuff with namespaces, which is kind of interesting, in that it might be testing against classes which can't be resolved in a specific subset of namespaces.

But I'd wager that at some point this isolated namespace gets sent off to be global in some way.

Or maybe not.

run-tests.sh does the same trick as well. It saves the results of the autoload scan in a global so that it can be re-used if it's still there.

We could modify tests/bootstrap.php and rename the global to $test_runner_autoloader_dont_overload_this_absurdly_long_name or something...

In an ideal world, everything would be properly injected.

neclimdul’s picture

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.