Problem/Motivation
Core can now specify a drupalci.yml file to control the testbot builds
Proposed resolution
Start with a drupalci.yml file that allows us to leverage some of the new features, primarily a fail fast mechanism to stop 1 hour tests if the unit tests are known failures.
Then we can add additional commands to do things like upgrade phpunit etc.
Remaining tasks
User interface changes
API changes
Data model changes
| Comment | File | Size | Author |
|---|---|---|---|
| #41 | interdiff_41.txt | 857 bytes | Mixologic |
| #41 | 2951843_41.patch | 2.28 KB | Mixologic |
| #34 | interdiff.txt | 566 bytes | mile23 |
| #34 | 2951843_34.patch | 2.28 KB | mile23 |
| #24 | 2951843_21.patch | 2.28 KB | mile23 |
Comments
Comment #2
mile23Adds core/drupalci.yml
Comment #3
mile23A more realistic use-case. See #2932606-22: Use PHPUnit 6 for PHP 7.0 / 7.1 testing
Comment #4
mile23Adds Composer dependency bounds checking.
Eventually, project maintainers will be able to specify which type of CI build should do this kind of thing, so this isn't a great example. (It runs many of the same tests three times.)
Comment #5
dawehnergg
I'm curious whether we could run the unit tests first and individual, then then kernel tests and so on and so forth.
Comment #6
mile23Indeed we can. :-)
Buy @mixologic a beer.
Comment #7
mile23Well, still buy him a beer. But that's the idea: Arbitrary number of run_tests because that's what you need.
Filed #2959930: run_tests doesn't seem to want to run more than twice
Comment #8
Mixologicdang. I did that thing again where I loaded this issue two hours ago, saw that the run_tests.sh doesnt want to run three times due to the fact that it keeps doing 'ln -s ' . $sourcedir . ' ' . $sourcedir . '/subdirectory', without checking whether its there first, so first time, it makes /var/www/html/subdirectory, the second time it makes /var/www/html/subdirectory/html, and the third time it fails as things already exist.
I'll fix this now.
Comment #9
Mixologicokay, well, we can see a that theres some things. that phpunit upgrade thing might get in the way of min max testing. anyhow, 3rd times a charm run_tests plugin is now no longer a blocker.
Comment #12
dawehnerI'm curious whether we should document why we are doing this.
Comment #13
MixologicWe should keep this issue on scope, and not introduce any min/max testing here, as thats not how we plan on doing it (i.e. not going to run run_tests on every patch with min/max deps etc.)
This patch contains the new halt-on-fail settings that allows us to fail fast, along with some documentation.
Comment #14
mile23Console output for #13 says:
Comment #15
Mixologicyep. I forgot where they go and put it in / instead of /core .. resubmitting.
Comment #16
mile23Well, the scope is pretty vague....
Comment #17
xjmThis is a big deal.
Comment #18
dwwThis sounds like a great step forward.
Mostly the new file looks great.
1 tiny nitpick, and 1 question (and 2 meta comments)
A) Nit:
Missing period. ;)
B) Question:
I'm way out of the loop on our bots and testing infrastructure, so maybe this is a dumb question, but WTF is that? ;) All the other sections under
testing:are quite self-documenting, even for someone who's not closely following. But then there's this step, at the very end, that uses a totally different syntax that seems to declare there's nothing to do in that step? Or maybe there's no input to declare how this step works but the bot instinctively knows what to do? Perhaps a comment is in order here?C) Meta: Could there be a comment at the top of the file that briefly explains what this file is and ideally points to the docs (?) on what goes in it, expected syntax, possible options, etc? I love the comments at the start of
testing:to explain those specific options and why we're setting them. I'm talking about general docs on the overall structure of the file, options, etc. Maybe such a doc doesn't exist. Still, a tiny bit of context at the top of the file might help people who are wondering but can't put it all together from the filename alone.Totally out of scope from this issue, but curious:
D) Other meta: it's interesting that everything is nested under
build:. I tend to think of build, then test, as separate steps. Build == get everything together you need to run/test/release whatever. Then you can test it as phase N+1. But here everything is declared as sub-steps of build. Ditto "assessment". Can anything happen afterbuild:? If not, why does it have to be a parent item at all? If this file always has that as a single top-level item, with nothing defined for itself, and simply as a container to hold sub-processes (sorry if that's not the terminology in use), why is it there in the first place?Thanks!
-Derek
Comment #19
mile23@dww: There's some helpful documentation about how this all works: https://www.drupal.org/drupalorg/docs/drupal-ci/customizing-drupalci-tes... and https://www.drupal.org/drupalorg/docs/drupal-ci/configurable-drupalci-bu...
In testbot-world, the thing the testbot does is called a 'build.'
The whole thing is also nested under 'assessment' which is the stage of the build that does linting and coding standards type stuff (validate_codebase), as well as actually running tests (testing).
The assessment stage from the drupalci.yml file replaces the one in the existing default build, which you can see here: http://cgit.drupalcode.org/drupalci_testbot/tree/build_definitions/devel...
The top-level keys are there because there might very well be more options forthcoming. :-)
Comment #20
dawehnerIt is weird that we have to configure that.
This is confusing. Why we are not running the one shipped in core?
Comment #21
mile23Addressing #18 and #20.
It defaults to false, but it's good to be explicit, and also lets people know that you can change it.
Comment #22
dawehnerThank you @Mile23
This looks good to me. I like that you added more documentation explaining things better.
Comment #23
dwwSame here. I'm happy with the current comments and links to docs. +1 towards RTBC from me. However, we should at least wait for the current build to come back green since there were some functional changes since the last successful test run, not just comments.
Maybe it'd be worth seeing it in action? Can we add a patch here that adds the file, but also intentionally breaks something in a unit test, and verify the build bailed early and completed quickly?
Thanks!
-Derek
Comment #24
mile23Testing each type, with a re-upload of the patch from #21.
Comment #26
mile23Unit test:
Comment #28
dwwSweet, nice! Thanks for doing that.
One thing I notice is that PIFT is somehow getting confused by these new kinds of test results:
Note the 2nd class name is truncated. So, the styles don't kick in and mark the test results red.
Compare with the test results from comment #6:
That's clearly a bug in something other than this particular issue, but we might as well resolve that before we commit this...
Comment #29
mile23Yah, I saw that, too. Also there's a tweak or two needed for the composer cache, since run-tests.sh now updates phpunit under some circumstances.
Added: #2960518: run_tests can't fail a build, only error
Comment #30
Mixologic@Mile23 fixed the drupalci issue, and I reviewed/merged/tested/deployed it and now it will send proper error codes to jenkins to cause pift to find the failed test results.
I've requeued the tests in #24, and they are starting to come back as proper failures.
There isnt necessarily any indicator in the failure results that it failed fast on unit tests or where it stopped. It just shows less passing tests, and the correct number of fails. But Im of the opinion that will be okay, and we can always adjust PIFT to be more descriptive if need be.
Comment #33
quietone commentedSpotted a typo.
s/deprectaions/deprecations/
Comment #34
mile23Addresses typo in #33.
See #24 for test-only runs which prove the testbot fails early.
Comment #35
dwwLooks great. PIFT no longer seems confused, and the classes and styles are working on #24 as expected.
The only remaining (extremely minor) concern is why the difference between this:
and this:
? Both mean "run this thing that the bot knows about and don't send any input/arguments/info", right? If so, why use one syntax in one place and another in the other? Seems like we should always include the
{ }or never, but not a mix of both styles.Otherwise, I think this is RTBC.
Thanks!
-Derek
Comment #36
MixologicGood catch, thats another throwback to what yaml processors put out for "empty",
But its definitely easier to read without, and doesnt really add value, so lets leave them blank.
Comment #38
Mixologicpackagist race condition funtimes.
Comment #39
mile23Not really the parent issue.
Comment #40
berdirshould we combine those maybe, no real reason to have them separate?
If one phpunit functional test would fail then this would be faster, on the other hand, if done like this then if there is for example a single/few tests that run several minutes longer than the other ((InstallUninstall and so on), then we have to wait until that finishes until we can start with the legacy simpletest tests.
Also, the advantage of having it fail early on the phpunit functional tests will get slimmmer with each test we convert.
One downside of halt-on-fail is that we lose the ability to see how many tests are failing, being able to fix them all and upload a new patch. But I assume that halt-on-fail is still for the whole group.. if 20 functional tests are failing then I will see the results of all of them, it won't abort on the first failing test? It's just unfortunate if one kernel test is failing, you fix that and then you get to see the 20 functional tests that are failing.
Comment #41
MixologicI was thinking that perhaps we could reverse them around so that the legacy simpletest ones ran before the PHPUnit-Functional ones (the legacy ones only take 3 min).
It gives us one invocation of run-tests.sh per 'type' that we have defined, and it gives us a nice list of tests that need to still be converted from legacy to phpunit.
One thing we could do is not halt-on-fail for the simpletest type, that way it would run simpletest, and then phpunit-functional tests, and keep going if we accidentally broke a legacy test.
Another advantage of splitting up the simpletest from phpunit-functional test runs is that I can add a feature to the run-tests.sh plugin be able to leverage the junit.xml data from phpunit, only only resort to relying on the simpletest database for that portion of tests.
That would allow us to get skipped test data, as well as timing information because phpunit already has that data in the tests.
I've updated the patch to switch the positions of the commands, as well as change halt-on-fail to false for the simpletest legacy version.
Comment #42
mile23Working on letting run-tests.sh reflect skipped/incomplete here, so it doesn't have to live in the testbot: #2905007: Allow run-tests.sh to report skipped/incomplete PHPUnit tests
Comment #43
berdirThe update makes sense to me. I think this is ready unless someone sees a problem with it?
The only problem I see is that sometimes, you want to upload a patch to see how well it is actualy working on the real tests and it's a bit annoying then if it fails on some small unit test coverage problem. But on the plus side, you do get *that* information then very fast and you can change drupalci.yml in your patch if you want to see all the test results...
I also wanted to suggest to have a test only patch to show tha it does fail early as expected, but I see we already had that in #24.
Comment #44
alexpottI remember when we introduced PHPUnit there were some developers that were angry that it stopped on the first fail in a test. That didn't make sense to me. You knew what tests failing so the correct thing is to run the test locally till you get it to pass. The problem here is that some people view DrupalCI as their local runner because tests take so long.
However what is being proposed here is a bit different because after a test run you no longer know all the tests that are failing. I'm not sure that this is the correct behaviour. It'd be nice to see prior art in this area but this is not the default behaviour of PHPUnit or any runner I've used. Yes it can often be configured, for example
--stop-on-failurein PHPunit. But I think that this behaviour change is not in the spirit of the principle of least astonishment. If I submit a patch to drupal.org my expectation is that I'll be told what tests are failing and which are not. This change means I can fix all the errors Drupal.org is telling me about. Get it a patch "green" locally and then submit it only to discover that DrupalCI could have told me about more fails that I had little chance of discovering after the first test run.I'm +1 on introducing drupalci.yml and running tests in this order so people get quick feedback if they are watching the test runner but I'm -1 on stopping other test types just because a previous test type has failed.
Comment #45
MixologicThere is a difference in the way that this would fail fast vs the way that phpunit and/or run-tests.sh handle both
--stop-on-failureand--die-on-failBoth phpunit and run-tests.sh's handling stop running tests as soon as the first failure is discovered, whereas drupalci would only stop if it detects a failure in an entire group.
The one fail at a time strategy would be maddening indeed.
Ultimately theres two goals with this strategy, save on testing costs by not overtesting obviously broken patches, and improve developer experience by not having to wait as long to get feedback on a patch that needs work.
Given that there were about 7000 failed core patch tests in the last year, and only some fraction of those had unit test failures, that means we're not going to save a significant amount of money(~1500$/yr?),
So really it comes down to whether or not rapid feedback is actually an improvement over testing the entire patch.
Comment #46
dawehnerYeah I think this is a totally fine behaviour. One could communicate this a bit better, but I think it is good enough.
Yeah I doubt it is super helpful for unit tests, but I think it could be quite an impact on kernel tests.
Comment #47
alexpottI think that this approach will have the opposite effect. It will take longer, result in more tests because there will be more fails. Because people will not know all the fails if there are fails in the next type of test. If people want immediate feedback then right now they can watch the test run live so to speak. And if we wanted we could detect the fail earlier of DrupalCI and report back to the issue but continue running the tests to get all the info.
This is not true for a group of tests. If you tell them to run a group it tests the entire group. And each test method inside each test is guaranteed to run too. Yes when a fail is discovered inside a test method that method is terminated but not the entire test and not the any other test.
Comment #48
dawehnerI guess this depends on the the time ratio between kernel tests and functionality tests. If kernel tests costs like 20times less than functional tests, I think having the chance of not running functionality tests could be quite helpful.
Comment #49
alexpottDoesn't it more depend on how many patches are only broken in unit tests / kernel tests. Which has to be very low number. For all other cases this will cost time. And if you have a fail in a Functional and FunctionalJavascript this change is going to be very annoying.
Comment #50
alexpottAlso random fails become even more annoying with this change.
Comment #51
Anonymous (not verified) commentedWhat about the next rule:
And just save the UI for run all tests manually.
Perhaps also more strict rule:
If the patch not contains tests, so you need run it manually.
Comment #52
mile23We can't currently run only tests that have changed. But we also can't assume that only modified tests will tell us about regressions.
If we think about this in terms of the circumstances in which we run tests, there are a bunch: Patch, RTBC, Commit, Release.
For a patch we really want a lot of information. If we fail the build based on unit test fails only, then we lose some value because we don't get a lot of information back.
For RTBC patches, we want to fail early because there are a lot of them and they run (about) every day. So what we care most about is pass/fail, and reporting status, not a lot of info about why the patch failed. Developers would then run tests locally to do another patch. Though in practice they'd probably just restart the testbot for a full build.
For commit you might want to fail early because there might be a bunch of commits per day. You can shorten that cycle and do your reversion while you're still working on it, not the next day.
For a release, you might do even more tests than you do in the other build targets, like doing a composer update test or something, or triggering builds of demo sites to make sure they work. For these you would definitely want to fail early on large-scope black box tests, and then move on to unit tests at the end, since you've already passed the gateway of the commit test.
This is just to illustrate that we need the testbot build targets feature: #2951375: Allow for build targets #2949209: Phase I: Allow drupalci.yml to influence the build
Comment #53
MixologicThis looks like a classic "trying to do more than one thing in a patch" (adding a drupalci.yml, changing the policy on how we do testing).
So how about we trim the scope a little bit. Since we've already got a lot of discussion here about the pros/cons of the policy change, I opened a child issue #2969363: Add a drupalci.yml to core which introduces a drupalci.yml to core, and preserves the existing testing behavior.
That will allow us to accomplish having a drupalci.yml file in core that will give people an example of what they can do in their contrib projects, and make it easier for developers to control the drupalci in their own patches without having to resort to run-tests.sh hacking etc. (like running just one group of tests, or deliberately failing fast etc).
I'll postpone this on the child, but we can still continue discussing whether fail fast is a net positive or not.
Comment #54
lokapujyaWe should just have an advanced option to fast fail and also to #2969258: Modify run_tests task to allow contrib modules to specify a subdirectory. This would be used in some cases, for example: a test-only patch. The whole test suite doesn't need to run; It's not a regression test. It can help reduce the response time during sprints.
Comment #55
mile23You can include a drupalci.yml file in the test-only patch. You can currently only pass through to run-tests.sh's --types parameter (using the 'types' config) and groups using 'testgroups' like you see in the patches here.
The testbot doesn't have a way to specify directories or classes, even though run-tests.sh could let you. So I moved that issue to the testbot queue with a rescope: #2969258: Modify run_tests task to allow contrib modules to specify a subdirectory
Also, run-tests.sh (via TestDiscovery) can give a false-positive if you have it run tests by group: #2296615: Accurately support multiple @groups per test class so we have to fix that, too for this to work.
Comment #56
MixologicThe testbot does support directories, classes, and groups, see my comment here: #2969258-5: Modify run_tests task to allow contrib modules to specify a subdirectory
Comment #57
MixologicOkay, we now have a drupalci.yml file in core. Now we can discuss whether or not this would be a valuable, time saver and cost saver, or whether it would be a negative time suck.
We have our known knowns, which is our current status quo. What we dont really know, but can speculate about, is whether a fast failure is better or worse for development velocity, and for cost savings.
I think that most of our speculation may or may not be grounded, but the only way to actually validate our assumptions would be to trial failing fast for a short amount of time, and at the end of the trial decide whether we should switch back (unless, of course, there is vehement pushback before the trial is even over).
The alternative is that we decide we never want to fail fast, or we cant decide whether or not to trial it. Either way is the same, and the status quo, and requires no changes.
Comment #58
dawehnerI'm wondering something: How often do unit tests vs. kernel tests vs. functional tests fail?
How often do unit/kernel tests fail and functional tests fail as well?
@Mixologic Do we have these kind of numbers available?
Comment #59
MixologicWe've got that data, and by data I mean we have unbelievably large amounts of it. As in really, really hard to write a query against it that does anything other than filter it. We basically store every result on failing tests, but we've only got it as raw data, so figuring out which tests had failing unit tests that *also* had failing kernel tests is substantial.
Theres 936 million rows in the pift_ci_result table, so its kinda hard to do true aggregate analysis on it on the hardware that we have.
Perhaps if I dumped that table and pushed it through some aws processes to create a report we'd know what the answer is, frequency wise, but Im thinking the amount of effort to make that happen and get that data might be able to be skipped if we just tried it out.
Worst case scenaro is that some developers lose some velocity for however long it takes us to determine that its not better, and we revert it.
Comment #60
dawehnerOne major point I could so: By letting kernel/unit tests fail first (I think running both in one batch feels fine), we point people towards easy to debug failure first. This could save some people not super familiar with test results quite some time.
Comment #61
berdirWe might not have easily available data on which parts fail how often, but it is now very easy to see how long each part takes, from the last patch here:
unit tests: 7seconds
kernel tests: 2 minutes
web tests: 27 minutes
js tests: 20 minutes
I think we shouldn't underestimate the testbot time this could save.
One thing is javascript tests. With the current setup, they take almost about as much time as the other tests combined because they're not running concurrent. Maybe that will improve with nightwatch, we'll see, didn't look into that yet. But I think we can agree that we only need to run those tests if the others passed? Doing only that would already save 20min or so for each test run that has a fail.
Also, if kernel tests don't fail for most changes then it also doesn't hurt all those patches that don't. And if a patch breaks a kernel test then it's probably fairly low level and it actually makes sense to fix that first as it's fairly likely that a ton of web tests will fail too?
Unit tests on the other hand might not be worth to fail early. The thing is that updating mocks and so on is often a pretty tedious thing and I often postpone that until I have a sense of how well a patch is doing. So we could not fail early on those, at least not initially.
This is also not the same as --stop-on-failure, we'd still run all tests of a group, so you get all kernel test fails or all web test fails still.
Maybe we could make it somehow configurable( Similar to how we can select the branch to run tests with, allow to select fail early or not)?
Comment #62
MixologicRe: javascript tests - Been working on converting the tests to use chromedriver vs phantomjs so we can parallelize here:
https://www.drupal.org/project/drupal/issues/2942900
That drops the test times down to 5 min from 20.
The idea behind having a drupalci.yml file is that you can configure how drupalci works on a patch by patch basis by making changes to the file as part of your submitted patch, without us having to continuously to add more ui/ux options to the issue queue, as that becomes a lot more code for us to have to maintain.
Comment #73
quietone commentedThank you to everyone who worked on this improvement!
Closing as outdated because we are using gitlab for testing now.