Problem/Motivation

DrupalCI currently uses a number of work-arounds to signal itself, Jenkins, and D.O what's going on.

These different strategies are inconsistent and hard to maintain.

Proposed resolution

Based on #2843562-5: Patch plugin doesn't signal an application error

Some design requirements for DrupalCI exception handling and other signaling:

A D.O-level ‘CI Error’ should mean that the CI system broke, not that there was an testing or validation error during CI.

All drupalci code should strive to avoid ‘CI Error’ at D.O level. ‘CI Error’ comes from drupalci returning anything other than 0.

This means BuildTaskException should always be handled, and should always be handled consistently by Build::executeBuild(). Intermediaries can catch and process these exceptions, but then must re-throw them.

The Build object eventually turns a BuildTaskException into buildstate.json. (todo: rename buildstate, give BTE a way to generate its own json.)

buildstate.json is a single quoted string which can be read by D.O. It should contain a user-facing explanation of what happened. It could appear on D.O in the in-comment test run results, such as "Build Successful" or "CI Error".

Plugins should handle exceptions from code they call, for likely issues such as a container is missing. These should then be turned into a BuildTaskException which is then thrown to be handled by Build.

Remaining tasks

User interface changes

API changes

Data model changes

Comments

Mile23 created an issue. See original summary.

Mile23’s picture

Issue summary: View changes
Mile23’s picture

Note that Build::executeBuild() says this:

    catch (BuildTaskException $e) {
      $this->saveBuildState($e->getMessage());
      return 2;
    } finally {

That is, it always returns 2 for an exception.

Mixologic’s picture

That is, it always returns 2 for an exception.

Right, thats what we want.

This:

All drupalci code should strive to avoid ‘CI Error’ at D.O level. ‘CI Error’ comes from drupalci returning anything other than 0.

Reads funny.

It is only *currently* that way, because anything other than 0 causes jenkins to FAIL the build. Drupal.org does not *currently* know what to do with FAILURE.

So, We first have to give drupal.org something it can use when it sees FAILURE - i.e. buildstate.json needs to be populated with a short message describing the nature of the failure: Patch Failed to Apply, PHP syntax Error, Composer Require Error, etc etc.

Second thing is change drupal.org to not just default to CI ERROR when it sees FAILURE. It should see FAILURE and then parse buildstate.json and display that message Instead of "DrupalCI Error"

Then, once that is working, we can return to having drupalCI spit out proper signal responses. I..e it should always spit out a 2 if there is a BuildTaskException, and 1 if there were failed assessments, and 0 if everything is hunky dory. And probably something other than 0, 1, or 2 if a portal in the spacetime continuum opens up to unlighted chambers beyond time and space amidst the muffled, maddening beating of vile drums and the thin monotonous whine of accursed flutes, and Azathoth gets up off his throne and declares a segfault.

Mixologic’s picture

Re-reading it seems like you were already saying what I said. Carry on.

Mixologic’s picture

Assigned: Unassigned » Mixologic

Im gonna take a quick stab at this. It was actually listed on my sprint as a thing to do before next week anyhow.

  • Mixologic committed 342cddf on 2843592-build-results-messaging
    Issue #2843592: Refactors all instances of exception throwing in...
  • Mixologic committed aeca661 on 2843592-build-results-messaging
    Issue #2843592: Adds a BuildResult object, tweaks buildException to have...
Mixologic’s picture

Title: [plan] Normalize on signal and error reporting » Normalize on signal and error reporting
Assigned: Mixologic » Unassigned
Category: Plan » Task
Status: Active » Needs work

Okay, I havent actually ran any tests on this yet, but I did wanna get this WIP out there.

  1. BuildTaskBase now has a method called "Terminate Build" -> cleaner API for plugins so that you dont have to remember to do things like spit out an IO message as well as throw the exception.
  2. BuildTaskException now has a couple of properties, the label and the details, and overrides the constructor
  3. BuildResult / BuildResultInterface are a thing - as soon as I saw that 'label/details' were going to be array keys used by multiple things, I put on my OO hat and objectified them. Its responsible for serializing itself now too, and the 'saveBuildState' just creates an artifact out of it.
  4. Refactored out all of the usages of throwing the build exception to be terminateBuild. This is where I might have broken some tests as some output may have changed.

Some TODOS: calls to terminateBuild ought to be < 50 chars and make sense, and spit out good data in the exceptionDetails. for example, the Patch plugin should fix this: #181174: Included rejected patch file as text on results page by sticking whatever patch failure message we have into the details.

Im gonna go ahead and 'unplan' this and convert it to a task.

Mile23’s picture

Title: Normalize on signal and error reporting » [plan] Normalize on signal and error reporting
Category: Task » Plan
Status: Needs work » Active

I like this architectural change to drupalci, but I'm switching this back to a plan so we can be clear on what issues need to happen in other projects.

BTW the tests fail. I'd fix them but I still don't understand a plan of action, which is why this issue is a plan. :-)

OK, so we have three things, and this will largely restate what I think you're saying:

  1. drupalci says 0, 1, or 2 depending on what went wrong or what went right.
  2. Jenkins sees either 1 (a good result, just failing tests) or 2 (something went awry) and incorrectly tells D.O that it's the end of the world.
  3. D.O sees Jenkins say END_OF_WORLD, so it reports an error, even if it was just a fail. And there are presumably workarounds at the D.O layer so that it can report fails.

So if the constraint where Jenkins gives us the choice between 0 or ARMAGEDDON is hard (or at least not something we want to muck with), then we can do the following:

  1. drupalci says 0 for pass or fail, 1 (or 2+) for armageddon. We store pass or fail in buildresults.json like this: {"status":1, "reason":"Your tests failed."}
  2. Jenkins sees 0 and knows it's all OK, even though your tests failed.
  3. D.O sees buildresults.json, reads it in, tells the user.

This would mean that we should never return anything but 0 from drupalci if we can help it, but it also means we have to modify D.O to read in our status file.

This leads to two questions:

  1. Is there a follow-up to modify D.O to read the status file?
  2. If we fix drupalci to only ever return 0 then will fails still be marked as fail? That is, can we work on this and fix it and then deploy it before the D.O fix, without breaking CI?

  • Mixologic committed d1ba22f on 2843592-build-results-messaging
    Issue #2843592: Build Outcome is the new black.
    
Mixologic’s picture

Okay, I fixed the tests - some were due to things erroring a little harder than they did previously (like git cloning over the top of a replicated local checkout). I forgot to add in the Issue number when I pushed the commits so they are here:

jenkins has three relevant statusus, that are shouty for some reason, but I reshout them here:. PASS, UNSTABLE and FAILED. Since we are executing drupalci via a jenkins bash plugin, It assumes that 0 = success, and 1 or greater = FAILED. The only way we get to UNSTABLE (which means, drupalci ran okay, but some tests did not pass), is by returning 0, and also having some test xml that jenkins can interpret as meaning "mark unstable"

So the constraint jenkins gives us isnt hard, its just that we dont want to muck with it *yet*.

D.O is currently interpreting FAILED as ARMAGEDDON, and yes, I've discussed it with Neil, but haven't opened an issue, to have d.o. read in our status file. I wanted to have our status file contain valid info, be named the right thing, and have a format defined inside it. I.e. we need to button down what the API is going to be.
But basically instead of D.O seeing FAILED via the jenkins api and reporting END_OF_WORLD, we want it to see FAILED at the api and read in our buildoutcome.json, and tell the user why any plugin decided to terminate the build.

So to answer your second question: drupalci shouldnt be fixed to only ever return 0. It should eventually return 0 for everything passed, 1 for tests failed, and 2 for exceptions encountered. The only thing that we *currently* care about overriding to be 0 is applying patches as that is currently the only thing that *should* be a 2, but isnt so that d.o. can get its dummy xml file to communicate to the user that a patch failed to apply.

So the next steps are
1. Make sure that buildoutcome.json is populated with a sane message every time something craps out. (I just renamed it)
2. Once 1 is done, Open an issue with pift to start consuming that data when it detects a FAILURE.
3. Once 2 is done, we can reset all of the signals that drupalci emits back to their ideal state (0 - all things go,1 - assesments failed,2 - exception thrown).

Once all of that is finished we can think about creating a service for the buildResult, and allowing plugins to *also* insert summary news? It might be better for d.o. to not have to figure out what it should display in a little green/red/white pill. e.g. when we add codesniffer, we'll have our X tests passed, X tests failed pills. Do we add new, codesniffer pills? if so, what do they say? or do we just leave that up to d.o. to sort out and not have plugin's define that?

  • Mile23 committed 0436a1e on 2843592-build-results-messaging
    Issue #2843592: BuildResultsInterface extends \JsonSerializable, is now...
Mile23’s picture

0436a1e uses \JsonSerializable and turns BuildResultsInterface into an immutable, by removing setters, because no one should be changing result outputs.

Added a test to ensure that we have the keys we expect in BuildResults, because eventually that will become buildoutcome.json.

Still needs a test to ensure that buildoutcome.json was written and that it contains the keys we need. We'd probably add that to existing pass and fail tests.

Once all of that is finished we can think about creating a service for the buildResult, and allowing plugins to *also* insert summary news? It might be better for d.o. to not have to figure out what it should display in a little green/red/white pill. e.g. when we add codesniffer, we'll have our X tests passed, X tests failed pills. Do we add new, codesniffer pills? if so, what do they say? or do we just leave that up to d.o. to sort out and not have plugin's define that?

OK, so if the rule is that buildoutcome.json can only contain one status output, then D.O just says what it says. The user can then poke around for more info, and we're done. :-)

If we want to have a sort of warning status along with the pass/fail/explode, then we might add that. For instance, if coder fails, but doesn't fail the test run, then it could have its own pill. We'd want to structure that into buildoutcome.json, so it could look like this:

{
  'buildLabel': "Pass",
  "buildDescription": "Yup, you're good."
  "warnings": [
    "phpcs": "Code failed phpcs.",
    "eslint": "Bad javascript!"
  ]
}

Then D.O could pick that up and figure out what to do with it.

This would mean that if you send errors-dont-stop-build to a given plugin, then errors get demoted to warnings.

  • Mile23 committed b56007c on 2843592-build-results-messaging
    Issue #2843592: Added tests for buildoutcome.json to all Application...
Mile23’s picture

OK, so after chatting in IRC I seem I'm barking up the wrong tree WRT having more warning stuff in buildoutcome.json. Jenkins has us covered.

Added tests for the presence of buildoutcome.json in artifacts to all the Application tests.

Mixologic’s picture

Status: Active » Needs work

Dood. b56007c is awesome. Thanks.

Also, This altogether looks like a good first step. Im going to push it to dev and watch all the tests.

I'll mark the 'plan' needs work, because I need to open some follow ups for D.O.

Also, we need to add a couple of things into simpletest that detect other failed states, like 'no tests found' - that one is sorta tricky because we call the plugin twice, and there might not be javascript tests, so its almost like if *both* simpletest plugins failed we need to set the build outcome to "No tests executed"

Mixologic’s picture

Other situations for buildstate:

Fatal errors running run-tests to get the groups either because of faulty php, or because of missing @groups:
https://dispatcher.drupalci.org/job/default/284809/console

Invalid composer dependency: (note that broken_dependency isnt a thing anymore, but real, wrong ones may still exist somehow)

https://dispatcher.drupalci.org/job/default/284623/console
The requested package drupal/broken_dependency could not be found in any version

Another good example of that is something that adds a 'repositories' key to their modules composer.json (a nono) and then 'requires' that : http://cgit.drupalcode.org/yamlblock/tree/composer.json

Mixologic’s picture

Allright, now we're getting somewhere.

Lots of things have been added back in, so now we should audit all of the BuildTask plugins to make sure that when things fail, and if they fail hard (like cannot keep processing) we 'terminate' the build.

CreateDatabase is a good example of this. If it cannot create the database, it spews an error message, and keeps on truckin, and it ought not do that.

Mile23’s picture

Status: Needs work » Closed (outdated)

Well the origin branch for this has been deleted, and when I merged dev into it locally there was no diff between it and dev. So marking this closed. :-)