The output from Drupal.org automated tests should be useful for humans [#2667218]

Sometimes I'm careless and submit a patch to a project that has "automatic tests" enabled. I know from past encounters that by doing so, I set my self up for a humiliating experience. But sometimes I just want to get some bug fixed or feature added – so I brace myself and do it.

I would like to share a recent such encounter. It's about this little patch: #2497785: The path for advanced help in submodule Views Slideshow: Cycle must be updated.

The patch is not exactly rocket science. It is a one-liner replacing a quoted string with a wrong path with another quoted string with the correct path (i.e.no code change).

It failed (of course) – my patches always fail automatic testing.

The reason given for failing is:

D7 CI error

I'm guess there was a letter-shortage in the context this message was created. But there are plenty of letters in the "Console Output" context (see below). Maybe some of them can be put to better use here?

Exploring further, I click on this message. This takes me to this page, where the reason for failure has been further abbreviated to:

CI error

So "D7" wasn't part of the problem. That's nice to know.

And there is another link: "View results on dispatcher". Clicking it leads to this page, which seems to be the home page Mr. Jenkins.

And Mr. Jenkins' page has all sorts of fancy stuff: "Status" and "Console Output" and "View Build Information" and "Parameters". I click through all of those, and even read all the letters in "Console Output" (letters galore – too many, in fact) and I am still unable to figure out why my one line patch was a total failure.

Don't get me wrong: I think automatic testing is a great idea, I really do – but only if the feedback it provides makes sense to a human. The idea behind providing feedback must be to motivate the human to use that feedback to improve his code. What the current feedback communicates to the contributor is this: “You and your patch failed – and you are too stupid to understand why.”

I know that it's a computer and that I should not take it personal, but for me, the user experience is just bad – it is humiliating to have work rejected and not being unable to understand why.

And on a less personal level: Exposing contributors to error message made up of obscure initials, and raw "console output", is probably not the best motivation.

Update 2016-02-12:
I see from #2 by drumm that the failure of this particular patch is a bug.

Fine, but fixing this bug does not fix the real problem: The output produced by the automated testing is not suitable for humans..

The output from this particular should not have been:

D7 CI error

… plus a lot of crud – including a very long and unreadable (to humans) console message. It should have been something like this message – given the (IMHO faulty) logic of the current test bot:

The testing failed because the test bot did not find any tests for your patch.

Contributors that upload patches to fix bugs and add features to Drupal should be among the most valued members of the Drupal community. This means that we should care about them and their motivation.

In my company, we don't roll out new features without at least some usability and UX testing. It may be testing involving observing real users (i.e. not members of the dev. team) using the new features, or it may be something done by non-users (e.g. heuristic evaluation) – but usability testing is mandatory before major changes to a configuration are put in production.

Not being a member of the infrastructure team, I don't know how you go about your business, but I would suggest that you review "best practices" from the industry regarding major changes to the configuration, and perhaps adopt some of them for your workflow.

Comment	File	Size	Author
#6	testoutput.png	25.61 KB	gisle

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Comment #1

11 February 2016 at 20:10

gisle created an issue. See original summary.

Comment #2

drumm

he/him

NY, US

CreditAttribution: drumm at Drupal Association commented 11 February 2016 at 22:20

Status:

Active

» Closed (duplicate)

This is indeed a bug, #2645590: Ensure that simpletest job doesn't "fail" testing if no tests are present is the root cause.

The key part of the console output page is

ERROR: No valid tests were specified.

If you were to run run-tests.sh locally, you would just see that; but we have quite a bit more verbosity added in. That's communicated as a console message, but doesn't make it back to Drupal.org in machine readable form (until that underlying bug is fixed).

Comment #3

gisle

he/him

Norwegian Bokmål

Norway

CreditAttribution: gisle commented 12 February 2016 at 05:56

Issue summary:	View changes
Status:	Closed (duplicate)	» Active

drumm wrote:

This is indeed a bug

What caused this particular patch to fail may have been a bug.

However, fixing that bug will not fix the issue I am trying to raise here – that the motivation of contributors matters and the test bot currently offers an UX that is demotivating because its output is not designed with ordinary humans in mind.

Updating the issue summary to make it clearer what the issue is about, and setting back to active, as it is not a duplicate of #2645590: Ensure that simpletest job doesn't "fail" testing if no tests are present .

Comment #4

drumm

he/him

NY, US

CreditAttribution: drumm at Drupal Association commented 12 February 2016 at 15:56

#2645590: Ensure that simpletest job doesn't "fail" testing if no tests are present will make this suitable for humans, replacing the "CI Error" message on Drupal.org.

Developers do need the full verbose output for debugging. For example, what if a project is saying “no tests are present” when the maintainer thought they had committed tests. They need a way to dig in and find out what exactly is happening.

Comment #5

hestenet

He/Him

Portland, OR 🇺🇸

CreditAttribution: hestenet at Drupal Association commented 12 February 2016 at 16:14

@gisle - I think it would be better to look at some test results that do not have a known bug affecting their output to evaluate the human friendliness of the test results.

If we take these two tests result sets for example (picked at random from the core queue):

https://www.drupal.org/pift-ci-job/163960

https://www.drupal.org/pift-ci-job/175616

Can you articulate what sorts of changes you think would make this more human friendly?

Bear in mind that DrupalCI is mostly responsible for parsing and ordering the output of the test results, not for the actual contents of those results (those have more to do with the tests themselves).

One suggestion we've heard before is to have the 'view results' link go directly to the plain text console log, rather than to the build itself in the Jenkins dispatcher.

Comment #6

gisle

he/him

Norwegian Bokmål

Norway

CreditAttribution: gisle commented 13 February 2016 at 08:35

File	Size
testoutput.png	25.61 KB

hestenet wrote:

Can you articulate what sorts of changes you think would make this more human friendly?

This issue is not only about the end result (the text visible by the user), but the workflow that puts the text in front of the user.

Why was a version of the test module that produced the text "D7 CI error" if there were no tests, deployed on this production website?

Did nobody think of this (IMHO quite obvious) use case - so there were no automated tests for it in the automated test project?
Do the team deploying the automated testing framework believe that old-school usability testing (e.g. observing real users using the software) is not needed?

That being said, I think the text immediately visible by the user (before the users drills down to the cruft that "[d]evelopers need" according to #2) could do with some love. Below is a screenshoot from: https://www.drupal.org/pift-ci-job/163960 with my scribbling on it (right click on image to see it full size):
Annotated output

But first the good: It is nice to see only the failing classes. I also think the iconography is good. The green and red symbols communicate at a glance what worked and what failed.

There are some design elements I am less enamored with:

I don't understand the three numbers (13, 2, 8) near the left top. I think they need to be explained to the user, or removed.
I am also puzzled by the word pairs in the table heading ("Taxonomy.Taxonomy") In this particular case they may be "Failing classes" - but I see such pairs of words separated by a dot in projects that only use functions, not classes, Freelinking: https://www.drupal.org/pift-ci-job/176438.
As a developer, I understand what a "PHP Fatal Error" is. But what is "Other"? I had a look at a few of those, and all I looked at were failed assertions. If so, why are they not called "Failed assertions"?

Comment #7

hestenet

He/Him

Portland, OR 🇺🇸

CreditAttribution: hestenet at Drupal Association commented 18 February 2016 at 18:21

User testing is wonderful and we do it as much as we can. Things that may seem surprisingly obvious in retrospect do sometimes slip through the cracks.

I don't understand the three numbers (13, 2, 8) near the left top. I think they need to be explained to the user, or removed.

I am also puzzled by the word pairs in the table heading ("Taxonomy.Taxonomy") In this particular case they may be "Failing classes" - but I see such pairs of words separated by a dot in projects that only use functions, not classes, Freelinking: https://www.drupal.org/pift-ci-job/176438.

As a developer, I understand what a "PHP Fatal Error" is. But what is "Other"? I had a look at a few of those, and all I looked at were failed assertions. If so, why are they not called "Failed assertions"?

Responding to the screenshot, and the associated questions:

Those numbers are intended to indicate how many passes and fails there are within the larger test class, and then within the test itself.
That string is the test class (it is not defined by the automated testing infrastructure, but by the people writing the tests)
[Failed assertion] might indeed be a good category to add, though that would really be feedback for the tests themselves, rather than the testing system. We would continue to want a catch-all [other] category for un-predicted types of failures.

I do think there must be a bug in the display of the pass/fail summary numbers -- If you look at the same numbers on the 'all classes' view you can see the logic more clearly. Not sure how the extra '8' is slipping in.

Comment #8

gisle

he/him

Norwegian Bokmål

Norway

CreditAttribution: gisle commented 19 February 2016 at 03:41

That string [Taxonomy.Taxonomy] is the test class (it is not defined by the automated testing infrastructure, but by the people writing the tests).

Then the string "Failing classes" (describing what follows) should say "Failing test classes". I interpret the term "Failing classes" as referring to classes being present in the project being tested, not referring to test classes defined by the people writing the tests.