Problem/Motivation

For the past couple of weeks, there has been a very high fail rate in the Forum test group on PHP7/Postgres. It is not always the same tests within the namespace. The rate of failure is especially high with Postgres/PHP7 on 8.2.x (~75%), but it also appears to occur at a lower rate with Postgres/PHP5.5 and on 8.1.x.

Example:
https://www.drupal.org/pift-ci-job/396471

Forum.Drupal\Tests\forum\Functional\ForumUninstallTest
✗	
testForumUninstallWithField
fail: [Other] Line 29 of core/modules/forum/tests/src/Functional/ForumUninstallTest.php:
Drupal\Tests\forum\Functional\ForumUninstallTest::testForumUninstallWithField
Behat\Mink\Exception\DriverException: There is no element matching XPath "//html"

/var/www/html/vendor/behat/mink-browserkit-driver/src/BrowserKitDriver.php:832
/var/www/html/vendor/behat/mink-browserkit-driver/src/BrowserKitDriver.php:347
/var/www/html/vendor/behat/mink/src/Element/Element.php:176
/var/www/html/vendor/behat/mink/src/WebAssert.php:257
/var/www/html/core/tests/Drupal/FunctionalTests/AssertLegacyTrait.php:67
/var/www/html/core/modules/forum/tests/src/Functional/ForumUninstallTest.php:122
✓		- testForumUninstallWithoutFieldStorage
✗	
rupal\Tests\forum\Functional\ForumUninstallTe
fail: [Other] Line 0 of sites/default/files/simpletest/phpunit-863.xml:
PHPunit Test failed to complete
✗	
nkno
fail: [run-tests.sh check] Line 0 of :
FATAL Drupal\Tests\forum\Functional\ForumUninstallTest: test runner returned a non-zero error code (2)

It appears to have started on July 19, the day #2737805: Convert web tests to browser tests for forum module was committed. This is the first result I saw on HEAD:
https://www.drupal.org/pift-ci-job/379115

The random fail does not seem to happen when one or more forum tests are run repeatedly on the Postgres/PHP7 environment, only when the whole test suite is run. This likely points to either a DrupalCI problem or a conflict with some other test not being torn down cleanly. See #10 for the ~75% fail rate in forum tests, on the following specific environments:

  • 8.2.x, PostgreSQL 9.1, PHP 7: 75% fail rate
  • 8.1.x, PostgreSQL 9.1, PHP 7: 75% fail rate
  • 8.2.x, MySQL 5.5, PHP 7: Not observed
  • 8.2.x, PostgreSQL 9.1, PHP 5.5: 25% fail rate

Proposed resolution

Figure out which commit introduced the fails, and if it is the above, revert until we can figure out why this is happening.

Remaining tasks

TBD

CommentFileSizeAuthor
#31 random-fail-2776269-31.patch9.16 KBklausi
#29 random-fail-2776269-29.patch13.19 KBklausi
#11 forum_test_revert_only.patch8.53 KBxjm
#10 not_a_patch.patch279 bytesxjm
#8 test_no_revert_forum_namespace.patch607 bytesxjm
#5 anybody_out_there.patch26.02 KBxjm
#4 test_with_revert.patch9.05 KBxjm
#3 test_no_revert.patch550 bytesxjm
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

xjm created an issue. See original summary.

xjm’s picture

Issue summary: View changes
xjm’s picture

Status: Active » Needs review
FileSize
550 bytes

First, a baseline on HEAD for one of the failing tests.

xjm’s picture

Next, the equivalent test prior to the conversion, with a revert. If this also fails on Postgres, then we have a different regression causing the fails. If it passes, we need to revert the Forum conversion until we figure out what's going on.

xjm’s picture

Hmm I don't think #3 is actually adding to the test list; maybe that only works for SimpleTests.

Status: Needs review » Needs work

The last submitted patch, 5: anybody_out_there.patch, failed testing.

xjm’s picture

Nope, it is running. So possibly the fail is due to some interaction between multiple tests.

xjm’s picture

xjm’s picture

Status: Needs work » Needs review
xjm’s picture

xjm’s picture

xjm’s picture

Issue tags: +Random test failure
xjm’s picture

Issue summary: View changes
xjm’s picture

Issue summary: View changes
xjm’s picture

Title: High random fail rate in forum tests on Postgres/PHP7 » High random fail rate in BTB forum tests on Postgres/PHP7
xjm’s picture

Issue summary: View changes
xjm’s picture

Issue summary: View changes
xjm’s picture

Title: High random fail rate in BTB forum tests on Postgres/PHP7 » High random fail rate in BTB forum tests on Postgres (especially, but not only, with PHP7)
xjm’s picture

Issue summary: View changes
xjm’s picture

Issue summary: View changes
xjm’s picture

Priority: Critical » Major
Status: Needs review » Active

I've reverted #2737805: Convert web tests to browser tests for forum module based on the results here and updated the summary with more detail on what circumstances appear to cause the fail and how frequently. Downgrading to major since the random fail is no longer in HEAD. I do still consider this a major bug because it is now blocking the initiative to improve the testing framework.

Setting back to active since there is not an actual patch, only demonstrations of the random fail.

xjm’s picture

Issue summary: View changes
xjm’s picture

I also reverted #2755991: Convert web tests to browser tests for telephone module for a similar, lower rate fail (also shown in one of the results above I believe as well as on the branch history for HEAD).

xjm’s picture

Also pinged @Mixologic to see if he has any ideas about this issue.

klausi’s picture

I wanted to do some investigation by running DrupalCI locally, but failed on #2780339: ./drupalci init fails with 400 Guzzle exception when creating images. Any tips on that would be very much appreciated!

Version: 8.2.x-dev » 8.3.x-dev

Drupal 8.2.0-beta1 was released on August 3, 2016, which means new developments and disruptive changes should now be targeted against the 8.3.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

klausi’s picture

Issue tags: +Drupalaton
klausi’s picture

klausi’s picture

Yay, we see the random fails as expected! Looks like #2771547: In Browser and FunctionalJavascript tests SIMPLETEST_USER_AGENT cookie needs to be set every 5 seconds is indeed the solution, should we close this as duplicate?

dawehner’s picture

Status: Needs review » Closed (duplicate)

I am convinced that we figured out the random failures ...

So this is just a prove that PGSQL page requests are sometimes slower than 5 seconds, well, this is how it is.