Support for Drupal 7 is ending on 5 January 2025—it’s time to migrate to Drupal 10! Learn about the many benefits of Drupal 10 and find migration tools in our resource center.
Problem/Motivation
ContactStorageTest has now failed daily testing for the past three days straight on 7.1.x-dev only, for both 8.4.x and 8.3.x:
https://www.drupal.org/pift-ci-job/602717
https://www.drupal.org/pift-ci-job/602029
Contact.Drupal\contact\Tests\ContactStorageTest
✗
testContactStorage
fail: [Completion check] Line 38 of core/modules/contact/src/Tests/ContactStorageTest.php:
The test did not complete due to a fatal error.
✓ - setUp
✗
Unknown
fail: [run-tests.sh check] Line 0 of :
FATAL Drupal\contact\Tests\ContactStorageTest: test runner returned a non-zero error code (139).
Looking at the full console output on CI, it appears to be a segfault:
02:57:46 Segmentation fault (core dumped)
02:57:46 FATAL Drupal\contact\Tests\ContactStorageTest: test runner returned a non-zero error code (139).
02:57:46 Drupal\contact\Tests\ContactStorageTest 0 passes 1 fails
Proposed resolution
TBD
Remaining tasks
TBD
User interface changes
TBD
API changes
TBD
Data model changes
TBD
Comments
Comment #2
xjmComment #3
xjmComment #4
xjmOn 8.3.x it is actually getting as far as Views'
DisplayPathTest
before it hits the segfault:All three results on 8.4.x the segfault happened at ContactStorageTest though.
Comment #5
alexpottI've built PHP 7.1 off the dev branch and run ContactStorageTest and DisplayPathTest on both 8.3.x and 8.4.x - no fails :(
This might be a DrupalCI related issue and due to a PHP 7.1 change. We need to see the core dump from the testbot.
Comment #6
MixologicWe're saving core dumps at least, but I had accidentally stripped out the debug symbols from the binaries, so not quite yet an automated core dump artifact.
So I fixed that, re-ran the tests, re-ran gdb to get the backtraces:
Here ya go:
Comment #7
MixologicAlso. I've fixed the containers, they now have proper debug symbols, and I added a core dump detection step that autoruns gdb's backtrace facilities if it finds one.
This will probably only work for when the cli core dumps, I'll probably need a different case for when apache dumps core as part of mod_php.
So, now, whenever a core dump happens it shows up in the console, as well as an artifact of the build process under /artifacts/simpletest.standard/{corefilename}.debug
I just re-ran the 8.3.x/7.1.x and got these results:
https://dispatcher.drupalci.org/job/php-7.1.x-apache_mysql5.5/56/console
(also here as a distinct artifact)
https://dispatcher.drupalci.org/job/php-7.1.x-apache_mysql5.5/56/artifac...
https://www.drupal.org/pift-ci-job/602717 is currently re-queued to attempt again as well to see if we're getting the same core dump as we did in #6
Comment #8
MixologicYep. https://dispatcher.drupalci.org/job/php-7.1.x-apache_mysql5.5/57/artifac... from https://www.drupal.org/pift-ci-job/602717 shows the same segfault results as I posted in #6
Comment #9
MixologicAnd I caught one with two segfaults: https://dispatcher.drupalci.org/job/php-7.1.x-apache_mysql5.5/59/artifac...
Comment #10
alexpottSo the line of code we're crashing on is:
In a file that has not had any changes for a long time https://github.com/symfony/routing/commits/master/RouteCompiler.php
Comment #11
alexpottThe other segfault in ViewsDisplayTest is here:
See https://dispatcher.drupalci.org/job/php-7.1.x-apache_mysql5.5/59/artifac...
Comment #12
xjmThis time it was
InstallUninstallTest
:https://www.drupal.org/pift-ci-job/604164
I bet the segfault might move again following the BTB conversion that just landed also.
The fact that these are happening only on 7.1.x-dev consistently makes me strongly suspect that it's something PHP committed a few days ago, not something we did.
Comment #13
alexpottComment #14
xjmHm I cannot see precise commonalities between #6, #11, and #13. :(
Comment #15
alexpottThe other issue is that this is not repeatable by building PHP locally for me. Which makes a new commit in PHP introducing this less likely. Non-repeatable consistent errors are annoying.
Comment #16
MixologicI have tried re-running just the failing tests on the testbots and they do not fail when run in isolation, ergo it's something to do with concurrency.
Comment #17
xjmHm, the last time we had segfaults with concurrency was an APC problem no?
Comment #18
xjm@alexpott If that's the case (not a recent PHP commit) how could it be happening only on 7.1.x-dev, and not on the stable 7.1 environment nor any other?
Comment #19
MixologicI've been meaning to disable php7 apc anyhow as a test.. (our testruns went from 17->22 min when we added apc + compiled differently.. /me is curious if apc slowed things down somehow)
Comment #20
MixologicThe other thing thats possible, is we could build a php7.1 container that is pinned to a commit of 7.1.x from before the day this was happening. (feb 16th container)
Comment #21
alexpott@Mixologic if you could work out which commit caused the regression that'd be amazing. As this seems to be something that only happens under high concurrency it's going to be really hard to work out what is happening unless someone with full access to DrupalCI spends time working this out.
Comment #22
MixologicI built a php container using commit https://github.com/php/php-src/commit/aa1d92e3e5189b34625b61c62dfd7bc441... which was on the 16th, so, no segfaults that I see with our latest, which either indicates to me it was one of the three commits that php added on the 17th:
https://github.com/php/php-src/commits/PHP-7.1
OR we somehow magically fixed this error by committing "59b4509 (grafted, HEAD, origin/8.4.x, 8.4.x) Issue #2854926 by xjm, himanshu-dixit, borisson_, boaloysius, alexpott: Remove unneeded control structures in ContentEntityBase" because now, all of a sudden, php 7.1.x is passing: https://www.drupal.org/pift-ci-job/606817
OR one of the commits that php7.1.x had on the 23rd also fixed it.
Results here.
https://dispatcher.drupalci.org/job/drupalci_test_containers/818/console...
I tried with another commit that happened on the next day that looked suspect, that https://github.com/php/php-src/commit/513582814b0ca82d81eb6b98897d745e0f..., except that ended up building th 7.0.x branch - so Im redoing now with the proper commit:
https://github.com/php/php-src/commit/c240feb7f4471d26b9622f48990e782031...
except that also seems to be working fine: https://dispatcher.drupalci.org/job/drupalci_test_containers/820/console...
which makes me wonder if it wasnt our fix after all that somehow changed things?
Anyhow, I queued up https://dispatcher.drupalci.org/job/php-7.1.x-apache_mysql5.5/68 to run 67 again to see if the pass was a random anomoly or not.
Comment #23
MixologicWelp. turns out head containers haven't been building.
Feb 19 2017 23:32:54 was the last time they got built, so the pass in https://dispatcher.drupalci.org/job/php-7.1.x-apache_mysql5.5/67/ is *not* due to an upstream fix.
Comment #24
Mixologicokay. we have new containers. And new tests that still fail. that success was just an anomoly. Which is a superbummer because that means that these core dumps are not guaranteed, and a test against a particular build may or may not reveal flaws. :/
Comment #25
MixologicThis may have healed itself. Then again maybe not, the head containers havent been rebuilding for a bit, but it looks like this problem fixed itself before that.
Comment #26
xjmI... okay, so. well, it's passing again now and has for a few days. Downgrading to major, but leaving open for now since we still have no idea why it happened or if it will come back.
Comment #27
xjmDiscussed with @catch, @Cottser, and @cilefen. We agreed that there is no bug here anymore since the issue was resolved as mysteriously as it appeared. At first we considered keeping the issue open as a major task in case it recurred and since we are seeing a different segfault on 5.6, but this issue does not need to be open for #2859704: Intermittent segfaults on DrupalCI (some "did not complete due to a fatal error" with no additional info) to reference it. We can reopen this issue if the problem recurs.
Comment #28
MixologicAs an Aside, php 7.1.3 and php 7.0.17 came out today and I updated the containers to match. If this *does* come back it could come back on the 7.1 or 7.0 branches now.
Comment #29
xjm