Problem/Motivation

ContactStorageTest has now failed daily testing for the past three days straight on 7.1.x-dev only, for both 8.4.x and 8.3.x:
https://www.drupal.org/pift-ci-job/602717
https://www.drupal.org/pift-ci-job/602029

Contact.Drupal\contact\Tests\ContactStorageTest
✗	
testContactStorage
fail: [Completion check] Line 38 of core/modules/contact/src/Tests/ContactStorageTest.php:
The test did not complete due to a fatal error.
✓		- setUp
✗	
Unknown
fail: [run-tests.sh check] Line 0 of :
FATAL Drupal\contact\Tests\ContactStorageTest: test runner returned a non-zero error code (139).

Looking at the full console output on CI, it appears to be a segfault:

02:57:46 Segmentation fault (core dumped)
02:57:46 FATAL Drupal\contact\Tests\ContactStorageTest: test runner returned a non-zero error code (139).
02:57:46 Drupal\contact\Tests\ContactStorageTest                        0 passes   1 fails   

Proposed resolution

TBD

Remaining tasks

TBD

User interface changes

TBD

API changes

TBD

Data model changes

TBD

Comments

xjm created an issue. See original summary.

xjm’s picture

Issue summary: View changes
xjm’s picture

Issue summary: View changes
xjm’s picture

On 8.3.x it is actually getting as far as Views' DisplayPathTest before it hits the segfault:

01:22:49 Segmentation fault (core dumped)
01:22:49 FATAL Drupal\views_ui\Tests\DisplayPathTest: test runner returned a non-zero error code (139).
01:22:49 Drupal\views_ui\Tests\DisplayPathTest                          0 passes   1 fails                            

All three results on 8.4.x the segfault happened at ContactStorageTest though.

alexpott’s picture

I've built PHP 7.1 off the dev branch and run ContactStorageTest and DisplayPathTest on both 8.3.x and 8.4.x - no fails :(

This might be a DrupalCI related issue and due to a PHP 7.1 change. We need to see the core dump from the testbot.

Mixologic’s picture

We're saving core dumps at least, but I had accidentally stripped out the debug symbols from the binaries, so not quite yet an automated core dump artifact.

So I fixed that, re-ran the tests, re-ran gdb to get the backtraces:

Here ya go:

Reading symbols from /usr/local/bin/php...done.
[New LWP 2308]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/usr/local/bin/php ./core/scripts/run-tests.sh --url http://localhost/checkout'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  zend_mm_alloc_small (size=56, bin_num=6, heap=0x7f8ed7400040) at /usr/src/php/Zend/zend_alloc.c:1261
1261			heap->free_slot[bin_num] = p->next_free_slot;
(gdb) source /usr/src/php/.gdbinit
(gdb) bt
#0  zend_mm_alloc_small (size=56, bin_num=6, heap=0x7f8ed7400040) at /usr/src/php/Zend/zend_alloc.c:1261
#1  _emalloc_56 () at /usr/src/php/Zend/zend_alloc.c:2336
#2  0x00000000008d7389 in ZEND_INIT_ARRAY_SPEC_CONST_UNUSED_HANDLER () at /usr/src/php/Zend/zend_vm_execute.h:7759
#3  0x00000000008c71bb in execute_ex (ex=<optimized out>) at /usr/src/php/Zend/zend_vm_execute.h:429
#4  0x0000000000922490 in zend_execute (op_array=0x7f8ed749f2a0, op_array@entry=0x7f8e1d017400, return_value=return_value@entry=0x7f8ed7413d10) at /usr/src/php/Zend/zend_vm_execute.h:474
#5  0x0000000000880734 in zend_execute_scripts (type=type@entry=8, retval=0x7f8ed7413d10, retval@entry=0x0, file_count=file_count@entry=3) at /usr/src/php/Zend/zend.c:1476
#6  0x00000000008205b0 in php_execute_script (primary_file=0x7ffd421109a0) at /usr/src/php/main/main.c:2537
#7  0x000000000092461a in do_cli (argc=-683671488, argv=0x7f8ed7413dc0) at /usr/src/php/sapi/cli/php_cli.c:993
#8  0x0000000000449efc in main (argc=-683671488, argv=0x7f8ed7413dc0) at /usr/src/php/sapi/cli/php_cli.c:1381
(gdb) zbacktrace
[0x7f8ed7413d10] Symfony\Component\Routing\RouteCompiler->compilePattern(object[0x7f8ed7413d60], "/user/reset/{uid}/{timestamp}/{hash}/login", false) /var/www/html/vendor/symfony/routing/RouteCompiler.php:155
[0x7f8ed7413be0] Symfony\Component\Routing\RouteCompiler->compile(object[0x7f8ed7413c30]) /var/www/html/vendor/symfony/routing/RouteCompiler.php:65
[0x7f8ed7413b60] Symfony\Component\Routing\Route->compile() /var/www/html/vendor/symfony/routing/Route.php:625
[0x7f8ed7413a10] Drupal\Core\EventSubscriber\SpecialAttributesRouteSubscriber->alterRoutes(object[0x7f8ed7413a60]) /var/www/html/core/lib/Drupal/Core/EventSubscriber/SpecialAttributesRouteSubscriber.php:30
[0x7f8ed7413970] Drupal\Core\EventSubscriber\SpecialAttributesRouteSubscriber->onAlterRoutes(object[0x7f8ed74139c0], object[0x7f8ed74139d0], object[0x7f8ed74139e0]) /var/www/html/core/lib/Drupal/Core/EventSubscriber/SpecialAttributesRouteSubscriber.php:45
[0x7f8ed7413870] Drupal\Component\EventDispatcher\ContainerAwareEventDispatcher->dispatch("routing.route_alter", object[0x7f8ed74138d0]) /var/www/html/core/lib/Drupal/Component/EventDispatcher/ContainerAwareEventDispatcher.php:111
[0x7f8ed7413750] Drupal\Core\Routing\RouteBuilder->rebuild() /var/www/html/core/lib/Drupal/Core/Routing/RouteBuilder.php:184
[0x7f8ed74136f0] Drupal\Core\ProxyClass\Routing\RouteBuilder->rebuild() /var/www/html/core/lib/Drupal/Core/ProxyClass/Routing/RouteBuilder.php:83
[0x7f8ed7413600] drupal_flush_all_caches() /var/www/html/core/includes/common.inc:1141
[0x7f8ed74135a0] Drupal\simpletest\WebTestBase->resetAll() /var/www/html/core/lib/Drupal/Core/Test/FunctionalTestSetupTrait.php:197
[0x7f8ed7413540] Drupal\simpletest\WebTestBase->rebuildAll() /var/www/html/core/lib/Drupal/Core/Test/FunctionalTestSetupTrait.php:441
[0x7f8ed74134d0] Drupal\simpletest\WebTestBase->setUp() /var/www/html/core/modules/simpletest/src/WebTestBase.php:445
[0x7f8ed7413480] Drupal\contact\Tests\ContactSitewideTest->setUp() /var/www/html/core/modules/contact/src/Tests/ContactSitewideTest.php:36
[0x7f8ed7413310] Drupal\simpletest\TestBase->run(array(0)[0x7f8ed7413360]) /var/www/html/core/modules/simpletest/src/TestBase.php:947
[0x7f8ed7413210] simpletest_script_run_one_test("397", "Drupal\contact\Tests\ContactStorageTest") /var/www/html/core/scripts/run-tests.sh:792
[0x7f8ed7413030] (main) /var/www/html/core/scripts/run-tests.sh:66
Mixologic’s picture

Also. I've fixed the containers, they now have proper debug symbols, and I added a core dump detection step that autoruns gdb's backtrace facilities if it finds one.

This will probably only work for when the cli core dumps, I'll probably need a different case for when apache dumps core as part of mod_php.

So, now, whenever a core dump happens it shows up in the console, as well as an artifact of the build process under /artifacts/simpletest.standard/{corefilename}.debug

I just re-ran the 8.3.x/7.1.x and got these results:
https://dispatcher.drupalci.org/job/php-7.1.x-apache_mysql5.5/56/console

(also here as a distinct artifact)
https://dispatcher.drupalci.org/job/php-7.1.x-apache_mysql5.5/56/artifac...

https://www.drupal.org/pift-ci-job/602717 is currently re-queued to attempt again as well to see if we're getting the same core dump as we did in #6

Mixologic’s picture

Mixologic’s picture

alexpott’s picture

So the line of code we're crashing on is:

         if ($pos < strlen($pattern)) {

In a file that has not had any changes for a long time https://github.com/symfony/routing/commits/master/RouteCompiler.php

alexpott’s picture

The other segfault in ViewsDisplayTest is here:

[0x7f385a014140] apcu_fetch(array(1)[0x7f385a014190]) [internal function]
[0x7f385a014050] Drupal\Core\Cache\ApcuBackend->getMultiple(reference, false) /var/www/html/core/lib/Drupal/Core/Cache/ApcuBackend.php:88 
[0x7f385a013f50] Drupal\Core\Cache\ChainedFastBackend->getMultiple(reference, false) /var/www/html/core/lib/Drupal/Core/Cache/ChainedFastBackend.php:141 
[0x7f385a013eb0] Drupal\Core\Cache\ChainedFastBackend->get("entity_type") /var/www/html/core/lib/Drupal/Core/Cache/ChainedFastBackend.php:103 
[0x7f385a013e40] Drupal\Core\Plugin\DefaultPluginManager->cacheGet("entity_type") /var/www/html/core/lib/Drupal/Core/Cache/UseCacheBackendTrait.php:37 
[0x7f385a013dc0] Drupal\Core\Plugin\DefaultPluginManager->getCachedDefinitions() /var/www/html/core/lib/Drupal/Core/Plugin/DefaultPluginManager.php:208 
[0x7f385a013d50] Drupal\Core\Plugin\DefaultPluginManager->getDefinitions() /var/www/html/core/lib/Drupal/Core/Plugin/DefaultPluginManager.php:174 
[0x7f385a013cd0] Drupal\Core\Plugin\DefaultPluginManager->getDefinition("block", false) /var/www/html/core/lib/Drupal/Component/Plugin/Discovery/DiscoveryCachedTrait.php:22 
[0x7f385a013c30] Drupal\Core\Entity\EntityTypeManager->getDefinition("block") /var/www/html/core/lib/Drupal/Core/Entity/EntityTypeManager.php:126 
[0x7f385a013b70] Drupal\Core\Entity\EntityTypeManager->getHandler("block", "storage") /var/www/html/core/lib/Drupal/Core/Entity/EntityTypeManager.php:233 
[0x7f385a013b00] Drupal\Core\Entity\EntityTypeManager->getStorage("block") /var/www/html/core/lib/Drupal/Core/Entity/EntityTypeManager.php:169 
[0x7f385a013a70] Drupal\Core\Entity\EntityTypeListener->onEntityTypeCreate(object[0x7f385a013ac0]) /var/www/html/core/lib/Drupal/Core/Entity/EntityTypeListener.php:69 
[0x7f385a013a00] Drupal\Core\Entity\EntityManager->onEntityTypeCreate(object[0x7f385a013a50]) /var/www/html/core/lib/Drupal/Core/Entity/EntityManager.php:384 
[0x7f385a013990] Drupal\Core\Entity\EntityDefinitionUpdateManager->installEntityType(object[0x7f385a0139e0]) /var/www/html/core/lib/Drupal/Core/Entity/EntityDefinitionUpdateManager.php:135 
[0x7f385a013700] Drupal\Core\Extension\ModuleInstaller->install(reference, true) /var/www/html/core/lib/Drupal/Core/Extension/ModuleInstaller.php:219 
[0x7f385a013680] Drupal\Core\ProxyClass\Extension\ModuleInstaller->install(array(7)[0x7f385a0136d0], true) /var/www/html/core/lib/Drupal/Core/ProxyClass/Extension/ModuleInstaller.php:83 
[0x7f385a0135c0] Drupal\simpletest\WebTestBase->installModulesFromClassProperty(object[0x7f385a013610]) /var/www/html/core/lib/Drupal/Core/Test/FunctionalTestSetupTrait.php:419 
[0x7f385a013550] Drupal\simpletest\WebTestBase->setUp() /var/www/html/core/modules/simpletest/src/WebTestBase.php:439 
[0x7f385a0134e0] Drupal\views\Tests\ViewTestBase->setUp() /var/www/html/core/modules/views/src/Tests/ViewTestBase.php:31 
[0x7f385a013480] Drupal\views_ui\Tests\UITestBase->setUp() /var/www/html/core/modules/views_ui/src/Tests/UITestBase.php:37 
[0x7f385a013310] Drupal\simpletest\TestBase->run(array(0)[0x7f385a013360]) /var/www/html/core/modules/simpletest/src/TestBase.php:947 
[0x7f385a013210] simpletest_script_run_one_test("2361", "Drupal\views_ui\Tests\DisplayTest") /var/www/html/core/scripts/run-tests.sh:792 
[0x7f385a013030] (main) /var/www/html/core/scripts/run-tests.sh:66 

See https://dispatcher.drupalci.org/job/php-7.1.x-apache_mysql5.5/59/artifac...

xjm’s picture

Title: ContactStorageTest failing with a segfault on PHP 7.1.x-dev » Segfaults in tests on PHP 7.1.x-dev

This time it was InstallUninstallTest:
https://www.drupal.org/pift-ci-job/604164

01:34:04 Segmentation fault (core dumped)
01:34:04 FATAL Drupal\system\Tests\Module\InstallUninstallTest: test runner returned a non-zero error code (139).
01:34:04 Drupal\system\Tests\Module\InstallUninstallTest                0 passes   1 fails           

I bet the segfault might move again following the BTB conversion that just landed also.

The fact that these are happening only on 7.1.x-dev consistently makes me strongly suspect that it's something PHP committed a few days ago, not something we did.

alexpott’s picture

Program terminated with signal SIGSEGV, Segmentation fault.
#0  zend_mm_alloc_small (size=56, bin_num=6, heap=0x7f4bdb000040)
    at /usr/src/php/Zend/zend_alloc.c:1261
1261			heap->free_slot[bin_num] = p->next_free_slot;
#0  zend_mm_alloc_small (size=56, bin_num=6, heap=0x7f4bdb000040)
    at /usr/src/php/Zend/zend_alloc.c:1261
#1  _emalloc_56 () at /usr/src/php/Zend/zend_alloc.c:2336
#2  0x00000000008922ae in zend_array_dup (source=0x7f4b1def1fc0)
    at /usr/src/php/Zend/zend_hash.c:1764
#3  0x000000000090986b in ZEND_UNSET_DIM_SPEC_CV_CONST_HANDLER ()
    at /usr/src/php/Zend/zend_vm_execute.h:39973
#4  0x00000000008c71cb in execute_ex (ex=<optimized out>)
    at /usr/src/php/Zend/zend_vm_execute.h:429
#5  0x00000000009224a0 in zend_execute (op_array=0x7f4bdb09f2a0, 
    op_array@entry=0x7f4b202b2048, 
    return_value=return_value@entry=0x7f4bdb013c70)
    at /usr/src/php/Zend/zend_vm_execute.h:474
#6  0x0000000000880744 in zend_execute_scripts (type=type@entry=8, 
    retval=0x7f4bdb013c70, retval@entry=0x0, file_count=file_count@entry=3)
    at /usr/src/php/Zend/zend.c:1476
#7  0x00000000008205c0 in php_execute_script (primary_file=0x7fff4de012c0)
    at /usr/src/php/main/main.c:2537
#8  0x000000000092462a in do_cli (argc=-620756928, argv=0x160)
    at /usr/src/php/sapi/cli/php_cli.c:993
#9  0x0000000000449efc in main (argc=-620756928, argv=0x160)
    at /usr/src/php/sapi/cli/php_cli.c:1381
[0x7f4bdb013c70] Drupal\Core\DependencyInjection\YamlFileLoader->parseDefinition("user_access_denied_subscriber", array(3)[0x7f4bdb013cd0], "core/modules/user/user.services.yml") /var/www/html/core/lib/Drupal/Core/DependencyInjection/YamlFileLoader.php:280 
[0x7f4bdb013b90] Drupal\Core\DependencyInjection\YamlFileLoader->parseDefinitions(array(2)[0x7f4bdb013be0], "core/modules/user/user.services.yml") /var/www/html/core/lib/Drupal/Core/DependencyInjection/YamlFileLoader.php:124 
[0x7f4bdb013ae0] Drupal\Core\DependencyInjection\YamlFileLoader->load("core/modules/user/user.services.yml") /var/www/html/core/lib/Drupal/Core/DependencyInjection/YamlFileLoader.php:95 
[0x7f4bdb013940] Drupal\Core\DrupalKernel->compileContainer() /var/www/html/core/lib/Drupal/Core/DrupalKernel.php:1254 
[0x7f4bdb013840] Drupal\Core\DrupalKernel->initializeContainer() /var/www/html/core/lib/Drupal/Core/DrupalKernel.php:873 
[0x7f4bdb0137e0] Drupal\Core\DrupalKernel->rebuildContainer() /var/www/html/core/lib/Drupal/Core/DrupalKernel.php:1143 
[0x7f4bdb013780] Drupal\simpletest\WebTestBase->rebuildContainer() /var/www/html/core/lib/Drupal/Core/Test/FunctionalTestSetupTrait.php:177 
[0x7f4bdb0136d0] Drupal\system\Tests\Module\ModuleTestBase->assertModules(array(1)[0x7f4bdb013720], false) /var/www/html/core/modules/system/src/Tests/Module/ModuleTestBase.php:149 
[0x7f4bdb013660] Drupal\system\Tests\Module\InstallUninstallTest->assertModuleNotInstalled("file") /var/www/html/core/modules/system/src/Tests/Module/InstallUninstallTest.php:219 
[0x7f4bdb013480] Drupal\system\Tests\Module\InstallUninstallTest->testInstallUninstall() /var/www/html/core/modules/system/src/Tests/Module/InstallUninstallTest.php:89 
[0x7f4bdb013310] Drupal\simpletest\TestBase->run(array(0)[0x7f4bdb013360]) /var/www/html/core/modules/simpletest/src/TestBase.php:958 
[0x7f4bdb013210] simpletest_script_run_one_test("1383", "Drupal\system\Tests\Module\InstallUninstallTest") /var/www/html/core/scripts/run-tests.sh:792 
[0x7f4bdb013030] (main) /var/www/html/core/scripts/run-tests.sh:66 
xjm’s picture

Title: Segfaults in tests on PHP 7.1.x-dev » Segfaults in every test run on PHP 7.1.x-dev

Hm I cannot see precise commonalities between #6, #11, and #13. :(

alexpott’s picture

The other issue is that this is not repeatable by building PHP locally for me. Which makes a new commit in PHP introducing this less likely. Non-repeatable consistent errors are annoying.

Mixologic’s picture

I have tried re-running just the failing tests on the testbots and they do not fail when run in isolation, ergo it's something to do with concurrency.

xjm’s picture

Hm, the last time we had segfaults with concurrency was an APC problem no?

xjm’s picture

@alexpott If that's the case (not a recent PHP commit) how could it be happening only on 7.1.x-dev, and not on the stable 7.1 environment nor any other?

Mixologic’s picture

I've been meaning to disable php7 apc anyhow as a test.. (our testruns went from 17->22 min when we added apc + compiled differently.. /me is curious if apc slowed things down somehow)

Mixologic’s picture

The other thing thats possible, is we could build a php7.1 container that is pinned to a commit of 7.1.x from before the day this was happening. (feb 16th container)

alexpott’s picture

@Mixologic if you could work out which commit caused the regression that'd be amazing. As this seems to be something that only happens under high concurrency it's going to be really hard to work out what is happening unless someone with full access to DrupalCI spends time working this out.

Mixologic’s picture

I built a php container using commit https://github.com/php/php-src/commit/aa1d92e3e5189b34625b61c62dfd7bc441... which was on the 16th, so, no segfaults that I see with our latest, which either indicates to me it was one of the three commits that php added on the 17th:
https://github.com/php/php-src/commits/PHP-7.1

OR we somehow magically fixed this error by committing "59b4509 (grafted, HEAD, origin/8.4.x, 8.4.x) Issue #2854926 by xjm, himanshu-dixit, borisson_, boaloysius, alexpott: Remove unneeded control structures in ContentEntityBase" because now, all of a sudden, php 7.1.x is passing: https://www.drupal.org/pift-ci-job/606817

OR one of the commits that php7.1.x had on the 23rd also fixed it.

Results here.
https://dispatcher.drupalci.org/job/drupalci_test_containers/818/console...

I tried with another commit that happened on the next day that looked suspect, that https://github.com/php/php-src/commit/513582814b0ca82d81eb6b98897d745e0f..., except that ended up building th 7.0.x branch - so Im redoing now with the proper commit:

https://github.com/php/php-src/commit/c240feb7f4471d26b9622f48990e782031...

except that also seems to be working fine: https://dispatcher.drupalci.org/job/drupalci_test_containers/820/console...

which makes me wonder if it wasnt our fix after all that somehow changed things?

Anyhow, I queued up https://dispatcher.drupalci.org/job/php-7.1.x-apache_mysql5.5/68 to run 67 again to see if the pass was a random anomoly or not.

Mixologic’s picture

Welp. turns out head containers haven't been building.

Feb 19 2017 23:32:54 was the last time they got built, so the pass in https://dispatcher.drupalci.org/job/php-7.1.x-apache_mysql5.5/67/ is *not* due to an upstream fix.

Mixologic’s picture

okay. we have new containers. And new tests that still fail. that success was just an anomoly. Which is a superbummer because that means that these core dumps are not guaranteed, and a test against a particular build may or may not reveal flaws. :/

Mixologic’s picture

This may have healed itself. Then again maybe not, the head containers havent been rebuilding for a bit, but it looks like this problem fixed itself before that.

xjm’s picture

Title: Segfaults in every test run on PHP 7.1.x-dev » Figure out why there were segfaults in every test run on PHP 7.1.x-dev
Priority: Critical » Major

I... okay, so. well, it's passing again now and has for a few days. Downgrading to major, but leaving open for now since we still have no idea why it happened or if it will come back.

xjm’s picture

Category: Bug report » Task
Status: Active » Closed (cannot reproduce)

Discussed with @catch, @Cottser, and @cilefen. We agreed that there is no bug here anymore since the issue was resolved as mysteriously as it appeared. At first we considered keeping the issue open as a major task in case it recurred and since we are seeing a different segfault on 5.6, but this issue does not need to be open for #2859704: Intermittent segfaults on DrupalCI (some "did not complete due to a fatal error" with no additional info) to reference it. We can reopen this issue if the problem recurs.

Mixologic’s picture

As an Aside, php 7.1.3 and php 7.0.17 came out today and I updated the containers to match. If this *does* come back it could come back on the 7.1 or 7.0 branches now.

xjm’s picture