Problem/Motivation

After #2749955: Random fails in UpdatePathTestBase tests, we are continuing to see random fails in UpdatePathTestBase tests. For example, from #2627512-139: Datetime Views plugins don't support timezones:

https://www.drupal.org/pift-ci-job/532197

https://www.drupal.org/pift-ci-job/532126

https://www.drupal.org/pift-ci-job/532196

https://www.drupal.org/pift-ci-job/532198

When you look at these reports, there isn't a clear symptom other than

fail: [Browser] Line 248 of core/modules/system/src/Tests/Update/UpdatePathTestBase.php:
GET http://localhost/checkout/update.php/start?id=2&op=do_nojs returned 0 (0 bytes).

fail: [Other] Line 263 of core/modules/system/src/Tests/Update/UpdatePathTestBase.php:
Schema key block.block.bartik_login:settings.cache failed with: missing schema

at some point in the output.

Given the recent work that has gone into test isolation (need links to the related issues) for APCu, database, and filesystem collisions during parallel tests, these new fails more baffling.

Proposed resolution

Until this can be isolated, I think we need to consider two radical options

- Disable parallel tests
- or Force a fresh testbot for each full run

that should(?) point us in the right direction.

Remaining tasks

User interface changes

API changes

Data model changes

CommentFileSizeAuthor
#131 2828559-131.patch572 bytesalexpott
#126 2828559-126.patch579 bytescatch
#119 control.patch292 bytesxjm
#117 2828559-117.patch326 bytestacituseu
#113 gc_after_test_setup.patch437 byteshchonov
#110 update_tests_gc.patch483 byteshchonov
#107 run_update_tests_isolated.patch2.37 KBhchonov
#104 2828559-104.patch449 bytesmpdonadio
#94 2828559-94.patch695 bytesalexpott
#88 Screen Shot 2017-01-04 at 10.04.19.png387.46 KBwim leers
#86 2828559-86.patch5.94 KBtacituseu
#79 2828559-79.patch5.5 KBtacituseu
#77 2828559-77-2.patch5.5 KBtacituseu
#76 2828559-76.patch3.41 KBtacituseu
#75 memory-usage-rolling.ods158.82 KBtacituseu
#71 2828559-71.patch3.79 KBtacituseu
#69 2828559-69.patch967 bytestacituseu
#68 2828559-68.patch383 bytesmpdonadio
#65 disable_opcache_apache_php5.patch454 byteshchonov
#59 2828559-59.patch1.61 KBcatch
#47 interdiff-45-47.txt567 bytesmpdonadio
#47 interdiff-41-47.txt635 bytesmpdonadio
#47 2828559-47.patch1.61 KBmpdonadio
#45 interdiff-41-45.txt636 bytesmpdonadio
#45 2828559-45.patch1.61 KBmpdonadio
#41 2828559-41.patch1010 bytesalexpott
#40 2828559-40.patch886 bytesalexpott
#39 2828559-no-cron-in-maintenance-mode-2.patch532 bytesmpdonadio
#39 2828559-no-cron-in-maintenance-mode.patch1.12 KBmpdonadio
#36 2828559.36.patch1.09 KBalexpott
#24 2828559-2-24.patch2.27 KBalexpott
#24 17-24-interdiff.txt3 KBalexpott
#21 2828559-21.patch717 bytesalexpott
#21 2828559-21.test-only.patch716 bytesalexpott
#17 interdiff.txt1.63 KBamateescu
#17 2828559-17.patch3.41 KBamateescu
#12 interdiff.txt941 bytesamateescu
#12 2828559-12.patch2.51 KBamateescu
#10 2828559-10.patch1.59 KBamateescu
#8 2828559-8.patch1.72 KBamateescu
#7 2828559.patch638 bytesamateescu

Comments

mpdonadio created an issue. See original summary.

mpdonadio’s picture

I guess another possibility is to run all tests (with parallelism) except for UpdatePathTestBase tests, and then run those at the end in serial.

?

xjm’s picture

Issue tags: +Triaged D8 critical

@alexpott, @effulgentsia, @cilefen, @catch, and I agreed that this is a critical issue since it's a random test failure for core.

xjm’s picture

Title: More random fails in UpdatePathTestBase tests » More random fails in UpdatePathTestBase tests: "settings.cache failed with: missing schema"
Anonymous’s picture

Sorry if this information would be useless. But I had a similar example https://www.drupal.org/pift-ci-job/480599.

amateescu’s picture

StatusFileSize
new638 bytes

At @alexpott's suggestion, here's a patch which will show us exactly what updates didn't run.

amateescu’s picture

Status: Active » Needs review
StatusFileSize
new1.72 KB

Better one.

Status: Needs review » Needs work

The last submitted patch, 8: 2828559-8.patch, failed testing.

amateescu’s picture

Status: Needs work » Needs review
StatusFileSize
new1.59 KB

More better :D

Status: Needs review » Needs work

The last submitted patch, 10: 2828559-10.patch, failed testing.

amateescu’s picture

Status: Needs work » Needs review
StatusFileSize
new2.51 KB
new941 bytes

We need to reset the static cache for drupal_get_installed_schema_version() now.

Status: Needs review » Needs work

The last submitted patch, 12: 2828559-12.patch, failed testing.

The last submitted patch, 12: 2828559-12.patch, failed testing.

The last submitted patch, 12: 2828559-12.patch, failed testing.

The last submitted patch, 12: 2828559-12.patch, failed testing.

amateescu’s picture

Status: Needs work » Needs review
StatusFileSize
new3.41 KB
new1.63 KB

Missed a spot.

hchonov’s picture

We've noticed in #2248983: Define the revision metadata base fields in the entity annotation in order for the storage to create them only in the revision table that the problem is reproducible with PHP 5.5 but not with PHP 7 when we put our update logic in View::preSave and through a hook_post_update trigger re-save of all views. However the problem does not occur if we instead directly update the configs in hook_update_n.

So currently there we have a patch which is constantly failing and not randomly.

hchonov’s picture

The random fails in our case have been caused highly probable by PHP 5.5 bug by using "foreach by reference" as shown in the interdiff from #2248983-176: Define the revision metadata base fields in the entity annotation in order for the storage to create them only in the revision table where the patch finally passed as instead using "foreach by reference" I've modified the array values directly.

Anonymous’s picture

Regexp foreach.+=>\s*& found about 50 cases in core. We can try replace it, right?

Edit: and 1 case in vendor /vendor/guzzlehttp/guzzle/src/UriTemplate.php

alexpott’s picture

StatusFileSize
new716 bytes
new717 bytes

Might have worked this out :) We have protection in WebTestBase to prevent following to many redirects... the problem is that this is set to 5 and sometimes (depending on load) the update batch can't be completed in 5 refreshes.

alexpott’s picture

I think the issue summary has it wrong btw... it is quoting the second fail - the first fail from all of the tests is:

fail: [Browser] Line 248 of core/modules/system/src/Tests/Update/UpdatePathTestBase.php:
GET http://localhost/checkout/update.php/start?id=2&op=do_nojs returned 0 (0 bytes).
alexpott’s picture

So #21 is not right - the batch system meta refreshes which are handled separately :( but still I think we should be focussing on the 0 bytes not the schema stuff.

Looking at #17.

+++ b/core/modules/system/src/Tests/Update/UpdatePathRC1TestBaseTest.php
@@ -37,7 +37,7 @@ public function testDatabaseLoaded() {
+      $this->assertEqual(drupal_get_installed_schema_version($module, TRUE), $schema, new FormattableMarkup('Module @module schema is @schema', ['@module' => $module, '@schema' => $schema]));

+++ b/core/modules/system/src/Tests/Update/UpdatePathTestBaseTest.php
@@ -32,7 +32,7 @@ protected function setDatabaseDumpFiles() {
+      $this->assertEqual(drupal_get_installed_schema_version($module, TRUE), 8000, SafeMarkup::format('Module @module schema is 8000', ['@module' => $module]));

This change is a shame. Any ideas how we can get around doing that?

alexpott’s picture

StatusFileSize
new3 KB
new2.27 KB

I cannot see any reason why drupal_get_installed_schema_version() is not using drupal_static() - let's use that so we can reset it.

I think we should commit this so we have additional information about what is going on here. At the moment we are flying blind.

alexpott’s picture

So the static was added in #35924: Make update.php work with modules before drupal_static() existed and it just never got updated.

alexpott’s picture

amateescu’s picture

The interdiff from #24 makes perfect sense, why didn't I think of that? :P

+1 from me!

catch’s picture

This looks good, I don't think the drupal_static() is a problem for 8.2.x.

alexpott’s picture

I've committed #24 to 8.2.x and 8.3.x because without this extra info we're never going to find out what is happening.

Committed and pushed 5e5ef24 to 8.3.x and 607ae50 to 8.2.x. Thanks!

  • alexpott committed 5e5ef24 on 8.3.x
    Issue #2828559 by amateescu, alexpott: More random fails in...

  • alexpott committed 607ae50 on 8.2.x
    Issue #2828559 by amateescu, alexpott: More random fails in...
alexpott’s picture

Okay we've had the first fail with the new code in place https://www.drupal.org/pift-ci-job/548483. So more information...

  • The disable_blocks_with_missing_contexts() update function from the block module did not run.
  • The disabled_region_update() update function from the block module did not run.
  • The fix_negate_in_conditions() update function from the block module did not run.
  • The add_message_redirect_field_to_contact_form() update function from the contact module did not run.
  • The clear_cache_for_file_reference_filter() update function from the editor module did not run.
  • The email_widget_size_setting() update function from the field module did not run.
  • The entity_reference_handler_setting() update function from the field module did not run.
  • The save_custom_storage_property() update function from the field module did not run.
  • The image_style_dependencies() update function from the image module did not run.
  • The add_region_to_entity_displays() update function from the system module did not run.
  • The clear_twig_cache() update function from the system module did not run.
  • The recalculate_configuration_entity_dependencies() update function from the system module did not run.
  • The boolean_filter_values() update function from the views module did not run.
  • The cleanup_duplicate_views_data() update function from the views module did not run.
  • The field_formatter_dependencies() update function from the views module did not run.
  • The serializer_dependencies() update function from the views module did not run.
  • The taxonomy_index_tid() update function from the views module did not run.
  • The update_cacheability_metadata() update function from the views module did not run.

So all the hook_update_N's have run but some of the post updates have not... actually it is not some! It is ALL of the post updates!

berdir’s picture

Are we sure it is only post updates? Post update functions don't run if any of the normal update functions fail.

The entity field definitions differences are also strange. Those should be fixed by actual update functions, not by post update functions, so that seems another indication that not all update functions where executed?

alexpott’s picture

In-between running hook_update_N() and post-updates we call drupal_flush_all_caches() to ensure that the environment is clean before allowing the full API use in post updates. I wonder if immediately after drupal_flush_all_caches() if we do a refresh for the batch there is a problem. So things work if a post update is run but not if one is not.

mpdonadio’s picture

#34, what happens in this scenario if two updates are running at the same time b/c test concurrency?

alexpott’s picture

StatusFileSize
new1.09 KB

@mpdonadio I'm not sure concurrency has much to do with this - there are separate URLs, separate DBs, separate everythings... however it is worth testing this...

mpdonadio’s picture

In wonder if this is related to #2607572: Simpletest UI throwing 500 errors during tests where the batch tries to run another set, and the queue is actually empty, no it never actually finishes properly.

alexpott’s picture

One thought is that automated_cron is installed by system_update_8013() and there is nothing to prevent cron from running whilst the updates are still running. This seems a bit unexpected to me. Also given that cron on request termination it could be running at the same time drupal_flush_all_caches() is running. However if this was the cause I'd expect the missing updates to be more random :(

Still I'm pretty sure that cron shouldn't run if there are outstanding database updates.

mpdonadio’s picture

Two quick hacks to check the theory in #38.

alexpott’s picture

StatusFileSize
new886 bytes

Well we're not putting the site into maintenance mode :) and since #565154: cron.php will run regardless of whether the site is in maintenance mode or not. actually we prevent cron running in maintenance when run via the cron url so... let's put do updates properly :)

I propose that if this patch is green we commit it. Because we have no reliable way of causing the error yet so we can't prove if anything fixes it - only that it does not.

alexpott’s picture

StatusFileSize
new1010 bytes

Oops $this->assert(200) is just silly :) - also let's preserve the value just in case people are testing the updating does not change this value.

mpdonadio’s picture

#40, confused here, DbUpdateController::triggerBatch() puts the side in maintenance mode if it isn't already:

  protected function triggerBatch(Request $request) {
    $maintenance_mode = $this->state->get('system.maintenance_mode', FALSE);
    // Store the current maintenance mode status in the session so that it can
    // be restored at the end of the batch.
    $_SESSION['maintenance_mode'] = $maintenance_mode;
    // During the update, always put the site into maintenance mode so that
    // in-progress schema changes do not affect visiting users.
    if (empty($maintenance_mode)) {
      $this->state->set('system.maintenance_mode', TRUE);
    }

and then AutomatedCron::onTerminate() doesn't have any checks. TBH, I was just checking maintenance_mode because I was too lazy to set/clear a state variable before/after updates in DbUpdateController to test the theory in #38 about cron running during updates and triggering a cache clear in the middle.

The last submitted patch, 40: 2828559-40.patch, failed testing.

alexpott’s picture

@mpdonadio - ah you are right the checks are built into the cron route rather than the cron service. I think the checks should be in the cron service and not the cron controller.

mpdonadio’s picture

StatusFileSize
new1.61 KB
new636 bytes

So, the cron route in system.routing.yml is wired up to CronController::run(), which has no check on it at all for maintenance_mode (not sure when that diverted from #565154: cron.php will run regardless of whether the site is in maintenance mode or not.; can't fully trace out the history of cron.php getting converted to the current route implementation]).

Here is the access check in Cron::run() along with #41. This should prevent cron from running in maintenance_mode from anywhere (the route, the form, the event, the tests).

We may want to explicitly set a lock around the update and check that instead (or both?).

Side note in CronController:

 public function run() {
    $this->cron->run();

    // HTTP 204 is "No content", meaning "I did what you asked and we're done."
    return new Response('', 204);
  }

Since Cron::run() can return FALSE, we may want a followup here to check for the return code, and return a 503 if FALSE. There are similar checks when run manually (eg, CronForm::submitForm()).

Status: Needs review » Needs work

The last submitted patch, 45: 2828559-45.patch, failed testing.

mpdonadio’s picture

Status: Needs work » Needs review
StatusFileSize
new1.61 KB
new635 bytes
new567 bytes

Well that was embarrassing.

mpdonadio’s picture

Issue summary: View changes

Fails, https://www.drupal.org/pift-ci-job/550289

Oddly, it is a zero byte symptom, but something I don't recall seeing before

1) Drupal\Tests\rest\Functional\EntityResource\Comment\CommentJsonAnonTest::testDelete
GuzzleHttp\Exception\ConnectException: cURL error 28: Operation timed out after 30001 milliseconds with 0 bytes received (see http://curl.haxx.se/libcurl/c/libcurl-errors.html)

/var/www/html/vendor/guzzlehttp/guzzle/src/Handler/CurlFactory.php:186
/var/www/html/vendor/guzzlehttp/guzzle/src/Handler/CurlFactory.php:150
/var/www/html/vendor/guzzlehttp/guzzle/src/Handler/CurlFactory.php:103
/var/www/html/vendor/guzzlehttp/guzzle/src/Handler/CurlHandler.php:43
/var/www/html/vendor/guzzlehttp/guzzle/src/Handler/Proxy.php:28
/var/www/html/vendor/guzzlehttp/guzzle/src/Handler/Proxy.php:51
/var/www/html/core/lib/Drupal/Core/Test/HttpClientMiddleware/TestHttpClientMiddleware.php:31
/var/www/html/vendor/guzzlehttp/guzzle/src/PrepareBodyMiddleware.php:42
/var/www/html/vendor/guzzlehttp/guzzle/src/Middleware.php:36
/var/www/html/vendor/guzzlehttp/guzzle/src/RedirectMiddleware.php:52
/var/www/html/vendor/guzzlehttp/guzzle/src/Middleware.php:59
/var/www/html/vendor/guzzlehttp/guzzle/src/HandlerStack.php:67
/var/www/html/vendor/guzzlehttp/guzzle/src/Client.php:275
/var/www/html/vendor/guzzlehttp/guzzle/src/Client.php:123
/var/www/html/vendor/guzzlehttp/guzzle/src/Client.php:129
/var/www/html/vendor/fabpot/goutte/Goutte/Client.php:138
/var/www/html/vendor/symfony/browser-kit/Client.php:297
/var/www/html/vendor/symfony/browser-kit/Client.php:481
/var/www/html/vendor/symfony/browser-kit/Client.php:313
/var/www/html/vendor/behat/mink-browserkit-driver/src/BrowserKitDriver.php:144
/var/www/html/vendor/behat/mink/src/Session.php:143
/var/www/html/core/tests/Drupal/Tests/BrowserTestBase.php:372
/var/www/html/core/tests/Drupal/Tests/BrowserTestBase.php:475
/var/www/html/core/modules/rest/tests/src/Functional/ResourceTestBase.php:101
/var/www/html/core/modules/rest/tests/src/Functional/EntityResource/EntityResourceTestBase.php:154

Going to trigger a few more runs.

hchonov’s picture

We've noticed some strange test fails in our setups and they were caused when running tests through run-tests.sh, but they were not reproducible when running tests through the Simpletest UI. The same happens for example with #2837707-7: EntityChangedConstraintValidator should be retrieving the latest changed time through an entity query instead by loading the unchanged entity as well. The test fails with PHP 5.6 on the test bot and at my machine locally through run-tests.sh, but passes when I execute it through the Simpletest UI.

xjm’s picture

There's a new variety of this fail, possibly just the same bug reported in a new way with the new CI deployment:
https://www.drupal.org/pift-ci-job/556488

xjm’s picture

So a number of people have said they saw this fail a lot more frequently today on patches, to the point that they thought HEAD was broken. 8.3.x HEAD failed with different variants of it twice in a row before passing on a third retest.

alexpott’s picture

https://dispatcher.drupalci.org/job/php5.5_mysql5.5/4637/artifact/jenkin... shows that a segfault occurred on the recent fails. A segfault explains two things… the 0 byte response and the complete lack of error info - which has always been happening with this one

mpdonadio’s picture

#52, is there a core dump to go along with it so we can see if there is a stack trace, or is it long gone?

wim leers’s picture

Is this issue also the one that explains the near-constant failures in Drupal\views\Tests\Update\ArgumentPlaceholderUpdatePathTest since yesterday, for example https://www.drupal.org/pift-ci-job/557113?

dawehner’s picture

Note: This failure should be fixed by #2784159: Remove CURL timeout in BTB

wim leers’s picture

catch reverted a patch at #2834291-39: Add a SQL index for entity types that are using EntityPublishedInterface, as a best guess effort, but it didn't help, unfortunately.

mpdonadio’s picture

#55, currently that isn't applied to 8.2.x. But yeah, I think the error in #48 is a pretty clear indication that this is the 30 second problem in Guzzle.

dawehner’s picture

+++ b/core/modules/system/src/Tests/Update/UpdatePathTestBase.php
@@ -242,13 +242,17 @@ protected function runUpdates() {
 
+    $pre_update_maintenance_mode = \Drupal::state()->get('system.maintenance_mode', FALSE);
+    \Drupal::state()->set('system.maintenance_mode', TRUE);
     $this->drupalGet($this->updateUrl);
     $this->clickLink(t('Continue'));
 
     $this->doSelectionTest();
     // Run the update hooks.
     $this->clickLink(t('Apply pending updates'));
+    $this->assertResponse(200);
 
+    \Drupal::state()->set('system.maintenance_mode', $pre_update_maintenance_mode);

I am a bit confused about those changes, as ideally update.php toggles the maintenance mode for us already, doesn't it?

catch’s picture

StatusFileSize
new1.61 KB

The fails in #47 look to me like they might be different random test failures. If that's the case, we should consider committing this at least as an interim improvement. Posting the same patch again, and setting of some tests. And it definitely makes sense in itself regardless.

Status: Needs review » Needs work

The last submitted patch, 59: 2828559-59.patch, failed testing.

mpdonadio’s picture

In #59 it looks like none of the fails have the 0-byte thing, which suggests #2784159: Remove CURL timeout in BTB may be the true root cause.

hchonov’s picture

Has it been considered converting the update path tests to phpunit tests and see if the random failures still occur?

tacituseu’s picture

error.log from #59 (https://www.drupal.org/pift-ci-job/558039) has:

zend_mm_heap corrupted

in it

catch’s picture

The migrate_drupal_ui fails like https://www.drupal.org/pift-ci-job/558033 are with the old WebTestCase not BrowserTestBase so can't really be related.

Looks like at least some are failing with the 0 byte response too: https://www.drupal.org/pift-ci-job/558039

hchonov’s picture

Status: Needs work » Needs review
StatusFileSize
new454 bytes

What if we disable OPCache just to ensure it is not causing the problems? I am not sure if the Drupal CI is using mod_php?

Status: Needs review » Needs work

The last submitted patch, 65: disable_opcache_apache_php5.patch, failed testing.

Mixologic’s picture

re #64 the fail mentioned in #64 at least is showing zend_mm_heap corrupted in the error log, along with three apache core dumps: https://dispatcher.drupalci.org/job/default/284689/artifact/jenkins-defa...

mpdonadio’s picture

Status: Needs work » Needs review
StatusFileSize
new383 bytes

Have we tried the USE_ZEND_ALLOC=0 trick in the apache environment?

tacituseu’s picture

StatusFileSize
new967 bytes

This might be issue with peak memory usage under high concurrency, as couple of memory intensive tests are running near each other (UpdatePathTestBaseFilledTest, UpdatePathRC1TestBaseFilledTest), add to that update batches spawned from them and usage of zlib and it could get Apache running out of available memory trying to create/send response causing a segfault (so ulimately mpm_prefork/MaxRequestWorkers sizing issue).
Due to high concurrency tests are not finishing in deterministric order so that would explain why sometimes the peaks miss each other and they pass.

Traced the source of spikes to:
Drupal\system\Tests\Update\UpdatePathTestBase::setUp()

    // Load the database(s).
    foreach ($this->databaseDumpFiles as $file) {
      if (substr($file, -3) == '.gz') {
        $file = "compress.zlib://$file";
      }
      // spikes with '/../../../tests/fixtures/update/drupal-8.filled.standard.php.gz'
      require $file;
    } 

which gets called in a loop from Drupal\simpletest\TestBase::run()

    foreach ($test_methods as $method) {
      ...
      try {
        $this->setUp();
      }
      ...
      try {
        $this->$method();
      }
      ...
    }

So it is called 3 times for UpdatePathTestBaseFilledTest (testUpdatedSite, testDatabaseLoaded, testUpdateHookN) which makes it even worse.

Attached a crude patch to get some data on memory usage for each test, hope parsers won't mind abuse of #debug.

Status: Needs review » Needs work

The last submitted patch, 69: 2828559-69.patch, failed testing.

tacituseu’s picture

StatusFileSize
new3.79 KB

Now with phpunit support. Usage: check console output on dispatcher, 'messages' == 'bytes'.

tacituseu’s picture

Status: Needs work » Needs review

The last submitted patch, 7: 2828559.patch, failed testing.

alexpott’s picture

+++ b/core/scripts/run-tests.sh
@@ -457,6 +457,8 @@ function simpletest_script_init() {
+    // increase our chances, we want to get full profile
+    $args['concurrency'] = 8;

This will reduce the chances of concurrency issues on testbot? no? It has concurrency 31

tacituseu’s picture

StatusFileSize
new158.82 KB

@alexpott: Yes, it was about increasing chances for good pass, so we will get data from all the tests.

What I was after is finding out approximate max memory usage of 31 concurrently running tests.
In short PHP 5.5: 1.82GB, PHP 5.6: 1.83GB, PHP 7: 1.55GB. Now this is with concurrency forced to 8, normal run would have different characteristic (higher).
Details in attached spreadsheet.

tacituseu’s picture

StatusFileSize
new3.41 KB

Now to get one from normal (concurrency: 31) pass.

tacituseu’s picture

StatusFileSize
new5.5 KB

New one keeps log allowing to reconstruct what is executing at the same time, also covers update.php.

Status: Needs review » Needs work

The last submitted patch, 77: 2828559-77-2.patch, failed testing.

tacituseu’s picture

StatusFileSize
new5.5 KB

Once more.

tacituseu’s picture

Status: Needs work » Needs review

Status: Needs review » Needs work

The last submitted patch, 79: 2828559-79.patch, failed testing.

berdir’s picture

Is it correct that this only happens with 5.5?

What if we switch the default test branch to 7 until this is resolved at least? It looks like more patches fail with this than don't at the moment, which makes it extremely annoying to work on core atm.

catch’s picture

I tried to change the branch testing default, but the option is greyed out for me - I haven't change this since the UI change so not sure if it's me or the UI...

Mixologic’s picture

Sorry, looks like we have a bug in the qa page. I'll ping @drumm about it on monday.

claudiu.cristea’s picture

There's now way to temporarily increase the memory of the instances and see if the failure goes away?

tacituseu’s picture

StatusFileSize
new5.94 KB

Results from #79 give 2.13GB for PHP 5.5 and 8.3.x, noticed there's also 2GB APCu cache defined in the container, adding it as datapoint.

tacituseu’s picture

Status: Needs work » Needs review
wim leers’s picture

Title: More random fails in UpdatePathTestBase tests: "settings.cache failed with: missing schema" » Many UpdatePathTestBase tests failing 100% of the time
StatusFileSize
new387.46 KB

I think this is more accurate at this point.

Perhaps more importantly, I found a contrib module with the same problem: the CDN module, which I maintain! It does daily re-testing of the PHP7 branch. As such, it noticed that since December 20, it's failing 100% of the time in its update path test! See https://www.drupal.org/pift-ci-job/565105. Despite no commits since December 2.

Screenshot for posterity:

What's so great about this? Several things:

  1. it's much faster to test the CDN module than it is to test Drupal core
  2. it appears to happen 100% of the time
  3. it happens on PHP 7, which means this problem is not isolated to PHP 5.5

That should all be quite helpful in pinpointing the root cause.

wim leers’s picture

Title: Many UpdatePathTestBase tests failing 100% of the time » Many UpdatePathTestBase tests failing 100% of the time since DrupalCI update
wim leers’s picture

So I wonder if the CDN update path test and the failing update path tests in core are doing something particular that some other update path tests are not doing.

The fixture that the (sole) CDN update path test is using does not contain any serialized PHP. In fact, it might very well be the simplest of all fixtures. Yet it still causes this failure.

It also appears that it's failing with a far more useful error message:

exception: [Warning] Line 176 of core/modules/system/src/Tests/Update/UpdatePathTestBase.php:
require(compress.zlib:///var/www/html/modules/contrib/cdn/tests/src/Functional/Update/../../../../../../core/modules/system/tests/fixtures/update/drupal-8.bare.standard.php.gz): failed to open stream: operation failedrequire() (Line: 176)
Drupal\system\Tests\Update\UpdatePathTestBase->setUp() (Line: 947)
Drupal\simpletest\TestBase->run(Array) (Line: 752)
simpletest_script_run_one_test('1', 'Drupal\Tests\cdn\Functional\Update\CdnDefaultSettingsUpdateTest') (Line: 66)

This immediately points to the same area of code that @tacituseu traced it to in #69!

berdir’s picture

@Wim: That doesn't look like the same error to me, it clearly says that it is missing a file.

The reason for that apparently is that CI uses composer to install modules now and it does so in modules/contrib. You have a hardcoded relative path to the database dump, that doesn't work. You need a drupal_get_path() or something there, or (as that will likely not work at that point) some lower-level discovery API or some dynamic logic to find the modules folder. Keep in mind that technically, modules can still be in sites/all/modules/contrib or site specific folders too.

alexpott’s picture

Thinking about things the bots have the following config:

opcache.enable_cli => On => On
opcache.max_accelerated_files => 100000 => 100000
opcache.max_file_size => 0 => 0

So we're going to opcache the uncompressed database file for no good reason. Maybe it just breaks.

alexpott’s picture

I'm not sure why enable_cli is actually on - I can't remember if this helps on not. But if it was off we'd not cache these files.

alexpott’s picture

StatusFileSize
new695 bytes
alexpott’s picture

So I think we can exclude opcaching of the database dumps as the issue because we have a fail in #94 - https://www.drupal.org/pift-ci-job/565630

alexpott’s picture

Also thinking about it some more it can't the extraction that is the problem because it is failing whilst running the updates.

alexpott’s picture

Title: Many UpdatePathTestBase tests failing 100% of the time since DrupalCI update » UpdatePathTestBase tests randomly failing

Updating issue title since #88 is a red-herring and the fails fortunately are not 100% of the time.

hchonov’s picture

So if I understand it correctly what @tacituseu has shown is :

- concurrency=8 : memory consumption for PHP 5.5 = 1.82GB and all the tests pass

- concurrency=31 : memory consumption for PHP 5.5 = 2.13GB and update tests fail

This sounds like some 2GB memory limit and also that

a) Drupal increased the memory usage or has now new tests that require much more memory

or

b) a server setup has changed introducing this memory limit.

So in order to get the tests pass again and the work on the other issues continue probably we should set the concurrency to 8 as a temporary workaround and investigate further the problem here.

hchonov’s picture

Also it probably makes sense to execute each failed test again with concurency=1 to ensure we would not go into failing tests due to memory limits.

tacituseu’s picture

PHP 7 failures seem to be a recent and different issue.

With regard to my memory usage hunt I expected it to be much worse.

The bots are running on cc2.8xlarge AWS instances, and from the testbot code it doesn't look like containers are run with memory constraints so it is unlikely to be the reason (it still might be for local/Vagrant testing due to : v.customize [ "modifyvm", :id, "--memory", "4096" ]).

tacituseu’s picture

As for #98 and 1.82GB vs 2.13GB:
1.82GB was just an approximation by summing memory usage of tests finishing together, there is plenty of small tests that come and go quite quickly and lower the figure.
2.13GB figure takes into account which tests are actually running together ($children array in simpletest_script_execute_batch()). At the end of the run there is a bunch of tests in it that are longer running/taking more resources:

Drupal\system\Tests\Module\InstallUninstallTest  (this one is cooking for 20minutes !!)
Drupal\system\Tests\Update\UpdatePathTestBaseFilledTest (8 mins)
Drupal\user\Tests\UserPasswordResetTest (5min)
Drupal\views\Tests\Handler\AreaTest (2min)
Drupal\views\Tests\Handler\HandlerTest (2min)
Drupal\views\Tests\Plugin\DisplayTest
Drupal\views\Tests\Plugin\ExposedFormTest
Drupal\views\Tests\Plugin\PagerTest
Drupal\views\Tests\Update\ArgumentPlaceholderUpdatePathTest
Drupal\views\Tests\Wizard\TaggedWithTest
Drupal\views_ui\Tests\AreaEntityUITest
Drupal\views_ui\Tests\CustomBooleanTest
Drupal\views_ui\Tests\DefaultViewsTest
Drupal\views_ui\Tests\DisplayCRUDTest
Drupal\views_ui\Tests\DisplayFeedTest
Drupal\views_ui\Tests\DisplayPathTest
Drupal\views_ui\Tests\DisplayTest
Drupal\views_ui\Tests\ExposedFormUITest
Drupal\views_ui\Tests\FieldUITest
Drupal\views_ui\Tests\FilterBooleanWebTest
Drupal\views_ui\Tests\FilterNumericWebTest
Drupal\views_ui\Tests\FilterUITest
Drupal\views_ui\Tests\GroupByTest
Drupal\views_ui\Tests\HandlerTest
Drupal\views_ui\Tests\NewViewConfigSchemaTest
Drupal\views_ui\Tests\OverrideDisplaysTest
Drupal\views_ui\Tests\PreviewTest
Drupal\views_ui\Tests\QueryTest
Drupal\views_ui\Tests\RearrangeFieldsTest
Drupal\views_ui\Tests\RedirectTest
Drupal\views_ui\Tests\ReportFieldsTest
claudiu.cristea’s picture

If we know the date when the failures start to occur, let's add a patch that reverts all changes since that date and see if the failures persist. If yes, there it's something with the infrastructure.

xjm’s picture

@claudiu.cristea, these fails have been happening for most of 2016 at least. One difference is that they seem to be happening more frequently now.

mpdonadio’s picture

StatusFileSize
new449 bytes

How long would test runs take if we eliminate concurrency as a stop-gap? Or maybe drop it to something really low? There are a lot of patches that won't stay RTBC, and this is really become problematic for some issues. Being selfish here, but I feel like I lost a lot of time out of the 8.2.x cycle with being able to get patches in.

Status: Needs review » Needs work

The last submitted patch, 104: 2828559-104.patch, failed testing.

Mixologic’s picture

Sorry. Testbots are configured to abort after 1:50 minutes to trap hung builds.

I've been working on new container builds and should have the entire process reconstructed by next week - from that point forward troubleshooting environment issues will be much, much easier to manage.

hchonov’s picture

Status: Needs work » Needs review
StatusFileSize
new2.37 KB

What about running the update tests with concurrency=1 and all other with concurrency=31?

Mixologic’s picture

I dont think thats going to work. Run-tests.sh calls itself, and the tests are executed in an arbitrary order, so there may already be 27 tests running when the update test gets executed.

In addition it appears as though that patch revealed something that *should* be failing the test. The log for that passing test has a fatal error in run-tests.sh:
https://dispatcher.drupalci.org/job/default/293122/consoleFull

Fatal error: Allowed memory size of 335544320 bytes exhausted (tried to allocate 1048576 bytes) in /var/www/html/core/scripts/run-tests.sh on line 640
PHP Fatal error:  Allowed memory size of 335544320 bytes exhausted (tried to allocate 1048576 bytes) in /var/www/html/core/scripts/run-tests.sh on line 640

So looks like drupalci isnt catching fatals in run-tests.sh. bummer.

Mixologic’s picture

Been working on new containers and container build process: some input/+1s etc would be helpful here: #2841441: Migrating to new php container versions

hchonov’s picture

StatusFileSize
new483 bytes

All the tests should pass now when we force the garbage collection in the setup of update tests.

catch’s picture

https://www.drupal.org/pift-ci-job/567196 looks the same to me.

We've switched the default patch testing environment to PHP 5.6 to try to stop the issue queue falling over all the time.

hchonov’s picture

So it was just a luck that the patch was green in #2816097-50: IGNORE patch testing issue :(

hchonov’s picture

StatusFileSize
new437 bytes
Mixologic’s picture

Okay, so I have drupalci now set so that the host AMI allows for core dumps, and the containers are being started with the proper ulimits. I still need to have them dropping the cores in the right place and adding the php-src so we have zbacktrace as well as apache-dbg, but at least we're making core files if I can catch them.

I *did* catch one for this pgsql/mysql 5.5 fail:https://dispatcher.drupalci.org/job/php5.5_postgres9.1/1670/

root@8c88fb795987:/var/lib/drupalci/artifacts# gdb /usr/lib/debug/usr/sbin/apache2 core.apache2.11.1483729544.59632
GNU gdb (Ubuntu 7.7.1-0ubuntu5~14.04.2) 7.7.1
Copyright (C) 2014 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/lib/debug/usr/sbin/apache2...done.
[New LWP 59632]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/usr/sbin/apache2 -D FOREGROUND'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  zval_scan (pz=0x0) at /tmp/php-build/source/5.5.23/Zend/zend_gc.c:516
516							p = p->pListNext;
(gdb) bt
#0  zval_scan (pz=0x0) at /tmp/php-build/source/5.5.23/Zend/zend_gc.c:516
#1  0x00007f62189b2625 in zval_scan (pz=<optimized out>) at /tmp/php-build/source/5.5.23/Zend/zend_gc.c:568
#2  0x00007f62189b2665 in zval_scan (pz=<optimized out>) at /tmp/php-build/source/5.5.23/Zend/zend_gc.c:544
#3  0x00007f62189b2625 in zval_scan (pz=<optimized out>) at /tmp/php-build/source/5.5.23/Zend/zend_gc.c:568
#4  0x00007f62189b2625 in zval_scan (pz=<optimized out>) at /tmp/php-build/source/5.5.23/Zend/zend_gc.c:568
#5  0x00007f62189b2665 in zval_scan (pz=<optimized out>) at /tmp/php-build/source/5.5.23/Zend/zend_gc.c:544
#6  0x00007f62189b2625 in zval_scan (pz=<optimized out>) at /tmp/php-build/source/5.5.23/Zend/zend_gc.c:568
#7  0x00007f62189b2625 in zval_scan (pz=<optimized out>) at /tmp/php-build/source/5.5.23/Zend/zend_gc.c:568
#8  0x00007f62189b29b9 in gc_scan_roots () at /tmp/php-build/source/5.5.23/Zend/zend_gc.c:627
#9  gc_collect_cycles () at /tmp/php-build/source/5.5.23/Zend/zend_gc.c:796
#10 0x00007f62189b3174 in gc_zval_possible_root (zv=zv@entry=0x7f621fa59bb8) at /tmp/php-build/source/5.5.23/Zend/zend_gc.c:166
#11 0x00007f6218a3fdbd in gc_zval_check_possible_root (z=0x7f621fa59bb8) at /tmp/php-build/source/5.5.23/Zend/zend_gc.h:183
#12 i_zval_ptr_dtor (zval_ptr=0x7f621fa59bb8) at /tmp/php-build/source/5.5.23/Zend/zend_execute.h:90
#13 i_free_compiled_variables (execute_data=0x7f621cd67130) at /tmp/php-build/source/5.5.23/Zend/zend_execute.c:1510
#14 zend_leave_helper_SPEC (execute_data=0x7f621cd67130) at /tmp/php-build/source/5.5.23/Zend/zend_vm_execute.h:399
#15 0x00007f62189bdbe8 in execute_ex (execute_data=0x7f621cd67130) at /tmp/php-build/source/5.5.23/Zend/zend_vm_execute.h:363
#16 0x00007f6218986128 in zend_call_function (fci=fci@entry=0x7ffd99f476c0, fci_cache=<optimized out>, fci_cache@entry=0x7ffd99f47690) at /tmp/php-build/source/5.5.23/Zend/zend_execute_API.c:934
#17 0x00007f6218885d92 in zif_call_user_func_array (ht=<optimized out>, return_value=0x7f621f2afdb8, return_value_ptr=<optimized out>, this_ptr=<optimized out>, return_value_used=<optimized out>)
    at /tmp/php-build/source/5.5.23/ext/standard/basic_functions.c:4812
#18 0x00007f6218a43688 in zend_do_fcall_common_helper_SPEC (execute_data=<optimized out>) at /tmp/php-build/source/5.5.23/Zend/zend_vm_execute.h:550
#19 0x00007f62189bdbe8 in execute_ex (execute_data=0x7f621cd64df8) at /tmp/php-build/source/5.5.23/Zend/zend_vm_execute.h:363
#20 0x00007f6218986128 in zend_call_function (fci=fci@entry=0x7ffd99f479b0, fci_cache=<optimized out>, fci_cache@entry=0x7ffd99f47980) at /tmp/php-build/source/5.5.23/Zend/zend_execute_API.c:934
#21 0x00007f6218885d92 in zif_call_user_func_array (ht=<optimized out>, return_value=0x7f621eb283b8, return_value_ptr=<optimized out>, this_ptr=<optimized out>, return_value_used=<optimized out>)
    at /tmp/php-build/source/5.5.23/ext/standard/basic_functions.c:4812
#22 0x00007f6218a43688 in zend_do_fcall_common_helper_SPEC (execute_data=<optimized out>) at /tmp/php-build/source/5.5.23/Zend/zend_vm_execute.h:550
#23 0x00007f62189bdbe8 in execute_ex (execute_data=0x7f621cd64720) at /tmp/php-build/source/5.5.23/Zend/zend_vm_execute.h:363
#24 0x00007f6218995620 in zend_execute_scripts (type=type@entry=8, retval=retval@entry=0x0, file_count=file_count@entry=3) at /tmp/php-build/source/5.5.23/Zend/zend.c:1327
#25 0x00007f6218936172 in php_execute_script (primary_file=primary_file@entry=0x7ffd99f49ea0) at /tmp/php-build/source/5.5.23/main/main.c:2525
#26 0x00007f6218a450da in php_handler (r=<optimized out>) at /tmp/php-build/source/5.5.23/sapi/apache2handler/sapi_apache2.c:667
#27 0x00007f621cc26be0 in ?? ()
#28 0x00007f621ccdd0a0 in ?? ()
#29 0x0000000000000000 in ?? ()
(gdb) source /tmp/php
php-build/ phpenv/
(gdb) source /tmp/php-build/source/5.5.23/.g
.gdbinit        .gitattributes  .gitignore
(gdb) source /tmp/php-build/source/5.5.23/.gdbinit
(gdb) zbacktrace
[0x1cd66f80] mapConditionOperator() /var/www/html/core/lib/Drupal/Core/Database/Query/Condition.php:207
[0x1cd66d88] compile() /var/www/html/core/lib/Drupal/Core/Database/Query/Condition.php:172
[0x1cd66c10] compile() /var/www/html/core/lib/Drupal/Core/Database/Query/Select.php:227
[0x1cd66ab8] compile() /var/www/html/core/lib/Drupal/Core/Database/Query/Select.php:431
[0x1cd669d0] getArguments() /var/www/html/core/lib/Drupal/Core/Database/Query/Select.php:507
[0x1cd668c8] execute() /var/www/html/core/lib/Drupal/Core/Database/Driver/pgsql/Select.php:145
[0x1cd66760] execute() /var/www/html/core/lib/Drupal/Core/Database/Query/Merge.php:365
[0x1cd66608] execute() /var/www/html/core/lib/Drupal/Core/Config/DatabaseStorage.php:150
[0x1cd664c0] doWrite() /var/www/html/core/lib/Drupal/Core/Config/DatabaseStorage.php:123
[0x1cd66388] write() /var/www/html/core/lib/Drupal/Core/Config/CachedStorage.php:124
[0x1cd66230] write() /var/www/html/core/lib/Drupal/Core/Config/Config.php:222
[0x1cd66040] save() /var/www/html/core/lib/Drupal/Core/Config/Entity/ConfigEntityStorage.php:280
[0x1cd65ee8] doSave() /var/www/html/core/lib/Drupal/Core/Entity/EntityStorageBase.php:392
[0x1cd65db8] save() /var/www/html/core/lib/Drupal/Core/Config/Entity/ConfigEntityStorage.php:259
[0x1cd65c78] save() /var/www/html/core/lib/Drupal/Core/Entity/Entity.php:364
[0x1cd65ba0] save() /var/www/html/core/lib/Drupal/Core/Config/Entity/ConfigEntityBase.php:637
[0x1cd659f8] save() /var/www/html/core/modules/views/src/EventSubscriber/ViewsEntitySchemaSubscriber.php:190
[0x1cd65880] onEntityTypeUpdate() /var/www/html/core/lib/Drupal/Core/Entity/EntityTypeEventSubscriberTrait.php:47
[0x1cd65710] onEntityTypeEvent() /var/www/html/core/lib/Drupal/Component/EventDispatcher/ContainerAwareEventDispatcher.php:111
[0x1cd65558] dispatch() /var/www/html/core/lib/Drupal/Core/Entity/EntityTypeListener.php:95
[0x1cd65440] onEntityTypeUpdate() /var/www/html/core/lib/Drupal/Core/Entity/EntityManager.php:393
[0x1cd65328] onEntityTypeUpdate() /var/www/html/core/lib/Drupal/Core/Entity/EntityDefinitionUpdateManager.php:144
[0x1cd65208] updateEntityType() /var/www/html/core/modules/system/system.install:1172
[0x1cd65078] system_update_8004() /var/www/html/core/includes/update.inc:178
[0x99f475c0] update_do_one()
[0x1cd64df8] call_user_func_array() /var/www/html/core/includes/batch.inc:252
[0x1cd64c28] _batch_process() /var/www/html/core/includes/batch.inc:144
[0x1cd64a40] _batch_progress_page() /var/www/html/core/includes/batch.inc:72
[0x1cd648a0] _batch_page() /var/www/html/core/modules/system/src/Controller/DbUpdateController.php:186
[0x99f478b0] handle()
[0x1cd64720] call_user_func_array() /var/www/html/core/lib/Drupal/Core/Update/UpdateKernel.php:110
[0x1cd645c0] handleRaw() /var/www/html/core/lib/Drupal/Core/Update/UpdateKernel.php:73
[0x1cd644b0] handle() /var/www/html/update.php:19

Hopefully that gives us something to go on. The container build process has run into some snags that I have to work out (notable 5.6.29 had a core dump, 2 failed tests, and took 1hr 20 min.

Mixologic’s picture

Got it to fail again, looks like its the exact same spot (this time on mysql and not pgsql)
https://dispatcher.drupalci.org/job/php5.5_mysql5.5/4714/

PHP5.5 MySQL 5.5

(gdb) source /tmp/php-build/source/5.5.23/.gdbinit
(gdb) bt --help
No symbol "help" in current context.
(gdb) bt help
No symbol "help" in current context.
(gdb) bt
#0  zval_scan (pz=0x0) at /tmp/php-build/source/5.5.23/Zend/zend_gc.c:516
#1  0x00007f8e00149625 in zval_scan (pz=<optimized out>) at /tmp/php-build/source/5.5.23/Zend/zend_gc.c:568
#2  0x00007f8e00149625 in zval_scan (pz=<optimized out>) at /tmp/php-build/source/5.5.23/Zend/zend_gc.c:568
#3  0x00007f8e00149625 in zval_scan (pz=<optimized out>) at /tmp/php-build/source/5.5.23/Zend/zend_gc.c:568
#4  0x00007f8e00149665 in zval_scan (pz=<optimized out>) at /tmp/php-build/source/5.5.23/Zend/zend_gc.c:544
#5  0x00007f8e00149625 in zval_scan (pz=<optimized out>) at /tmp/php-build/source/5.5.23/Zend/zend_gc.c:568
#6  0x00007f8e00149625 in zval_scan (pz=<optimized out>) at /tmp/php-build/source/5.5.23/Zend/zend_gc.c:568
#7  0x00007f8e001499b9 in gc_scan_roots () at /tmp/php-build/source/5.5.23/Zend/zend_gc.c:627
#8  gc_collect_cycles () at /tmp/php-build/source/5.5.23/Zend/zend_gc.c:796
#9  0x00007f8e0014a174 in gc_zval_possible_root (zv=zv@entry=0x7f8e07298988) at /tmp/php-build/source/5.5.23/Zend/zend_gc.c:166
#10 0x00007f8e001da5ae in gc_zval_check_possible_root (z=0x7f8e07298988) at /tmp/php-build/source/5.5.23/Zend/zend_gc.h:183
#11 i_zval_ptr_dtor (zval_ptr=0x7f8e07298988) at /tmp/php-build/source/5.5.23/Zend/zend_execute.h:90
#12 zend_vm_stack_clear_multiple (nested=1) at /tmp/php-build/source/5.5.23/Zend/zend_execute.h:298
#13 zend_do_fcall_common_helper_SPEC (execute_data=<optimized out>) at /tmp/php-build/source/5.5.23/Zend/zend_vm_execute.h:642
#14 0x00007f8e00154be8 in execute_ex (execute_data=0x7f8e044feb18) at /tmp/php-build/source/5.5.23/Zend/zend_vm_execute.h:363
#15 0x00007f8e0011d128 in zend_call_function (fci=fci@entry=0x7f8e009ea760 <basic_globals+416>, fci_cache=<optimized out>, fci_cache@entry=0x7f8e009ea7a8 <basic_globals+488>) at /tmp/php-build/source/5.5.23/Zend/zend_execute_API.c:934
#16 0x00007f8e00010663 in php_array_walk (target_hash=0x7f8e0764dde8, userdata=0x0, recursive=recursive@entry=0) at /tmp/php-build/source/5.5.23/ext/standard/array.c:1094
#17 0x00007f8e00014de0 in zif_array_walk (ht=<optimized out>, return_value=0x7f8e073925a0, return_value_ptr=<optimized out>, this_ptr=<optimized out>, return_value_used=<optimized out>) at /tmp/php-build/source/5.5.23/ext/standard/array.c:1139
#18 0x00007f8e001da688 in zend_do_fcall_common_helper_SPEC (execute_data=<optimized out>) at /tmp/php-build/source/5.5.23/Zend/zend_vm_execute.h:550
#19 0x00007f8e00154be8 in execute_ex (execute_data=0x7f8e044fd688) at /tmp/php-build/source/5.5.23/Zend/zend_vm_execute.h:363
#20 0x00007f8e0011d128 in zend_call_function (fci=fci@entry=0x7fffa55d6020, fci_cache=<optimized out>, fci_cache@entry=0x7fffa55d5ff0) at /tmp/php-build/source/5.5.23/Zend/zend_execute_API.c:934
#21 0x00007f8e0001cd92 in zif_call_user_func_array (ht=<optimized out>, return_value=0x7f8e05826870, return_value_ptr=<optimized out>, this_ptr=<optimized out>, return_value_used=<optimized out>) at /tmp/php-build/source/5.5.23/ext/standard/basic_functions.c:4812
#22 0x00007f8e001da688 in zend_do_fcall_common_helper_SPEC (execute_data=<optimized out>) at /tmp/php-build/source/5.5.23/Zend/zend_vm_execute.h:550
#23 0x00007f8e00154be8 in execute_ex (execute_data=0x7f8e044fbdf8) at /tmp/php-build/source/5.5.23/Zend/zend_vm_execute.h:363
#24 0x00007f8e0011d128 in zend_call_function (fci=fci@entry=0x7fffa55d6310, fci_cache=<optimized out>, fci_cache@entry=0x7fffa55d62e0) at /tmp/php-build/source/5.5.23/Zend/zend_execute_API.c:934
#25 0x00007f8e0001cd92 in zif_call_user_func_array (ht=<optimized out>, return_value=0x7f8e05805630, return_value_ptr=<optimized out>, this_ptr=<optimized out>, return_value_used=<optimized out>) at /tmp/php-build/source/5.5.23/ext/standard/basic_functions.c:4812
#26 0x00007f8e001da688 in zend_do_fcall_common_helper_SPEC (execute_data=<optimized out>) at /tmp/php-build/source/5.5.23/Zend/zend_vm_execute.h:550
#27 0x00007f8e00154be8 in execute_ex (execute_data=0x7f8e044fb720) at /tmp/php-build/source/5.5.23/Zend/zend_vm_execute.h:363
#28 0x00007f8e0012c620 in zend_execute_scripts (type=type@entry=8, retval=retval@entry=0x0, file_count=file_count@entry=3) at /tmp/php-build/source/5.5.23/Zend/zend.c:1327
#29 0x00007f8e000cd172 in php_execute_script (primary_file=primary_file@entry=0x7fffa55d8800) at /tmp/php-build/source/5.5.23/main/main.c:2525
#30 0x00007f8e001dc0da in php_handler (r=<optimized out>) at /tmp/php-build/source/5.5.23/sapi/apache2handler/sapi_apache2.c:667
#31 0x00007f8e043bdbe0 in ?? ()
#32 0x00007f8e044690a0 in ?? ()
#33 0x0000000000000000 in ?? ()
(gdb) zbacktrace
[0x044feb18] getColumns() /var/www/html/core/lib/Drupal/Core/Field/BaseFieldDefinition.php:663
[0x044fe9d0] getColumns() /var/www/html/core/lib/Drupal/Core/Entity/Sql/DefaultTableMapping.php:204
[0x044fe878] getFieldColumnName() /var/www/html/core/lib/Drupal/Core/Entity/Sql/DefaultTableMapping.php:190
[0x044fe6e0] getColumnNames() /var/www/html/core/lib/Drupal/Core/Entity/Sql/DefaultTableMapping.php:107
[0x044fe570] getAllColumns() /var/www/html/core/lib/Drupal/Core/Entity/Query/Sql/Tables.php:374
[0x044fe248] getTableMapping() /var/www/html/core/lib/Drupal/Core/Entity/Query/Sql/Tables.php:197
[0x044fe098] addField() /var/www/html/core/lib/Drupal/Core/Entity/Query/Sql/Condition.php:44
[0x044fdf58] compile() /var/www/html/core/lib/Drupal/Core/Entity/Query/Sql/Query.php:155
[0x044fde90] compile() /var/www/html/core/lib/Drupal/Core/Entity/Query/Sql/Query.php:74
[0x044fdd98] execute() /var/www/html/core/lib/Drupal/Core/Entity/EntityStorageBase.php:503
[0x044fdc58] loadByProperties() /var/www/html/core/lib/Drupal/Core/Entity/EntityRepository.php:52
[0x044fdb20] loadEntityByUuid() /var/www/html/core/lib/Drupal/Core/Entity/EntityRepository.php:73
[0x044fda08] loadEntityByConfigTarget() /var/www/html/core/lib/Drupal/Core/Entity/EntityManager.php:368
[0x044fd910] loadEntityByConfigTarget() /var/www/html/core/modules/views/src/Plugin/views/area/Entity.php:190
[0x044fd7d8] calculateDependencies() /var/www/html/core/lib/Drupal/Core/Plugin/PluginDependencyTrait.php:35
[0xa55d5b50] calculatePluginDependencies()
[0x044fd688] array_walk() /var/www/html/core/modules/views/src/Plugin/views/display/DisplayPluginBase.php:953
[0x044fd570] calculateDependencies() /var/www/html/core/lib/Drupal/Core/Plugin/PluginDependencyTrait.php:35
[0x044fd450] calculatePluginDependencies() /var/www/html/core/modules/views/src/Entity/View.php:281
[0x044fd2b8] calculateDependencies() /var/www/html/core/lib/Drupal/Core/Config/Entity/ConfigEntityBase.php:346
[0x044fd168] preSave() /var/www/html/core/modules/views/src/Entity/View.php:291
[0x044fd038] preSave() /var/www/html/core/lib/Drupal/Core/Entity/EntityStorageBase.php:434
[0x044fcee8] doPreSave() /var/www/html/core/lib/Drupal/Core/Entity/EntityStorageBase.php:389
[0x044fcdb8] save() /var/www/html/core/lib/Drupal/Core/Config/Entity/ConfigEntityStorage.php:259
[0x044fcc78] save() /var/www/html/core/lib/Drupal/Core/Entity/Entity.php:364
[0x044fcba0] save() /var/www/html/core/lib/Drupal/Core/Config/Entity/ConfigEntityBase.php:637
[0x044fc9f8] save() /var/www/html/core/modules/views/src/EventSubscriber/ViewsEntitySchemaSubscriber.php:190
[0x044fc880] onEntityTypeUpdate() /var/www/html/core/lib/Drupal/Core/Entity/EntityTypeEventSubscriberTrait.php:47
[0x044fc710] onEntityTypeEvent() /var/www/html/core/lib/Drupal/Component/EventDispatcher/ContainerAwareEventDispatcher.php:111
[0x044fc558] dispatch() /var/www/html/core/lib/Drupal/Core/Entity/EntityTypeListener.php:95
[0x044fc440] onEntityTypeUpdate() /var/www/html/core/lib/Drupal/Core/Entity/EntityManager.php:393
[0x044fc328] onEntityTypeUpdate() /var/www/html/core/lib/Drupal/Core/Entity/EntityDefinitionUpdateManager.php:144
[0x044fc208] updateEntityType() /var/www/html/core/modules/system/system.install:1210
[0x044fc078] system_update_8004() /var/www/html/core/includes/update.inc:178
[0xa55d5f20] update_do_one()
[0x044fbdf8] call_user_func_array() /var/www/html/core/includes/batch.inc:252
[0x044fbc28] _batch_process() /var/www/html/core/includes/batch.inc:144
[0x044fba40] _batch_progress_page() /var/www/html/core/includes/batch.inc:72
[0x044fb8a0] _batch_page() /var/www/html/core/modules/system/src/Controller/DbUpdateController.php:186
[0xa55d6210] handle()
[0x044fb720] call_user_func_array() /var/www/html/core/lib/Drupal/Core/Update/UpdateKernel.php:110
[0x044fb5c0] handleRaw() /var/www/html/core/lib/Drupal/Core/Update/UpdateKernel.php:73
[0x044fb4b0] handle() /var/www/html/update.php:19
tacituseu’s picture

StatusFileSize
new326 bytes
catch’s picture

This needs a comment, but very nice find and I think we should commit it to start stabilising things.

xjm’s picture

StatusFileSize
new292 bytes

I queued more runs to demonstrate with reasonable statistical certainty that it is actually improving things over the baseline. Also here is a control patch to compare it to the current stability on CI.

tacituseu’s picture

Only meant the patch as a confirmation of the problem, which is a NULL pointer dereference in garbage collector.

Maybe there is some other solution:
- figure out which update produces so many objects and then either refactor it or force batch to run it in its own request or call gc_disable() only in it ?
- does it warrant raising minimum version required ?

Also one of the postgre fails (https://www.drupal.org/pift-ci-job/567927) is another issue #2157927: Intermittent test fails in LocaleUpdateTest::testUpdateImportSourceRemote(), the others are different but there are no segfaults.

Mixologic’s picture

Looks like that one fail on mysql 5.5 in that last battery on #117 is just another example of a different intermittent failure. #2806697: Random fail for AlreadyInstalledException

xjm’s picture

@tacituseu, really great work tracking this down. With the additional test runs I think we can actually say with some confidence that you've found the source of a bug that has eluded us since last June. (The fails on #117 are other known fails, so we exclude them from the count.)

I actually would consider committing the patch from #117 or something like it as a hotfix. The risk is maybe that update processes on real sites would run out of memory? Could we just disable garbage collection only in update tests somehow?

xjm’s picture

tacituseu’s picture

#122: should be possible by checking User-Agent, from Drupal\simpletest\WebTestBase::curlInitialize():

    curl_setopt($this->curlHandle, CURLOPT_USERAGENT, drupal_generate_test_ua($this->databasePrefix)); 
catch’s picture

Yes we still have drupal_valid_test_ua() and could use that. Given this is a PHP bug, I'd prefer the hotfix to trying to refactor an update to not trigger it - we don't know what contrib updates will do and they can run tests as well.

catch’s picture

StatusFileSize
new579 bytes

Here it is with the user agent check and a comment.

alexpott’s picture

I think it is more likely that this will help real sites rather than cause memory issues. The big question for me is what could be making this an effective fix. I.e. where are we generating so many objects that need cleaning up. Config schema generation might be one such place.

See the following links for numbers on turning of gc in composer...
Issue in composer: https://github.com/composer/composer/pull/3482#issuecomment-65199153
Solution and people replies: https://github.com/composer/composer/commit/ac676f47f7bbc619678a29deae09...

Status: Needs review » Needs work

The last submitted patch, 126: 2828559-126.patch, failed testing.

alexpott’s picture

Basically I think if gc_disable() helps testbot then it is going to help everyone.

alexpott’s picture

We need to call gc_collect_cycles() before calling gc_disable() see https://github.com/composer/composer/blob/b2efcb1078b8abfb9f20ae228dc255...

alexpott’s picture

Status: Needs work » Reviewed & tested by the community
StatusFileSize
new572 bytes

After thinking about this some more I think we should proceed with a quick fix for testbot and then open a follow up to get some real numbers about the affect of this.

alexpott’s picture

Status: Reviewed & tested by the community » Fixed

Committed and pushed 528e307 to 8.3.x and e690435 to 8.2.x. Thanks!

  • alexpott committed 528e307 on 8.3.x
    Issue #2828559 by tacituseu, alexpott, mpdonadio, amateescu, hchonov,...

  • alexpott committed e690435 on 8.2.x
    Issue #2828559 by tacituseu, alexpott, mpdonadio, amateescu, hchonov,...
xjm’s picture

Issue tags: +Needs followup
xjm’s picture

penyaskito’s picture

I'm still suffering this issue in the Lingotek module. See https://www.drupal.org/pift-ci-job/571581 as an example of another UpdatePathTestBase based test failing. Configured to use 8.2.x + PHP7.

alexpott’s picture

@penyaskito I don't think those fails are the same - firstly it does not look random. There's no 0 byte request and there is no segmentation fault.

penyaskito’s picture

@alexpott You're right, found my issue. Sorry for the annoyance.

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.

xjm’s picture