CommentFileSizeAuthor
#29 2898721-random.patch339 bytesmpdonadio
#14 2898721-no-gc.patch561 bytesmpdonadio
#11 revert-14c2920589c.patch747 bytesxjm
#8 revert-43c568a71.patch12.94 KBxjm
#8 baseline-fail.patch298 bytesxjm
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

tedbow created an issue. See original summary.

timmillwood’s picture

Issue summary: View changes

Adding https://www.drupal.org/pift-ci-job/728179 which looks to be the same issue.

tedbow’s picture

Issue summary: View changes
tacituseu’s picture

Priority: Normal » Critical
tacituseu’s picture

xjm’s picture

Title: Random failure FileFieldWidgetTest::testMultiValuedWidget » Random segfault currently in FileFieldWidgetTest::testMultiValuedWidget()

It's a segfault. Goody!

xjm’s picture

This is happening in a high rate in HEAD on 5.6 environments:
8.4.x: https://www.drupal.org/pift-ci-job/728200
8.5.x: https://www.drupal.org/pift-ci-job/728316

tacituseu’s picture

xjm’s picture

@tacituseu, good catch. Hmm.

mpdonadio’s picture

Have we gotten a core dump yet? Wonder if #2828559: UpdatePathTestBase tests randomly failing / #2842393: Discover why gc_disable() improves update.php stability are related? That fix only went into update.php. We may want to try a patch where we disable GC for all test runs...

xjm’s picture

So far I have only seen the segfault on:

  • 8.4.x or 8.5.x (not 8.3.x)
  • PHP 5.6

#2891911: Random fail in Drupal\Tests\locale\Functional\LocaleTranslationUiTest::testStringTranslation was backported to 8.3.x though.

mpdonadio’s picture

tacituseu’s picture

@mpdonadio: almost all of them are in gc_collect_cycles().

tacituseu’s picture

https://www.drupal.org/pift-ci-job/728169 was the first failing (in issue testing), did so 2 hours before the one from HEAD (https://www.drupal.org/pift-ci-job/728146).

The thing is, console logs say they both checked out the same commit:

Git Command: git clone -b 8.4.x --depth 1 git://git.drupal.org/project/drupal.git '/var/lib/drupalci/workspace/jenkins-drupal_patches-24621/source'
Git commit info:
6cec75b (grafted, HEAD, origin/HEAD, origin/8.4.x, 8.4.x) Issue #2849674 by mxh, Lendude, podarok, pingwin4eg, andypost, axel.rutz, catch: Complex job in ViewExecutable::unserialize() causes data corruption

Git Command: git clone -b 8.5.x --depth 1 git://git.drupal.org/project/drupal.git '/var/lib/drupalci/workspace/jenkins-php5.6_mysql5.5-3072/source'
Git commit info:
43c568a (grafted, HEAD, origin/8.5.x, 8.5.x) Issue #2849674 by mxh, Lendude, podarok, pingwin4eg, andypost, axel.rutz, catch: Complex job in ViewExecutable::unserialize() causes data corruption
Checkout complete.

So @xjm might be on to something with #8.
As it looks like the 'tested on commit' isn't really testing each of the triggering commits, but whatever given branch is at when it starts executing.

tacituseu’s picture

Another weird thing, even the passing runs from #8 (both baseline and revert) contain coredumps (stale data ??):
24725/artifacts/simpletest.standard/
24727/artifacts/simpletest.standard/
24732/artifacts/simpletest.standard/

The last submitted patch, 11: revert-14c2920589c.patch, failed testing. View results

Status: Needs review » Needs work

The last submitted patch, 14: 2898721-no-gc.patch, failed testing. View results

catch’s picture

Status: Needs work » Needs review

Given the revert of 43c568a71 is all green, I'm going to go ahead and do that to see if it improves the situation.

If people notice any more segfaults from now on, please post here since that'll mean it's not the culprit.

tacituseu’s picture

Plenty of Requeued after CI error, CI aborted, looks like testbots ran out of disk space:
https://dispatcher.drupalci.org/job/php-7.1.x-apache_mysql5.5/397/consoleText

PHP Warning: file_put_contents(): Only 0 of 575 bytes written, possibly out of free disk space in /opt/drupalci/testrunner/src/DrupalCI/Build/Build.php on line 481

https://dispatcher.drupalci.org/job/php-5.6-apache_postgres9.1/449/consoleText

BFD: Warning: /var/lib/drupalci/coredumps/core.php.11.1501535143.205 is truncated: expected core file size >= 39734886400, found: 19501174784.

xjm’s picture

Title: Random segfault currently in FileFieldWidgetTest::testMultiValuedWidget() » Frequent segfault currently in FileFieldWidgetTest::testMultiValuedWidget() and CI disk space
Priority: Critical » Major

Agreed on the revert for #2849674: Complex job in ViewExecutable::unserialize() causes data corruption to stop the damage (which catch already did). However, the segfault still did happen on 14c2920589cc9 which is the commit immediately before it happened. Also see #16.

https://www.drupal.org/node/3060/qa is not pretty right now.

Rescoping. I think we should keep this open to continue to disucss the garbage collection angle (we can use the patch from #2849674: Complex job in ViewExecutable::unserialize() causes data corruption to stress-test) but I'll ping Mixologic again about the ongoing issues. https://www.drupal.org/node/3060/qa is not pretty right now.

xjm’s picture

Priority: Major » Critical

Still critical really.

tacituseu’s picture

@xjm: I don't think they happened on 14c2920589cc9, see #16 for details, it wasn't actually testing 14c2920589cc9.

Mixologic’s picture

Title: Frequent segfault currently in FileFieldWidgetTest::testMultiValuedWidget() and CI disk space » Random segfault currently in FileFieldWidgetTest::testMultiValuedWidget()

Im pretty sure these segfaults are related to a bug that upstream does not plan on fixing: https://bugs.php.net/bug.php?id=72286

Re: #17 There is definitely a bug in drupalci where its not cleaning up after its own coredumps, so the same bot will re-report the existence of core dump files. I'll look into that today. (#2899031: Core dumps need to be cleaned up properly.

Mixologic’s picture

One way to tell, currently, if the core dump is related to the current run of tests is if there is a line in the console output that says: Cannot access memory at address 0x7ef95e818708

(with varying addresses, of course) -> that indicates that the core dump is from a different docker container, and not related to this test run.

Version: 8.4.x-dev » 8.5.x-dev

Drupal 8.4.0-alpha1 will be released the week of July 31, 2017, which means new developments and disruptive changes should now be targeted against the 8.5.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

xjm’s picture

I dug into https://bugs.php.net/bug.php?id=72286 and eventually managed to find its scenario:
https://github.com/lightster/php-circular-reference-segfault/commit/ce85...

The author of the above says they're not sure if it's the same bug as https://bugs.php.net/bug.php?id=71958 or not.

The test result of #14 in https://www.drupal.org/pift-ci-job/728450 is an OOM and shows that we do need and benefit from garbage collection. ;)

Would #2899031: Core dumps need to be cleaned up properly. affect the segfaults at all, or only the subsequent (possibly unrelated) swath of "CI error" test results?

tacituseu’s picture

Is there any reason not to make PHP 7 the default 'on commit' and 'issue testing' branch, the advantages are obvious and it represents the majority of the market.
Otherwise to support some (imaginary?) minority, we'll be commiting to a daunting task of trying to audit the whole codebase just not to step on some obsolete version's toes.
It is one thing to limit yourself to a given version's feature level but whole another to be limited by its bugs, if there's a distribution that is unwilling to update it should be its job to patch things up.

Anonymous’s picture

xjm’s picture

Status: Needs review » Fixed

Since #2849674: Complex job in ViewExecutable::unserialize() causes data corruption was reverted, this segfault is no longer happening in HEAD and so we have to work around it in that other issue. So, we can move this issue to fixed and focus discussion over there.

I've also changed the patch testing default environment to PHP 7 (as per #2607222: [policy, no patch] Default to PHP 7 for Drupal core patch testing). So this fail won't explode the entire patch queue if reintroduced.

Lendude’s picture

We've been running some tests in this in #2879048: Ignore: patch testing issue for #2919863 and not sure if I would call this 'fixed'. FileFieldWidgetTest is highly unstable, since the mere act of adding an empty test module to the code base can break it, see https://www.drupal.org/node/2879048#comment-12202738

Yeah the random fails are gone, but any patch that adds a test module can easily bring these back.

Anonymous’s picture

FileFieldWidgetTest is highly unstable

Today I ran the test on Windows without any patches and got a failure:

488 passes, 1 fail, 1 exception, 138 debug messages

Exception:

simplexml_import_dom(): Invalid Nodetype to import
simplexml_import_dom(Object) (Line: 133)
Drupal\simpletest\WebTestBase->parse() (Line: 1036)
Drupal\simpletest\WebTestBase->drupalPostForm('node/1/edit', Array, 'Save') (Line: 405)
Drupal\file\Tests\FileFieldWidgetTest->testPrivateFileComment() (Line: 960)
Drupal\simpletest\TestBase->run() (Line: 430)
_simpletest_batch_operation(Array, '1', Array) (Line: 252)
_batch_process() (Line: 95)
_batch_do() (Line: 77)
_batch_page(Object) (Line: 55)
Drupal\system\Controller\BatchController->batchPage(Object)
call_user_func_array(Array, Array) (Line: 123)
Drupal\Core\EventSubscriber\EarlyRenderingControllerWrapperSubscriber->Drupal\Core\EventSubscriber\{closure}() (Line: 574)
Drupal\Core\Render\Renderer->executeInRenderContext(Object, Object) (Line: 124)
Drupal\Core\EventSubscriber\EarlyRenderingControllerWrapperSubscriber->wrapControllerExecutionInRenderContext(Array, Array) (Line: 97)
Drupal\Core\EventSubscriber\EarlyRenderingControllerWrapperSubscriber->Drupal\Core\EventSubscriber\{closure}()
call_user_func_array(Object, Array) (Line: 153)
Symfony\Component\HttpKernel\HttpKernel->handleRaw(Object, 1) (Line: 68)
Symfony\Component\HttpKernel\HttpKernel->handle(Object, 1, 1) (Line: 57)
Drupal\Core\StackMiddleware\Session->handle(Object, 1, 1) (Line: 47)
Drupal\Core\StackMiddleware\KernelPreHandle->handle(Object, 1, 1) (Line: 99)
Drupal\page_cache\StackMiddleware\PageCache->pass(Object, 1, 1) (Line: 78)
Drupal\page_cache\StackMiddleware\PageCache->handle(Object, 1, 1) (Line: 47)
Drupal\Core\StackMiddleware\ReverseProxyMiddleware->handle(Object, 1, 1) (Line: 50)
Drupal\Core\StackMiddleware\NegotiationMiddleware->handle(Object, 1, 1) (Line: 23)
Stack\StackedHttpKernel->handle(Object, 1, 1) (Line: 656)
Drupal\Core\DrupalKernel->handle(Object) (Line: 19)

Fail:
Confirmed that access is denied for the file without the needed permission.

After re-run all good:
492 passes, 0 fails, 0 exceptions, 139 debug messages

Anonymous’s picture

The last 8.5.x-dev, php 7.1.7, mysql 5.7.19. And maybe i'm not clear cache after patch with removed themes (#2879048-139: Ignore: patch testing issue for #2919863). But it's still suspicious.

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.

Gábor Hojtsy’s picture

Status: Closed (fixed) » Closed (duplicate)

The right status would be closed duplicate given another issue supposedly fixed what this issue said was a problem.

xjm’s picture