Problem/Motivation

After #3031128: Update from 8.6.7 to 8.6.8 warnings - Drupal\Core\Extension\Extension has no unserializer was committed, test run times went from ~51 minutes to ~1h42 minutes.

Not 100% clear what in that commit caused it, but the callback in fixSerializedExtensionObjects() seems a likely culprit.

Proposed resolution

Perhaps this error handler can be turned off during test runs?

Remaining tasks

User interface changes

API changes

Data model changes

Release notes snippet

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

tim.plunkett created an issue. See original summary.

tim.plunkett’s picture

dww’s picture

Here's a patch that implements the suggested solution from the summary:

Perhaps this error handler can be turned off during test runs?

Early return in UpdateKernel::fixSerializedExtensionObjects() if drupal_valid_test_ua() returns something TRUEish.

Curious both about the bot result, and the timing. ;)

Status: Needs review » Needs work

The last submitted patch, 3: 3032869-3.patch, failed testing. View results

tim.plunkett’s picture

empty.patch (HEAD)

Unit: 26 sec
Kernel: 7 min 10 sec
Simpletest: 55 sec
Functional: 56 min 36 sec
FunctionalJS:
Total: 71 min 21 sec

test.patch

Unit: 24 sec
Kernel: 6 min 56 sec
Simpletest: 57 sec
Functional: 32 min 52 sec
FunctionalJS: 6 min 6 sec
Total: 47 min 15 sec

#3

Unit: 28 sec
Kernel: 7 min 4 sec
Simpletest: 57 sec
Functional: 32 min 51 sec
FunctionalJS: 5 min 42 sec
Total: 47 min 2 sec

Looks like the one failure was for the test of that code...
The HEAD run wasn't _as_ slow as it's been all day, but this still illustrated the problem

dww’s picture

Yup, nice. Thanks for the timing result summary. Very helpful.

Let's see what @alexpott has to say on how to proceed.

Cheers,
-Derek

alexpott’s picture

Status: Needs work » Needs review
FileSize
1.6 KB

ugh.

I think there is some recursion going on.

Lendude’s picture

First alarms raised in #2946294: Fix race conditions in OffCanvasTestBase by @tacituseu. Please consider @tacituseu for credit here.

Status: Needs review » Needs work

The last submitted patch, 7: 3032869-7.patch, failed testing. View results

tacituseu’s picture

I suspect because of #2474909: Allow Simpletest to use the same APC user cache prefix so that tests can share the classmap and other cache objects and 'apcu_ensure_unique_prefix' the affected tests are basically flushing cache from under all other currently running ones, no idea why this doesn't seem to affect 8.6.x to which it was committed too.

Edit: The behaviour was similar when APCu was doing forced expunges because it was running out of memory (see: #2926309-21: Random fail due to APCu not being able to allocate memory).

alexpott’s picture

Status: Needs work » Needs review
FileSize
1.4 KB

Here's another attempt to ensure we only do one clear.

alexpott’s picture

for what it is worth - Functional testing took 57 min 44 sec for #7 so that didn't help at all.

If #11 doesn't help then it's not the recursion that's the problem it's the clearing of the caches which is quite a big concern.

alexpott’s picture

Thinking about this some more regardless of whether #11 makes things faster we still want the check from #7 because atm there are no other errors at this point there might be in the future and the correct action might not be clearing the cache.

alexpott’s picture

So for #11 Functional Test run duration: 44 min 8 sec
And for #13 Functional Test run duration: 44 min 12 sec

So this is much better. It's still longer but getting back towards where we were. This result points to our update path testing only being as fast as it is because of caches and pseudo caches in state which is quite worrying really.

alexpott’s picture

I suspect the initial 44 min runs are part of the natural variation of aws testbots...

Looks like we have two sizes of instances :)

Imo this fix is good.

alexpott’s picture

+++ b/core/lib/Drupal/Core/Update/UpdateKernel.php
@@ -218,7 +218,18 @@ public static function fixSerializedExtensionObjects(ContainerInterface $contain
+      if ($errstr === 'Class Drupal\Core\Extension\Extension has no unserializer') {
+        $clear_caches = TRUE;
+      }

So one thought I've got is - is the check on the error string too fragile? It's not necessary for the improves in test times.

OTOH being specific here might prevent future surprises.

alexpott’s picture

I re-ran the revert patch a few times. Less variation hmmm.... but maybe a time of day thing. Tricky. Not sure why that'd be. Resubmitted some more tests of #13 to see how that's doing now.

Here's the results of re-run the revert:
https://www.drupal.org/pift-ci-job/1200735 00:34:15.642 Test run duration: 26 min 56 sec
https://www.drupal.org/pift-ci-job/1200736 00:36:07.507 Test run duration: 28 min 40 sec
https://www.drupal.org/pift-ci-job/1200737 00:43:50.140 Test run duration: 33 min 19 sec
https://www.drupal.org/pift-ci-job/1200739 00:34:20.779 Test run duration: 25 min 53 sec
https://www.drupal.org/pift-ci-job/1200740 00:45:58.911 Test run duration: 33 min

tacituseu’s picture

Re #15: they all run on the same instances (c4.8xlarge), some time trends:
PHP 7.2 & MySQL 5.5
php-7.2-apache_mysql5.5-buildTimeTrend.png
PHP 7.2.x-dev & MySQL 5.5 (ignore the spike between #2122-#2160 - it is an anomaly)
php-7.2-apache_mysql5.5-buildTimeTrend.png

alexpott’s picture

https://www.drupal.org/pift-ci-job/1200765 Test run duration: 44 min 30 sec
https://www.drupal.org/pift-ci-job/1200762 Test run duration: 27 min 34 sec
https://www.drupal.org/pift-ci-job/1200763 Test run duration: 29 min 12 sec
https://www.drupal.org/pift-ci-job/1200766 Test run duration: 26 min 29 sec
https://www.drupal.org/pift-ci-job/1200767 Test run duration: 34 min 57 sec
https://www.drupal.org/pift-ci-job/1200789 Test run duration: 43 min 45 sec

alexpott’s picture

#19 is from another round of testing for #13/ #11 - still looks much quicker than HEAD.

It'd be great someone can think about #16. I'm unsure and could go either way.

tim.plunkett’s picture

Status: Needs review » Reviewed & tested by the community

This whole method is already really specific and targeted anyway. Relying on $container->get('state')->get('system.theme.data', []); to be the indicator of a needed cache clear is obscure enough that I don't see a problem with hardcoding the expected error message.

tacituseu’s picture

Going over #15,#17,#19 it seems to be affecting only PHP 7.2 and PHP 7.2-dev MySQL environments.

xjm’s picture

Going over #15,#17,#19 it seems to be affecting only PHP 7.2 and PHP 7.2-dev MySQL environments.

That would explain why we weren't seeing such a problem with 8.6.x -- because the default tests for 8.6.x run against 7.0 and 7.2 is just a nightly.

tacituseu’s picture

@xjm: Not really, I did check PHP 7.2 and PHP 7.2.x-dev with 8.6.x for anomalies and not the default one.

  • xjm committed f633f3b on 8.7.x
    Issue #3032869 by alexpott, tim.plunkett, dww, tacituseu, Mixologic: \...

  • xjm committed 32c2145 on 8.6.x
    Issue #3032869 by alexpott, tim.plunkett, dww, tacituseu, Mixologic: \...
xjm’s picture

Version: 8.7.x-dev » 8.6.x-dev
Status: Reviewed & tested by the community » Fixed

Committed to 8.7.x and cherry-picked to 8.6.x. Thanks!

  • xjm committed a1fdae0 on 8.6.x
    Revert "Issue #3032869 by alexpott, tim.plunkett, dww, tacituseu,...
xjm’s picture

Status: Fixed » Reviewed & tested by the community

This has been temporarily reverted from 8.6.x only.

  • xjm committed f11cc39 on 8.6.x
    Issue #3032869 by alexpott, tim.plunkett, dww, tacituseu, xjm, Mixologic...
xjm’s picture

Status: Reviewed & tested by the community » Fixed

Recommitted to 8.6.x. Thanks!

tacituseu’s picture

alexpott’s picture

@tacituseu are you saying that we've not yet fixed the problem? It looks like you are saying we still have a serious regression in PHP 7.2 test performance. I've just had a test run timeout on another issue where functional tests took 54 min 50 sec.

tacituseu’s picture

@alexpott: It's tricky as it depends how you define the problem, all I'm saying is that there's still some time penalty compared to pre-#3031128: Update from 8.6.7 to 8.6.8 warnings - Drupal\Core\Extension\Extension has no unserializer times (and so far only on two particular environments). I don't know what exactly is the mechanism behind it, didn't dig in yet.

It still only seems to affect 8.7.x and PHP 7.2.* combination and for some reason has no effect on 8.6.x on the exact same PHP 7.2.* environments (or any other combination on 8.7.x and PHP 7 I checked for that matter - however the data points there are scarce).

The PHP 7.2 & MySQL 5.5 environment has PHP 7.2.8 and PHP 7.2.x-dev is PHP 7.2.17-dev, so that might account for some (~10 min) of the difference between the two.

Edit: didn't notice any timeouts.

alexpott’s picture

@tacituseu thinking about this has made me realise that we're clearing the cache all the time during 8.7.x updates without fixing it because we no longer use system.theme.data. Opened a new issue to address this - see #3037823: The system.theme.data key remains corrupted in state causing performance issues and is not used in >= 8.7

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.