Problem/Motivation
Drupal\field\Tests\Update\FieldUpdateTest and Drupal\views\Tests\Update\EntityViewsDataUpdateTest fail on randomly on 8.1.x / PHP 5.5 and Drupal\comment\Tests\CommentFieldsTest::testCommentInstallAfterContentModule() fails consistently on 8.1.x PHP 5.6. We though we had fixed the latter with #2757427: Array was modified by the user comparison function in ConfigDependencyManager::getDependentEntities() - but it hasn't :(
All of these tests are fixed if core/lib/Drupal/Core/Config/Entity/ConfigDependencyManager.php is changed in anyway - for example just adding a space to comment!
Drupal\comment\Tests\CommentFieldsTest::testCommentInstallAfterContentModule() fails with
Array was modified by the user comparison function in ConfigDependencyManager::getDependentEntities()
Proposed resolution
No idea
Remaining tasks
User interface changes
N/a
API changes
Data model changes
Comment | File | Size | Author |
---|---|---|---|
#72 | 2762549-4-72.patch | 3.22 KB | alexpott |
#72 | 62-72-interdiff.txt | 3.65 KB | alexpott |
#62 | 2762549-62.patch | 2.75 KB | Fabianx |
#60 | 2762549-60.patch | 3.25 KB | Fabianx |
| |||
#58 | 2762549-58.patch | 3.25 KB | Fabianx |
Comments
Comment #2
alexpottComment #3
alexpottComment #4
alexpottComment #5
alexpottComment #6
alexpottSo what I think it happening is a that a PHP notice is being thrown inside of the uasort... specifically inside
SortArray::sortByKeyString($a, $b, 'name');
Comment #7
alexpottComment #8
alexpottThe fact this doesn't happen on 8.2.x is super confusing.
Comment #9
alexpottComment #10
alexpottComment #11
alexpottLet's see what happens if this is a closure - suggested by @dawehner
Comment #12
alexpottComment #13
alexpottOk making the sort callbacks static and running full tests.
Comment #14
dawehnerWe don't understand why, but basically all versions work, beside the one you would think is the best solution. Maybe should keep that in mind for all our search functions all over the place.
Comment #15
catchCommitted/pushed to 8.2.x and cherry-picked to 8.1.x. Thanks!
Comment #18
alexpottOkay this is weird.. still failing - I think just patching \Drupal\Core\Config\Entity\ConfigDependencyManager fixes the test run - so this looks like something caused by something in the DrupalCI environment... very very odd.
Comment #19
alexpottComment #20
alexpottLet's disable opcache and see...
Comment #21
alexpottI set up DrupalCI to run locally and ran just this test on php 5.6 with mysql 5.5...
So even using the same docker images I can't reproduce the issue - that points to something underlying on the infrastructure DrupalCI is deployed on.
Comment #22
MixologicOkay, I just got this all setup and was able to reproduce it on a bot, but I was *also* able to reproduce it locally with drupalci.
This looks like we're randomly hitting a php bug, not sure why its only manifesting for 5.6, but it sounds exactly like this: http://stackoverflow.com/questions/3235387/usort-array-was-modified-by-t...
As a test I went ahead and put in @uasort on the angry sort in question, and the tests passed. ugly, for sure, and I have no explanation as to why its *only* php5.6 and *only* 8.1.x, but all of that points to bugs in phpland.
Comment #23
Fabianx CreditAttribution: Fabianx as a volunteer and at Tag1 Consulting commented#22: PHP 5.6 has a lot of weird bugs here and there.
I also have seen at least three opcode related bugs.
#22: Can you try turning off opcache completely (e.g. not load the module) for your local env and see if that makes any difference?
Comment #24
MixologicI shut off opcache and the error persists, then I shut of apcu, and the error goes away (though there are a couple of other errors that werent there before, but not the same uasort error)
Comment #25
catchDirect link to the PHP bug from that stackoverflow discussion: https://bugs.php.net/bug.php?id=50688
Comment #26
alexpott@Mixologic can you try upping the APCu memory - also can you paste the new errors you got with APCu disabled?
Comment #27
alexpott@Mixologic also can you output your local DrupalCI config so I can see how you reproduced the error?
Comment #28
Fabianx CreditAttribution: Fabianx as a volunteer and at Tag1 Consulting commented@all: So the error seems to be when we have reference counting going up due to something allocating memory for it.
If we can give laruence a box with login access and gdb for the test runner we can likely get him to do a quick check for us.
Comment #29
MixologicIm using the vagrant box to run this, so right after I vagrant ssh I execute the following script to get everything setup to run:
After each run I clean everything up with
./drupalci docker-rm containers; sudo rm -rf /var/lib/drupalci/database/* ; sudo rm -rf /var/lib/drupalci/web/jenktest
To look at the results after a drupalci run, I then do:
sqlite3 /var/lib/drupalci/web/jenktest/results/simpletest.sqlite
and
select * from simpletest where status NOT IN ('pass','debug');
To get into the php container so I can adjust settings, I do a
docker ps
Which gives me the web-5.6 container name:
Then I
docker exec -it naughty_stallman /bin/bash
to get into the container.
I can re-run the command inside the container using the same execution command:
cd /var/www/html && sudo -u www-data php /var/www/html/core/scripts/run-tests.sh --url http://localhost/checkout --dburl mysql://drupaltestbot:drupaltestbotpw@drupaltestbot-db-mysql-5-5/jenkins_default_newtest --verbose --color --keep-results --color --concurrency 31 --sqlite /var/www/html/results/simpletest.sqlite --php /opt/phpenv/shims/php --types "Simpletest,PHPUnit-Unit,PHPUnit-Kernel,PHPUnit-Functional" --all
Inside the container the php.ini file is located at
/opt/phpenv/versions/5.6.7/etc/php.ini
The very first line is the apcu.so include, so disabling that, and looking at the test run database (which is now located at
sqlite3 /var/www/html/results/simpletest.sqlite
because we're inside the container) and running antotherselect * from simpletest where status NOT IN ('pass','debug');
Gives us the following:
I upped the apcu memory (located at
/opt/phpenv/versions/5.6.7/etc/conf.d/apcu.ini
) from 2000M to 3000M with apcu enabled and it still gives the same uasort error.Hopefully that's helpful?
Comment #30
alexpottIt appears this is also causing a random fail in
Drupal\field\Tests\Update\FieldUpdateTest
see #2749955-46: Random fails in UpdatePathTestBase testsComment #31
alexpottComment #32
alexpottFleshing out the issue summary to give all known details.
Comment #33
alexpottComment #34
alexpottComment #35
alexpottComment #36
alexpottComment #38
alexpottComment #40
alexpott#38 was meant to be just changing the comment in ConfigDependencyManager... here's the right patch.
Note this patch has failed on php 5.5 / sqlite - see https://www.drupal.org/pift-ci-job/390313
Comment #41
alexpottNow that @catch has committed the unique apc prefix patch re-uploading the fail patch...
Comment #43
MixologicSo I spent a ton of time digging on this today.
Here's what I tried, and what happened:
Hoping that they were segfaults caused by stack overflows, I tried setting the ulimit to something much higher than 8192 - 102400. That didnt seem to work, but I wasnt sure if the way I was setting it was getting through the phpenv->php->phpchild process, so I retried on a bot that was using one of the 'official' php 5.5.23 containers (not based off of phpenv). Still had the same issues.
So, on a hunch I upgraded apcu to 4.0.11, and lo and behold, every single test failed in exactly the same way. I tried both HEAD and the commit before that and both had identical failures.
So I think there may be buggy issues with the beta version of apcu we're using, that may somehow be masking the issues we're trying to sort out.
So, what probably needs to happen is to upgrade the containers to use the 4.0.11 version of apcu, and have another environment set up with that because I have a bad feeling that there might be many, many more failures than just this one test. That way we can make it a "targeted supported environment" and keep the current flaky one as our canonical for the time being.
Thoughts?
Comment #44
alexpottI think this is extremely tricky... @Mixologic have you managed to put APCu instrumentation on a bot to see what's now going on during a test run?
Comment #45
dawehner@Mixologic
It would be especially interesting to see APC fragmentation over time, does that peak somehow right before the crashes?
Comment #46
MixologicOkay, some more data. Running the patch from #41 against current HEAD, and everything passes.
Upgrading apcu from 4.0.7 beta to 4.0.11 stable yields the following for every test:
Drupal\field\Tests\Update\FieldUpdateTest 463 passes 78 fails 8 exceptions
Drupal\field\Tests\Update\FieldUpdateTest 463 passes 78 fails 8 exceptions
Drupal\field\Tests\Update\FieldUpdateTest 463 passes 78 fails 8 exceptions
Drupal\field\Tests\Update\FieldUpdateTest 463 passes 78 fails 8 exceptions
Drupal\field\Tests\Update\FieldUpdateTest 463 passes 78 fails 8 exceptions
Drupal\field\Tests\Update\FieldUpdateTest 463 passes 78 fails 8 exceptions
Drupal\field\Tests\Update\FieldUpdateTest 463 passes 78 fails 8 exceptions
Drupal\field\Tests\Update\FieldUpdateTest 463 passes 78 fails 8 exceptions
etc X100.
The failure messages from the resultant sqlite database can be seen here in this gist:
https://gist.github.com/ryanaslett/d7e524f657156607f0b269343a501685
APCu fragmentation and the like might help explain random failures, but this seems like we've got legitimate bugs in the code that is being masked by the fact that we're on an older APCu.
Comment #47
xjmDid #2754477: Unexpected config entity delete due to dependency calculations introduce the non-random failure on PHP 5.6?
Comment #48
alexpottLet's confirm that a test only patch can cause it...
Comment #49
alexpottSo let's try the space patch again...
Edit: YAY!!! Putting a space in a file doesn't fix. That helps sanity.
Comment #50
alexpottSo maybe what we have here is a memory issue. Since #2754477: Unexpected config entity delete due to dependency calculations, Features had the following problem: #2765911: Features performance issue.
Comment #51
alexpottComment #54
xjm@effulgentsia, @catch, @Cottse, @webchick, @alexpott and I agreed this is critical, because it causes test failures on a supported environment.
The failure of
Drupal\comment\Tests\CommentFieldsTest
is actually non-random; it always happens on PHP 5.6.Comment #55
effulgentsia CreditAttribution: effulgentsia at Acquia commentedDoes this mean it's failing on HEAD in 8.2 and 8.3 as well? #2657684: Refactor BigPipe internals to allow a contrib module to extend BigPipe with the ability to stream anonymous responses and prime Page Cache for subsequent visits is encountering that same failure, I think, but if it's also happening in HEAD, then maybe not that patch's fault?
Comment #56
xjm@effulgentsia, yes, you can see this by visiting:
http://drupal.org/node/3060/qa
https://www.drupal.org/pift-ci-job/442327
You can also subscribe via email there to be notified when HEAD fails. I get emails about CommentFieldTest every single day. :)
Previously, the BigPipe issue was showing different failures that are not this one, or so I understood.
Comment #57
Fabianx CreditAttribution: Fabianx as a volunteer and at Tag1 Consulting commentedNeeds some more work.
I think we should just use array_multisort() instead of uasort() as its way faster and does not have this bug and also stable sorting by default.
Comment #58
Fabianx CreditAttribution: Fabianx as a volunteer and at Tag1 Consulting commentedLets try this.
--
Credit goes to donquixote for his research on #2757207: Introduce "stable sort by weight" functions to sort arrays (and objects) and #2759479: Proof of concept: Replace uasort() with dedicated stable-sort-by-weight functions, where possible, which was very helpful in converting this, even though for this issue I did go with the simpler approach from the first comment on the array_multisort() documentation (http://php.net/manual/de/function.array-multisort.php).
Comment #60
Fabianx CreditAttribution: Fabianx as a volunteer and at Tag1 Consulting commentedIt helps to name variables properly ...
Comment #61
Fabianx CreditAttribution: Fabianx as a volunteer and at Tag1 Consulting commentedComment #62
Fabianx CreditAttribution: Fabianx as a volunteer and at Tag1 Consulting commentedOkay, that works nicely. Lets remove to test only this one test and test them all.
This should be ready for commit, except for Documentation:
- I have no idea how to properly document variable function args (and no time to look right now)
- I don't know right now how we deprecate things for 9.0.0 (i.e. @deprecated or not)
I leave the rest to someone else as the general patch is ready.
Comment #64
Fabianx CreditAttribution: Fabianx as a volunteer and at Tag1 Consulting commentedOkay, it only solves the Drupal\comment\Tests\CommentFieldsTest, the others are failing randomly still (as seen).
Comment #65
Mile23Shouldn't this be on the
Graph
class? Then you can say$graph = $this->getGraph()->sortByKeys($etc);
https://www.drupal.org/coding-standards/docs#types doesn't offer any guidance on arbitrary args to a method.
"@deprecated in Drupal 8.1.x for removal before Drupal 9.0.0 release. Use [whatever] instead."
Edit: Actually I'm not sure you need to deprecate them. They're already
protected
andConfigDependencyManager
doesn't seem to have any subclasses.Also, I'm not really sure why you marked this issue as novice... :-)
Comment #67
xjmYeah not novice. :)
So I think the
...
is correct (we use it elsewhere in core) but it probably needs at leastmixed
as the type.However, it's unclear to me from the docs here which args it is to
array_multisort()
. Is the first parameter still a different array than$graph
? Should we just document the structure we expect a little more explicitly?Basically, I kinda have no idea how to use this method reading the docs or the contents. :P
Comment #68
xjmE.g., something like:
etc.
However is it actually the same arguments as
array_multisort()
? There's a string in there too that I think needs to be documented. And would we accept multiple arrays asarray_multisort()
does, or just one plus flags?Comment #69
Fabianx CreditAttribution: Fabianx as a volunteer and at Tag1 Consulting commented#65:
1. The graph is no class, but an array that is returned ...
Also this is a critical and such changes should be as less invasive as possible.
2. Thanks!
3. Thanks! I do think we keep BC even for "protected" methods, because you could subclass them in contrib.
#68: Array multisort is very hard to understand, because it allows arbitrary flags in arbitrary order.
We can also use a different syntax, e.g.:
and instead of array_multisort() we could assert() that for each key there are exactly 2 arguments given.
Of course we could also get fancy and do:
However I think the previous one is actually easier to read and I don't think it hurts to always specify SORT_ASC, SORT_GENERIC instead of some magic the defaults are SORT_ASC and SORT_GENERIC, so you can skip them ...
Let me know what you think!
Thanks!
Comment #71
alexpottThis looks really good to me. The sorts are now quite well tested. If the new method supports all the ways you can use array_multisort then great. If not let's just introduce two methods and hardcode the arguments. That way there is less API surface area. In fact this might be a better approach because Features had a performance issue with the sorting and having less possible sorts makes it easier to statically cache - see #2765911: Features performance issue.
Not quite the right standard. Something like:
@deprecated Will be removed before Drupal 9.0.0 use Blah::sortGraphByKeys instead.
Comment #72
alexpottHere's a patch where we don't have to support all the vagaries of array_multisort().
Comment #73
alexpottComment #74
Fabianx CreditAttribution: Fabianx as a volunteer and at Tag1 Consulting commentedRTBC - ultimatively I would like an easier to understand API, but this is absolutely sufficient for this issue.
Comment #75
catchCommitted/pushed to 8.3.x and cherry-picked to 8.2.x. Thanks!