Support for Drupal 7 is ending on 5 January 2025—it’s time to migrate to Drupal 10! Learn about the many benefits of Drupal 10 and find migration tools in our resource center.
Problem/Motivation
Over in #2494987: [meta-6] Reduce cold cache memory requirements we're tracking fixing instances where we exceed the 64MB memory limit.
I asked if we'd like simpletest to try and help us find such pages.
Proposed resolution
Add a header for peak memory usage when running the child site.
Have the parent runner read it and fail tests when it exceeds 64Mb.
Let tests opt-out.
Beta phase evaluation
Issue category | |
---|---|
Unfrozen changes | Unfrozen because it only changes test code |
Prioritized changes | The main goal of this issue is scalability (preventing memory usage regressions). |
Remaining tasks
Most of it
User interface changes
None
API changes
None
Comment | File | Size | Author |
---|---|---|---|
#171 | 2495411_171.patch | 14.36 KB | Mile23 |
| |||
#167 | 2495411-167.patch | 14.33 KB | lokapujya |
#164 | 2495411-163.patch | 15.88 KB | lokapujya |
#161 | 2495411-3-161.patch | 15.98 KB | alexpott |
#161 | 159-161-interdiff.txt | 837 bytes | alexpott |
Comments
Comment #1
larowlanadapting some existing code from D7
Comment #2
larowlanHere goes
Comment #3
larowlanAnd with the opt-out
Comment #5
tim.plunkettFine to do, but that method name sounds like it is adding the headers itself.
Mismatched name?
Any reason not to use DRUPAL_MINIMUM_PHP_MEMORY_LIMIT? (other than the M)
Just one less place to adjust the number.
wtf why is format_string not @deprecated? either way, SafeMarkup::format please
Comment #6
YesCT CreditAttribution: YesCT commentedComment #8
larowlanFixes #5 1,2,4 as well as php syntax (doh)
3 - cause some tests might want it to be smaller.
Comment #10
larowlannot my day today
Comment #12
larowlanComment #14
larowlanFails:
Drupal\views\Tests\Plugin\DisplayTest
Drupal\views\Tests\ViewExecutableTest
Drupal\user\Tests\Views\HandlerFieldRoleTest
Drupal\simpletest\Tests\SimpleTestBrowserTest
Drupal\simpletest\Tests\SimpleTestTest
Drupal\simpletest\Tests\OtherInstallationProfileTestsTest
Drupal\simpletest\Tests\BrokenSetUpTest
Drupal\simpletest\Tests\MissingCheckedRequirementsTest
Drupal\simpletest\Tests\InstallationProfileModuleTestsTest
Running them locally should get you the urls/conditions.
Comment #15
BerdirLooks like the script was killed at the end, that happens if there's a timeout. The test was running for 1h.
Maybe it was a bot problem, re-testing.
Comment #18
Fabianx CreditAttribution: Fabianx as a volunteer commentedlarowlan++ - Nice work, I just commented on that blog post on planet for memory_profiler that we should be doing that in our tests. You already did :p. Nice!
Comment #19
Fabianx CreditAttribution: Fabianx as a volunteer commentedComment #20
Berdiris the headers_sent() thing really still relevant? ;)
So, the only tests that failed are simpletest tests. I'm wondering if that's really correct but it's quite possible, because I guess we don't have any standard profile tests that submit the module page. Just for testing, can you add that to StandardProfileTest or so? according to the numbers that we saw, that should still happen there right now?
I think it's safe to ignore this for the simpletest test, pretty sure we don't have the requirement to run all tests with 64MB ?
Comment #21
Fabianx CreditAttribution: Fabianx as a volunteer commentedI wonder if it would pass with 128 MB ...
Comment #22
plachCouldn't we merge these two? E.g.
$this->memoryUsageTrigger = FALSE;
could disable the memory usage check.Comment #23
catchTrying 128, might as well bisect this.
Comment #24
BerdirI'm confused, the failure says exactly how much memory it used, no need to try and figure out out? :)
Comment #25
catchYes, yes it does.
Comment #28
catchOK https://qa.drupal.org/pifr/test/1056818 is now just simpletest's own tests of itself.
We're fine having those opt-out of the 64mb check, so I think we should do that then go ahead and commit this.
Comment #29
Fabianx CreditAttribution: Fabianx as a volunteer commentedThe parent being critical, this is major at this stage.
Comment #30
dawehnerDon't we want to call out to memory_get_peak_usage() in case we want to get the peak memory usage ^^ :)
Comment #31
alexpottAddressing #30 and setting back to 64mb
Comment #32
plachTwo things:
- it's still 128M
- I think merging these two properties would be better for DX: IMO it would be easier to use the setting if to disable it we could just use an invalid value, like 0 or -1.
Comment #33
catchUploading a patch with 64M just to get a test run with it.
I do think merging the properties is good - and even memory-hogging tests we could set something like 256M in most cases since going over that is useful information too.
Comment #35
catchRemoved the separate property, and updated simpletest tests.
I get a couple of failures locally that look unrelated, see what the bot says.
Since the bots run with an opcode cache, we should consider bisecting this with lower memory limits a bit and set to the lowest that passes for the default to give a bit of head room.
Comment #36
plachHow is this skipping the check when we have an invalid value? Shouldn't we do something like this?
Minor, we could alse merge the nested if.
Comment #37
catchAddressing both points from #36.
Comment #40
plachFrom https://qa.drupal.org/pifr/test/1070008:
Nice :D
Comment #41
catch#37 was the wrong patch.
That new failure is fun since it was passing a few days ago.
Comment #43
catchOh dear.
Comment #45
catch@alexpott pointed out that InstallUninstall test probably started failing because we switched to memory_get_peak_usage(), which is of course a good change.
I just demoted the critical relating to module installation because a single module install was well under 64M on my system, but I think that test does multiple modules, or it could be PHP 5.4 vs. PHP 5.5.
#2451395: drupal_get_schema()/drupal_get_complete_schema() no longer work as expected; remove them may be enough to get us under. Will post combined patch once that issue has come back green.
Comment #46
catchand/or #2495073: Views feed display plugin has to get all views data on init which was already RTBC but got bot fluked.
Comment #47
catch#2495073: Views feed display plugin has to get all views data on init is in. Sending for re-test.
Comment #49
plachVery minor: not sure this is needed.
Comment #50
xjmThis is great.
Should we add test coverage for the memory check? E.g. test it with a really low limit.
Also, should we file followups for the ones where we are using 128 MB, to remove them when we get the memory usage down (maybe issues exist already in some cases), and then reference those in @todo where we're setting 128 MB?
I have no idea about #49.
Comment #51
alexpottThe 128mb is due to reading all of the test files. There is not really anyway around there other than having less tests.
Adding a low memory test say of a cache page response is out of scope for this issue which is about enforcing our limit.
@plach is correct but there is no harm in the line.
Comment #53
xjm@alexpott I meant test coverage for the functionality added by the patch, which would seem to be in scope for the patch. We're adding a property to tests that is not documented or tested. That should be in scope for this issue, no?
Comment #54
xjm@alexpott and I discussed this in IRC. He is looking into the possibility of adding tests, and also said he found a different problem with the patch (I don't know what).
Comment #55
alexpottI've added assertions to SimpletestTest to ensure we get failures if the limit is exceed. I've also bump the testing that load the Simpletest UI to 140mb because without that I could not get them to run locally.
I'm removing the "need followup" because we can't reduce the memory required for Simpletest UI unless we don't read all the test files to find their description and group.
Comment #56
xjmSo to clarify for the interdiff in #55, I checked with @alexpott in IRC and the 140 MB is not a change for him locally since the previous patch (i.e. not some huge memory regression in HEAD); just a difference between his local and testbot. And obviously we need to let the tests be run locally.
Comment #57
xjmOh, but the comments for these don't match the number now. Hm. Though it is true AFAIK that the simpletest memory requirement is 128 MB (and declared as such in a constant).
Comment #58
alexpott@xjm yeah not sure want to do here. I think we've got a couple of options...
Option 1: the simplest thing is to raise the limit of the testing module - even on the bots when it was 128M it was very close so adding more tests could eventually start breaking this.
Option 2: Perhaps the way to go is to set the limit for these tests to -1 so the limit is not applied and open a followup which profiles on different PHP versions what the Simpletest UI actually takes to display the list. Since testing against drupal's stated memory limit is way more important.
Also another thing I thought - shouldn't we set the default
WebTestBase::$memoryUsageTrigger
toDRUPAL_MINIMUM_PHP_MEMORY_LIMIT
so that all of this is controlled from one place.Doing that will require deprecating DRUPAL_MINIMUM_PHP_MEMORY_LIMIT for Drupal 9.0 and adding
\Drupal::MINIMUM_PHP_MEMORY_LIMIT
which I think is fine.Comment #59
tim.plunkett+1 for Option 2, using -1
+1 for reusing the constant, although one has an M and the other doesn't. (I mentioned this in #5.3 above)
Comment #60
alexpottI've concluded that coupling DRUPAL_MINIMUM_PHP_MEMORY_LIMIT to the default test memory limit is probably a bad idea because we might want to reduce the simpletest default check but leave DRUPAL_MINIMUM_PHP_MEMORY_LIMIT at 64M.
Looking the memory usage of SimpletestTestForm I need 143Mb not in the when not in the test environment so the current value of
SIMPLETEST_MINIMUM_PHP_MEMORY_LIMIT
. It appears that over half the amount is used by form rendering (and mostly not SafeMarkup). Rebuilding the list of testing from a cold cache only costs us about 25Mb. This information makes me lean towards option 2 and either, opening a followup issue to address memory usage of SimpletestTestForm, or, adding more information to the permissions memory usage page because this looks like a similar problem - rendering hundreds of checkboxes is expensive.Comment #61
alexpottImplemented option 2 and improved documentation of the WebTestBase property. Opened #2505877: SimpletestTestForm is very memory expensive to track the issue with
SimpletestTestForm
Comment #62
alexpottFixed a bug in the patch with the reported URL -
$this->getUrl()
cannot work in a curl header callback because$this->url
has not been set yet :) Added test assertions to ensure we don't re-introduce the bug.Also changed
WebTestBase::$memoryUsageTrigger
to accept strings with the format as PHP memory limit because this make conceptual sense to me. Also the header now reports the precise number of bytes which might be more useful in the long if we somehow extract this info from SImpletest and plot it over time.Comment #64
alexpottGenerating the absolute url's in a slightly different way for the test.
Comment #65
dawehnerWhy do you call out to headers_sent() and then add it to the response object? Isn't that kind of a total different thing?
Comment #66
alexpott@dawehner that method is used by the exception listener to - and an exception might occur after the headers sent.
Comment #67
catch100% agreed with this. Ideally we'd want to keep reducing the simpletest limit as far as we can as core makes improvements (it can always be raised if we regress intentionally for some reason). But we'll likely need 64M as the core memory limit for a while due to people running without opcode caches.
Comment #68
almaudoh CreditAttribution: almaudoh commentedDoc block nitpick...
Comment #69
alexpottInteresting english @alexpott :)
Nice spot @almaudoh.
Comment #70
plachMinor: shouldn't @site be something more meaningful, like @usage or @site_usage?
Aside from that looks good again to me :)
Comment #71
alexpott@plach how about this?
Comment #72
plachWay better, thanks :)
Comment #73
BerdirResponse to #60: Last time I checked those page it wasn't so much the checkboxes but just rendering large tables with render arrays.. that results in a ton of render calls and one of the most expensive things there was merging cacheability metadata.. might be worth to check that page in #2471232: Optimize Cache::merge*(), by only accepting 2 instead of N arguments and the related issues about optimizing that.
Comment #74
xjmLooks great now. Only super tiny nitpicks, none of which need to block commit:
Super fussy grammar nitpick: I think this should be something like "Indicates whether headers should be added.: "If" implies that it doesn't return at all if headers should not be added. ;)
Minor: All of these are comma splices. Also, we should probably mention that it's setting no limit. (Same for the rest of the SimpleTest tests.)
Is there any risk of the UID being something other than 2 depending on the testrunner or particular storage etc.? This didn't seem worth blocking the patch on and I assume @alexpott would have thought of that, but asking just in case.
I think we can drop the "Set to" since this is a property.
So here, I'd suggest: this default value. It's confusing on first read.
Minor fixes: parens on hook_requirements() and use the correct "its".
Non-actionable remark: Interesting that this function isn't a part of
Bytes
.I can fix these things when I commit the patch (later this evening), or someone can reroll with them in the meanwhile.
Comment #75
xjmActually holding off on committing now because HEAD is timing out on miscellaneous bots; see #2500067: Repeated HEAD failures on multiple bots with apparent 60 min timeout.
Attached fixes the docs nitpitcks. Leaving at RTBC because docs nitpicks.
Comment #77
alexpottThe fails above are:
Comment #79
xjmDid head regress or is there something wrong with the bots? Is this too fragile?
Comment #80
alexpott@xjm not entirely sure - which is why I documented with tests failed. I was happily surprised originally when ConfigImportAll was not affected by this change.
Comment #81
catchHow likely is it we have a bot with no opcode cache enabled? That could definitely make the difference.
I'm running InstallUninstall test locally - although had to up my max_execution_time to 60 seconds so far since it fails with 30. If I can get it to finish, then I'll see what the memory usage looks like...
Comment #82
catchInstallUninstall and ConfigImportAll both pass on my machine with the patch applied, so I suspect lack of opcode cache.
That would also explain otherwise similar bots taking 60 minutes instead of 30 minutes.
Comment #83
Fabianx CreditAttribution: Fabianx as a volunteer commented#82: Oh, _that_ could indeed be. But why would APC tests pass on those bots then? Or don't we have any tests needing APC(u) anymore?
Comment #84
catchThose tests just don't run at the moment:
Comment #85
alexpottI don't think this is the case. One of the passes here says we have 149 passes in this test - see https://qa.drupal.org/pifr/test/1072383
I think the Amazon infrastructure was choking on IOPs.
Comment #87
Fabianx CreditAttribution: Fabianx as a volunteer commented#85: How does a 21 min run give any indication why the above test-run failed that you documented?
Comment #88
alexpottRe #87 it doesn't - but #2500067: Repeated HEAD failures on multiple bots with apparent 60 min timeout does.
Comment #89
Fabianx CreditAttribution: Fabianx as a volunteer commentedI checked that locally and yes without APC(u) the test chokes ...
Only explanation could be that we trashed APC somehow (e.g. if it just has 32 MB or such in total), re-testing for now.
Comment #92
Fabianx CreditAttribution: Fabianx as a volunteer commented/admin/modules failed now ... Seems that has regressed ...
Comment #111
xjmHoly mothballs.
Comment #114
Fabianx CreditAttribution: Fabianx as a volunteer commentedComment #115
almaudoh CreditAttribution: almaudoh commentedEasy re-roll ;)
Comment #117
xjmHuh, why are the PIFR results so different from the DrupalCI results on #115?
Comment #118
jthorson CreditAttribution: jthorson for Drupal Association commented@xjm: See #2532268: DrupalCI test results show wildly different numbers of passes/fails than PIFT/PIFR ... DrupalCI reports test results, not individual assertion results ... a new paradigm for sure, but it should result in more consistent test counts between individual test runs.
Comment #119
alexpottre #118 that's not the case here - only 1 test class has fails on DrupalCI whereas 6 classes have on testbots-of-yore
Comment #120
MixologicYes, this looks like a case where the new testbots arent actually using enough memory to trigger the fail. Can we add some debug assertions to the patch to see what the actual values are in those tests?
Comment #121
MixologicI wonder if this is related to the fact that the drupalci testbots might be entirely fresh vs the legacy bots that have been around for weeks.
Comment #122
catchThis is probably easiest.
Comment #127
catchComment #129
catchSo 63M looks like what DrupalCi is using. That isn't consistent with PIFR now, but it's very close to what it was when this patch was RTBC in #72 - we were just under at that time.
I'm assuming the PIFR issues are something to do with PHP configuration - although I think we ruled out opcache being full/fragmented already which is the only obvious issue.
However it's also concerning that the SimpletestTest which sets this to 1M doesn't pass - suggests something broken in the running of that test.
Seeing if we can get PIFR to pass at all now.
Comment #131
catchComment #133
alexpottFixed SimpletestTest and I think the differences between DrupalCI and pifr can be explained by DrupalCI being slower and therefore doing less each batch step. We can check memory usage on progressive batches to try to make the environments perform the same.
This also explains why this was working on pifr when it had slower bots.
Comment #135
alexpottLet's go to 60%
Comment #137
catchOr 40%, but I start to wonder if the bigger operations within batches is the only issue.
Comment #141
effulgentsia CreditAttribution: effulgentsia at Acquia commentedRunning a similar experiment as #122: i.e., setting the limit to 32M to compare differences between pifr and DrupalCI.
Comment #143
effulgentsia CreditAttribution: effulgentsia at Acquia commentedRunning these tests locally, I get the same memory numbers for CommentFieldsTest as PIFR (53 MB) when I have opcache disabled (with or without apcu.so enabled) and nearly the same numbers as DCI (I get 30MB whereas DCI shows 32.5MB) when I have opcache enabled (with or without apcu.so enabled). Are we sure that PIFR has a functioning opcache? Or does it only have apcu, which isn't an opcode cache?
Comment #144
MixologicThe PIFR Bots have opcache running and its functional - I ran this from an existing bot, and saved it to a dev site so it could be seen:
https://drupal:drupal@mixologic-ci-drupal.redesign.devdrupal.org/opcache...
Comment #145
catchIs opcode.max_accelerated_files to low?
Is set to 2000.
3,905 scripts are cached.
http://php.net/manual/en/opcache.configuration.php#ini.opcache.max-accel... says the real maximum when set to 2000 will be 3907
This would also explain why the output says it's full, when there is plenty of space.
Comment #146
catchI've opened #2551309: Increase opcache.max_accelerated_files to track this somewhere other than this issue.
Comment #149
catchReviving this now that PIFR is disabled.
Comment #150
catchComment #152
alexpottYou need to change this too... I think we should have a constant so we only have to configure this once... it can't be in Simpletest as that won;t be available in child sites but one constant in ./core won't hurt... maybe on \Drupal
Comment #154
catchGood point. Added the constant.
Comment #157
alexpottFixing the test fail. Also the new constant is only about simpletest and therefore should be named that way.
Comment #158
catchNot sure we should name it simpletest, we could potentially do the same treatment on browser tests. Just TEST something might be fine though.
Comment #159
alexpottHmmm I'm also not sure we're doing the batch limit right - I think we need to be able to send the memory limit to the child site so tests can alter it.
Comment #161
alexpottFixing tests - we still need to make BrowserTestBase tests fail when the memory limit has been exceeded.
Comment #163
lokapujyaComment #164
lokapujyaReroll Attempt.
Comment #166
lokapujyaFail :(
Something to work on at the sprint.
Comment #167
lokapujyaMerging BrowserTestBase conflicts is a little complicated.
Comment #169
Fabianx CreditAttribution: Fabianx as a volunteer commentedToo bad this is not yet in :/.
Comment #170
MixologicComment #171
Mile23Reroll of #167.
Comment #173
Mile23Testing against the wrong core version...