Support for Drupal 7 is ending on 5 January 2025—it’s time to migrate to Drupal 10! Learn about the many benefits of Drupal 10 and find migration tools in our resource center.
Problem
- If a fatal error happens in many test cases, after the test case was successfully setup, then the test(bot) environment runs out of disk space.
Solution
- Clear out any created db/file resources when a test runner child process exited with a fatal error.
Comments
Comment #2
sunAdded the test-specific error.log to the review log output.
Apparently, I do not see any output from the new cleanup function in the review log for patch #0 in the testbot review log. However, this works correctly on my local machine. That possibly means that the simpletest_test_id table does not contain a db prefix on the testbot. (?)
Comment #4
sunReview log for #2 contains the expected cleanup operations after the fatal error.
However, there's a second fatal error in another test case now.
The $test_id appears to be the same for all test runners? How is that possible? For Simpletest module, it changes for every test case being executed in the batch.
Comment #5
sunI fail to see how {simpletest_test_id}.last_prefix is able to work when tests are executed concurrently.
run-tests.sh calls into simpletest_last_test_get() (like Simpletest module itself), but that is totally not designed for concurrency in parallel threads.
Comment #7
sunThe concurrency problem with last_prefix cannot be fixed without a database schema change, which inherently means it cannot be fixed without adjusting the entire PIFR stack.
The PIFR stack must work across major versions of Drupal, so it would need a conditional, version-agnostic adaption all over the place.
I've no interest in pursuing that.
So while this patch here fixes a tough problem, which causes @jthorson to have to manually reset and clean individual testbots every second day, it's not possible to commit the fix.
Comment #8
sunAlternatively, let's see what happens if we assign a new test_id for every class. This might blow up though.
Comment #10
sunLooks like that stop-gap fix worked. Not sure about PIFR implications, but attached patch completes the fix and properly removes the bogus assumption that there's only one $test_id.
Comment #11
sunComment #12
sunoink. Now the review log contains the additional cleanup debug messages for each test case:
That should only be output in case of a fatal error. Need to think how to fix this.
Comment #13
sunAttached patch simplifies the changes and should resolve the verbose debug output issue for positive tests.
Comment #15
sunSo #13 actually works as intended.
However, as of now, DrupalUnitTestCase only sets and changes the database prefix, but does not update it correctly in Simpletest's test_id table. The patch in #1563620: All unit tests blow up with a fatal error fixes that.
Comment #16
catchSo #13 exposes the unit tests bug that's been hidden for months?
Comment #17
BerdirSeparate issue, but is there a way to extract the actual error and display it somehow? It can be very hard to see what the actual error is currently.
I'm not sure what exactly is the state here, does this need to go in first or the linked issue? Or doesn't it matter? Anyway, this certainly needs a version of the patch without the forced fatal error in node.test.
Comment #18
sunThe actual error is supposed to be captured and inserted into SimpleTest's assertion table. In case of fatal errors the actual error is read from the PHP error.log, and this happens after the fact (and is only possible in run-tests.sh, since simpletest.module runs in the same process as the test).
Note that this issue/change is about web/integration tests, not unit tests.
In order to be able to clean up the test environment after a test run with fatal error(s), the test runner needs to know the database prefix that was used to set up the test.
Simpletest normally records the database prefix for each test being run. Simpletest itself attempts to clean up the test environment after each test run. In case of a fatal error, those Simpletest functions are no longer executed.
The current run-tests.sh script does not contain any clean up code currently. This patch basically adds the same clean-up operations to run-tests.sh, which are executed whenever a test runner completes.
However, run-tests.sh supports a special notion of concurrently executing multiple test runner threads in sub-processes, which was only added to the run-tests.sh script, but not to Simpletest. Right now, run-tests.sh uses the same test ID for all tests being executed. This means that, in concurrency, run-tests.sh is not able to retrieve the database prefix that was used to setup the test site, because the concurrently executed test runners overwrite the recorded database prefix. Therefore, this patch changes run-tests.sh to use a separate unique test ID for each test runner, so each test runner can retrieve the database prefix, and in turn, clean up the environment after test execution.
This also implies a functional change -- the configuration option that controls whether the test environment should be cleaned after a test run is ignored.
Unit tests do not properly set up a database prefix currently - even though they are supposed to record any errors and fatal errors in the same way like web tests. That's why this patch depends on #1563620: All unit tests blow up with a fatal error
Comment #19
sunNote: We're testing these patches in #1591812: Sun's Simpletest patches currently.
Comment #20
sunTechnically, this should work correctly now.
Comment #22
sunExcellent.
And now. Let's try to kill a bot.
Comment #24
sunNo improvement. :( Sorry.
http://qa.drupal.org/pifr/test/135299 is stuck with another, subsequent test now.
The review log still contains MySQL data file I/O errors. I don't know why.
Technically, the concurrency fix for run-tests.sh worked. But at some point, MySQL suddenly starts with I/O errors on data files.
I wonder whether the testbot client itself attempts to perform additional clean-up operations on its own, which might conflict?
EDIT: The only actual improvement is that the patch/test didn't come back as "green" (with 0 passes).
Comment #25
BerdirThat does sound like a full file system (RAM in our case) issue...
Comment #26
sunRight. However, that's exactly what this patch tries to prevent.
Comment #27
BerdirWell, I think it's lying, because it's not actually removing the tables.
I've applied the latest patch and started executing all tests with a concurrency of 4. And my table list kept growing and growing despite it telling me that it removed them. Did some SHOW TABLES like 'prefix%'; and the exact amount of tables that it claimed to have removed where still there.
Comment #28
BerdirAh, I know why.
Guess what's wrong here ;)
Comment #29
jthorson CreditAttribution: jthorson commentedNice catch!
Let's try this one. (Same patch, without the 's'.)
Comment #31
jthorson CreditAttribution: jthorson commentedNice ... expected failure, many fatals, but no PDO exceptions or "bot-killing" /tmpfs errors this time! That's easily the most exciting 'failed' test I've seen in a LONG time! :)
Comment #32
jthorson CreditAttribution: jthorson commentedThis version strips out the entity_create('foo'), and
the commented code inside of index.phpEDIT: Half of a patch which shouldn't have been here in the first place!I believe that should be enough to run green (and thus represent the patch for actual commit).Comment #34
jthorson CreditAttribution: jthorson commentedDoh ... looks like a previous patch snuck into #29.
Here's two patches ... one with the test-bot killing fatals (#22 with the extra 's' removed), and one without (should be the 'green' patch for application).
Comment #36
jthorson CreditAttribution: jthorson commentedOdd ... the without-fatal test failed the first time through, with PDO errors on an image-related test. Re-test run came back clean.
Comment #37
sunTo make sure we're not introducing random test failures, I'm resending this patch for re-test all over again a bit... (already done so two times)
Comment #38
sun#34: drupal8.run-tests-fatal-clean.34.without-fatal.patch queued for re-testing.
Comment #39
Dries CreditAttribution: Dries commentedI reviewed this patch and it looks good. I'm going to mark it RTBC and give it a bit more time before getting this committed.
Comment #40
Dries CreditAttribution: Dries commentedCommitted to 8.x. Thanks.
Comment #41
BerdirLooking a http://qa.drupal.org/pifr/test/279838, which has some fatal errors but also working tests. The testbot however now doesn't list anymore which tests failed and which don't, so you have to go through the detail log to find out what exactly happened. So the nice thing is that you actually can do that and actuall see a PHP fatal error there but there is no nice overview anymore.
Can we do anything anywhere (run-tests.sh, testbot, ..) to improve that?
Comment #42
catchComment #43
jthorson CreditAttribution: jthorson commentedItem in #41 opened as #1615232: Have PIFR provide a results summary even for failed tests..
Comment #44
aspilicious CreditAttribution: aspilicious commentedPatch and a diff to proof the patch is identical as the D8 version.
Comment #45
sunheh. Just wanted to cry and ask why and how on earth #44 was able to pass, but "fortunately", only the reported test result is misleading - the review log contains fatal errors.
That is, because this patch depends on the changes in #1563620: All unit tests blow up with a fatal error, so postponing on that :)
Comment #46
sunThe primary dependency landed: #1541958: Split setUp() into specific sub-methods
The second dependency is: #1563620: All unit tests blow up with a fatal error
After that, we can backport this one (whereas this patch is the most straightforward of the chain).
Comment #47
sunThe primary purpose of backporting these changes was to retain a comparable testing framework between D8 and D7, so as to make future backports easier.
Getting close to D8 feature freeze, and facing a heavily diverged simpletest framework between D8 and D7, it's close to impossible to retain "similarity" between major core versions, and doing so involves a huge risk of breaking things badly in D7.
Therefore, I'm calling out the end to testing framework backports.
For D7, you get to work with the current mess. At the very least, that's "predictable."
Needless to say, actual bug fixes are excluded from that. However, given that there are 30,000+ test assertions that pass against D7, people will have to make pretty good arguments and show solid proof that there is actually a bug somewhere. ;)
Let's focus on the future. (Which isn't Simpletest either way.)
Thanks everyone for working on this and making Drupal awesome! :)