Problem/Motivation
While working with symfony_mailer_test, I noticed random test failures, where an expected email was not received.
E.g. I would run a test 10 times, and randomly around 3 out of 10 test runs would fail.
I created a custom log file to track the calls to:
MailerTestService::postSend(), from the web request.
MailerTestService::destruct(), from the web request.
MailerTestTrait::init(), from a test class.
I found that the order of these calls was not consistent between test runs.
Sometimes the ::destruct() in the web request would run _after_ the ::init() in the test class.
Successful order:
Drupal\symfony_mailer_test\MailerTestService::postSend()
Drupal\symfony_mailer_test\MailerTestService::destruct() - 1 emails
Drupal\symfony_mailer_test\MailerTestTrait::init()
Failing order:
Drupal\symfony_mailer_test\MailerTestService::postSend()
Drupal\symfony_mailer_test\MailerTestTrait::init()
Drupal\symfony_mailer_test\MailerTestService::destruct() - 1 emails
Steps to reproduce
To artificially force the bug to reproduce:
(I did not actually try this yet)
- Create a functional test that uses symfony_mailer_test to check for emails.
- Run the test.
- Insert "usleep(200000);" at the start of MailerTestService::destruct().
- Run the test again.
In the first run, the test will likely be successful.
In the second run, with the usleep(), perhaps not.
I suppose there is a range for the usleep number where the order becomes unpredictable, so that the test will randomly fail at ~50%.
I suspect it can also depend on background activity on the machine where the test runs.
Proposed resolution
Don't wait for MailerTestService::destruct(), write to the state directly in ::postSend().
This said, I would be curious about the conditions that cause a delayed ::destruct() call after a response is already sent.
Remaining tasks
User interface changes
API changes
Data model changes
Possible follow-up tasks
Actually to me the state feels a bit fragile.
It can fail if emails are sent in parallel requests.
Normally this won't happen in a test, unless emails are sent in asset or ajax requests.
A cleaner alternative would be a dedicated db table with one record per email, which can behave like a queue.
But for the scope of this issue it is enough to drop the ::destruct().
Issue fork symfony_mailer-3463648
Show commands
Start within a Git clone of the project using the version control instructions.
Or, if you do not have SSH keys set up on git.drupalcode.org:
Comments
Comment #2
donquixote commentedComment #3
donquixote commentedComment #4
donquixote commentedActually...
this is not a ::__destruct() method, but a ::destruct(), which is called from $kernel->terminate().
We see this in index.php:
So this explains the order :)
Comment #5
donquixote commentedAlso, the current solution with using state assumes that all emails are sent during a single request.
If there are multiple requests that happen before the test can collect the emails, e.g. as a batch process, and each of them is sending emails, then there will be a problem in the __construct() which expects the state collected mails to be empty.
A database table with one row per mail and auto increment would solve this.
There can be a counter in the test trait that contains the last read auto increment id.
In the ::setUp() method the table can be truncated, and the auto increment reset, OR the internal counter variable/property can be set to the current auto increment counter from the db.
We also want this to work with ExistingSite tests, where we cannot rely on the module being freshly installed each time.
Comment #8
donquixote commentedThis is interesting.
The "FAIL" merge request does not actually fail the phpunit test.
But in my local env (with ddev) I do get test failures, even without the usleep().
One thing I suspect is that usleep()/sleep() does not really work in the pipeline.
Comment #9
donquixote commentedIn my local I get these failures with the FAIL branch:
With the 1.x branch I get 1 failure:
With the fix branch, all tests pass.
Comment #12
adamps commentedThanks.
I believe the current code was written for efficiency with a test that sent 1000s of emails. I guess the new code would be slower in that case?? But core TestMailCollector also rewrites the state for every mail, so it probably doesn't matter much.
Please can you raise a follow on issue for the other problems you spotted?