Problem/Motivation
Current implementation of Drupal\Core\Mail\MailManager::doMail() doesn't produce valid From/Sender/Reply-to header field if ()<>[]:;@\,." characters are used in a ASCII string.
On #2717965: Site name is not UTF-8 encoded in email headers we made sure non-ASCII characters are encoded, together with any eventual special characters, by using Unicode::mimeHeaderEncode() helper. But if the string contains only ASCII then specials are left in the string and the mail syntax is not RFC-2822 compliant. See #12 for a full explanation of the problem.
Original Issue Summary:
According to https://www.drupal.org/project/drupal/issues/2717965 - the site name was not properly UTF-8 encoded, and so messages going out from our contact form were being systematically rejected by Google (our email provider).
However, we don't have "special" characters in our site name, but rather, a comma. Our site name is "Redfin Solutions, LLC" (as that's our legal name), but this is rejected by Google with "Messages with multiple addresses in From: 550 5.7.1 header are not accepted."
I can confirm that removing the ", LLC" from our site name allows the messages to be sent.
Proposed resolution
#12 outlines the problem and suggests as possible solution transforming the string in "quoted-string" by quoting the text with double quotes and escaping '"' and '\' accordingly.
Remaining tasks
Identify and document the issue(see #12)Produce a failing test(see #14 and #17)Work on a patch(see #17)- Validate the solution
- Write a Change Record if required.
- Review, Commit.
User interface changes
No UI changes.
API changes
A new Drupal\Component\Utility\Mail component utility class is added. Potentially we can implement other helpers in the future, but currently it just implements a formatDisplayName() static method to format RFC-2822 compliant "display-name" components.
Modules providing mail handling can benefit from this helper.
Data model changes
No changes.
| Comment | File | Size | Author |
|---|---|---|---|
| #39 | 2936032-39.patch | 9.03 KB | gambry |
| #39 | interdiff-34-39.txt | 1.16 KB | gambry |
| #17 | 2936032-17-test-only.patch | 4.74 KB | gambry |
Comments
Comment #2
cilefen commentedIt helps in a case like this to relate the new issue from the older issue, so as to draw attention. I've just done that.
Did #2717965: Site name is not UTF-8 encoded in email headers actually break this for you?
Comment #3
liam morlandDid it used to work? You could use
git bisectto figure out which commit broke it.Comment #4
chrisfromredfinNo, I don't think it has ever worked. We recently discovered that our contact form wasn't sending and went digging as to why. I suspected the comma once I found the mail logs error from Google. But then I got optimistic when I found #2717965 so I upgraded from 8.4.3 to 8.4.4; however, it was still not working.
That is, I *think* that the fix in the related issue is a good fix (was based on upper-ASCII type characters, like e-accent), but may be incomplete?
Comment #5
liam morlandIf a site name has accented characters, it will get a encoded, which would take care of the comma problem. A patch that triggers encoding for a wider range of characters would fix it. It could probably also be fixed by putting the site name in quotes in the email.
Comment #6
chrisfromredfinThat sounds like the right thing, since typically RFC-compliant emails would look like
"Redfin Solutions, LLC" <411@redfinsolutions.com>Comment #7
ibullockI'm having a similar issue, but in my case the character seeming to cause issues is a single quote.
Comment #8
gambryIn my case the issue was with the From header being
Foo's Bar. That's because my site name is "Foo's Bar" and - in my case through Webform email handler - this is encoded and then processed.To be honest I stopped reading the RFC 822, but my understanding is either the RCF822 doesn't allow or Gmail doesn't like
,,&,#and/or;in (From: only?) header - and there be more - unless wrapped in double quotes OR as UTF-8 mime-encoded string.mimeHeaderEncode()doesn't have any effect on processing a string not matching the conditionif (preg_match('/[^\x20-\x7E]/', $string)) {}, so the string is returned as it is.The best solution would be to use
\Drupal\Component\Utility\Html::decodeEntities()right before the condition as well as wrapping the From name bit in double-quotes. Attached a patch with both of them.Of course just wrapping the string with double-quote may solve the issue, but they you may have disgusting things in the email like From: Foo's Bar
Thoughts?
PS: this patch affects and tries to fix the issue in Core, however in my case as well as the other people above the problem seems to be somewhere else (webform for me). All those cases may need to be fixed in a similar way BUT separately, depending by the outcome of this issue.
Comment #9
gambryForgot to add:
- if the string is wrapped with
"we must escape them.- if the string is wrapped with
"I'm not sure mime-encoded text is interpreted correctly.I haven't added these on the patch as that is just a PoC to start the conversation from.
Comment #10
gambryI found this more readable explanation of RFC-822.
and
Comment #12
gambryTL;DR: Current implementation of
Drupal\Core\Mail\MailManager::doMail()doesn't produce valid From/Sender/Reply-to header field if()<>[]:;@\,.characters are used in a ASCII string.The full story
RFC 822 and RFC 2822 define as Originator fields (RFC-2822 section 3.6.2):
Those are "From", "Sender" and "Reply-to" header fields. The label are called "field-name", and after the ":" (colon) we have the "field-body" which for originator fields it can be:
What
mailbox,mailbox-listandaddress-listare is defined in RFC-2822 section 3.4. You can read the whole specification, but in summary:So for example "From" header field can contain comma separated instances of:
- test@example.org (
adds-spec)- <test@example.org> (
angle-addr)- Test <test@example.org> (
display-nameadds-spec)In the last example, Test is the "display-name" component of our "From" field body and can only be a
phraseand it must follow its syntax rules.The phrase, word and atom components
RFC-2822 from section 3.2.4 to section 3.2.6 tells us a
phrasecan be made of 1 or more white-space separated words, and awordis either anatomor aquoted-string.An
atomcan contain any ASCII alpha-numeric characters as well as!#$%&'*+-/=?^_`{|}~.So basically it can NOT contain what the RFC defines as
specials:()<>[]:;@\,.and the double-quote character.Worth mentioning an
atomcan not contain space and CTRL too, which if existing inside it the parser will simply read the spaced atom as a phrase of multiple atoms. So we can say they are not allowed but can be used. However the result may be unexpected (i.e. if using "Q"encoded-wordwith spaces, see below).As an
atomcan not containspecials- i.e. a "," (comma) - the RFC let us use those within awordif represented asquoted-string, which is a string wrapped with double-quotes:The only limitation with
quoted-stringis double-quote, "\" and CTRL can be used if escaped with \", \\ and \[CTRL] specifically.What if our header field-body string has non-ASCII characters
RFC-2822 allows only ASCII text in headers, but RFC 2047 (revision of the original RFC-1342) helps us defining the rules for Non-ASCII Mail headers by introducing Encodings:
And it continues:
[an "encoded-word" can be used] As a replacement for a "word" entity within a "phrase", for example,
one that precedes an address in a From, To, or Cc header.
So a
phrasecan now be a space/CTRL delimited list of words OR encoded-words.Let's come back to our issue
If the
display-namecomponent of a From: header (and similar):- has only ASCII characters and no
specials, can be left as it is.- has only ASCII characters including any
specialsshould be transformed to either aquoted-string(preferred) or a "Q"encoded-word(strongly discouraged, duespecialsstill not allowed!).- has non-ASCII characters, must be encoded either as a single encoded-word or multiple ones, whatever is easier/better. As we "B" encode (Base64 encoding) the string, we don't have problems if the string contains
specials, as they will be safely encoded.Current implementation of
Drupal\Core\Mail\MailManager::doMail()covers perfectly last scenario, but leaves us uncovered with the first two. That explains the bug on this issue: if thedisplay-namewith ASCII characters includesspecialsGMail - and maybe more provider - refuse processing the email due malformed header.Comment #13
gambryI'm going to work on this.
Comment #14
gambryAttached a test-only patch to prove the bug.
The additional test coverage makes sure "display-name" is correctly quoted and escaped if in ASCII context, but left untouched if in a UTF-8 encoded string.
I think the best way to tackle this issue is to create a
Drupal/Component/Utility/Mailclass helper with a static method to massage the value of the string, and either callUnicode::mimeHeaderEncode()and/or quote+escape if needed.This left out the issue with html-encoded characters. I would add this into the helper as well, although I'm not 100% sure that's the best place.
Setting Needs Review only for triggering the bot. This issue still Needs Work.
Comment #16
gambryI'm working on this.
Comment #17
gambryIn this patch:
Mailcomponent utility class is added. Potentially we can implement other helpers, but currently it just implements aformatDisplayName()static method to format RFC-2822 compliant "display-name" components.I've added a new test-only patch, integrating the Mail utility unit test.
Comment #19
gambryTest only fails as expected (additional failures from #14 as the Mail utility class doesn't exist).
Comment #20
gambryUpdating the Issue Summary after discovery done in #12.
Comment #21
cilefen commentedIf the mail is actually sent, would something like the TLDR part of #12 make a more accurate title?
Comment #22
cilefen commentedComment #23
gambry@cilefen currently the mail header is not RFC compliant, so everything can happen. The symptom we discovered was due GMail bad-interpreting a comma as a list delimiter, in a sent mail. But there may be mail systems refusing to send the mail at all.
I left the "cannot send mail" because is what probably as user will look like trying to find an issue related to the problem, and then I added #12 TLDR as issue introduction. Feel free to swap them.
@alexpott suggested to have a look at Swiftmailer and how they try to solve the problem. Switfmailer processes RFC-2822
phraseswith a proper AbstractHeader::createPhrase(), but we Core is not meant to be a perfect mailing tool.It anyway does more or less the same we do - quote and escape if ASCII string with
specials- without the check for avoiding formatting a string twice.I'm temporary setting this to Needs Work while doing a better comparison of our solution and test coverage with Swiftmailer one.
Comment #24
gambryFrom my research it looks like we are doing pretty much the same as swiftmailer. Below the differences and a couple of possible suggestions to be done in follow-ups and/or D9.
TL;DR: I don't see any critical difference, so setting back Needs Review.
Differences between #17 and Swiftmailer
phrase, if not it proceeds with the RFC2822 compliant formatting. We don't have the concept of phrase yet, so we do process any string either if is already compliant or not. This is acceptable, just slightly less efficient. In return we avoid its long and scary regexp.Mail::RFC_2822_SPECIALShas remained in the string (encoding will remove them, so if any still exists the string is ASCII only and needs compliant formatting). I don't see any issue with that, besides we avoid the horrifying regexp/^[\x00-\x08\x0B\x0C\x0E-\x7F]*$/DIMO less readable than our/[^\x20-\x7E]/inUtil/Unicode::mimeHeaderEncode()(although our regexp let encode some ASCII characters like NULL or DEL, which is harmless though).Suggestions for follow-ups
Drupal Core is not meant to be a perfect mailing system. It should be compliant with RFCs but allowing all scenarios and considering all the components (phrases, words, atoms, etc.) will be a massive and worthless work. If we really want that it would be better - and quicker - to add something like swiftmailer as dependency and use it. However few things could be made to improve the mail subsystem:
phrasein Utility/Mail, to process mail headers more easily instead of processing their single parts separately.may become:
Which is going to take care of any components and its quoting, escaping, encoding etc.
Comment #26
gambryChanging the priority to Major because emails may not be sent at all in websites where this bug is triggered .
The workaround would be to change the site name, however that's not always possible due business requirements for example if any specials (
()<>[]:;@\,.") is part of the name of the company/product.Also fixing grammar on "API Changes" section.
Comment #27
damiengr commentedIs there a typo on MailHelper ?
Comment #28
gambry@DamienGR it is a typo. This is embarrassing! :D
This patch fixes that typo and re-roll after #2935255: Remove all usages of drupal_set_message and drupal_get_messages in core lib.
Comment #29
darvanenWith the exception of 6, these are super nit-picky things to do with grammar and readability. Take them or leave them, just providing a point of view.
The code looks good to me, and has had a few eyes over it now. I say once 6 is done we can make this RTBC and get it in front of a maintainer.
I find "RFCs compliant" jarring, suggest rewording to "Provides helpers to ensure emails are compliant with RFCs"
Also, is that the full intended scope of this class? Could it ever be used for something else? If so, should the comment reflect that?
After reading the RFC I can see why you're using the plural of "specials" here but I don't think that's necessary, the plural form is always used in context in the RFC and they do refer to "special characters" in other places.
Personally I would prefer we used the more grammatically correct singular form "special" here, without the quotes.
RFC-2822 special characters.
Whereas, this would remain as the plural form because it makes sense here.
decoded
it can still contain
-s
There should be a newline at the end of the file per Drupal coding standards.
Comment #30
gambryThanks for the review @Darvanen!
They are called "specials" by the RFCs RFC-822 and RFC-2822:
RFC-822, 3.3:
RFC-2822, 3.2.1:
Also swiftmailer use "specials", quoted.
But I kind of see your point, so I can update the comment to read RFC-822 "specials" special characters, as specials is the - unfortunate - name and not an adjective.
Working on the feedback.
Comment #31
dhirendra.mishra commentedwill work on comment from #29
Comment #32
dhirendra.mishra commentedComment #33
nlisgo commentedThis is a re-roll patch. I will follow up with a patch to address #29 feedback.
Comment #34
nlisgo commentedAddresses feedback from #29 but takes in to consideration response in #30.
Comment #35
darvanenSeems there's a solid reason behind using "specials" so ok, I'm certainly not gonna get my knickers in a twist about it :)
#34 looks good to me.
I'm told there's no subsystem maintainer for mail, so I'll remove that tag.
Comment #36
alexpottThis change looks comprehensive. I think we should create a change record to announce the new Mail utility class.
I think we should add something like $safe_display_name === $string - ie. they mimeHeaderEncode has done nothing. Just in case. Yes atm this is not possible but this code feels risky. I think the comments contains an assumption that is not ensured - ie
If string contains only ASCII characters it won't be encodedComment #37
gambry@alexpott I see your point, but what we need is to ensure
$safe_display_namedoesn't contain any "specials".If we add
$safe_display_name === $stringto look like:Then if in any remote and scary future
Unicode::mimeHeaderEncode()alter the original string BUT doesn't encode it, i.e. returns the trim()-ed original value, or add an initial space, etc., the replacing won't run and invalid headers will still be produced.Test coverage should be able to flag this, but I'm wondering if the risk of not having
$safe_display_name === $stringis worth the risk of having it?Comment #38
alexpott@gambry but there's the other risk - if we've encoded the string all the logic inside this if shouldn't fire. If
Unicode::mimeHeaderEncode()alter the original string BUT doesn't encode it - then that is super super odd.Comment #39
gambryI've spoked with @alexpott on slack, this is the summary of the conversation:
If the string is encoded, that logic won’t fire. Encoded words don’t allow “specials” [..] the only way that could change is either:
But the comment is really misleading on this aspect. If
Unicode::mimeHeaderEncode()encodes the string then that will NOT contain any "specials" and so the block code after thepreg_match()won't run. If howeverUnicode::mimeHeaderEncode()does not encode the string - i.e. ASCII only string - then "specials" may still be there and so that code will be triggered.This patch updates that comment, hopefully making this clearer.
Setting Needs Review only to trigger test bot. I'm going to create a Change Record and so until then this is to be considered Needs Work.
Comment #40
gambryChange Record is created, tests are green. Happy Review!
Comment #41
alexpottThanks for improving the comment.
Comment #42
alexpottCreditting @cwells for creating the issue, @cilefen for issue triage, and @alexpott, @Darvanen and @DamienGR for patch reviews.
Comment #43
alexpottEven though this adds a new class and has a change record this is a non-disruptive bugfix so going to backport to 8.6.x.
Committed and pushed 53c24728fb to 8.7.x and 01a729ab2b to 8.6.x. Thanks!
Fixed the above coding standards issues on commit.