Problem/Motivation

Current implementation of Drupal\Core\Mail\MailManager::doMail() doesn't produce valid From/Sender/Reply-to header field if ()<>[]:;@\,." characters are used in a ASCII string.
On #2717965: Site name is not UTF-8 encoded in email headers we made sure non-ASCII characters are encoded, together with any eventual special characters, by using Unicode::mimeHeaderEncode() helper. But if the string contains only ASCII then specials are left in the string and the mail syntax is not RFC-2822 compliant. See #12 for a full explanation of the problem.

Original Issue Summary:

According to https://www.drupal.org/project/drupal/issues/2717965 - the site name was not properly UTF-8 encoded, and so messages going out from our contact form were being systematically rejected by Google (our email provider).

However, we don't have "special" characters in our site name, but rather, a comma. Our site name is "Redfin Solutions, LLC" (as that's our legal name), but this is rejected by Google with "Messages with multiple addresses in From: 550 5.7.1 header are not accepted."

I can confirm that removing the ", LLC" from our site name allows the messages to be sent.

Proposed resolution

#12 outlines the problem and suggests as possible solution transforming the string in "quoted-string" by quoting the text with double quotes and escaping '"' and '\' accordingly.

Remaining tasks

  • Identify and document the issue (see #12)
  • Produce a failing test(see #14 and #17)
  • Work on a patch (see #17)
  • Validate the solution
  • Write a Change Record if required.
  • Review, Commit.

User interface changes

No UI changes.

API changes

A new Drupal\Component\Utility\Mail component utility class is added. Potentially we can implement other helpers in the future, but currently it just implements a formatDisplayName() static method to format RFC-2822 compliant "display-name" components.
Modules providing mail handling can benefit from this helper.

Data model changes

No changes.

Comments

cwells created an issue. See original summary.

cilefen’s picture

It helps in a case like this to relate the new issue from the older issue, so as to draw attention. I've just done that.

Did #2717965: Site name is not UTF-8 encoded in email headers actually break this for you?

liam morland’s picture

Did it used to work? You could use git bisect to figure out which commit broke it.

chrisfromredfin’s picture

No, I don't think it has ever worked. We recently discovered that our contact form wasn't sending and went digging as to why. I suspected the comma once I found the mail logs error from Google. But then I got optimistic when I found #2717965 so I upgraded from 8.4.3 to 8.4.4; however, it was still not working.

That is, I *think* that the fix in the related issue is a good fix (was based on upper-ASCII type characters, like e-accent), but may be incomplete?

liam morland’s picture

If a site name has accented characters, it will get a encoded, which would take care of the comma problem. A patch that triggers encoding for a wider range of characters would fix it. It could probably also be fixed by putting the site name in quotes in the email.

chrisfromredfin’s picture

That sounds like the right thing, since typically RFC-compliant emails would look like "Redfin Solutions, LLC" <411@redfinsolutions.com>

ibullock’s picture

I'm having a similar issue, but in my case the character seeming to cause issues is a single quote.

gambry’s picture

Version: 8.4.4 » 8.6.x-dev
Status: Active » Needs review
StatusFileSize
new1.39 KB

In my case the issue was with the From header being Foo&#039;s Bar. That's because my site name is "Foo's Bar" and - in my case through Webform email handler - this is encoded and then processed.

To be honest I stopped reading the RFC 822, but my understanding is either the RCF822 doesn't allow or Gmail doesn't like ,, &, # and/or ; in (From: only?) header - and there be more - unless wrapped in double quotes OR as UTF-8 mime-encoded string.

mimeHeaderEncode() doesn't have any effect on processing a string not matching the condition if (preg_match('/[^\x20-\x7E]/', $string)) {}, so the string is returned as it is.

The best solution would be to use \Drupal\Component\Utility\Html::decodeEntities() right before the condition as well as wrapping the From name bit in double-quotes. Attached a patch with both of them.

Of course just wrapping the string with double-quote may solve the issue, but they you may have disgusting things in the email like From: Foo's Bar

Thoughts?

PS: this patch affects and tries to fix the issue in Core, however in my case as well as the other people above the problem seems to be somewhere else (webform for me). All those cases may need to be fixed in a similar way BUT separately, depending by the outcome of this issue.

gambry’s picture

Forgot to add:
- if the string is wrapped with " we must escape them.
- if the string is wrapped with " I'm not sure mime-encoded text is interpreted correctly.

I haven't added these on the patch as that is just a PoC to start the conversation from.

gambry’s picture

I found this more readable explanation of RFC-822.

An atom is a sequence of printable ASCII characters except space or any of the following:
()<>@,;:\".[]
Positively speaking, this means that the valid constituents of an atom are the following:

!"#$%&'*+-/0123456789=?
@ABCDEFGHIJKLMNOPQRSTUVWXYZ^_
`abcdefghijklmnopqrstuvwxyz{|}~

and

A quoted string is formed by using normal ASCII quotation marks (") around a string, and it's a way of turning almost any string syntactically to a word. This means for example that a string containing a space (say, Jukka Korpela) becomes acceptable when quoted, in a context where the syntax requires a word. A quoted string may contain any ASCII character, but quotation mark (") or carriage return (CR control code) must be preceded by a reverse solidus (backslash, \), and the reverse solidus itself as a character must be written as doubled (\\).

Status: Needs review » Needs work

The last submitted patch, 8: 2936032-7.patch, failed testing. View results

gambry’s picture

TL;DR: Current implementation of Drupal\Core\Mail\MailManager::doMail() doesn't produce valid From/Sender/Reply-to header field if ()<>[]:;@\,. characters are used in a ASCII string.

The full story

RFC 822 and RFC 2822 define as Originator fields (RFC-2822 section 3.6.2):

The originator fields of a message consist of the from field, the
sender field (when applicable), and optionally the reply-to field.

Those are "From", "Sender" and "Reply-to" header fields. The label are called "field-name", and after the ":" (colon) we have the "field-body" which for originator fields it can be:

from = "From:" mailbox-list CRLF
sender = "Sender:" mailbox CRLF
reply-to = "Reply-To:" address-list CRLF

What mailbox, mailbox-list and address-list are is defined in RFC-2822 section 3.4. You can read the whole specification, but in summary:

mailbox = name-addr / addr-spec
name-addr = [display-name] angle-addr
angle-addr = "<" addr-spec ">"
display-name = phrase

So for example "From" header field can contain comma separated instances of:
- test@example.org (adds-spec)
- <test@example.org> (angle-addr)
- Test <test@example.org> (display-name adds-spec)

In the last example, Test is the "display-name" component of our "From" field body and can only be a phrase and it must follow its syntax rules.

The phrase, word and atom components

RFC-2822 from section 3.2.4 to section 3.2.6 tells us a phrase can be made of 1 or more white-space separated words, and a word is either an atom or a quoted-string.

An atom can contain any ASCII alpha-numeric characters as well as !#$%&'*+-/=?^_`{|}~.
So basically it can NOT contain what the RFC defines as specials: ()<>[]:;@\,. and the double-quote character.
Worth mentioning an atom can not contain space and CTRL too, which if existing inside it the parser will simply read the spaced atom as a phrase of multiple atoms. So we can say they are not allowed but can be used. However the result may be unexpected (i.e. if using "Q" encoded-word with spaces, see below).

As an atom can not contain specials - i.e. a "," (comma) - the RFC let us use those within a word if represented as quoted-string, which is a string wrapped with double-quotes:

The only limitation with quoted-string is double-quote, "\" and CTRL can be used if escaped with \", \\ and \[CTRL] specifically.

What if our header field-body string has non-ASCII characters

RFC-2822 allows only ASCII text in headers, but RFC 2047 (revision of the original RFC-1342) helps us defining the rules for Non-ASCII Mail headers by introducing Encodings:

An "encoded-word" is a sequence of printable ASCII characters that
begins with "=?", ends with "?=", and has two "?"s in between. It
specifies a character set and an encoding method, and also includes
the original text encoded as ASCII characters, according to the rules
for that encoding method.

A mail composer that implements this specification will provide a
means of inputing non-ASCII text in header fields, but will translate
these fields (or appropriate portions of these fields) into encoded-
words before inserting them into the message header.

And it continues:

[an "encoded-word" can be used] As a replacement for a "word" entity within a "phrase", for example,
one that precedes an address in a From, To, or Cc header.

So a phrase can now be a space/CTRL delimited list of words OR encoded-words.

Let's come back to our issue

If the display-name component of a From: header (and similar):
- has only ASCII characters and no specials, can be left as it is.
- has only ASCII characters including any specials should be transformed to either a quoted-string (preferred) or a "Q" encoded-word (strongly discouraged, due specials still not allowed!).
- has non-ASCII characters, must be encoded either as a single encoded-word or multiple ones, whatever is easier/better. As we "B" encode (Base64 encoding) the string, we don't have problems if the string contains specials, as they will be safely encoded.

Current implementation of Drupal\Core\Mail\MailManager::doMail() covers perfectly last scenario, but leaves us uncovered with the first two. That explains the bug on this issue: if the display-name with ASCII characters includes specials GMail - and maybe more provider - refuse processing the email due malformed header.

gambry’s picture

Assigned: Unassigned » gambry

I'm going to work on this.

gambry’s picture

Assigned: gambry » Unassigned
Status: Needs work » Needs review
StatusFileSize
new2.73 KB

Attached a test-only patch to prove the bug.

The additional test coverage makes sure "display-name" is correctly quoted and escaped if in ASCII context, but left untouched if in a UTF-8 encoded string.

I think the best way to tackle this issue is to create a Drupal/Component/Utility/Mail class helper with a static method to massage the value of the string, and either call Unicode::mimeHeaderEncode() and/or quote+escape if needed.

This left out the issue with html-encoded characters. I would add this into the helper as well, although I'm not 100% sure that's the best place.

Setting Needs Review only for triggering the bot. This issue still Needs Work.

Status: Needs review » Needs work

The last submitted patch, 14: 2936032-14_test-only.patch, failed testing. View results

gambry’s picture

I'm working on this.

gambry’s picture

Status: Needs work » Needs review
StatusFileSize
new8.72 KB
new4.74 KB
new5.98 KB

In this patch:

  • A new Mail component utility class is added. Potentially we can implement other helpers, but currently it just implements a formatDisplayName() static method to format RFC-2822 compliant "display-name" components.
  • Unit testing the Mail utility class
  • MailManager now use the Mail::formatDisplayName() instead of Unicode::mimeHeaderEncode()

I've added a new test-only patch, integrating the Mail utility unit test.

Status: Needs review » Needs work

The last submitted patch, 17: 2936032-17-test-only.patch, failed testing. View results

gambry’s picture

Status: Needs work » Needs review

Test only fails as expected (additional failures from #14 as the Mail utility class doesn't exist).

gambry’s picture

Title: Site name with comma in it cannot send mail to GMail » Site name with special characters cannot send mail
Issue summary: View changes

Updating the Issue Summary after discovery done in #12.

cilefen’s picture

If the mail is actually sent, would something like the TLDR part of #12 make a more accurate title?

cilefen’s picture

Title: Site name with special characters cannot send mail » Sites named with special characters cannot send mail
gambry’s picture

Status: Needs review » Needs work

@cilefen currently the mail header is not RFC compliant, so everything can happen. The symptom we discovered was due GMail bad-interpreting a comma as a list delimiter, in a sent mail. But there may be mail systems refusing to send the mail at all.

I left the "cannot send mail" because is what probably as user will look like trying to find an issue related to the problem, and then I added #12 TLDR as issue introduction. Feel free to swap them.

@alexpott suggested to have a look at Swiftmailer and how they try to solve the problem. Switfmailer processes RFC-2822 phrases with a proper AbstractHeader::createPhrase(), but we Core is not meant to be a perfect mailing tool.
It anyway does more or less the same we do - quote and escape if ASCII string with specials - without the check for avoiding formatting a string twice.

I'm temporary setting this to Needs Work while doing a better comparison of our solution and test coverage with Swiftmailer one.

gambry’s picture

Status: Needs work » Needs review

From my research it looks like we are doing pretty much the same as swiftmailer. Below the differences and a couple of possible suggestions to be done in follow-ups and/or D9.

TL;DR: I don't see any critical difference, so setting back Needs Review.

Differences between #17 and Swiftmailer

  1. AbstractHeader::createPhrase() first analyses if the string is already a valid phrase, if not it proceeds with the RFC2822 compliant formatting. We don't have the concept of phrase yet, so we do process any string either if is already compliant or not. This is acceptable, just slightly less efficient. In return we avoid its long and scary regexp.
  2. It then checks if the string contains only ASCII characters, and if yes it escapes and quotes the string, if not it encodes it. We do the other way round: we encode if any UTF-8 character and escape-&-quote if any Mail::RFC_2822_SPECIALS has remained in the string (encoding will remove them, so if any still exists the string is ASCII only and needs compliant formatting). I don't see any issue with that, besides we avoid the horrifying regexp /^[\x00-\x08\x0B\x0C\x0E-\x7F]*$/D IMO less readable than our /[^\x20-\x7E]/in Util/Unicode::mimeHeaderEncode() (although our regexp let encode some ASCII characters like NULL or DEL, which is harmless though).
  3. Swiftmailer doesn't explicitly take in consideration the string may already be escaped. This is probably done by the regexp from point 1, but I would really avoid it because it's totally unreadable. Instead patch on #17 takes a defensive approach assuming any ASCII-only already quoted string may not be compliant, so it resets any escaping and process the string anyway (and we've got test coverage for these scenarios).

Suggestions for follow-ups
Drupal Core is not meant to be a perfect mailing system. It should be compliant with RFCs but allowing all scenarios and considering all the components (phrases, words, atoms, etc.) will be a massive and worthless work. If we really want that it would be better - and quicker - to add something like swiftmailer as dependency and use it. However few things could be made to improve the mail subsystem:

  1. Implementing the concept of phrase in Utility/Mail, to process mail headers more easily instead of processing their single parts separately.
  2. When we've got solution 1 we move any RFC* formatting related code out from MailManager and add to PhpMail, where it's supposed to stay. So this code in PhpMail:
    $mimeheaders = [];
    foreach ($message['headers'] as $name => $value) {
      $mimeheaders[] = $name . ': ' . Unicode::mimeHeaderEncode($value);
    }
    

    may become:

    $mimeheaders = [];
    foreach ($message['headers'] as $name => $value) {
      $mimeheaders[] = $name . ': ' . Mail::formatPhrase($value);
    }
    

    Which is going to take care of any components and its quoting, escaping, encoding etc.

Version: 8.6.x-dev » 8.7.x-dev

Drupal 8.6.0-alpha1 will be released the week of July 16, 2018, which means new developments and disruptive changes should now be targeted against the 8.7.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

gambry’s picture

Priority: Normal » Major
Issue summary: View changes

Changing the priority to Major because emails may not be sent at all in websites where this bug is triggered .

  • Interfere with normal site visitors' use of the site (for example, content in the wrong language, or validation errors for regular form submissions), even if there is a workaround.

The workaround would be to change the site name, however that's not always possible due business requirements for example if any specials (()<>[]:;@\,.") is part of the name of the company/product.

Also fixing grammar on "API Changes" section.

damiengr’s picture

+++ b/core/lib/Drupal/Core/Mail/MailManager.php
@@ -3,7 +3,7 @@
+use Drupal\Component\Utility\Mail as MaiHelper;

Is there a typo on MailHelper ?

gambry’s picture

StatusFileSize
new1.09 KB
new8.74 KB

@DamienGR it is a typo. This is embarrassing! :D

This patch fixes that typo and re-roll after #2935255: Remove all usages of drupal_set_message and drupal_get_messages in core lib.

darvanen’s picture

Status: Needs review » Needs work
Issue tags: +Needs subsystem maintainer review

With the exception of 6, these are super nit-picky things to do with grammar and readability. Take them or leave them, just providing a point of view.

The code looks good to me, and has had a few eyes over it now. I say once 6 is done we can make this RTBC and get it in front of a maintainer.

  1. +++ b/core/lib/Drupal/Component/Utility/Mail.php
    @@ -0,0 +1,62 @@
    + * Provides RFCs compliant helpers for e-mails.
    

    I find "RFCs compliant" jarring, suggest rewording to "Provides helpers to ensure emails are compliant with RFCs"

    Also, is that the full intended scope of this class? Could it ever be used for something else? If so, should the comment reflect that?

  2. +++ b/core/lib/Drupal/Component/Utility/Mail.php
    @@ -0,0 +1,62 @@
    +   * RFC-2822 "specials" characters.
    

    After reading the RFC I can see why you're using the plural of "specials" here but I don't think that's necessary, the plural form is always used in context in the RFC and they do refer to "special characters" in other places.

    Personally I would prefer we used the more grammatically correct singular form "special" here, without the quotes.

    RFC-2822 special characters.

  3. +++ b/core/lib/Drupal/Component/Utility/Mail.php
    @@ -0,0 +1,62 @@
    +  const RFC_2822_SPECIALS = '()<>[]:;@\,."';
    

    Whereas, this would remain as the plural form because it makes sense here.

  4. +++ b/core/lib/Drupal/Component/Utility/Mail.php
    @@ -0,0 +1,62 @@
    +    // processed if decode.
    

    decoded

  5. +++ b/core/lib/Drupal/Component/Utility/Mail.php
    @@ -0,0 +1,62 @@
    +    // If string contains only ASCII characters it won't be encoded and so it
    +    // can still contains forbidden "specials" characters.
    

    it can still contain

    -s

  6. +++ b/core/lib/Drupal/Component/Utility/Mail.php
    @@ -0,0 +1,62 @@
    \ No newline at end of file
    

    There should be a newline at the end of the file per Drupal coding standards.

gambry’s picture

Thanks for the review @Darvanen!

After reading the RFC I can see why you're using the plural of "specials" here but I don't think that's necessary, the plural form is always used in context in the RFC and they do refer to "special characters" in other places.

They are called "specials" by the RFCs RFC-822 and RFC-2822:

RFC-822, 3.3:

specials = [...] Must be in quoted-string, to use within a word.

RFC-2822, 3.2.1:

specials = [...] Special characters used in other parts of the syntax

Also swiftmailer use "specials", quoted.

But I kind of see your point, so I can update the comment to read RFC-822 "specials" special characters, as specials is the - unfortunate - name and not an adjective.

Working on the feedback.

dhirendra.mishra’s picture

Assigned: Unassigned » dhirendra.mishra

will work on comment from #29

dhirendra.mishra’s picture

Assigned: dhirendra.mishra » Unassigned
nlisgo’s picture

Status: Needs work » Needs review
StatusFileSize
new8.75 KB

This is a re-roll patch. I will follow up with a patch to address #29 feedback.

nlisgo’s picture

StatusFileSize
new8.73 KB
new1.48 KB

Addresses feedback from #29 but takes in to consideration response in #30.

darvanen’s picture

Status: Needs review » Reviewed & tested by the community
Issue tags: -Needs subsystem maintainer review

Seems there's a solid reason behind using "specials" so ok, I'm certainly not gonna get my knickers in a twist about it :)

#34 looks good to me.

I'm told there's no subsystem maintainer for mail, so I'll remove that tag.

alexpott’s picture

Status: Reviewed & tested by the community » Needs work
Issue tags: +Needs change record

This change looks comprehensive. I think we should create a change record to announce the new Mail utility class.

+++ b/core/lib/Drupal/Component/Utility/Mail.php
@@ -0,0 +1,62 @@
+    // If string contains only ASCII characters it won't be encoded and so it
+    // can still contain forbidden "specials" characters.
+    if (preg_match('/[' . preg_quote(Mail::RFC_2822_SPECIALS) . ']/', $safe_display_name)) {

I think we should add something like $safe_display_name === $string - ie. they mimeHeaderEncode has done nothing. Just in case. Yes atm this is not possible but this code feels risky. I think the comments contains an assumption that is not ensured - ie If string contains only ASCII characters it won't be encoded

gambry’s picture

@alexpott I see your point, but what we need is to ensure $safe_display_name doesn't contain any "specials".

If we add $safe_display_name === $string to look like:

if ($safe_display_name === $string && preg_match('/[' . preg_quote(Mail::RFC_2822_SPECIALS) . ']/', $safe_display_name)) {}

Then if in any remote and scary future Unicode::mimeHeaderEncode() alter the original string BUT doesn't encode it, i.e. returns the trim()-ed original value, or add an initial space, etc., the replacing won't run and invalid headers will still be produced.

Test coverage should be able to flag this, but I'm wondering if the risk of not having $safe_display_name === $string is worth the risk of having it?

alexpott’s picture

@gambry but there's the other risk - if we've encoded the string all the logic inside this if shouldn't fire. If Unicode::mimeHeaderEncode() alter the original string BUT doesn't encode it - then that is super super odd.

gambry’s picture

Status: Needs work » Needs review
StatusFileSize
new1.16 KB
new9.03 KB

I've spoked with @alexpott on slack, this is the summary of the conversation:

if we've encoded the string all the logic inside this if shouldn't fire

If the string is encoded, that logic won’t fire. Encoded words don’t allow “specials” [..] the only way that could change is either:

  • The RFC changes the syntax, which is unlikely
  • Drupal doesn’t follow RFCs, which is unlikely too

But the comment is really misleading on this aspect. If Unicode::mimeHeaderEncode() encodes the string then that will NOT contain any "specials" and so the block code after the preg_match() won't run. If however Unicode::mimeHeaderEncode() does not encode the string - i.e. ASCII only string - then "specials" may still be there and so that code will be triggered.

This patch updates that comment, hopefully making this clearer.
Setting Needs Review only to trigger test bot. I'm going to create a Change Record and so until then this is to be considered Needs Work.

gambry’s picture

Issue tags: -Needs change record

Change Record is created, tests are green. Happy Review!

alexpott’s picture

Status: Needs review » Reviewed & tested by the community

Thanks for improving the comment.

alexpott’s picture

Creditting @cwells for creating the issue, @cilefen for issue triage, and @alexpott, @Darvanen and @DamienGR for patch reviews.

alexpott’s picture

Version: 8.7.x-dev » 8.6.x-dev
Status: Reviewed & tested by the community » Fixed

Even though this adds a new class and has a change record this is a non-disruptive bugfix so going to backport to 8.6.x.

Committed and pushed 53c24728fb to 8.7.x and 01a729ab2b to 8.6.x. Thanks!

diff --git a/core/lib/Drupal/Component/Utility/Mail.php b/core/lib/Drupal/Component/Utility/Mail.php
index 623c6d510f..423cfb2668 100644
--- a/core/lib/Drupal/Component/Utility/Mail.php
+++ b/core/lib/Drupal/Component/Utility/Mail.php
@@ -63,4 +63,5 @@ public static function formatDisplayName($string) {
 
     return $safe_display_name;
   }
+
 }
diff --git a/core/lib/Drupal/Core/Mail/MailManager.php b/core/lib/Drupal/Core/Mail/MailManager.php
index 838273e3a0..5cf3c9ea05 100644
--- a/core/lib/Drupal/Core/Mail/MailManager.php
+++ b/core/lib/Drupal/Core/Mail/MailManager.php
@@ -6,7 +6,6 @@
 use Drupal\Component\Render\PlainTextOutput;
 use Drupal\Component\Utility\Html;
 use Drupal\Component\Utility\Mail as MailHelper;
-use Drupal\Component\Utility\Unicode;
 use Drupal\Core\Logger\LoggerChannelFactoryInterface;
 use Drupal\Core\Messenger\MessengerTrait;
 use Drupal\Core\Plugin\DefaultPluginManager;

Fixed the above coding standards issues on commit.

  • alexpott committed 53c2472 on 8.7.x
    Issue #2936032 by gambry, nlisgo, alexpott, cwells, cilefen, Darvanen,...

  • alexpott committed 01a729a on 8.6.x
    Issue #2936032 by gambry, nlisgo, alexpott, cwells, cilefen, Darvanen,...

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.