Problem/Motivation

Per i18n, hreflang tags are allowed to have more then one translation point to the same link. However, using the core (includes/common.inc) drupal_add_html_head_link() function, this is not possible. Currently, drupal_add_html_head_link() function considers the same rel and href values to be duplicates and removes all but one, breaking attempts to add multiple hreflang links pointing to the same href.

See Google write-up of this i18n consideration for hreflang uniqueness: https://www.rebelytics.com/multiple-hreflang-tags-one-url/

Proposed resolution

Add hreflang value as part of the uniqueness check (if hreflang attribute is provided). Rather then just checking rel / href, a unique link tag would consider the 3 attribute values of rel / href / hreflang as a unique link.

API changes

Modify the drupal_add_html_head_link() function in includes/common.inc to included hreflang as part of it's check for a unique link tag in order to prevent removal of duplicate link tags that have different hreflang attribute values but the same rel/href attributes.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

ejanus created an issue. See original summary.

ejanus’s picture

Status: Active » Needs review
FileSize
646 bytes
joseph.olstad’s picture

Hmm, yes makes sense to me. Nice work catching this. You must be spending some time doing SEO (search engine optimisation). This is a valuable patch.

ejanus’s picture

Issue summary: View changes
sjerdo’s picture

I suggest adding the hreflang parameter to the functions docblock:

/**
 * Adds a LINK tag with a distinct 'rel' attribute to the page's HEAD.
 *
 * This function can be called as long the HTML header hasn't been sent, which
 * on normal pages is up through the preprocess step of theme('html'). Adding
 * a link will overwrite a prior link with the exact same 'rel' and 'href'
 * attributes.
 *
 * @param $attributes
 *   Associative array of element attributes including 'href' and 'rel'.
 * @param $header
 *   Optional flag to determine if a HTTP 'Link:' header should be sent.
 */
function drupal_add_html_head_link($attributes, $header = FALSE) {
msankhala’s picture

Thanks, @ejanus for the patch and this improvement. Here is updated patch with updated doc block as per #5.

surbz’s picture

Status: Needs review » Reviewed & tested by the community

Thanks @ejanus and @msankhala for the patch.
I can confirm patch #6 applies cleanly on 7.x branch.
The code looks fine and the tests are also passing.

Since #6 contains everything from #2 we would only be needing #6.
Marking --links-2959727.patch as hidden to avoid confusions for anyone who drops in here looking for the fix.

MustangGB’s picture

Issue tags: +Drupal 7.60 target
joseph.olstad’s picture

+1; All my clients use i18n on all of their Drupal sites. Would be helpful to have this core patch.

prashantgajare’s picture

LGTM ... Patch works well!

ejanus’s picture

Issue tags: -Drupal 7.60 target +Drupal 7.61 target

Updated target to 7.61 due to core security patch taking 7.60 release.

joseph.olstad’s picture

joseph.olstad’s picture

ejanus’s picture

joseph.olstad’s picture

joseph.olstad’s picture

MustangGB’s picture

Issue tags: -Drupal 7.68 target +Drupal 7.69 target
MustangGB’s picture

Issue tags: -Drupal 7.69 target +Drupal 7.70 target
akhilavnair’s picture

Hi,
It will be more useful if you merge this core patch to the latest drupal 7 version.

mcdruid’s picture

Status: Reviewed & tested by the community » Needs work
Issue tags: -Drupal 7.70 target +Needs tests

Looks good but I'd definitely want to see tests for a change like this.

mfb’s picture

The Drupal 9 counterpart for this bug fix is finally in - #2945033: HtmlHeadLink processing does not allow for duplicated alternate hreflang links - so hopefully we could add a test and land this soon.. :)

mfb’s picture

Ok, added a test. I couldn't actually find any preexisting tests for drupal_add_html_head_link() - maybe I missed it though?

I tweaked the order of attributes in the element name to match that in Drupal 9; also tweaked the docblock as hreflang isn't a required attribute.

Status: Needs review » Needs work
mfb’s picture

Status: Needs work » Needs review

Hrm the test-only patch was intended to fail; restoring status.

mfb’s picture

I found an unrelated issue with this code, which is that the Link HTTP header URL should not be HTML-encoded, but is for some reason, by running it thru check_plain(). The tests I added here actually depend on the buggy behavior :p

mfb’s picture

joseph.olstad’s picture

Status: Needs review » Reviewed & tested by the community

@mfb,
nicely done!

mfb’s picture

ok great! If we need to separate these two bug fixes into two issues we can..

mcdruid’s picture

It's great to see a (really decent) test added with this, thanks!

The removal of check_plain() is a bit tricky though.

AFAICS the code we're modifying was committed aeons ago in #552478: Restrict "self-closing" tags to only void elements in drupal_pre_render_html_tag and there's some discussion in that issue about check_plain().

It seems true that value of the HTTP header should not be encoded, so that makes sense.

Doing a quick test with a malicious href attribute suggests that the (rendered) value is encoded without the check_plain in drupal_add_html_head_link() - for example:

  drupal_add_html_head_link(array(
    'href' => '/foo?bar=baz&baz=false"><script>alert(1);</script>',
    'rel' => 'alternate',
  ), TRUE);

...is rendered as:

<link href="/foo?bar=baz&amp;baz=false&quot;&gt;&lt;script&gt;alert(1);&lt;/script&gt;" rel="alternate" />

FWIW it's not escaped at all in the HTTP response header but that's what we expect:

X-Generator: Drupal 7 (http://drupal.org)
Link: </foo?bar=baz&baz=false"><script>alert(1);</script>>; rel="alternate"

We obviously want to be very careful removing a check_plain(), however it doesn't look like it's introducing a vulnerability here AFAICS.

I'm not convinced that the ampersands in the href attribute should be encoded though?

All of which is making me think it'd be best to split that change out to another issue which verifies it's okay to remove the check_plain() and looks at whether we can fix the encoding in general, as I think it looks broken.

Here's a patch which retains the check_plain (for now) and tweaks the test so that it should still pass (because there's nothing in the attributes which will be encoded).

Note that the encoding by check_plain only breaks the test for the Header, as we're not looking at the HTML in detail, just counting the elements.

In other words, the encoding's as broken after this patch as before, but the patch fixes the issue with multiple variations based on hreflang and the new test should pass.

mcdruid’s picture

FWIW looks like the value of the html atttribute is passed through check_plain() in drupal_attributes() anyway:

https://git.drupalcode.org/project/drupal/-/blob/7.82/includes/common.in...

function drupal_attributes(array $attributes = array()) {
  foreach ($attributes as $attribute => &$data) {
    $data = implode(' ', (array) $data);
    $data = $attribute . '="' . check_plain($data) . '"';
  }
  return $attributes ? ' ' . implode(' ', $attributes) : '';
}

The call stack looks like:

common.inc:2472, drupal_attributes()
theme.inc:2319, theme_html_tag()
theme.inc:1161, theme()
common.inc:6144, drupal_render()
common.inc:6151, drupal_render()
common.inc:355, drupal_get_html_head()
theme.inc:2768, template_process_html()
theme.inc:1125, theme()
common.inc:6159, drupal_render()
common.inc:6006, drupal_render_page()
common.inc:2795, drupal_deliver_html_page()
common.inc:2655, drupal_deliver_page()

That explains why the attribute's still being encoded when the check_plain() is removed by the patch in #26

I think on that basis, we could forget #29 and go back to looking at #26 - in other words I'm marking #26 RTBC again.

Looking at whether the link element's attributes should be encoded etc.. could be a separate issue. I'd think we want to sanitise the value but not encode e.g. ampersands... unless I'm missing something.

mcdruid’s picture

Issue tags: +Needs change record, +Pending Drupal 7 commit

This will need a CR.

mfb’s picture

The encoding is correct - ampersands should always be HTML-encoded in HTML. Of course, browsers are lenient about this since most folks writing HTML manually don't know about this or forget. The problem is just when you HTML-encode something outside of the HTML context.

mfb’s picture

Issue tags: -Needs change record
mcdruid’s picture

Thanks @mfb - looks like you're right about the encoding of ampersands in the attributes.. if I did know that, I'd forgotten.

Draft CR looks good.

One thought; if we're removing a check_plain() it'd be good to add an assertion to the test to ensure that the encoding / sanitisation is applied elsewhere.

Fabianx’s picture

+++ b/includes/common.inc
@@ -2965,12 +2965,12 @@ function drupal_add_html_head_link($attributes, $header = FALSE) {
-  drupal_add_html_head($element, 'drupal_add_html_head_link:' . $attributes['rel'] . ':' . $href);
+  drupal_add_html_head($element, 'drupal_add_html_head_link:' . $attributes['rel'] . ':' . (isset($attributes['hreflang']) ? "{$attributes['hreflang']}:" : '') . $href);

It's a nit, but would prefer:

$hreflang = isset($attributes['hreflang']) ? "{$attributes['hreflang']}:" : '';

$string = return <<<EOF
drupal_add_html_head_link:{$attributes['rel']}:{$hreflang}:{$href}
EOF;

OR shorter:

$hreflang = isset($attributes['hreflang']) ? "{$attributes['hreflang']}:" : '';

$string = "drupal_add_html_head_link:{$attributes['rel']}:{$hreflang}:{$href}";

----

Overall RTBC + 1, but I also would love a test to ensure we don't regress here with the HTML attributes.

Removing check plains makes me nervous as well.

Fabianx’s picture

RTBC for #29 actually, because removing check_plain is neither necessary for this issue nor do we need to add a test for it right now.

It is okay to fix this in it's own issue, but this should not happen as a side effect of a feature request.

So RTBC #29 is fine plus follow-up to remove the check_plain (and need to check D9 as well of if it needs to be fixed there first)

Thanks for everyone that has worked on this issue!

It's great.

When we follow the proper processes we ensure we can get code in faster (less discussions) and that we don't accidentally introduce a security issue in an obscure edge case. We also avoid accidentally diverting from D9 behavior.

Fabianx’s picture

mfb’s picture

  • mcdruid committed a45aec7 on 7.x
    Issue #2959727 by mfb, mcdruid, ejanus, msankhala, Fabianx:...
mcdruid’s picture

Status: Reviewed & tested by the community » Fixed
Related issues: +#3247935: drupal_add_html_head_link(): URL in link HTTP header should not be HTML-encoded

Committed #29 - thank you!

Now that the new tests are committed, we could add to them with an assertion or two for the encoding of the attribute(s) in #3247935: drupal_add_html_head_link(): URL in link HTTP header should not be HTML-encoded.

mcdruid’s picture

Issue tags: -Pending Drupal 7 commit

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.