Problem/Motivation
Per i18n, hreflang tags are allowed to have more then one translation point to the same link. However, using the core (includes/common.inc) drupal_add_html_head_link() function, this is not possible. Currently, drupal_add_html_head_link() function considers the same rel and href values to be duplicates and removes all but one, breaking attempts to add multiple hreflang links pointing to the same href.
See Google write-up of this i18n consideration for hreflang uniqueness: https://www.rebelytics.com/multiple-hreflang-tags-one-url/
Proposed resolution
Add hreflang value as part of the uniqueness check (if hreflang attribute is provided). Rather then just checking rel / href, a unique link tag would consider the 3 attribute values of rel / href / hreflang as a unique link.
API changes
Modify the drupal_add_html_head_link() function in includes/common.inc to included hreflang as part of it's check for a unique link tag in order to prevent removal of duplicate link tags that have different hreflang attribute values but the same rel/href attributes.
Comments
Comment #2
ejanus CreditAttribution: ejanus as a volunteer commentedComment #3
joseph.olstadHmm, yes makes sense to me. Nice work catching this. You must be spending some time doing SEO (search engine optimisation). This is a valuable patch.
Comment #4
ejanus CreditAttribution: ejanus as a volunteer commentedComment #5
sjerdoI suggest adding the hreflang parameter to the functions docblock:
Comment #6
msankhala CreditAttribution: msankhala as a volunteer and at Srijan | A Material+ Company commentedThanks, @ejanus for the patch and this improvement. Here is updated patch with updated doc block as per #5.
Comment #7
surbz CreditAttribution: surbz at Srijan | A Material+ Company commentedThanks @ejanus and @msankhala for the patch.
I can confirm patch #6 applies cleanly on 7.x branch.
The code looks fine and the tests are also passing.
Since #6 contains everything from #2 we would only be needing #6.
Marking --links-2959727.patch as hidden to avoid confusions for anyone who drops in here looking for the fix.
Comment #8
MustangGB CreditAttribution: MustangGB commentedComment #9
joseph.olstad+1; All my clients use i18n on all of their Drupal sites. Would be helpful to have this core patch.
Comment #10
prashantgajare CreditAttribution: prashantgajare as a volunteer and at TATA Consultancy Services commentedLGTM ... Patch works well!
Comment #11
ejanus CreditAttribution: ejanus as a volunteer commentedUpdated target to 7.61 due to core security patch taking 7.60 release.
Comment #12
joseph.olstadComment #13
joseph.olstadComment #14
ejanus CreditAttribution: ejanus as a volunteer commentedComment #15
joseph.olstadComment #16
joseph.olstadComment #17
MustangGB CreditAttribution: MustangGB commentedComment #18
MustangGB CreditAttribution: MustangGB commentedComment #19
akhilavnairHi,
It will be more useful if you merge this core patch to the latest drupal 7 version.
Comment #20
mcdruidLooks good but I'd definitely want to see tests for a change like this.
Comment #21
mfbThe Drupal 9 counterpart for this bug fix is finally in - #2945033: HtmlHeadLink processing does not allow for duplicated alternate hreflang links - so hopefully we could add a test and land this soon.. :)
Comment #22
mfbOk, added a test. I couldn't actually find any preexisting tests for drupal_add_html_head_link() - maybe I missed it though?
I tweaked the order of attributes in the element name to match that in Drupal 9; also tweaked the docblock as hreflang isn't a required attribute.
Comment #24
mfbHrm the test-only patch was intended to fail; restoring status.
Comment #25
mfbI found an unrelated issue with this code, which is that the Link HTTP header URL should not be HTML-encoded, but is for some reason, by running it thru check_plain(). The tests I added here actually depend on the buggy behavior :p
Comment #26
mfbLet's see if tests pass if we also fix the HTML-encoding bug..
Comment #27
joseph.olstad@mfb,
nicely done!
Comment #28
mfbok great! If we need to separate these two bug fixes into two issues we can..
Comment #29
mcdruidIt's great to see a (really decent) test added with this, thanks!
The removal of
check_plain()
is a bit tricky though.AFAICS the code we're modifying was committed aeons ago in #552478: Restrict "self-closing" tags to only void elements in drupal_pre_render_html_tag and there's some discussion in that issue about
check_plain()
.It seems true that value of the HTTP header should not be encoded, so that makes sense.
Doing a quick test with a malicious href attribute suggests that the (rendered) value is encoded without the check_plain in
drupal_add_html_head_link()
- for example:...is rendered as:
FWIW it's not escaped at all in the HTTP response header but that's what we expect:
We obviously want to be very careful removing a
check_plain()
, however it doesn't look like it's introducing a vulnerability here AFAICS.I'm not convinced that the ampersands in the href attribute should be encoded though?
All of which is making me think it'd be best to split that change out to another issue which verifies it's okay to remove the check_plain() and looks at whether we can fix the encoding in general, as I think it looks broken.
Here's a patch which retains the check_plain (for now) and tweaks the test so that it should still pass (because there's nothing in the attributes which will be encoded).
Note that the encoding by check_plain only breaks the test for the Header, as we're not looking at the HTML in detail, just counting the elements.
In other words, the encoding's as broken after this patch as before, but the patch fixes the issue with multiple variations based on hreflang and the new test should pass.
Comment #30
mcdruidFWIW looks like the value of the html atttribute is passed through check_plain() in drupal_attributes() anyway:
https://git.drupalcode.org/project/drupal/-/blob/7.82/includes/common.in...
The call stack looks like:
That explains why the attribute's still being encoded when the check_plain() is removed by the patch in #26
I think on that basis, we could forget #29 and go back to looking at #26 - in other words I'm marking #26 RTBC again.
Looking at whether the link element's attributes should be encoded etc.. could be a separate issue. I'd think we want to sanitise the value but not encode e.g. ampersands... unless I'm missing something.
Comment #31
mcdruidThis will need a CR.
Comment #32
mfbThe encoding is correct - ampersands should always be HTML-encoded in HTML. Of course, browsers are lenient about this since most folks writing HTML manually don't know about this or forget. The problem is just when you HTML-encode something outside of the HTML context.
Comment #33
mfbDraft change record @ https://www.drupal.org/node/3247944
Comment #34
mcdruidThanks @mfb - looks like you're right about the encoding of ampersands in the attributes.. if I did know that, I'd forgotten.
Draft CR looks good.
One thought; if we're removing a
check_plain()
it'd be good to add an assertion to the test to ensure that the encoding / sanitisation is applied elsewhere.Comment #35
Fabianx CreditAttribution: Fabianx as a volunteer and at Tag1 Consulting commentedIt's a nit, but would prefer:
OR shorter:
----
Overall RTBC + 1, but I also would love a test to ensure we don't regress here with the HTML attributes.
Removing check plains makes me nervous as well.
Comment #36
Fabianx CreditAttribution: Fabianx as a volunteer and at Tag1 Consulting commentedRTBC for #29 actually, because removing check_plain is neither necessary for this issue nor do we need to add a test for it right now.
It is okay to fix this in it's own issue, but this should not happen as a side effect of a feature request.
So RTBC #29 is fine plus follow-up to remove the check_plain (and need to check D9 as well of if it needs to be fixed there first)
Thanks for everyone that has worked on this issue!
It's great.
When we follow the proper processes we ensure we can get code in faster (less discussions) and that we don't accidentally introduce a security issue in an obscure edge case. We also avoid accidentally diverting from D9 behavior.
Comment #37
Fabianx CreditAttribution: Fabianx as a volunteer and at Tag1 Consulting commentedComment #38
mfbHere are the issues for the HTML-encoded HTTP header bug:
D9: #3245895: Link HTTP header should not be HTML-encoded
D7: #3247935: drupal_add_html_head_link(): URL in link HTTP header should not be HTML-encoded
Comment #40
mcdruidCommitted #29 - thank you!
Now that the new tests are committed, we could add to them with an assertion or two for the encoding of the attribute(s) in #3247935: drupal_add_html_head_link(): URL in link HTTP header should not be HTML-encoded.
Comment #41
mcdruid