Problem/Motivation

NodeViewController adds all uri relationships it can find as <link>. But according to http://validator.w3.org/, that is not actually valid. Also, those links are printed for all users. Shouldn't we at least check access to those routes/urls?

The situation about what is a valid link relation seems complex see #8 from @Berdir.

What is certain is that having these links present on page for anonymous users is affecting how Google and other service engines index sites. Google will visit each link relation and get many access denieds. This is bad for sites because they are doing unnecessary work to deny access.

Proposed resolution

In NodeViewController if the user is not authenticated (ie anonymous) we should check access. This means we only vary by the user.roles:anonymous cache context. This prevents this change from cause caching issues. And if the links are accessible then Google can index them (as I guess would be expected).

Remaining tasks

User interface changes

None

API changes

Add current_user service to NodeViewController and the user.roles:anonymous cache context to node views.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Berdir’s picture

Issue summary: View changes
Berdir’s picture

Status: Active » Needs review
FileSize
1.59 KB
849 bytes

Wondering if if we have any tests for this.

Terms have the same logic btw, but not any other entity type, not like this. Just doing node for now.

Wim Leers’s picture

Also, those links are printed for all users. Shouldn't we at least check access to those routes/urls?

Those URLs can easily be determined by malicious users with 99% reliability by looking at the Drupal 8 source code. There's no sensitive information disclosure if those URLs, when accessed by users with insufficient permissions, merely return a 403.

Berdir’s picture

Yes, I'm not saying it is a security issue. But we don't display an edit link in an operations element if you don't have edit permissions either?

Status: Needs review » Needs work

The last submitted patch, 2: node-links-2406533-1-access.patch, failed testing.

The last submitted patch, 2: node-links-2406533-1-canonical-only.patch, failed testing.

Wim Leers’s picture

#4: oh, okay. I think it's fine to show inaccessible links, because indeed we take great care to only show links to the user that will work. But in this case, the user is not going to be human, it's going to be a bot (or at least other software). I think it's fine to simply send all link relationships; it can figure out by doing HTTP HEAD requests whether those links are accessible or not.

Berdir’s picture

Did a bit of research, there are different resources, the validator itself documents the following two lists as his source:

https://html.spec.whatwg.org/multipage/semantics.html#linkTypes
http://microformats.org/wiki/existing-rel-values#HTML5_link_type_extensions

The second is the one that defines canonical and shortlink and a lot of other crap (quoting @WimLeers). But no edit-form or delete-form. It does have a EditUrl, though. Apparently used by wordpress, which is big enough to make its own standards :p

Wim Leers’s picture

From the W3C validator's output:

You can register link types on the http://microformats.org/wiki/existing-rel-values#HTML5_link_type_extensions yourself.

/facepalm


Then, at http://microformats.org/wiki/existing-rel-values#non_HTML_rel_values:

"There are markup languages other than HTML that also have a rel attribute, often based upon the HTML rel attribute. It is useful to document some of these other languages and their rel values for both reference purposes, and to provide background research for the possible development and re-use of these values in HTML, as http://microformats.org/wiki/poshformats or http://microformats.org/wiki/microformats"

[…]

See http://www.iana.org/assignments/link-relations.html for more.

That then lists, amongst others, the edit-form relationship.


Conclusion: this is a fustercluck. So I'd say: let's add all the relationships Drupal uses, reference the IANA URL, and wait for the W3C parser to pick them up, and call it a day.

olli’s picture

Extended 1-access.patch to node preview and term page.

olli’s picture

Status: Needs work » Needs review
FileSize
3.65 KB
1.25 KB

Fixed the failing test.

Status: Needs review » Needs work

The last submitted patch, 11: 2406533-11-access.patch, failed testing.

Dom.’s picture

Dom.’s picture

@Wim Leers about #9: did anyone did that ?

let's add all the relationships Drupal uses, reference the IANA URL, and wait for the W3C parser to pick them up, and call it a day.

Dom.’s picture

Status: Needs work » Needs review
FileSize
3.63 KB

Patch #11 does not apply. Just changed it to apply. I did not fix anything more.

Status: Needs review » Needs work

The last submitted patch, 15: access--2406533-15.patch, failed testing.

Wim Leers’s picture

#14: not AFAIK. Pinging @Crell.

Crell’s picture

Huh. I didn't even realize there was a link relation validator...

To understand the intent here, see the issue summary of #2113345: Define a mechanism for custom link relationships (which is postponed to 8.1 for beta-restriction reasons).

In short, if we are going to auto-generate relationships for every ER field, which in the long run I think we will, formally registering those with a standards body is way too involved and they'd probably slap us. Instead, use Curies, which namespaces relations in a sane fashion. We just haven't built out the mechanism for that yet, but if someone wants to then see the linked issue. :-)

Access-restricting the links makes some sense, but honestly I'd ignore the W3C validation for the time being. (Unless someone disagrees with the plan above, in which case let's discuss that.)

Cliff’s picture

The plan—including to ignore the W3C validation for the time being—makes perfect sense.

I am not aware of any values of rel that are used by screen readers or other assistive technologies to offer better accessibility. I can think of ways it could be used, but not of any ways that are recognized and implemented widely.

Wim Leers’s picture

So what are then the concrete next steps here?

Crell’s picture

+++ b/core/modules/node/src/Controller/NodePreviewController.php
@@ -32,6 +32,11 @@ public function view(EntityInterface $node_preview, $view_mode_id = 'full', $lan
+      // Only add links the user is allowed to visit.
+      if (!$node_preview->urlInfo($rel)->access()) {
+        continue;
+      }

So really, the only question is whether we want to do this or not. All of the links exist; all will 403 if accessed by a user who shouldn't view them. But do we omit them or not?

I'm not sure which approach I prefer. I put a question to the HAL-Discuss mailing list, so let's see what they recommend:

https://groups.google.com/forum/#!topic/hal-discuss/nxBWSbPItGA

My knee-jerk inclination if there's no consensus is to show ALL THE LINKS, for greater flexibility and easier implementation, but we'll see if there's a consensus on the list.

mirzu’s picture

I would strongly recommend against putting links into an API response that the user isn't able to use. The only reason to put links at all is to make the Developer Experience better for client developers. This would be like simply showing the whole menu tree to an admin and asking them to click around to find what works and what doesn't.

http://en.wikipedia.org/wiki/HATEOS

The wikipedia example shows an example of a bank account. The API shows a user with money in the bank options like deposit, withdrawal and transfer. The user with 0 balance has only the option to deposit.

I'm sure I can dig up other examples, but imagine how frustrating it would be to use if all users always had all links in the admin menu regardless of permissions.

Crell’s picture

mirzu: Well, the implications of that include:

1) We can't rely on HTTP caching for non-anonymous, because we would need to make REST responses user.permissions cache context aware. All requests still will hit Drupal to check that.
2) Both in REST and in HTML Link tags, we'd need to include permission checks. Doable, but more work, and more things for us to potentially get wrong. :-) We'd want to make sure it can be addressed globally.

Berdir’s picture

1) We need to do that anyway. The output is already permission specific, you might not see some fields based on your permissions.

mirzu’s picture

1) The anonymous API user is the same as the anonymous HTML user. We can add edit links to pages if a user is logged in, and omit those links when it's served to an anonymous user. This is the same idea. Anonymous output is cachable, user specific output isn't. The links for the HTML user and the API user analogous here. IMO.

I did a quick poke around a few APIs and can confirm that it's really frustrating when trying to "discover" available resources and 90% of links lead to a 403.

Crell’s picture

Berdir: Hm. That's a very valid point, so destroys my point 1.

Which leaves only point 2, "if we're going to do it, don't do it ad hoc". We want to (in 8.1 if not sooner) auto-adding as many relevant links as we can to both HTML head and HAL links. If we're going to filter for access control, we should make sure that is done in a consistent and clean fashion for links, rather than one-off in each case, if possible. That or confirm that there's only 2-3 places it even matters, so we don't need to worry about it. For the latter to work, we would need to centralize the link-addition out of NodeController and into generic Entity handling, which frankly is something we should have done a long time ago anyway.

Perhaps the one-off (patches above or similar) for now with @todo statements, and open an 8.1 issue to centralize/automate those, which should simplify a lot of code anyway (and is useful in its own right independently of anything we're discussing here)?

Wim Leers’s picture

1. What if the access result depends on more than just permissions? What if it is per user too? Or worse?

Then we'd be destroying cacheability for the sake of a tiny, tiny, tiny fraction of user agents that actually use this information, that's not even visible to human users.

Let's not forget about that.

mirzu’s picture

What is the utility that these links are adding if they lead to a 403? I would say if the permission/access problem is too hard to solve we should simply not add them to the HTML head and deal with the HAL/REST issue separately if possible.

Berdir’s picture

1. That's why the access result has cacheability metadata? :) Any possible complexity could also exist on the field-level, which we already respect as well. Really no difference...

olli’s picture

willzyx’s picture

rerolled #15 (Tests should fail due the cache issues)

IMHO we should at least add an access check. Provide inaccessible URI relationship links doesn't seems to be a semantic improvement and makes little sense for me

willzyx’s picture

Status: Needs work » Needs review

The last submitted patch, 10: 2406533-10-access.patch, failed testing.

Status: Needs review » Needs work

The last submitted patch, 31: access--2406533-31.patch, failed testing.

willzyx’s picture

Wim Leers’s picture

Status: Needs review » Needs work
+++ b/core/modules/node/src/Controller/NodePreviewController.php
@@ -29,6 +29,11 @@ public function view(EntityInterface $node_preview, $view_mode_id = 'full', $lan
+      if ($node_preview->isNew() || !$node_preview->urlInfo($rel)->access()) {

+++ b/core/modules/node/src/Controller/NodeViewController.php
@@ -22,7 +22,13 @@ class NodeViewController extends EntityViewController {
+      if (!$node->urlInfo($rel)->access()) {

+++ b/core/modules/taxonomy/taxonomy.module
@@ -93,6 +93,11 @@ function taxonomy_page_attachments_alter(array &$page) {
+      if (!$term->urlInfo($rel)->access()) {

All of these cause the output to vary based on access, so the cacheability associated with the access result needs to be applied to the render arrays as well.

willzyx’s picture

All of these cause the output to vary based on access, so the cacheability associated with the access result needs to be applied to the render arrays as well.

Silly question.. for doing that, we need to change Url::access() method for return optionally the access result object, since currently currently it only returns boolean, right?

RainbowArray’s picture

As a sidenote, these links are proving extremely difficult to remove if somebody wants to do so either via a module or a theme.

  • Extend the NodeViewController and override the view method, so these are never added as html_head_link items and use that extended ViewController in a custom controller. Nope, they still show up somehow.
  • Try to unset them from html_head_link in preprocess_node within a theme. Nope, they still show up.
  • They don't even show up at all in page_attachments_alter.

I don't know what the benefit is of including these in head, but just note that it's extremely difficult to remove them, and in general my belief is that themers/front-end-developers should be able to have this sort of fine-grained control of the markup Drupal outputs.

swentel’s picture

@mdrummond

Fwiw, I've been able to hide them in entity_view_alter

function epsenkaas_entity_view_alter(array &$build, EntityInterface $entity, EntityViewDisplayInterface $display) {
  if ($entity->getEntityType()->id() == 'node') {
    if (!empty($build['#attached']['html_head_link'])) {
      foreach ($build['#attached']['html_head_link'] as $key => $config) {
        if ($config[0]['rel'] != 'canonical' && $config[0]['rel'] != 'shortlink') {
          unset($build['#attached']['html_head_link'][$key]);
        }
      }
    }
  }
}
RainbowArray’s picture

I was finally able to remove them by extending EntityViewController instead of NodeViewController, but I'll try entity_view_alter, which seems cleaner.

Thanks for the help!

This does feel like one of the darker corners of Drupal. This could be really improved from a TX perspective.

flocondetoile’s picture

And for page taxonomy term you can remove them using hook_page_attachments_alter().

mgifford’s picture

Issue tags: +W3C validation
RainbowArray’s picture

What's most confusing is that hook_page_attachments_alter doesn't work for the links added from node. I would think all the head links would be available from there, but they aren't. I don't know if that's a bug or if I just don't fully understand how this works. I would guess others will run into this too.

Crell’s picture

My goal originally was to add all links to all objects in as many places as possible, to build a robust set of hyperlinks automagically in all output formats, including HTML. We didn't quite get that far; IIRC, only nodes do that right now, and use a one-off controller to do so. That is ripe for refactoring for 8.1.

hook_page_attachments_alter(), I believe, is useful only for CSS and JS, not arbitrary links. But I could be wrong about that, as I'm not sure exactly how it ties in...

sachbearbeiter’s picture

Remark: My log is full with entrances by bots, trying to call this links ...
Do i have to place #39 into a module or is the theme (template.php) file fine?

RainbowArray’s picture

As far as I can tell, these links can only be removed or altered in a module. Themes should have full control of any markup Drupal outputs. Something may need to change so that can happen.

Crell’s picture

mdrummond: Disagree. There's no styling impact from link elements in the head. I would probably agree with you for the contents of the body tag, but for the head is all metadata and control data. I don't see any reason for a template file to need to modify a link tag or meta tag, especially one that is not related to CSS.

DamienMcKenna’s picture

@mdrummond: Maybe we can work out how to remove them in Metatag?

@crell: Meta tags are a grey area between configuration and content, but generally they're considered part of the content realm. One of the key wins of D8 has been to put more control into the themer's hands with respect to the entire page's output, so there should be a mechanism to allow these tags to be customized via the theme. Lets consider that as an API flaw rather than a feature and treat is as such. If there is too much work to be done in order to complete the functionality for 8.1 it may be worthwhile simply removing them for now and not reintroducing the feature until it's actually working as intended.

DamienMcKenna’s picture

Status: Needs work » Needs review
FileSize
1.1 KB

This removes the unwanted meta tags. Lets see what the tests say.

Status: Needs review » Needs work

The last submitted patch, 49: drupal-n2406533-49.patch, failed testing.

Crell’s picture

Damien: Removing those links will break a whole crapload of stuff, on a whole crapload of modules. We use those link values to build out the admin UI in many cases.

Referring back to the IS, there's two separate questions here:

1) Having some of those links exposed makes the W3C Validator complain. This is a problem with the W3C validator, not with Drupal. Defining new links is completely legal per modern W3C / hypermedia standards and conventions. We're supposed to be namespacing them, but the Validator appears to just be doing an old whitelist, I imagine. That's its problem, not ours, and thus is won't-fix.

2) Is having some of those URIs exposed for anon users a security issue? I'm not convinced it is. But changing it (as discussed in #26 and following) may have performance/caching implications.

These are NOT visually-relevant links; they're hypermedia metadata. Thus this is NOT an area that themers should be concerned with. Drupal is not designed for themers to have byte-level control from HTML tag to /HTML tag. Inside the Body, sure, but this is all in the HEAD and therefore a separate concern. As I see it, exposing these links more readily *is* working as intended; if anything, we should be generalizing the code that works on nodes now so that it happens to other entities, too, and in non-HTML output formats. This is not an API flaw, it's a deliberate design decision. (And if anyone starts building code that relies on these links, allowing themers to remove them for whatever reason would cause those modules to mysteriously break. Not good.)

flocondetoile’s picture

Is having some of those URIs exposed for anon users a security issue? I'm not convinced it

I agree. But There is too an SEO issue with those URLs. Expose those URLs to bots is not really a best practice for SEO (and it's an euphemism). Bots parse all those urls, and receive 403 errors. One D8 webssite i launch recently was literraly flooded by those crawls.

Remove those URLs for anon users is, in my opinion, necessary.

Wim Leers’s picture

hook_page_attachments_alter(), I believe, is useful only for CSS and JS, not arbitrary links. But I could be wrong about that, as I'm not sure exactly how it ties in...

Anything can be #attached in hook_page_attachments() or hook_page_attachments_alter().

2) Is having some of those URIs exposed for anon users a security issue? I'm not convinced it is.

Agreed. But it pisses off themers/front-end developers a lot: see #2528498: Only emit Quick Edit data- attributes when actually necessary, where this exact annoyance also came up.

Drupal is not designed for themers to have byte-level control from HTML tag to /HTML tag. Inside the Body, sure, but this is all in the HEAD and therefore a separate concern.

That's a fair/solid rule of thumb IMO :) But let's hear what Cottser & co think.

DamienMcKenna’s picture

Seeing as #2528498: Only emit Quick Edit data- attributes when actually necessary has been mentioned, the caching is already taking a hit because of #2493033: Make 'user.permissions' a required cache context, there shouldn't be any reason to not only show them to visitors they're appropriate for?

If there's a problem with the W3 validator can we also get this listed as a "known issue" somewhere (8.0.x release notes maybe) until this issue is resolved?

Wim Leers’s picture

there shouldn't be any reason to not only show them to visitors they're appropriate for?

… except that Quick Edit is always varying by permissions, yet node/entity access can have an arbitrary complexity.

RainbowArray’s picture

Drupal is not designed for themers to have byte-level control from HTML tag to /HTML tag.

Let me be more clear as to the original source of my raising this.

I'm working on the AMP module and AMP theme that allow Drupal to be output as AMP HTML, as per the AMP spec. AMP (Accelerated Mobile Pages) HTML is a subset of HTML to allow for quick-loading mobile pages. Pages must be valid AMP HTML in order to be treated as AMP, and the spec only allows for a few valid options for link elements in head. Some of these link elements do not fit the bill and cause AMP validation to fail.

To get around this, here is in essence what I'm doing with an entity_view_alter hook. (I removed a few bits not relevant for the example.)

    if (!empty($build['#attached']['html_head_link'])) {
      foreach ($build['#attached']['html_head_link'] as $key => $config) {
        if ($config[0]['rel'] != 'canonical' && $config[0]['rel'] != 'shortlink') {
          unset($build['#attached']['html_head_link'][$key]);
        }
      }
    }

This is perfectly possible to do in a module. It's not possible to do in a theme, because the full set of html head links are not available in preprocess, and I have no idea why. Somehow they are being pulled out before preprocess or something like that, and it seems like that is something that should be fixable. In this case, we have a module and theme working together, so that's fine. But for somebody developing a theme on their own that runs into this, they'd be out of luck.

I say this not because I think Drupal 8 should be bent to fit the AMP spec, but because this is one example of why a themer may well need to have control of elements within the head element. It's HTML. That's something that every front-end developer cares about. There's all sorts of things that may need to be monkeyed with in head in order to improve page rendering performance.

Themers and front-end developers have access to change html in head through html.html.twig, and through many preprocess functions. I think it seems rather inexplicable as to why that's not possible with this weird edge case. And indeed I've had several people send me private messages since I posted this asking for suggestions on how to work with these link elements. So I certainly think I'm not alone as a front-end developer/themer that wants access to these elements.

For the record, I'm fine if they're in there by default even if it isn't valid as per the spec. I like it when things are valid if possible. There seems to be a useful need for these. If we can only show them when permissions are available to make use of them, that makes sense to me. I just want to be able to work with these bits when necessary because the front-end world is weird, and sometimes you have to do weird things because reasons.

Crell’s picture

I think the ideal case would be to generalize the code that nodes use now so that they apply to all entities, and then apply normal access checks to them just like we do for menu links, etc. That way anything that is accessible will get shown, and anything that isn't won't, and it's consistent across all entities automatically.

There's probably an argument that a link that leads to a 403 is an indicator that a user agent needs to apply to escalate its privileges (eg, needs to login), but I think that's a sufficiently small edge case that reducing the spidering of 403 pages is a good trade-off.

richmaniac’s picture

Hi mdrummond,

if this works as a module - can't you just realise it as a module? I am not really a developer and these are my first steps with drupal 8 ;-(

DamienMcKenna’s picture

FYI I've opened an issue for Metatag to remove these meta tags: #2684991: Remove edit-form, delete-form, etc link tags

swentel’s picture

FYI - the code in #39 doesn't work when using panelizer - digging.

koddr’s picture

I write working solution — module Unset HTML head link.

Crell’s picture

Are there any actual references that having more LINK elements in the header harms SEO? The module linked in #61 claims that removing them is good for SEO, but I've never seen any actual evidence of that claim. At all. Yes I know Google doesn't share its PageRank algorithm, which means any such claims are total guesswork without actual guidance from Google.

koddr’s picture

@Crell: this is actual problem for Yandex search engine (RU).

Crell’s picture

Source? And is it just that search engine, or are there others that are similarly affected? (Conceptually it seems like it should be the other way around; more metadata is better.)

[Edited to add more questions.]

koddr’s picture

@Crell: hmmm.. no source. This observation of my dev/seo team.

Our story: after Drupal 8.0.0 release, we make test site (with some unique content pages). Yandex indexed this site for 2 weeks. On webmaster.yandex.ru (official statistics for domain in Yandex), I see wrong pages. It was a pages that have been specified in head (edit-form, delete-form etc. <link> tags).

As a result, in SERP, we saw a lot of pages that led to the 403 error. It was regarded (by Yandex) as a search engine spam and can impose sanctions on website.

...more metadata is better.

Source?

Crell’s picture

Version: 8.0.x-dev » 8.1.x-dev

OK, so it's not the presence of the links that seems to be an issue but links pointing at 403s. So we don't need to remove them, just access check them, per #51/#57.

We still need a patch for that, which should resolve the issues to date.

Wim Leers’s picture

#66 sounds like an excellent way to destroy performance & cacheability… Access to those routes could be based on arbitrarily complex access checks. It could be per user. It can be time-based.

DamienMcKenna’s picture

@Wim: Why would adding permissions to those links be any different to what was done in #2528498: Only emit Quick Edit data- attributes when actually necessary?

Crell’s picture

Or any menu link, action link, local task (tab), etc. We already access check a hundred links per page. I don't see how adding 4 more to the list would be a problem, unless it's the fact that they're so "high" in the render hierarchy?

RainbowArray’s picture

I think putting access checks on those links makes sense. I'm not clear on why that would be bad for performance. I will say, if "good performance" requires providing access to things people shouldn't have access to, that seems like a problem.

I still would like to better understand why these particular links aren't available in preprocess. It seems very strange that only modules have the ability to affect these head links. Why aren't they available to preprocess like other items?

I'm not opposed to these being in head if they're access protected, and there's a good reason for them to be there. I do think themes should be able to pull them out if they have reason to do so without having to create a module to do that.

Crell’s picture

mdrummond: I suspect it's because they're added much further down the render array tree, not at the page level, and they bubble up after the alter hook runs. I've never liked that those links are added via a custom controller that has no purpose other than to add those links; I'd much rather we add them for all entities in a more consistent fashion. That may end up making them accessible in the alter, or not. It will depend on the details of how we do that. (I'm not sure off the top of my head where to hook in to do so globally, or I would have done it a long time ago. :-) ) However, I think the intent of that hook is more for adding stuff than removing it, so it was also never a use case that was really considered.

Wim Leers’s picture

Issue tags: +needs profiling

#68: because #2528498: Only emit Quick Edit data- attributes when actually necessary is indeed checking a permission (and hence the output varies by the user.permissions cache context, which causes not a whole lot of variations). This is checking entity access. Entity access can vary by just about anything. Very often it varies by user. Which is terrible for performance.

#69: Same as the above, the vast majority of links on a page vary by permissions.

#70: … I don't disagree with that statement, but we made it so that access checking can easily vary by user (think edit own X permissions). That flexibility comes with a cost.

I suspect that out of the box the cost wouldn't be too bad, because out of the box our entity access checking is fairly simple. But we should do profiling with a case where the access result is A) varying by the user cache context and B) one where the access result is not at all cacheable. I'd love to be wrong. But I think it's my task to call out such statements/directions that completely disregard cacheability. That's the D7/D6/D5 way of doing things, and look where that got us. It's just as bad as using globals everywhere. Cacheability is not only about caching impact, it's also about actually realizing how ridiculously dynamic we make some things, and being aware about the consequences.

Berdir’s picture

Yeah. The main problem with those links and cacheablity is that they are *not* a block or any other contained unit that is cacheable on it's own. It's part of the basic top-level structure.

That means (Wim, correct me if I'm wrong there), that we can *not* use lazy-rendering/placeholdering with dynamic page cache and/or big_pipe, it has to be part of the initial response.

And yes, by default, node access for anyone except users with bypass node access is per-user. because of the mentioned edit own permissions. Dynamic page cache is disabled if the page itself has a user cache context. And combined, that would mean that we have no dynamic page cache anymore on node pages. Which would obviously be bad.

Wim, assuming that is correct, is there any chance we can placeholder those links? I fear that we can't do the access check thing without that...

Wim Leers’s picture

Off of the top of my head, #73 is indeed what I fear. This is why we used to have #post_render_cache, and not "placeholders" and #lazy_builders. Because it was able to also dynamically generate such attachments (e.g. highly conditional CSS/JS, or like here, <link> tags).

But, look at html.html.twig::

<head-placeholder token="{{ placeholder_token|raw }}">

That's where <link> tags eventually end up.

So, what we maybe could do, is modify HtmlResponseAttachmentsProcessor::processHtmlHeadLink() allow attached html_head, html_head_link etc to have an associated access check. Then the access check could run at the very very last moment. But, it's almost 1 AM here, so I may not be thinking all the consequences through.

DamienMcKenna’s picture

Another idea. Would it be possible to simplify this so that the links are only added if the normal permissions would allow it?

The general problem is that these links are being exposed to anonymous visitors, thus search indexes, etc, it isn't really a problem if authenticated users see extra meta tags that show them inaccessible links. On the vast majority of sites the URLs would result in 403 errors for anonymous visitors, therefore they serve no value for such users, they are only of value for authenticated users who have a specific role or fit other access requirements defined via other services, hooks, yadda.

So, could we cheat slightly with the following rules?

  • Never show the links if there's no possible way the visitor's normal permissions would allow them to access those paths.
  • Always show the links if it would be possible the visitor's normal permissions might allow them to access those paths, ignoring other permission checks.

Specific scenarios:

  • 99.9% of sites, anonymous visitors would never have the "Edit any content" or "Edit own content" permissions for a content type: no links
  • Authenticated user with a role that gives them the "Edit own content" permission and they are viewing a node they own: show links
  • Authenticated user with a role that gives them the "Edit own content" permission but they are not viewing a node they own: no links
  • Authenticated user with a role that gives them the "Edit any content" permission but there's an access requirement that blocks them for other reasons: show links

Basically I think that using the system permissions to hiding the links for the 99.9%-accurate use cases would suffice, and then always showing the links for the other people who might have another rule preventing them seeing the links would not be a big deal.

Crell’s picture

Re #75: Hm, that would remove all of the 403s from search engines. I would be OK with that compromise, iff we can figure out a cheap way to determine if a given link is "anon-friendly" or not. Other than hard-coding a list of probably-anon-friendly links by name (or some mutable definition array somewhere, blech) I'm not certain how we can detect that automatically.

RainbowArray’s picture

Got yet another theming question from somebody about how to remove these link elements. This is one of the bigger theming WTFs right now.

It looks like most of the action for how these get processed is happening in HtmlResponseAttachmentsProcessor. Specifically, processHtmlHeadLink takes all the html_head_link elements and then puts them in with the rest of the html_head elements, which then gets placeholdered in html.html.twig as head-placeholder.

So since there is already placeholdering going on, it seems like we should be able to do some access checks. The point of placeholdering is to allow for more expensive operations without holding up the rest of the page, right?

That said, I still don't understand why some of these link elements are available in preprocess and others aren't. I'm not interested in any arguments about "these are things that themers shouldn't care about." If it is HTML that appears on a page, themers are going to care about it. Period.

There's something going on that's preventing these link elements from seemingly being available at any stage of preprocess as far as I can tell, and I can't for the life of me figure out why.

alexpott’s picture

One option for search engines is to add nofollow to the rel. According to Mozilla's docs for link about rel attribute...

This attribute names a relationship of the linked document to the current document. The attribute must be a space-separated list of the link types values.

https://developer.mozilla.org/en-US/docs/Web/HTML/Element/link

catch’s picture

Variation on @mdrummond's idea would be to access check the links for anonymous users but not for auth users. That won't affect cacheability for auth users at all but will hide the links to anonymous users. Should be rare that anonymous access checks are expensive.

Doing a permissions check but not any other kind of access check is interesting and probably more accurate for auth users on most sites, but not sure we actually have an API for that.

alexpott’s picture

Priority: Normal » Major

This is at least a major bug. I'm seeing a lot of traffic from google's indexing services resulting in access denied because of this.

alexpott’s picture

Here's a patch that follows @catch's idea in #79.

alexpott’s picture

Status: Needs work » Needs review

Status: Needs review » Needs work

The last submitted patch, 81: 2406533-81.patch, failed testing.

alexpott’s picture

Status: Needs work » Needs review
FileSize
2.33 KB
6.24 KB

Now with added test goodness and fixes for failed tests.

The last submitted patch, 81: 2406533-81.patch, failed testing.

Berdir’s picture

+++ b/core/modules/node/src/Controller/NodeViewController.php
@@ -11,27 +15,66 @@
+  /**
+   * Creates an EntityViewController object.
+   *

that's a lie :)

constructor changes are always a bit tricky in case someone was overriding this with their own. As a BC meausure, we could make the argument optional and default to \Drupal::currentUser(). Might be easier to get it in as a bugfix into 8.2 and older.

See comment from @larowlan in #306662: Add redirect option to site-wide contact forms.

But unlike in that example, I couldn't find any existing subclasses of this controller. amp.module used to have one, but doesn't anymore.

alexpott’s picture

Thanks for the review @Berdir.

I guess there is no harm of a BC layer. OTOH I thought controller constructors were not API. In general though I think I agree that we should implement \Drupal fallbacks in our constructors - it just makes everyone's life easier.

I've also added a cache context for complete correctness and explained why even though we are doing access checks we are not adding the access cacheability data in.

Status: Needs review » Needs work

The last submitted patch, 87: 2406533-87.patch, failed testing.

alexpott’s picture

Status: Needs work » Needs review
FileSize
1.89 KB
8.69 KB

Nice there's test coverage :)

alexpott’s picture

Crell’s picture

A few minor nits:

  1. +++ b/core/modules/node/src/Controller/NodeViewController.php
    @@ -11,33 +15,83 @@
    +    if ($current_user) {
    +      $this->currentUser = $current_user;
    +    }
    +    else {
    +      $this->currentUser = \Drupal::currentUser();
    +    }
    

    Purely stylistic: This could easily be collapsed to a ternary. Heck, in PHP 7 it would be a coalesce, but... :-)

  2. +++ b/core/modules/node/src/Controller/NodeViewController.php
    @@ -11,33 +15,83 @@
    +        $build['#attached']['html_head_link'][] = array(
    +          array(
    +            'rel' => $rel,
    +            'href' => $url->toString(),
    +          ),
    +          TRUE,
    

    short-arrays, please.

Otherwise I'm good with this approach.

My question, though, is in light of #2113345: Define a mechanism for custom link relationships, it would be wise for us to generalize the node-link-adding code to apply to all entities, like that issue does for REST. Of course, we'd want to do the same access checks there, too. Is there a way that we can extract that code out to a common routine that we can apply to all entities easily, to get the same access check benefits? (That would allow us to eliminate the Node custom controller entirely, I believe.)

alexpott’s picture

I don't think this issue should try to address the task of generalising this. We need to fix the bug and then we can think about the changes we want for 8.3.x. Given what I'm seeing in the logs for real Drupal 8 sites I'm very close to making this a critical bug.

  1. Indeed the ternary looks nice.
  2. +++ b/core/modules/node/src/Controller/NodeViewController.php
    @@ -11,33 +15,83 @@
    -      $build['#attached']['html_head_link'][] = array(
    -        array(
    -          'rel' => $rel,
    ...
             $build['#attached']['html_head_link'][] = array(
               array(
    

    If I convert the new stuff to short arrays then I'd have to convert the entire file and end up with a tonne of out-of-scope changes.

Crell’s picture

Then I think a follow-up to generalize is appropriate. If we add that then I think this is RTBC.

alexpott’s picture

Crell’s picture

Status: Needs review » Reviewed & tested by the community

Then I think we're good here. Thanks, Alex!

xjm’s picture

This issue makes internal API additions in the service of fixing a major bug. That's acceptable given impact of fixing the bug vs. the small chance for disruption for something overriding the NodeViewController, but if we do backport this to 8.1.x we should mention it in the release notes and maybe have a CR?

I do not believe the issue is critical, though.

DamienMcKenna’s picture

Should it be punted to 8.2?

alexpott’s picture

On the criticality or not of this issue. @Berdir has been running his earlier patch to prevent performance issues on his large D8 site and I'm seeing an unbelieveable about of unnecessary traffic from Google on another site. There's no work around and this has a real affect on the load on your server if google decides to index your site. I think we need a new tag of 'berdir-has-this-applied-in-production' because for me real world issues trump some of our more theoretical issues.

alexpott’s picture

sylus’s picture

Thanks for the work everyone, works great and most w3c issues are gone. I did notice that the following still remains though:

Bad value revision for attribute rel on element link: The string revision is not a registered keyword.

<link rel="revision" href="/node/1" />

Shouldn't the above link rel have been affected as well to not appear?

alexpott’s picture

@sylus nope - the anonymous user has access to that. We've not definitively determined what really are valid link relationships - see earlier discussion in the issue.

Status: Reviewed & tested by the community » Needs work

The last submitted patch, 92: 2406533-92.patch, failed testing.

alexpott’s picture

Status: Needs work » Reviewed & tested by the community

Random DrupalCI error before the patch was even tested...

catch’s picture

+++ b/core/modules/node/src/Controller/NodeViewController.php
@@ -11,33 +15,78 @@
+      $url = $node->toUrl($rel);
+      // Add link relationships if the user is authenticated or if the anonymous
+      // user has access. This means that the page will vary by
+      // user.permissions. We also rely on the access checking fallback to
+      // ensure the correct cacheability metadata if we have to check access.
+      if ($this->currentUser->isAuthenticated() || $url->access($this->currentUser)) {

This could use an explanation as to why we do the anon vs. auth trick. That was my suggestion and I don't think core uses it anywhere, I've used that pattern or similar as a last resort on a client project iirc but haven't seen it around much.

Something like:

Access checking must be done for anonymous users to avoid traffic to inaccessible pages from web crawlers. For authenticated users, showing the links in HTML head does not impact user experience or security, since the routes are access checked when visited and only visible via view source. This prevents doing potentially expensive and hard to cache access checks on every request.

alexpott’s picture

That comment looks very clear to me.

dawehner’s picture

+++ b/core/modules/node/src/Controller/NodeViewController.php
@@ -11,33 +15,78 @@
+      if ($this->currentUser->isAuthenticated() || $url->access($this->currentUser)) {

I don't see where $url->access() ads the required cacheable metadata.

dawehner’s picture

  • catch committed 182f031 on 8.3.x
    Issue #2406533 by alexpott, willzyx, olli, Berdir, DamienMcKenna, Dom.,...

  • catch committed 1dc8309 on 8.2.x
    Issue #2406533 by alexpott, willzyx, olli, Berdir, DamienMcKenna, Dom.,...
Berdir’s picture

@dawehner: It doesn't, but since we only do this for anymous users, it doesn't really vary for them anyway? And the user.permissions cache context is added by default.

dawehner’s picture

Fair point

  • catch committed a2c42b9 on 8.1.x
    Issue #2406533 by alexpott, willzyx, olli, Berdir, DamienMcKenna, Dom.,...
catch’s picture

Status: Reviewed & tested by the community » Fixed
Issue tags: +Performance

Committed/pushed to all three 8.x branches. Agreed this is a severe bug, and it's likely causing performance issues on sites that haven't diagnosed it yet.

Wim Leers’s picture

+++ b/core/modules/node/src/Controller/NodeViewController.php
@@ -11,33 +15,78 @@
+      // user has access. This means that the page will vary by
+      // user.permissions. We also rely on the access checking fallback to
...
+    $build['#cache']['contexts'][] = 'user.roles:anonymous';

We document user.permissions, yet we add user.roles:anonymous. We actually need both, for correctness. In practice, user.permissions is added automatically, so we're safe. It's just a nit that should be fixed in a follow-up.

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.

m-p-j’s picture

I've noticed same thing is still happening on /taxonomy/term/ pages.
Running a search it might be happening in taxonomy_page_attachments_alter().

ksi’s picture

And so do I get still that error with a article node page with drupal version 8.3.6 running validator from validator.w3.org:

Error: Bad value revision for attribute rel on element link: The string revision is not a registered keyword.

The line in question ist
<link rel="revision" href="/article/some-headline" />

keszthelyi’s picture

I can confirm both #116 and #117 on Drupal 8.3.7

I think this issue should be opened again.

keszthelyi’s picture

I can confirm both #116 and #117 on Drupal 8.3.7

I think this issue should be re-opened.

plach’s picture

Wim Leers’s picture

Anybody’s picture

Important major follow-up for taxonomy terms and perhaps other entities:
#2821635: edit-form, delete-form etc. <link> tags added on /taxonomy/term/{taxonomy_term} are invalid according to W3C Validator

It would be nice if the people who fixed this bug could have a short look there to decide how to proceed and perhaps port the patch?