When viewing a node's page in Drupal, the <link rel="canonical" ...> and <link rel="shortlink" ...> tags are included to assist with SEO for pages that have multiple urls. See the node_page_view() function for an example.

When looking at a node via the URL provided by node_symlinks, these tags are not present. It would be helpful to have these pointing to the base URL for the original node. Regardless of whether you view the node via node/1 (or the node's url alias), or whether you view it via node/1/mid/2 (or the symlink's alias) the canonical and shortlink tags should be the same.

Comments

simon georges’s picture

I don't think the node page should have a canonical URL, I think only the symlinks pages should have it. Do you have some kind of Google FAQ explaining the process?

erik.erskine’s picture

Status: Active » Needs review
StatusFileSize
new665 bytes

Patch attached - this changes the nodesymlinks_page() menu callback to add the two links in the same way that node_page_view() does.

erik.erskine’s picture

Simon - thanks for the quick response!

As for an explanation on this, please see https://support.google.com/webmasters/answer/139394?hl=en

Core does provide the canonical url when you view a node. I'm guessing the symlink pages should include the same tag, so that the symlink page (and url) is properly treated as a copy of the original node.

simon georges’s picture

Why is there a difference between the canonical and the shortlink? Shouldn't both be the same?

erik.erskine’s picture

If available, the url alias is used for the canonical link (this is after all the one you want search engines to index), and the unaliased node/1 form for the shortlink. If the node doesn't have a url alias, then both are the same.

simon georges’s picture

I'm wondering if Google (or others search engine bots) will follow the node/$nid link and count it as a duplicate content of the aliased node, thereforce forcing us to install "global redirect" as well. What do you think?

erik.erskine’s picture

The canonical tag should be enough for the search engine to know that something is duplicate content, and crucially, what the "real" indexable url is.

I don't think search engines will follow the shortlink - it's intended as an alternative to url shorteners. Even if they do I'm not sure it's a problem though - as long as the page that's reached by following the shortlink has a proper canonical url we should be ok. Core alredy exposes node/$nid this way, so we are not doing anything new here.

Having thought about it a bit more I think the shortlink shouldn't take you to the original node. As it's supposed to be a shorter alternative url then it should probably be the unaliased version of the symlink path. I've done a new patch to reflect this.

simon georges’s picture

Status: Needs review » Fixed

You've convinced me ;-)
Committed, thanks!

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

Sethie’s picture

Issue summary: View changes
Status: Closed (fixed) » Active

Sorry for digging up this issue, but wouldn't it be better if the patch also contained code to remove the robots metatag? I don't see the point in having the canonical, shorturl and noindex metatags?

simon georges’s picture

There could be no point, but better safe than sorry, I don't know how every search engin implements canonicals, so...

Sethie’s picture

Thanks for the quick reply!
But maybe these rules are conflicting? I have no idea how search engines index a site, but I can imagine a scenario that if they encounter a noindex, they stop scanning the page thus rendering the canonical and shortlink useless because they do not get scanned/indexed?

Anyway, this blog posts verifies that all 3 major search engines (Google, Bing and Yahoo) use canonicals now:
http://searchengineland.com/canonical-tag-16537

So I think it's ok to remove the noindex because canonicals have completely taken over the goal of using noindex.

  • Commit 44666e8 on 7.x-1.x by Simon Georges:
    Issue #2042953 by Sethie: Don't mix canonical and noindex -> remove...
simon georges’s picture

Status: Active » Fixed

Alright. I Removed "noindex" robots metatag.

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.

mandus.cz’s picture

Does not generate a canonical URL, if the Metatag module is installed.

efruin’s picture

I can confirm that #16 is correct. When the Metatag module is installed, it overrides the canonical tag using the token [current-page:url:absolute] which grabs the url of the page in the symlink version - which is not the right thing. I tried going into the Metatags settings to set it up so that it would use a token that generates the absolute url of the original page rather than the symlink page, that token wasn't in the list of available tokens, so I'm guessing it doesn't exist. To fix this issue, one or more tokens would need to be created that allow the url of the original page to be generated and put in the Metatags configuration so the proper url is inserted. I know there are some tokens available to Pathauto for generating stuff for the symlinks, but they don't seem to be exposed in a way that Metatag can grab them. I tried copying the ones from the Pathauto config page, but they were rejected. I'm not much of a coder, so I'm not really sure where to begin on making this happen, but hopefully it would be easy for a more experienced dev to handle.

I though a workaround for this would be to just enter the proper url in the Metatag settings for each page manually, but this didn't work. It still uses the incorrect url for the symlinked page.

ecvandenberg’s picture

By coincidence I run into this very issue. #17 describes the problem correctly. But I don't understand #16. That would let Metatag define the wrong canonical link.

I tried the latest div 16 Apr 2015 and this issue still exists when the module Metatag is used. With Metatag disabled, the canonical link is fine. So this issue is about the combination of NodeSymlink and Metatag.

There is an advanced setting in Metatag called "Output meta tags even if only global settings apply". If you disable this, then there are no metatags generated on the symlink (duplicate) pages. Even the canonical link from the module NodeSymlink is removed.

Obviously the two modules hate each other...

ecvandenberg’s picture