The canonical link tag is now available to avoid duplicate content.

There is an unofficial patched version released which you can find here.

I took this patched version out for a spin to test it out. Think it would be nice to implement this in a future release? You should be able to assign one alias as the preferred url though (in case of multiple aliasses to 1 node).

CommentFileSizeAuthor
#4 canonical.patch1.85 KBRobLoach
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

nicholasThompson’s picture

It's going to be in a future release and I've been speaking with Sotak himself about this :)

We're Already On it :)

RobLoach’s picture

More information about canonical links is available from the Google Webmaster Central Blog: Specify your canonical.

I'm not sure how I feel about the unofficial patched version. It cancels the 301 redirect, and just sticks in the canonical link instead. Shouldn't it do both?

nicholasThompson’s picture

Couldn't agree more. 301's to tidy up the stuff like excess slashes or system URL when aliased exists... And then canonical on the page which clarifies whether or not things like query string should be taken into account.

One issue we will need to look into is pages with pagers on them.. Especially panels with multiple pagers!!

RobLoach’s picture

Status: Active » Needs review
FileSize
1.85 KB
no2e’s picture

subscribed

OnkelTem’s picture

Please please please... patch for 5.x :)

JStarcher’s picture

Rob, what are those patches?

OnkelTem’s picture

Dave Reid’s picture

Interesting new feature. Subscribing!

nicholasThompson’s picture

RobLoach - in your patch in #4, what happens if you're on (for example) a paged taxonomy term page? so you'd have somthing like:
http://example.com/taxonomy/term/123
http://example.com/taxonomy/term/123?page=1

Does the url() function also pick-up the arguments? What about other irrelevant arguments... eg...
http://example.com/taxonomy/term/123?page=1&flibble=blooblah

How do we know "flibble" is irrelevant? Is it?

Also.... multilingual sites... Does it work in this situation?

rickvug’s picture

Subscribing. Also, wanted to note that this patch got a shout out in our blog at http://imagexmedia.com/blog/2009/2/what-canonical-url-module-drupal-help....

askibinski’s picture

@#10
the unofficial patch doesn't work in multilingual sites. Maybe the patch in #4 does, but I haven't tested that one.

@#2
No it shouldn't do both. Canonical is simply another way to tell search engines about duplicate content. And there could be circumstances where you would want to have two pages with the same content but slightly different header for example.

gthing’s picture

Version: 6.x-1.x-dev » 5.x-1.x-dev

Second for 5.x version!

bejam’s picture

subscribing

jwuk’s picture

Interesting. Sub'ing. Thanks for acting on this so quickly.

open-keywords’s picture

I would suggest a strategy:
When we clearly know which URL is the canonical URL for the given request, we insert the tag in the HEAD with the proper canonical URL. Fine

When we are not sure of the very best canonical URL, we do NOT insert this LINK REL tag in the HEAD, and let the crawler do as it used to deal with these until now : trying to figure it out by himslef.

This way we don't take the risk of misleading the crawler, but for every URL without a risk, we can already help the crawler.

What do you think ?

Thanks for your attention

open-keywords’s picture

some further details about when and how to use this recommendation (google crawling official blog)

http://googlewebmastercentral.blogspot.com/2009/02/canonical-link-elemen...

open-keywords’s picture

We should take care of using the exact same URL for a node as in the one being used in the XML / Google sitemap module.
It is key that the 2 systems recommend the same URL for the same content/object/node.

open-keywords’s picture

See http://drupal.org/node/389380 for tracking consistency issues with XMLSITEMAP

natrio’s picture

Subscribing...any new updates on the patch?

fiLi’s picture

Subscribing. Interested in this.

dkruglyak’s picture

Version: 5.x-1.x-dev » 6.x-1.x-dev

I think the patch could be committed even in the current form. Given that writing out a canonical tag is merely an option it would do not harm.

FYI, a similar feature has been commited to Meta Tags module: http://drupal.org/node/374049

We should consider if there is any conflict / problem setting the tag in two places.

sedmi’s picture

Is there valid patch for drupal 6?

I applied patch #4 and it didn't add field where canonical url can be added. It takes url specified under url alias and places it in meta canonical. So the result is (if you use global redirect for url aliases) you have page specifying its own url as canonical.

EvanDonovan’s picture

Subscribing. I tested the patch in #4 and it is working great for me. I think this could be committed to the module "as is". It works for stripping off query strings, etc., but it wouldn't work for the use cases that the Canonical module or the Nodewords feature cover (where you specify the canonical tag on a page-by-page basis).

That's fine with me though. I don't need or want that kind of granular control. If I wanted it at all, I'd like it as an override, having an interface similar to URL aliases or URL redirects module. But that should be a follow up patch, I think, after this gets committed.

RobLoach’s picture

Status: Needs review » Closed (works as designed)

Let's move this discussion over to the issue queue in http://drupal.org/project/canonical_url and improve that instead of having druplication going on here.

EvanDonovan’s picture

Status: Closed (works as designed) » Active

That sounds good. But does that module insert the tag automatically, or do you have to add it for each node?

(Feel free to set back to "by design" later. I just wanted to make sure this reply would show up in people's issue tracker first.)

Dave Reid’s picture

Well, since that module is now deprecated in favor of nodewords, I think we should consider adding this feature as a 'lightweight' alternative to nodewords. It's a pretty heavy module package.

EvanDonovan’s picture

I think that would be a great idea. Nodewords has way more than most people might want.

not_Dries_Buytaert’s picture

Global Redirect (isn't depreciated and) redirects similar links to one and the same (canonical) link.
Running another module for just such a basic feature, costs a lot of initial and repeated effort (downloading, installing, configuring and testing) and negatively effects response times too. So, please DO add the feature to 'Global redirect' module.

nicholasThompson’s picture

Status: Active » Patch (to be ported)

The patch from #5 has been applied to 6.x-1.x-dev.

Needs porting to 7.x and 5.x

nicholasThompson’s picture

Actually.. Needs removing for 7.x... Looks like D7 already does it!
http://api.drupal.org/api/function/node_page_view/7

nicholasThompson’s picture

Version: 6.x-1.x-dev » 7.x-1.x-dev
Status: Patch (to be ported) » Needs work

Actually, only Node Pages do it... So I've left the code in and a check is needed.

bdunwood’s picture

Current behavior with D7 is that if you turn on the canonical tag setting in Global Redirect you get two canonical
tags in the head of node pages and one of these tags in other pages (supplied by Global Redirect). I guess this is the check that still needs to be implemented.

Separately, the canonical tag from D7 core is site root relative, while the one from Global Redirect is absolute. Either are acceptable, from what I read, but it seems odd that Global Redirect is not consistent with D7 core. It might make sense to bring Global Redirect inline with core, or vice versa.

Bence’s picture

Current behavior with D7 is that if you turn on the canonical tag setting in Global Redirect you get two canonical tags in the head of node pages and one of these tags in other pages (supplied by Global Redirect). I guess this is the check that still needs to be implemented.

Separately, the canonical tag from D7 core is site root relative, while the one from Global Redirect is absolute. Either are acceptable, from what I read, but it seems odd that Global Redirect is not consistent with D7 core. It might make sense to bring Global Redirect inline with core, or vice versa.

I can confirm this! Drupal 7 core generates canonical tags for nodes, see: http://api.drupal.org/api/drupal/modules--node--node.module/function/nod...

But only for nodes. So Global Redirect 1) must not generate canonical tags for nodes 2) it must generates canonical tags only for non-node URLs 3) the module should use the relative path in the canonical tag (like /node/2, NOT http://example.com/node/2), because this is the standard used in Drupal core.

And the canonical tag is missing from the front page! And even on the ?page=x URLs.

Bence’s picture

Another problem with the canonical tag: it points to the wrong URL, when there is a parameter in the URL, like this:

http://example.com/node/23?gclid=CJyDgIrL0o8CFR0oTAodTR6FCg

In this case, the canonical tag must look like this:

<link rel="canonical" href="http://example.com/node/23" />

But currently the canonical tag has the parameter, which is wrong:

<link rel="canonical" href="http://example.com/node/23?gclid=CJyDgIrL0o8CFR0oTAodTR6FCg" />

The gclid is the Google Adwords tracking parameter, so this is a real world example.

However, be careful, because Drupal uses the page parameter on paginated content, like the front page:

http://example.com/?page=3

In this case, the canonical tag must point to http://example.com/?page=3, not to http://example.com/.

EvanDonovan’s picture

@Bence: I think there are more complexities to this, though, since there are lots of cases potentially where a query string indicates a different set of content. Faceted Search and core Search are just two examples.

Bence’s picture

Search URLs are blocked via the default robots.txt file in Drupal 7. (So maybe the module should check whether the URL is blocked from search engines, or not?) Any other examples where a query string is important? I only know of ?page=x style parameters which are different content.

bdunwood’s picture

Just a little nudge on this issue. Right now if you use Bence's example, a node-base page can have two conflicting canonical tags, like this:

<link rel="canonical" href="/content/simple-article-test-1" />
<meta name="Generator" content="Drupal 7 (http://drupal.org)" />
<link rel="canonical" href="http://d7/content/simple-article-test-1?gclid=CJyDgIrL0o8CFR0oTAodTR6FCg" />

It seems like it might be a pretty high priority to resolve this conflicting tag issue. For the moment I don't think anyone would want to run Global Redirect's canonical functionality in a production website.

silkogelman’s picture

@Bence
Internal links like the /node/1#bottom (/node/1 canonical), but that seems to be working fine.
?page=x style parameters are definitely important.
a common example would be tagging url's for Google Analytics tracking:
http://example.com/node/1?utm_source=drupal&utm_medium=cpc&utm_content=2...

URL's can be generated here:
http://www.google.com/support/analyticshelp/bin/answer.py?answer=1033867

wizonesolutions’s picture

Status: Needs work » Postponed (maintainer needs more info)

Doesn't the Meta Tags module take care of canonical URLs now? Do we still need this functionality in Global Redirect?

bburg’s picture

Issue summary: View changes

Just an observation on canonical url tags with the Front page redirect handler option in Global Redirect (maybe this should be a separate issue?). If the front page at the site root provides a canonical tag for /home, but global redirect (or any redirection for that matter), sends a 301 to /, then this leads to a confusing discrepancy as the intent of a canonical tag is to convey the preferred url of the homepage. This is mainly an issue to bot traffic, which will attempt to request the homepage defined by the canonical url.