Can you add support for the link tag with rel="canonical"? If you need more info on how it works - http://googlewebmastercentral.blogspot.com/2009/02/specify-your-canonica...

Basically I could see this working by having a textbox that accepts the source URL. If that URL is filled in then the
element is created in the <HEAD> of the node otherwise nothing is created for the node.

Here's an example of how this tag will be beneficial both in Drupal and for site navigation.
example.com/jobs (Corporate Theme)
example.com/jobs/cityname (Corporate Theme)
example.com/cityname (City Theme)
example.com/cityname/jobs (City Theme) <- link rel="canonical" added here to point to example.com/jobs/cityname

Lets you keep the city theme and city secondary navigation/breadcrumbs without suffering from duplicate content penalties

Comments

Open Social’s picture

Check out this Drupal module from Joost de Valk:

http://yoast.com/canonical-url-links/

BradleyT’s picture

I did check it out. All it does is put the tag on pages like example.com/node/31 pointing to example.com/this-is-my-real-url. Which is a good idea but isn't a solution to what I was asking for.

However by looking at his code I see that adding the link to the header can be accomplished in just 1 line of code so I might just write this myself once I get further into my Developing Modules for Drupal 6 book and figure out how to add the textbox to the node/cck creation pages.

wflorian’s picture

+subscribe

no2e’s picture

subscribed

natrio’s picture

I add a quick line to node.tpl.php using the code below:

	<?php drupal_set_html_head('<link rel="canonical" href="'.url('node/' . $node->nid, array('absolute' => TRUE)).'" />') ?>

Although it seems to work, at least only for nodes, is there any side effect using this code to generate canonical link url?

BradleyT’s picture

natrio,

That is basically what Joost's module does. If Google happens to see example.com/node/215 then your tag tells it the content is really from example.com/the-best-page-on-my-site which is good and solves some problems a lot of website have.

However there's some situations where both /node/215 and /the-best-page-on-my-site need to point the canonical link to /tribute.

JeremyL’s picture

I agree, this is a tag that goes in the head and should be supported by the meta tags module. It should also be editable so we can set it to a custom location, not just what the path alias says it should be.

apaderno’s picture

Version: master » 6.x-1.x-dev

I am going to implement this in the version 6.x-1.x-dev.

apaderno’s picture

Title: LINK Tag Rel="canonical" » Add the support for <link rel="canonical">
Status: Active » Fixed

This has been implemented in version 6.x-1.x-dev.

sedmi’s picture

I moved to 6.x-1.x-dev to be able to use canonical url, but I don't like the way it works. It use canonical url by default, and if it is not specified while editing node, it says canonical - frontpage. And if I don't like that, I should set it on every single existing page and specify the url of the page which I'm editing as canonical (on page123 canonical should be: page123 to cancel default canonical: homepage).

The good behavior would be:
- when canonical url is specified, it is outputted in head
- if canonical is not specified, canonical meta tag is not present in the output

apaderno’s picture

Status: Fixed » Active

I guess that is caused by the fact the empty string is passed to url(), which returns the absolute URL of the front page.
That is not the desired thing, as an empty string should be interpreted as no canonical URL; there is always the way to mean the front page by using <front> (or something similar that is correctly interpreted by the function).

asak’s picture

+1

BradleyT’s picture

Issue tags: +canonical

I said this in the original request -

Basically I could see this working by having a textbox that accepts the source URL. If that URL is filled in then the element is created in the of the node otherwise nothing is created for the node.

I think if nothing is entered then the default behavior is to point to itself. When I posted this request I didn't know if pointing to itself would hurt a page but it's been written that it does not.

So if I create example.com/new-node and don't enter anything here's what we end up with -

example.com/new-node
example.com/new-node/
example.com/node/315
All point to -> example.com/new-node

apaderno’s picture

If the canonical URL is equal to the actual URL, there is no reason to add the canonical meta tag.

The problem in the code is that it passed the empty string to the url() function, which returns the URL for the front page; that is not desired, and it must be changed. When I get the possibility to commit the code I changed, the problem will be fixed.

apaderno’s picture

Status: Active » Fixed

I committed the changed code; if the canonical URL is an empty string, it will not passed to url(), which then will not return the front page URL.

I think the feature request can be set to fixed, now. Thanks to everyone for making me notice the code problem.

sedmi’s picture

Status: Fixed » Active

I tried it now. There is nothing to select under "Tags to show on edit form" and under "Tags to output in html head".

apaderno’s picture

Status: Active » Fixed

Actually, there is nothing under 'Global meta tags' too. Something got broken in the last commit I have done, it seems.
I must verify the last commit, as I did it from a different PC, and this could have caused some problems.

Anyway, this is a different issue. I will open a new bug report.

apaderno’s picture

See #497580: Settings page doesn't list any meta tag; the problem was caused by using a wrong name for the hook function that populates the meta tags list. I fixed the code, and committed the code in CVS. You have to wait until 12:00 PM GMT, before to get the new tarball archive.

Thanks for your report.

dkruglyak’s picture

Status: Fixed » Active

Thanks for fixing the "empty canonical" issue, but I am not sure dropping the tag completely is the best approach.

Why not have default behavior return the page's URL, cleaned up and stripped of junk, like extra parameters (e.g. page, destination, etc)?

In most cases canonical URL can and should be generated automagically.

apaderno’s picture

Status: Active » Postponed (maintainer needs more info)

I am not sure it's possible to use a default value that is valid for all the cases.
What do you think the default canonical URL used for nodes, taxonomy terms, views, panels, and the front page should be (one default value for each of the page types)?

dkruglyak’s picture

Status: Postponed (maintainer needs more info) » Needs work

I think we could have a special option to set "canonical default" with several possibilities:

1) Do not output the tag (existing behavior)
2) Simply cut off ALL query parameters, that would solve 90% problems with indexing pages like mydomain.com/my_node_list?page=15
3) Set up more elaborate options per entity type, whatever we figure out they should be. Maybe configurable with regexp or something...

Just adding Option 2 to Option 1, which you already implemented, should be a huge leap forward.

apaderno’s picture

Status: Needs work » Postponed (maintainer needs more info)

There is also the possibility to use the path alias as canonical URL; this could be an option more.

BradleyT’s picture

"There is also the possibility to use the path alias as canonical URL; this could be an option more."
That should be the default behavior IMO. Or if not default then definitely an option.

If you create /new-node think of all the ways that page can be accessed -
/new-node
/new-node/
/node/237
/?q=node/237

All of those need to be pointing to /new-node - that's the whole point of this tag/element.

apaderno’s picture

Title: Add the support for <link rel="canonical"> » Add a default value for the node canonical URL

My idea is to follow this order:

  • Verify if there is a path alias; if there is, use it as default value for the canonical URL.
  • If there isn't a path alias, remove any arguments passed on the URL, and use the resulting URL as canonical URL.

The procedure is followed for all the nodes for which there isn't a canonical URL set by the user; when the user explicitly set the canonical URL, that value will be used instead.

The reported cases are automatically handled by Drupal, which will display the node with ID 237 in all the four cases (supposing that "new-node" is the path alias for the node with ID equal to 237). It's then enough to set the canonical URL for that node.

apaderno’s picture

Status: Postponed (maintainer needs more info) » Fixed

The code to generate a default canonical URL for user profile pages, nodes, taxonomy terms, and the front page has been added.
It's not possible to generate a default canonical URL for other cases because the module doesn't have settings for a generic path.

I am setting this report as fixed. If you need to have a settings page for a generic path, see #236833: Add more settings pages for the global meta tags.

dkruglyak’s picture

Status: Fixed » Needs review

Have you tested this with frontpage? I briefly looked at the code and do not think <front> will resolve to the correct path.

Confirmed by running dpm(drupal_get_path_alias('<front>'));.

apaderno’s picture

Status: Needs review » Fixed

The code does not perform a drupal_get_path_alias('<front>'). The executed code is the following:

      // ...
      case 'page':
        if (count($ids) == 1 && $ids[0] == '') {
          $value = '<front>';
        }
        break;
      // ...

The code does not change $path, in that case, but $value, which is the value used for the meta tags, and that is passed to url() (see the following code).

function nodewords_canonical_alter(&$tags, $type, $ids, $settings) {
  if (isset($tags['canonical'])) {
    $canonical_url = $tags['canonical'];
    
    if ($canonical_url[0] == '/') {
      $canonical_url = drupal_substr($canonical_url, 1);
    }
    
    $tags['canonical'] = !empty($canonical_url) ? check_url(url($canonical_url, array('absolute' => TRUE))) : '';
  }
}

In some Drupal installations, url('<front>') could return a wrong URL for the front page (or a value that could not be desired to use for that purpose); in that case, the administrator user can always set a different value, and the module will not overwrite the value set by the user. in fact, the first code I reported is executed only if there isn't a value already set for the meta tag.

function nodewords_canonical_prepare($type, $ids, $value, $settings) {
  if ($value == '<none>') {
    $path = '';
    
    switch ($type) {
      case 'node':
        // ...

'<none>' is the value set when the meta tag is not defined; differently, when the user set a empty string for the meta tag, the function will receive an empty string.

apaderno’s picture

Issue tags: -canonical +6.x-1.0

I am adding a tag so I can retrieve the features added after version 6.x-1.0 has been created.

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

DamienMcKenna’s picture

Thanks for adding this feature, you've basically merged in the Canonical_URL module and fixed its deficiencies in one update :-)

apaderno’s picture

Whoops... It seems I have added feature that were present in three different modules. In one case, the feature was added to replace the support for Views, and Panels; in the other two cases, I added the feature because there was a feature request.

DamienMcKenna’s picture

I think it fits better in this module - as it is focused on managing all META and related tags, turning Nodewords into a one-stop-shop for all custom HTML HEAD tags isn't too far fetched to me and ultimately reduces the number of modules needed to run a site.

wflorian’s picture

Will the canonical URL feature be added to the D5 build of this module? I am really looking forword to having this implemented! Would be awesome!