1. You may run a site with two languages e.g. German/English with default German.
2. Now create a language neutral node.
3. German node, Canonical URL is de/content/foo.
4. Switch to English, Canonical URL is en/content/foo. BUG - this is duplicate content - Canonical URL must be [default site language]/content/foo e.g. de/content/foo.

Comments

drnugent’s picture

Is this specific to Meta tags? This is the way canonical URLs are generated by Drupal core.

http://api.drupal.org/api/drupal/modules%21node%21node.module/function/n...

You can override the behavior:

http://drupal.org/node/1068562

hass’s picture

I don't know, I only know - it is completly wrong as it does not solve the duplicate content issues it is made for.

drnugent’s picture

True, but that doesn't have much to do with this module. The canonical tag is generated the same way whether this module is enabled, or not.

colan’s picture

Status:Active» Closed (works as designed)
hass’s picture

Status:Closed (works as designed)» Active

Bug is not fixed.

willieseabrook’s picture

I'm currently launching a multi country site, and the issue is more complicated than just canonical.

Google actually understands a whole bunch of different things for different types of multilingual and multiregional websites.

See: http://googlewebmastercentral.blogspot.fr/2011/12/new-markup-for-multili...

Also
http://support.google.com/webmasters/bin/answer.py?hl=en&answer=139394
http://googlewebmastercentral.blogspot.fr/2010/09/unifying-content-under...
http://support.google.com/webmasters/bin/answer.py?hl=en&answer=182192 <--- Bottom of page says canonical isn't for multilingual

DamienMcKenna’s picture

Version:7.x-1.0-alpha5» 7.x-1.x-dev
DamienMcKenna’s picture

Status:Active» Postponed (maintainer needs more info)

Metatag now works with Entity Translation, please review the current functionality and let me know if the problems persist.

Kristen Pol’s picture

This is still a problem with:

Entity Translation - 7.x-1.0-beta2+17-dev (2013-01-27)
Meta Tags - 7.x-1.0-beta4+17-dev (2012-12-04)

DamienMcKenna’s picture

Please try the latest -dev release. Thanks.

q0rban’s picture

According to Google, we shouldn't be using canonical on multilingual sites. Instead, the attribute should be rel="alternate" with the hreflang set to the language. So, if you have 5 languages, you'd have 5 links, one for each language. I've done that on my site by using the following code:

<?php
/**
 * Implements hook_html_head_alter().
 */
function example_html_head_alter(&$elements) {
 
// Unset the Metatag canonical URL if it exists. See lb.cm/mcQ.
 
unset($elements['metatag_canonical']);

 
// Create a list of alternate urls, one for each language.
 
foreach (language_list() as $langcode => $language) {
   
// Make sure path is absolute and language is set.
   
$options = array('absolute' => TRUE, 'language' => $language);
   
// Generate the URL from the current q.
   
$href = url($_GET['q'], $options);
   
// Create a key in the elements array for this language.
   
$key = "example_rel_link_$langcode";
   
// Add the link using theme_html_tag.
   
$elements[$key] = array(
     
'#type' => 'html_tag',
     
'#tag' => 'link',
     
'#attributes' => array(
       
'rel' => 'alternate',
       
'hreflang' => $langcode,
       
'href' => $href,
      ),
    );
  }
}
?>

More conversation:

q0rban’s picture

Issue summary:View changes

a

Carlos Miranda Levy’s picture

Issue summary:View changes

This issue persists.

DamienMcKenna’s picture

Status:Postponed (maintainer needs more info)» Active
Parent issue:» #2175021: META: Plan for Metatag 7.x-1.0-rc1 release

Lets try and fix this for 1.0-rc1.

DamienMcKenna’s picture

So...

This needs some custom handling as the usual combination of tokens would not suffice.

Is it worth adding a custom submodule for this, or maybe copy the code to metatag_html_head_alter() to be loaded if the Locale module is enabled?

DamienMcKenna’s picture

Crazy idea - should this be added by Entity Translation?

DamienMcKenna’s picture

Question: if there is no translation of a node for the site's default language, should the canonical tag still link to [defaultlanguage]/the/node/alias?

If the canonical tag should be excluded completely, this seems like it should be just handled by hook_metatag_metatags_view_alter() where it would adds the new language tags and removes canonical.

Thoughts?

BTW I'm removing this from the 1.0-rc1 release.

DamienMcKenna’s picture

DamienMcKenna’s picture

I'm taking this off the list for 1.0. Yes, it's really important for multilingual sites, but I need some feedback on comment #16, or a patch that fixes All The Things, before proceeding.

Charles Belov’s picture

I have a concern about this. The two pages would not quite be the same.

de/content/foo would contain <html lang="de">
en/content/foo would contain <html lang="en">

Additionally, any translated strings on the page would be in English, not German, on en/content/foo.

Since the page is language-neutral, one of these language markings would be wrong, presumably the one for en/content/foo for a default German site. But there is no guarantee that a search engine would index the correct language marking.

That is, if Google (or whoever, but at the present time Google gives us the majority of visits) last visited en/content/foo, then de/content/foo would be marked as being in English, so that de/content/foo would not show up as a result in searches restricted to German (wrong) and would show up in searches restricted to English (also wrong).

The solution I implemented [yesterday] was to leave the canonical URL the way it is and for non-default languages to be marked with a meta tag for robots noindex on language-neutral pages.

That is, in the context of the current issue:
de/content/foo has canonical de/content/foo
en/content/foo has canonical en/content/foo AND has robots noindex

(Eventually, I'll change that to noindex, nofollow, but I have to wait until at least Google no longer has any of the pages that were indexed under the wrong languages.)

Additionally, if the user is authenticated (staff), we do the equivalent of redirecting en/content/foo to de/content/foo. That of course would have to be by role if the site allowed the public to have user accounts.

While this is outside the scope of the current issue - but related - #1518224 would not be appropriate for us.

If #1518224 is implemented, it needs to be a setting, not a given. That is, I see it as potentially problematic if:

de/content/foo has canonical de/content/foo
en/content/foo has canonical de/content/foo AND has robots noindex

in that Google (and other search engines) might inadvertently remove de/content/foo from the index due to the robots noindex tag.

DamienMcKenna’s picture

I think we should promote usage of the Alternative hreflang module instead. FYI I've submitted a patch to fix its language selection to use the LANGUAGE_TYPE_CONTENT instead of LANGUAGE_TYPE_INTERFACE.

hass’s picture

That is not the same.

DamienMcKenna’s picture

@hass: What's not the same?

hass’s picture

This issue is about Canonical URL and not hreflang. These are completly different things.