Realized that using both xmlsitemap_menu and xmlsitemap_node can result in duplicate items in the sitemap for nodes also having a link in a menu. After a quick search in the issue queue it seems that this bothered a few ppl and using other submodules can cause the same problem:

https://drupal.org/node/596008
https://drupal.org/node/631218
https://drupal.org/node/1999958

My proposed solution would be to add an option to always exclude duplicate entries from the sitemap. I'm waiting for some feedback/opinions before actually starting to work on a patch.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

spidersilk’s picture

I very much second this request! There is no valid reason I can think of for allowing duplicate entries in an XML stylesheet. The module should check before adding any URL to the sitemap to make sure it isn't a duplicate of one that's already there.

Also, I noticed in trying to investigate the issue on a client's site that the entries added by XML sitemap menu don't have any last modified date - that doesn't seem good!

I was about to disable XML sitemap menu to resolve this on that site, when I realized that the front page is only indexed as a menu link, not a node! So it looks like if I disable that module, the front page will no longer be indexed by search engines, which is a big problem! The node that acts as the front page would still be indexed, but not as the front page (i.e. it would be indexed as www.example.com/this-is-the-node-title, not as www.example.com).

Conversely, if I disable XML sitemap node instead, then none of the site map entries will have dates on them, which I expect is likely to cause problems. And if I don't disable either of them, then everything will keep on being indexed twice, which is also likely to cause problems (since I expect the search engines will see that as spamming, and penalize the site, or remove it entirely from their listings!).

There's also a support issue, #1999958: XML Sitemap Custom allows duplicate entries, which includes a patch to stop XML sitemap custom from creating duplicate entries - maybe it could be adapted to stop any duplicate entries from being indexed, regardless of which submodule is to blame? Then these two support issues could be merged...

gapple’s picture

Status: Active » Needs review
FileSize
1.06 KB

Here is my approach, which causes a sitemap entry created by xmlsitemap_node to take priority over one from xmlsitemap_menu.

alex.skrypnyk’s picture

@gapple
Proposed patch loads node for each link - this is quite expensive operation.

Instead, hook_xmlsitemap_link_alter() can be used. It runs only during sitemap generation.

/**
 * Implements hook_xmlsitemap_link_alter().
 */
function YOURMODULE_xmlsitemap_link_alter(array &$link, array $context) {
  static $links = array();

  if (in_array($link['loc'], $links)) {
    $link['access'] = FALSE;
  }
  else {
    $links[] = $link['loc'];
  }
}
mhmhartman’s picture

#3 did not work for me. Everything is still being duplicated.
#2 did work perfectly. @alex - 7000 links were generated in 1 minute, doesn't seem to be that expensive in my opinion.

Yaron Tal’s picture

Both options rely on having the xmlsitemap_node module run before the xmlsitemap_menu module runs. In my case they ran in the other order.

For me this seems to fix duplicate items:

/**
 * Implements hook_xmlsitemap_link_alter().
 */
function YOURMODULE_xmlsitemap_link_alter(array &$link, array $context) {
  $links = xmlsitemap_link_load_multiple(array('loc' => $link['loc']));
  if ($links) {
    foreach ($links as $other_link) {
      if ($other_link['type'] != $link['type'] && $other_link['access'] &&  $other_link['status']) {
        $link['access'] = FALSE;
      }
    }
  }
}
Gomez_in_the_South’s picture

#5 causes an issue for me whereby it sets link['access'] to false for valid entries on my multilingual site. This leaves me with a sitemap that is missing translations. I'm using language_hierarchy, but you may want to check before using on other multilingual sites as well.

Yaron Tal’s picture

There should be some kind of weight in there. At the moment it is impossible to choose which link should be visible to the user, and wich is a duplicate. We now use the following code to always use the link from the node, and only use others when there is no version available from the node module. This way you get all the data from the node module, and no duplicates.

The best solution would be to merge data from multiple sub-modules and make 'loc' the unique key, but I'm guessing that won't make it into the 7.x version of xmlsitemap.

/**
 * Implements hook_xmlsitemap_link_alter().
 */
function HOOK_xmlsitemap_link_alter(array &$link, array $context) {
  // Check if links to the same location already exist.
  $links = xmlsitemap_link_load_multiple(array('loc' => $link['loc']));
  if ($links) {
    foreach ($links as $other_link) {
      // If we're not updating an existing link and the other link is accessable.
      if ($other_link['type'] != $link['type'] && $other_link['access'] && $other_link['status']) {
        // The other link is a node, remove the current link.
        if ($other_link['type'] == 'node') {
          $link['access'] = FALSE;
        }
        // Neither link is a node type, remove the current (best performance).
        elseif ($link['type'] != 'node') {
          $link['access'] = FALSE;
        }
        // The current link is of the node type, so we need to hide the other link (the existing link).
        else {
          $other_link['access'] = FALSE;
          drupal_write_record('xmlsitemap', $other_link, array('id', 'type'));
        }
      }
    }
  }
}
JAINV18’s picture

#2 works for me. Thanks!

odrzutowiec’s picture

Hello solution proposed by gapple (the first one) worked perfectly for me, thanks!

giupenni’s picture

#2 works for me

darrenwh’s picture

Status: Needs review » Needs work
+++ b/xmlsitemap_menu/xmlsitemap_menu.module
@@ -262,6 +262,17 @@ function xmlsitemap_menu_create_link(array $menu_item) {
+  // Exclude menu items created for nodes that are added to the sitemap by xmlsitemap_node

Very minor DCS thing, comment missing end period (.)

Diego_Mow’s picture

Status: Needs work » Needs review
Issue tags: +ciandt-contrib
FileSize
1.04 KB

Hi.

I'm uploading patch number #12.
I resolved the Code Standard from patch number #2.

Please review and check.

contentsuit’s picture

Status: Needs review » Reviewed & tested by the community
Issue tags: -ciandt-contrib

Patch 12 worked for me.
RTBC.

renatog’s picture

Hi, guys.

I applied the patch #12 and works good for me too.

Thank you very much for contribution @Diego_Mow.

Good Work.

Regards.

  • RenatoG committed 8a34de3 on 7.x-2.x authored by Diego_Mow
    Issue #2257191 by Diego_Mow, gapple, Yaron Tal, mhmhartman, alex....
renatog’s picture

Status: Reviewed & tested by the community » Fixed

Fixed.

Committed to the dev branch.

Thank you all for contributions.

Regards.

JAINV18’s picture

Hi, guys,

I applied the patch #12 and works good for me too.

Thank you very much for contribution @Diego_Mow.

Good Work.

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.