I've enabled the "Include a stylesheet in the sitemaps for humans" configuration option. Each entry has a "nofollow" generated at line 110 of xmlsitemap.xml:

        <a href="{$sitemapURL}" ref="nofollow"><xsl:value-of select="$sitemapURL"></xsl:value-of></a>
    Two questions:
  1. Isn't the attribute name wrong? Shouldn't it be "rel", not "ref"?
  2. More importantly, why? I'd like to set up a Google Search Appliance to crawl my site. But won't it fail to index links with nofollow, defeating the utility of the sitemap for indexing?

Comments

rt_davies’s picture

After a bit more investigating:
Most crawlers, I assume, are expected to use the xml file as is (not bothering to do the xsl transformation to generate the html). This module properly adds the http header, "X-Robots-Tag noindex, follow" to the xml file, telling crawlers to follow urls found in the xml.

So this just seems reconfirm my suspicion of rel="nofollow" in the html version. If the html is meant only for human consumption, then including the nofollow relationship is meaningless. But if we assume that some crawlers might use the html somehow, then wouldn't we want them to follow links?

Dave Reid’s picture

You're seeing the HTML generated pretty output. The search engines and crawlers *should* only be seeing the XML raw output which does not have any tags and follows the sitemap standard.

Dave Reid’s picture

Status: Active » Postponed (maintainer needs more info)
rt_davies’s picture

Thanks for the reply Dave. I think question number one still stands. AFAIK, there is no "ref" attribute in an html anchor tag.

Gomez_in_the_South’s picture

Issue summary: View changes

Apologies for digging up a 10 year old discussion, but this question came up for a website I work with.
This ref='nofollow' entry from the xmlsitemap.xsl is still present in the D9 version of this module.

I agree with rt_davies in that:
a. 'ref' is meaningless, and this was meant to be 'rel'.
b. I don't see the value in having 'rel=nofollow' present, even if only in the XSL transformed version.

I'd be happy to push a patch to remove this, if a maintainer agrees. The nofollow comes from this line:

<a href="{$sitemapURL}" ref="nofollow"><xsl:value-of select="$sitemapURL"></xsl:value-of></a>

in /xsl/xmlsitemap.xsl