Problem/Motivation

The module prevents the XML-formatted sitemaps from being indexed within the X-robots response

Steps to reproduce

Try to index the sitemap.xml file(s) using Google Web console

Proposed resolution

The categoric X-robots “noindex” response within this sitemap module is against the best practices and the very purpose of the XML-sitemaps. The purpose of the XML-sitemaps is to programmatically and efficiently provide all the pages in each site to (search) bots. The noindex response in the server level blocks majority of the (search) crawlers in accessing the XML-sitemaps and therefore makes their existence obsolete or inefficient at the very minimum. E.g. Google eventually interprets the pages annotated with noindex as nofollow https://www.seroundtable.com/google-long-term-noindex-follow-24990.html thereby making the very existence of the XML-sitemap obsolete. Currently, the only way to circumvent the issue is to submit the XML-sitemaps in Search Console for crawling and indexing yet not all webmasters have this feature in use nor control. The correct technical treatment is to allow the (search) bots to access the XML-files without the noindex response in the X-robots tag by removing it altogether or changing the annotation to index, follow. Although this will enable the XML-sitemaps files to be found from search indices, there’s nothing inherently wrong with this approach. Allowing the XML-sitemaps to (search) bots is aligned with the purpose of XML-sitemaps. In case webmasters would want to further limit the access to users or bots, they have other means to do so (e.g. robots.txt file, encryption etc).

Remaining tasks

Remove noindex X-robots from the sitemap.xml response. This bug was introduced in https://www.drupal.org/project/simple_sitemap/issues/2878547#comment-121...

Comments

eldrupalista created an issue. See original summary.

gbyte’s picture

Version: 8.x-3.8 » 8.x-3.x-dev
Category: Bug report » Support request

Remove noindex X-robots from the sitemap.xml response. This bug was introduced in...

It's not a bug, it's a feature ;) Albeit probably a controversial one.

Contrary to its content (the links themselves), the sitemap page itself is *not* supposed to be indexed - if it was, you would get a link to the sitemap when googling for content. This is an issue for e.g sitemaps that contain images without the noindex directive - the images on the sitemap get indexed instead of their links.

The practice of setting the sitemap to noindex was blessed by John Mueller (webmaster trends analyst at Google).

If you find sources close to Google which dispute this assertion, let me know.

gbyte’s picture

joshi.rohit100’s picture

We also have requirement to mark "X-Robots-Tag" to "index". I think we can introduce some setting and that can be used to determine whether sitemap.xml needs to be indexed or not rather hardcoded.

Default value can be considered as false to maintain the existing behavior

manisha-acheryya’s picture

manisha-acheryya’s picture

StatusFileSize
new2.76 KB
joshi.rohit100’s picture

  1. +++ b/src/Form/SimplesitemapSettingsForm.php
    @@ -184,6 +184,13 @@ class SimplesitemapSettingsForm extends SimplesitemapFormBase {
    +    $form['simple_sitemap_settings']['advanced']['index_sitemap'] = [
    +      '#type' => 'radios',
    +      '#title' => t('Index / Unindex XML'),
    +      '#default_value' => $this->generator->getSetting('index_sitemap', FALSE),
    +      '#options' => [t('Index'), t('Unindex')],
    +    ];
    +
    

    Assuming this is new key, missing schema entry

  2. +++ b/src/Form/SimplesitemapSettingsForm.php
    @@ -184,6 +184,13 @@ class SimplesitemapSettingsForm extends SimplesitemapFormBase {
    +      '#title' => t('Index / Unindex XML'),
    

    $this->t()

  3. +++ b/src/Form/SimplesitemapSettingsForm.php
    @@ -184,6 +184,13 @@ class SimplesitemapSettingsForm extends SimplesitemapFormBase {
    +      '#options' => [t('Index'), t('Unindex')],
    

    $this->t()

gbyte’s picture

Status: Active » Fixed

I will treat this issue as a support request and the question why the sitemap itself is set to 'noindex' has been answered. Furthermore I'm not keen on adding a setting that (IMO) will accomplish nothing; Esoteric (and in this casse incorrect) requirements can be implemented programmatically by overriding part of the code.

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.