There appears to be a need for an index of all sitemap variants. This is different to a sitemap index that indexes all pages of a single variant (currently implemented). The sitemap protocol supports this idea.

For 3.x, please collaborate in simple_sitemap_index' queue on this functionality.

Let's focus on getting this feature cleanly into 4.x.

Command icon Show commands

Start within a Git clone of the project using the version control instructions.

Or, if you do not have SSH keys set up on git.drupalcode.org:

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

michele.lucchina created an issue. See original summary.

michele.lucchina’s picture

Status: Active » Needs review
FileSize
11.27 KB

I created this patch to facilitate the creation of the submodule as proposed in this previous conversation

gbyte’s picture

Title: Submodule for create sitemapindex » Sitemap variant index functionality
Issue summary: View changes

This module already creates an index for each sitemap instance (variant) depandant on the module settings. Also we don't adhere to the sitemaps protocol; instead we adhere to google's hreflang sitemap standard.

I believe what you are trying to achieve is to create an index of all the sitemap variants. I took the liberty of updating the issue description and title. Will take a look at your patch soon.

Does this borrow code from https://gist.github.com/bdlangton/aea9673cc640e2dfc58466f985a3284c ?

gbyte’s picture

Status: Needs review » Needs work

Looks quite good already. I think we should include it as plugins in the main module (no submodule) and we should change the naming from sitemapindex to variant_index like so:

  • VariantIndexSitemapType, plugin ID variant_index
  • VariantIndexSitemapGenerator, plugin ID variant_index
  • VariantIndexUrlGenerator, plugin ID variant_index

As soon as this is in, I will take a closer look at the code and documentation and add some final touches.

Thank you for looking into it!

michele.lucchina’s picture

Status: Needs work » Needs review
FileSize
11.09 KB

Here is the modification of the main module with your naming.
I have integrated a small modification to the forms to avoid inserting content in the variant index since this variant can only contain other sitemaps.

PS: yes I started my work from the bdlangton code

I hope my work is appreciated ;)

gbyte’s picture

@michele.lucchina Sorry for not making progress on this; I will make sure to test it thoroughly once I have more time on my hands.

cgmonroe’s picture

The current patch will not apply to the latest dev / release. Here is a re-rolled version that will.

cgmonroe’s picture

This is an update to the current patch to that includes a route for sitemap_index.xml.

The changes to the code are:

Adds a \sitemap_index.xml route.

Modifies the controller to include the config.factory service via injection and a getSitemapIndex(Request) function. The function looks for the variant_index config info. If found it calls the existing getSitemap(Request, variant) code with that key. If no config is found it returns a 404.

This allows the default sitemap setting to point to the main sitemap and the index to be retrieved with the suggested sitemap_index.xml filename.

Example, with the Variant Setup:

default | default_hreflang | Default
blog | default_hreflang | blog
index | variant_index | sitemapindex

Available URLs:

/sitemap_index.xml (includes /sitemap.xml and /blog/sitemap.xml )
/sitemap.xml (default variant)
/blog/sitemap.xml (blog variant)

a.milkovsky’s picture

#7 works for me. #9 did not generate the sitemap somehow.

Question: how to proceed with multilingual websites in this case?
What if I have example.de and example.ch websites. Should both domains be presented in the "index sitemap"?
Should we use hreflang?

Worth to mention:

Looks like Google does not require Sitemap Index file:

You can optionally create a sitemap index file (a file that points to a list of sitemaps) and submit that single index file to Google. You can submit multiple sitemaps and/or sitemap index files to Google.

cgmonroe’s picture

By #9 you mean the patch in #8? FWIW, I have noticed that both patches sometimes require the sitemap to be initially built twice for it to work. Mostly after adding variants or making changes. I think it has to do with the variant not being built before the sitemap index is built. Once the 'pump is primed', I have not see any problems.

Yes, the sitemap_index is 'optional'... but try explaining that to your SEO consultant / department... and then winning the fight against "but it's best for SEO...". :)

Anyway, the module supporting creating separate language sitemaps should be a different issue. And doing at the module level might be a bit tricky due to variants not being totally integrated.

That said, here's what I did to support this:

First, add some code similar to this in a custom module / theme file. Note the static variant to language lookup array in the filter code.

/**
 * Implements hook_simple_sitemap_links_alter().
 */
function my_module_simple_sitemap_links_alter(&$links, $variant) {

// Enable support for splitting languages into separate files
  static $split_languages = TRUE;

  // This is part of a lot of sitemap link filters we use... 
  foreach($links as $key => &$link) {
    if ($split_languages && !my_module_language_variant_filter($links, $key, $link, $variant )) {
      continue;
    }
  }
}
/**
 * Filter out links based on variant being processed.
 * Currently languages are handled by language variants, so any link that
 * is not in the language of the variant is removed from the links array.
 * Also, all alternative lang information is removed (should be in page header).
 *
 * @param array $links Array of links being added to sitemap variant
 * @param integer $key Index to item in links being processed
 * @param array $link Link being processed
 * @param string $variant The sitemap variant being processed
 *
 * @return boolean
 *   Returns true if link is allowed, false if it has been rejected.
 */
function my_module_language_variant_filter(&$links, $key, $link, $variant) {
  // The variants that are processed.
  static $language_variants = [
    'default' => 'en',
    'de' => 'de',
    'es' => 'es',
    'fr' => 'fr',
    'it' => 'it',
    'pt-br' => 'pt-br',
    'blog' => 'en',  // Some blog landing pages have 'psuedo translation'
  ];
  if (isset($language_variants[$variant])) {
    if (isset($link['langcode'])) {
      if ($link['langcode'] != $language_variants[$variant]) {
        unset($links[$key]);
        return FALSE;
      }
    }
  }
  if (isset($link['alternate_urls'])) {
    unset( $links[$key]['alternate_urls'] );
  }
  return TRUE;
}

Then set up your language variants like:

[lang code] | default_hreflang | [lang code]

e.g. de | default_hreflang | de

Downside is that for each language you will have to process all your urls again. So build time is a bit longer.

You might do it a bit faster by building a custom variant for each language and then overriding the common SimpleSitemap (a bit complex but do-able). And then filtering by language when the entities are loaded to create the links list.

I choose just to have a slightly slower build time using the simple method.

a.milkovsky’s picture

Hi @cgmonroe,

yes, sorry, I ment the patch #8. Thank you for posting your solution.

I am also looking currently for a solution for the index sitemap for multiple languages.
AS I understand it is not necessary to split the sitemaps into separate languages. And this module tries to avoid this separation.

Have a look at #3033283: Generate per language sitemap:

this module aims to be a hreflang sitemap generator out of the box, which means a newer standard and all languages in one sitemap.

From Google:

You can use a Sitemap to tell Google all of the language and region variants for each URL. To do so, add a element specifying a single URL, with child entries listing every language/locale variant of the page including itself. Therefore if you have 3 versions of a page, your sitemap will have 3 entries, each with 3 identical child entries.

gbyte’s picture

Regarding the last comment: I really hope it is clear by now that this module does everything a modern multilingual sitemap needs to be doing. This here issue is academic for people that need this to be working the old multilingual way 'because'. But seriously, if you are one of these people, you will be better off with xmlsitemap which does this instead of hreflang.

Edit: Sorry for the confusion, I am speaking of using variants/sitemap index to mimic the old multilingual way where you had one sitemap per language. This is academic and should be avoided. Having a sitemap index as proposed here is something different and probably a good feature to include.

a.milkovsky’s picture

Hey @gbyte, thank you for the answer and your contributions!

Regarding This here issue is academic for people that need this to be working the old way 'because'.

I do not completely agree with this statement. In my use case I heve separate sitemaps. But they are not separated by language, but rather by content type. As result it may make sense to have an index sitemap, that collects all of the distributed sitemaps. It is not an outdated concept. From the Google docu:

You can optionally create a sitemap index file (a file that points to a list of sitemaps) and submit that single index file to Google. You can submit multiple sitemaps and/or sitemap index files to Google.

In my current project I have 15 content types (this is a large media portal with a lot of content) and 2 languages.
This module allows to separate sitemaps by content types, that's why I have decided to use it instead of the Xmlsitemap module. In addition this module provides hreflang integration, which is also working perfect.

I have currrently generated 15 sitemaps, and I am looking for an option to generate an index sitemap (ideally with hreflangs).

I hope my usecase is clear. Looking forward to your feedback!

cgmonroe’s picture

Hey @gbyte,

Totally agree, this module is SimpleSitemap for a reason. It does the core job easily with minimal setup.

It is also flexible enough to meet local needs with some fairly simple site specific coding. This is very important, as SEO people are paid to "improve" site SEO.. which means they will always find things to change. Have been thru SEO consultant changes where the new consultant suddenly say, why are we doing that.. because the old consultant said to... no no rip it out...

Love the new variant plugin setup btw. Using it to keep nodes marked with noindex tags from getting into the sitemap. Bottom line is that we have met every SEO change challenge over the last 4+ years with this module. Great track record.

That said, I hope this hasn't distracted from the benefits of adding sitemap_index.xml support to the module.

TIA.

gbyte’s picture

Sorry for the confusion, I am speaking of using variants/sitemap index to mimic the old multilingual way where you had one sitemap per language. This is academic and should be avoided. Having a sitemap index as proposed here is something different and probably a good feature to include.

s_leu’s picture

Re-rolled the patch against current 8.x-3.x

donaldinou’s picture

Hi,

Thanks everyone for the great job.
I really need this functionnality fast, so I've made a module:
https://www.drupal.org/project/simple_sitemap_index

Feel free to contribute or include it as a submodule to simple_sitemap.

gbyte’s picture

Status: Needs review » Active

@donaldinou I installed and tested it, seems to be working fine. I'm sure people will appreciate it until this issue is fixed.
Regarding this issue, can we convert the newest patch to an issue fork so we can collaborate more effectively? This is one of the features I'd like to merge before starting the work on 4.x. Thanks to everyone who contributed!

daniel.bosen made their first commit to this issue’s fork.

daniel.bosen’s picture

@gbyte I created a Fork and MR from the latest patch. This looks all good to me. What is left to be done, to get it in?

Oscaner’s picture

fago’s picture

How does that work in terms of URL handling?

We have created a custom module that added a sitemap index as variant as well - it should have been posted and communicated here earlier - :-/ https://github.com/drunomics/simple-sitemap-extensions
Anyway, it does not cope with the sitemap requirements of ensuring that sub-sitemaps live within the same "folder" of the sitemap index:

. Sitemaps that are referenced in the sitemap index file must be in the same directory as the sitemap index file, or lower in the site hierarchy. For example, if the sitemap index file is at https://example.com/public/sitemap_index.xml, it can only contain sitemaps that are in the same or deeper directory, like https://example.com/public/shared/.... You can submit up to 500 sitemap index files for each site in your account.

https://developers.google.com/search/docs/advanced/sitemaps/large-sitemaps

So that's something which should be considered when designing a proper solution I suppose.

gbyte’s picture

Version: 8.x-3.x-dev » 4.x-dev
Status: Active » Needs work

Anyway, it does not cope with the sitemap requirements of ensuring that sub-sitemaps live within the same "folder" of the sitemap index

@fago If the sitemap variant index is set as default sitemap, its URL becomes /sitemap.xml which is in the site root, hence all sitemaps are located in subfolders fulfilling that requirement.

@gbyte I created a Fork and MR from the latest patch. This looks all good to me. What is left to be done, to get it in?

Things have changed since my last update:

  • 4.x passes tests, has an upgrade path and has been deemed good enough for non-api-dependant use cases (I encourage using it now and will be tagging beta/stable as soon I'm happy with the API and the UX).
  • 3.x is frozen for new features and the solution above slightly goes above just adding module plugins (e.g. here), which is understandable in this case, but I'm not comfortable with altering the API of a module that I want to deprecate soon (I do realize the linked change is non-breaking, but still).
  • As of now, I think we should stick with the contributed simple_sitemap_index module for 3.x and focus on getting this feature into 4.x. I realize this will step on some toes, but bear with me, 4.x will be worth it.

@fago @donaldinou
Can you guys collaborate together on the drupal.org simple_sitemap_index module for 3.x?

Thank you all for your input.

gbyte’s picture

Issue summary: View changes
gbyte’s picture

Issue summary: View changes
fago’s picture

>@fago If the sitemap variant index is set as default sitemap, its URL becomes /sitemap.xml which is in the site root, hence all sitemaps are
located in subfolders fulfilling that requirement.

Yes, that works if you add only one sitemap index, but not if you add multiples. We need a solution that is capable of that and solved it now by adding some more URL processing at https://github.com/drunomics/simple-sitemap-extensions

> Can you guys collaborate together on the drupal.org simple_sitemap_index module for 3.x?

It's too late, we already have two alternative solutions here and it does not make sense for us to re-build our working solution now.

I think we should aim at 4.x now and collaborate on a good solution to get into 4.x

donaldinou’s picture

It's too late, [...]

Such a shame.

@gbyte
I am willing to help the community as much as I can and would be happy to collaborate with anyone want to improve this module.

Li Qing’s picture

FileSize
14.37 KB

Remove "priority" tag as that is not valid XML.
See: https://www.sitemaps.org/protocol.html#index

gbyte’s picture

Assigned: Unassigned » gbyte

gbyte’s picture

Status: Needs work » Active

I have started implementing this in 4.x.

Questions to all you SEO gurus:

  • Would it make sense for new installations of the module to always create an index of all sitemaps (even if there is only one) and make that index the default sitemap available under /sitemap.xml?
  • If we do not use the index by default, would it still make sense to create the SimpleSitemap 'index' entity on install and have the sitemap index be available under /index/sitemap.xml? Obviously the user would be able to delete the index entity or set it as default to have it available under /sitemap.xml.
  • Last but no least, for existing intallations: Does it make sense to create the SimpleSitemap 'index' entity in an update hook on top of the SimpleSitemapType 'index' entity? Or do we trust site builders with that task?

Please speak up now - that includes my Thunder Genossen who apparently are still on 3.x. :)

gbyte’s picture

Assigned: gbyte » Unassigned
Status: Active » Postponed (maintainer needs more info)

That's your queue guys

marcoka’s picture

Not sure if i understand it correctly but i used it this way.
I have the domain https://www.kopfhoerer-berater.de/sitemap.xml
And on that page i have the seperate sitemaps for contenttypes.

On a small site that may not be necessary by default.

gbyte’s picture

Assigned: Unassigned » gbyte
Status: Postponed (maintainer needs more info) » Active
Related issues: +#3269333: Add ability to disable sitemap variants
  • Would it make sense for new installations of the module to always create an index of all sitemaps (even if there is only one) and make that index the default sitemap available under /sitemap.xml?
  • If we do not use the index by default, would it still make sense to create the SimpleSitemap 'index' entity on install and have the sitemap index be available under /index/sitemap.xml? Obviously the user would be able to delete the index entity or set it as default to have it available under /sitemap.xml.
  • Last but no least, for existing intallations: Does it make sense to create the SimpleSitemap 'index' entity in an update hook on top of the SimpleSitemapType 'index' entity? Or do we trust site builders with that task?

These questions become moot once I implement #3269333: Add ability to disable sitemap variants.

chr.fritsch’s picture

@gbyte Thx, for working on this. Awesome. Let me know if you need a review or any help.

gbyte’s picture

Assigned: gbyte » Unassigned
Status: Active » Needs review

@chr.fritsch Thanks and yes, I wouldn't mind one of you guys adding XSL to the sitemap index generator.

Use \Drupal\simple_sitemap\Plugin\simple_sitemap\SitemapGenerator\SitemapIndexGenerator::getXslContent analogous to \Drupal\simple_sitemap\Plugin\simple_sitemap\SitemapGenerator\DefaultSitemapGenerator::getXslContent.

Alternatively remove the method \Drupal\simple_sitemap\Plugin\simple_sitemap\SitemapGenerator\SitemapIndexGenerator::getXslContent and alter xsl/simple_sitemap.xsl to accomodate the sitemap index.

Feel free to introduce any other changes concerning the index functionality - will happily review. Thanks in advance.

gbyte’s picture

Status: Needs review » Needs work
chr.fritsch’s picture

I am not an expert on XML/XSL stuff. What should the XSL look like?

Can I do something similar to WordPress? https://developer.wordpress.org/reference/classes/wp_sitemaps_stylesheet...

gbyte’s picture

I am not an expert on XML/XSL stuff. What should the XSL look like?

I can't prioritize learning the structure ATM, hence I'm asking for support. Not sure about your question though, XSL is already integrated for the default sitemaps; right now it's only about adjusting it to fit the index of sitemaps. See my comment.
Anyone feel free to grab this.

chr.fritsch’s picture

Status: Needs work » Needs review

I fixed the sitemap index XML. And now the XSL is applied correctly as far as I can see.

gbyte’s picture

Status: Needs review » Needs work

Is the index of unrelated sitemaps on the site (new functionality) the same as splitting up one sitemap into chunks in terms of sitemap structure (old functionality)? If so how did we miss this?

In this case, SitemapIndexGenerator::getChunkContent should use SitemapGeneratorBase::$indexAttributes instead of calling DefaultSitemapGenerator::addSitemapAttributes() I believe, similar to what SitemapGeneratorBase::getIndexContent does.

marcoka’s picture

Installed the latest dev version today for testing. Whet i would expect is that
http://dev9.test.de/sitemap.xml would list an index that lists all the subindexes i created. In my case

-One for Contenttype Article
-One for Contenttype Site
-One for Contenttype Product

What i get is that http://dev9.test.de/sitemap.xml lists the first entry in my case the content of "http://dev9.test.de/artikel/sitemap.xml"

gbyte’s picture

@marcoka

/sitemap.xml lists your default sitemap, as set in your settings - this is expected behavior.

If you need an index of all sitemaps, be patient or use the above branch/patch. It's more or less finished, just needs some love.

  • gbyte committed dd1c22a on 4.x
    Issue #3109090 by gbyte, cgmonroe, michele.lucchina, chr.fritsch,...
gbyte’s picture

Status: Needs work » Fixed

Thanks for your input; that's in dev now and will (hopefully) be released mid June alongside D10 support and a few other niceties.

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.