Problem/Motivation

I'm getting multiple custom links entries in sitemaps in a multilingual site with no URL detection method enabled. Steps to reproduce from a fresh D8 installation with the standard profile:

  1. enable simple_sitemap, core content / config translation modules and their dependencies
  2. add one or more languages in addition to the default (English) at /admin/config/regional/language
  3. disable the URL detection method at /admin/config/regional/language/detection
  4. regenerate the sitemap at /admin/config/search/simplesitemap

The resulting sitemap will look like:

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="/sitemap.xsl"?>
<!--Generated by the Simple XML Sitemap Drupal module: https://drupal.org/project/simple_sitemap.-->
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:xhtml="http://www.w3.org/1999/xhtml" xmlns:image="http://www.google.com/schemas/sitemap-image/1.1">
 <url>
  <loc>http://localhost:8080/</loc>
  <xhtml:link rel="alternate" hreflang="en" href="http://localhost:8080/"/>
  <xhtml:link rel="alternate" hreflang="es" href="http://localhost:8080/"/>
  <xhtml:link rel="alternate" hreflang="de" href="http://localhost:8080/"/>
  <changefreq>daily</changefreq>
  <priority>1.0</priority>
 </url>
 <url>
  <loc>http://localhost:8080/</loc>
  <xhtml:link rel="alternate" hreflang="en" href="http://localhost:8080/"/>
  <xhtml:link rel="alternate" hreflang="es" href="http://localhost:8080/"/>
  <xhtml:link rel="alternate" hreflang="de" href="http://localhost:8080/"/>
  <changefreq>daily</changefreq>
  <priority>1.0</priority>
 </url>
 <url>
  <loc>http://localhost:8080/</loc>
  <xhtml:link rel="alternate" hreflang="en" href="http://localhost:8080/"/>
  <xhtml:link rel="alternate" hreflang="es" href="http://localhost:8080/"/>
  <xhtml:link rel="alternate" hreflang="de" href="http://localhost:8080/"/>
  <changefreq>daily</changefreq>
  <priority>1.0</priority>
 </url>
</urlset>

Proposed resolution

Prevent duplication of custom links with the same location.

Remaining tasks

Add fix.

User interface changes

N/A

API changes

N/A

Data model changes

N/A

Release notes snippet

TBD.

Comments

manuel.adan created an issue. See original summary.

gbyte’s picture

Title: Duplication of custom links in multilingual sites with no URL detection method enabled » Treat multilingual site with no URL detection as monolingual
Category: Bug report » Feature request

If I understand the problem correctly, I wouldn't say it is a bug; rather an undesired result of a misconfiguration. To mitigate it one can exclude the language from the sitemap in the module settings or enable detection.

Would you say we should treat a multilingual site with no detection method as monolingual?

manuel.adan’s picture

I have a site with two languages, English and Spanish. For configuration reasons, English is the default language and it is used only for administration tasks, so the site is multilingual but from the public point of view it is monolingual. AFAIK, the default language cannot be excluded from sitemap generation, so I cannot exclude the English language in this case.

A possible solution would be to let exclude any language, even the default one, but not sure about the implications. For testing purposes, I tried to set it directly in the simple_sitemap.settings (adding "en: en" to the excluded_languages configuration entry), but it doesn't work.

It is a particular use case that ends in duplications, but might be others. For this reason, I set as proposed solution to add some kind of duplication prevention method in the custom link entries processing.

gbyte’s picture

Status: Active » Postponed (maintainer needs more info)

What you see there has nothing to do with duplicates or custom links. I believe it's Drupal failing to produce different URLs for different languages due to there no being a way to determine the language per URL (no detection method enabled). Hence my question again:

Would you say we should treat a multilingual site with no detection method as monolingual?

I would like others from the community to chime in on this.

manuel.adan’s picture

Title: Treat multilingual site with no URL detection as monolingual » Allow to exclude the default language from sitemap generation on multilingual sites
Status: Postponed (maintainer needs more info) » Active

Would you say we should treat a multilingual site with no detection method as monolingual?

That's not exactly the case. Detection method is based on the user session with fallback to Spanish. It's a multilingual site, registered users can choose between the enabled languages, but anonymous always browse the site in one language. An option to exclude the default language would solve it, so I think we could focus this issue on that way.

gbyte’s picture

Title: Allow to exclude the default language from sitemap generation on multilingual sites » Treat multilingual site with no URL detection as monolingual

@manuel.adan My question regarding treating a multilangual site as monilingual in case there is no detection was referring only to the sitemap.

When URL detection is disabled, there is no such a thing as a link to a non-default language. The sitemap will only produce the default links. Right now you get what you called 'dupliacte links', because it now produces links for every enabled language, but there is no URL detection, so all links are same. This is why I think my title is the better fit for your problem. ;)

I hope this makes sense, if not, feel free to continue the discussion.

An option to exclude the default language would solve it, so I think we could focus this issue on that way.

You are describing a different issue. If you feel strongly about it, should go into a separate issue - feel free to create it as a feature request and let's discuss its viability over there. However right now excluding the default language seems like an edge case / workaround for misconfiguration to me.

  • gbyte.co committed 75017cd on 8.x-3.x
    Issue #3107818 by gbyte.co, manuel.adan: Treat multilingual site with no...
gbyte’s picture

Status: Active » Fixed

@manuel.adan Please test and get back to me.

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.

PieterDC’s picture

FYI, on one of my sites I exclude the default language with the help of the Disable Language module https://www.drupal.org/project/disable_language

It has support for this Simple Sitemap module. It excludes links in disabled languages from the sitemap.
Its main purpose though is to hide certain language(s) (from certain user roles).

Not sure if that covers @manuel.adan his use case, but it's worth mentioning ;-)

gbyte’s picture

@PieterDC

Good to know, but you can exclude languages in settings of this module as well.

PieterDC’s picture

Indeed, if they're not disabled by Disable Language, you can exclude them from the sitemap with the settings of this module.
Good that you mention it.