Problem/Motivation

Background and context:

Problem statement:

  • The index sitemap.xml available at http://localhost/sitemap.xml contains links to the paginated sitemaps like illustrated below.
  • Observe that the loc elements contain a language prefix 'fi' like http://localhost/fi/sitemap.xml?page=1
  • When I follow this link which contains a language prefix, I'll get HTTP 404 page not found response.
  • If I manually remove the language prefix from the URL i.e. I access http://localhost/sitemap.xml?page=1, the paginated sitemap works as expected.
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="/sitemap_generator/default/sitemap.xsl"?>
<!--Generated by the Simple XML Sitemap Drupal module: https://drupal.org/project/simple_sitemap.-->
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
 <sitemap>
  <loc>http://localhost/fi/sitemap.xml?page=1</loc>
  <lastmod>2023-06-25T09:45:45+03:00</lastmod>
 </sitemap>
 <sitemap>
  <loc>http://localhost/fi/sitemap.xml?page=2</loc>
  <lastmod>2023-06-25T09:45:45+03:00</lastmod>
 </sitemap>
 <sitemap>
  <loc>http://localhost/fi/sitemap.xml?page=3</loc>
  <lastmod>2023-06-25T09:45:45+03:00</lastmod>
 </sitemap>
 <sitemap>
  <loc>http://localhost/fi/sitemap.xml?page=4</loc>
  <lastmod>2023-06-25T09:45:45+03:00</lastmod>
 </sitemap>
 <sitemap>
  <loc>http://localhost/fi/sitemap.xml?page=5</loc>
  <lastmod>2023-06-25T09:45:45+03:00</lastmod>
 </sitemap>
</sitemapindex>

Steps to reproduce

See context above.

Proposed resolution

When sitemap.xml contains links to paginated sub-pages, ensure that the loc-elements do not contain language prefixes.

Remaining tasks

Investigate where the language prefixes are coming from.
Fix it.

User interface changes

API changes

Data model changes

Command icon Show commands

Start within a Git clone of the project using the version control instructions.

Or, if you do not have SSH keys set up on git.drupalcode.org:

Comments

masipila created an issue. See original summary.

masipila’s picture

Title: Unexpected language prefixes on multilingual sites » Unexpected language prefixes on sitemap index
masipila’s picture

Note to self: the index generation seems to happen in src/Plugin/simple_sitemap/SitemapGenerator/SitemapGeneratorBase.php

  public function getIndexContent(): string {
    [...]
    // Add sitemap chunk locations to document.
    for ($delta = 1; $delta <= $this->sitemap->fromUnpublished()->getChunkCount(); $delta++) {
      $this->writer->startElement('sitemap');
      // THE URL TO THE CHUNK IS CREATED HERE
      $this->writer->writeElement('loc', $this->sitemap->toUrl('canonical', ['delta' => $delta])->toString());
      // @todo Should this be current time instead?
      $this->writer->writeElement('lastmod', date('c', $this->sitemap->fromUnpublished()->getCreated()));
      $this->writer->endElement();
    }
  }
masipila’s picture

Okay, so getIndexContent() from the snippet mentioned in my previous comment calls SimpleSitemap::toUrl(), which is this (in src/Entity/SimpleSitemap.php)

  public function toUrl($rel = 'canonical', array $options = []) {
    if ($rel !== 'canonical') {
      return parent::toUrl($rel, $options);
    }

    $parameters = isset($options['delta']) ? ['page' => $options['delta']] : [];
    unset($options['delta']);

    if (empty($options['base_url'])) {
      /** @var \Drupal\simple_sitemap\Settings $settings */
      $settings = \Drupal::service('simple_sitemap.settings');
      $options['base_url'] = $settings->get('base_url') ?: $GLOBALS['base_url'];
    }

    $options['language'] = $this->languageManager()->getLanguage(LanguageInterface::LANGCODE_NOT_APPLICABLE);

    return $this->isDefault()
      ? Url::fromRoute(
        'simple_sitemap.sitemap_default',
        $parameters,
        $options)
      : Url::fromRoute(
        'simple_sitemap.sitemap_variant',
        $parameters + ['variant' => $this->id()],
        $options);
  }

Looking at the routing file, the route normalizer is already disabled (suggested here: https://drupal.stackexchange.com/questions/246572/disabling-language-pre...)

simple_sitemap.sitemap_default:
  path: '/sitemap.xml'
  defaults:
    _controller: '\Drupal\simple_sitemap\Controller\SimpleSitemapController::getSitemap'
    _disable_route_normalizer: 'TRUE'
  requirements:
    # Sitemaps are accessible for everyone.
    _access: 'TRUE'

What would be the Right Way to ensure that the URL does not have a language prefix?

(For the time being I wrote a small patch for myself that is included in my composer.json which has a hard coded "remove /fi/ prefix" logic but that's obviously not the correct way to handle this...)

Cheers,
Markus

kala4ek’s picture

Unforchanately it doesn't fully related to simple sitemap, because it was breaked at the core level, during #2883450: Missing url prefix on language neutral content tiket.

jjcarrion’s picture

Hi,

I'm facing the same problem after updating core to 10.1.2.

It seems that the root cause will take some time to be fixed https://www.drupal.org/project/drupal/issues/2883450#comment-15218088 so I have applied a hack for now, I'm agree with @masilipa that this is not the way to go, but until we find a better solution I'm uploading the hacky patch just in case anyone find it useful, I'm not even using dependency injection but as I said, this is not the right solution.

Thanks!

jheinon_finland’s picture

Greetings,

I was studying the issue and tried the patch on our Drupal project's Sitemap, and it didn't provide the desired result to remove the language prefix. The core version in the project in which I'm working is 10.1.2 and the module version is 4.1.x. I was studying the core issue to this and found a similar issue on OpenID Connect / OAuth client module's issue https://www.drupal.org/project/openid_connect/issues/3383036 / Redirect URI has the language prefix in it (in D10).

In this patch, the language for the URL is disabled, replacing it with `path_processing` set to false. And the same solution is proposed on the patch I'm providing with this comment on the issue, also with an interdiff to the prior patch.

heikkiy’s picture

Status: Active » Needs review

Encountered the same issue in OpenID Connect and Simple Sitemap XML. The patch from #3369919-7: Unexpected language prefixes on sitemap index seems to fix the issue. I'll move this to Needs review but probably also RTBC ready.

Shreya Shetty made their first commit to this issue’s fork.

sascha_meissner’s picture

+1 Having the same (drastic) issue, patch7 fixes this for me

gbyte’s picture

Version: 4.1.6 » 4.x-dev
Status: Needs review » Fixed

Thank you, this fix I can live with. Tests are gone ATM as I need to set up GitLab CI. Can you guys meanwhile test the dev version for me and tell me if it nothing broke?

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.