Problem/Motivation
Background and context:
- I have a D10.1.0 site with Simple XML Sitemap 4.1.6.
- The site has two languages: FI and EN. All URLs have a language prefix like http://localhost/fi/foo and http://localhost/en/bar.
- I have about 10k URLs to be included in the sitemap on my site. In other words, the sitemap will contain multiple pages like http://localhost/sitemap.xml?page=1, http://localhost/sitemap.xml?page=2 and so on.
- When the Simple XML sitemap is generated, it is available as expected at http://localhost/sitemap.xml (note that there is no language prefix here, which is correct as far as I understand).
Problem statement:
- The index sitemap.xml available at http://localhost/sitemap.xml contains links to the paginated sitemaps like illustrated below.
- Observe that the loc elements contain a language prefix 'fi' like http://localhost/fi/sitemap.xml?page=1
- When I follow this link which contains a language prefix, I'll get HTTP 404 page not found response.
- If I manually remove the language prefix from the URL i.e. I access http://localhost/sitemap.xml?page=1, the paginated sitemap works as expected.
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="/sitemap_generator/default/sitemap.xsl"?>
<!--Generated by the Simple XML Sitemap Drupal module: https://drupal.org/project/simple_sitemap.-->
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap>
<loc>http://localhost/fi/sitemap.xml?page=1</loc>
<lastmod>2023-06-25T09:45:45+03:00</lastmod>
</sitemap>
<sitemap>
<loc>http://localhost/fi/sitemap.xml?page=2</loc>
<lastmod>2023-06-25T09:45:45+03:00</lastmod>
</sitemap>
<sitemap>
<loc>http://localhost/fi/sitemap.xml?page=3</loc>
<lastmod>2023-06-25T09:45:45+03:00</lastmod>
</sitemap>
<sitemap>
<loc>http://localhost/fi/sitemap.xml?page=4</loc>
<lastmod>2023-06-25T09:45:45+03:00</lastmod>
</sitemap>
<sitemap>
<loc>http://localhost/fi/sitemap.xml?page=5</loc>
<lastmod>2023-06-25T09:45:45+03:00</lastmod>
</sitemap>
</sitemapindex>
Steps to reproduce
See context above.
Proposed resolution
When sitemap.xml contains links to paginated sub-pages, ensure that the loc-elements do not contain language prefixes.
Remaining tasks
Investigate where the language prefixes are coming from.
Fix it.
User interface changes
API changes
Data model changes
| Comment | File | Size | Author |
|---|---|---|---|
| #7 | interdiff_6-7.txt | 1.46 KB | jheinon_finland |
| #7 | simple_sitemap-unexpected-language-3369919-7.patch | 576 bytes | jheinon_finland |
| #6 | simple_sitemap-unexpected-language-3369919-6.patch | 1.22 KB | jjcarrion |
Issue fork simple_sitemap-3369919
Show commands
Start within a Git clone of the project using the version control instructions.
Or, if you do not have SSH keys set up on git.drupalcode.org:
Comments
Comment #2
masipila commentedComment #3
masipila commentedNote to self: the index generation seems to happen in
src/Plugin/simple_sitemap/SitemapGenerator/SitemapGeneratorBase.phpComment #4
masipila commentedOkay, so getIndexContent() from the snippet mentioned in my previous comment calls
SimpleSitemap::toUrl(), which is this (insrc/Entity/SimpleSitemap.php)Looking at the routing file, the route normalizer is already disabled (suggested here: https://drupal.stackexchange.com/questions/246572/disabling-language-pre...)
What would be the Right Way to ensure that the URL does not have a language prefix?
(For the time being I wrote a small patch for myself that is included in my composer.json which has a hard coded "remove /fi/ prefix" logic but that's obviously not the correct way to handle this...)
Cheers,
Markus
Comment #5
kala4ekUnforchanately it doesn't fully related to simple sitemap, because it was breaked at the core level, during #2883450: Missing url prefix on language neutral content tiket.
Comment #6
jjcarrionHi,
I'm facing the same problem after updating core to 10.1.2.
It seems that the root cause will take some time to be fixed https://www.drupal.org/project/drupal/issues/2883450#comment-15218088 so I have applied a hack for now, I'm agree with @masilipa that this is not the way to go, but until we find a better solution I'm uploading the hacky patch just in case anyone find it useful, I'm not even using dependency injection but as I said, this is not the right solution.
Thanks!
Comment #7
jheinon_finland commentedGreetings,
I was studying the issue and tried the patch on our Drupal project's Sitemap, and it didn't provide the desired result to remove the language prefix. The core version in the project in which I'm working is 10.1.2 and the module version is 4.1.x. I was studying the core issue to this and found a similar issue on OpenID Connect / OAuth client module's issue https://www.drupal.org/project/openid_connect/issues/3383036 / Redirect URI has the language prefix in it (in D10).
In this patch, the language for the URL is disabled, replacing it with `path_processing` set to false. And the same solution is proposed on the patch I'm providing with this comment on the issue, also with an interdiff to the prior patch.
Comment #8
heikkiy commentedEncountered the same issue in OpenID Connect and Simple Sitemap XML. The patch from #3369919-7: Unexpected language prefixes on sitemap index seems to fix the issue. I'll move this to Needs review but probably also RTBC ready.
Comment #10
sascha_meissner+1 Having the same (drastic) issue, patch7 fixes this for me
Comment #12
gbyteThank you, this fix I can live with. Tests are gone ATM as I need to set up GitLab CI. Can you guys meanwhile test the dev version for me and tell me if it nothing broke?