Problem/Motivation

Just saw by chance that after the update to Drupal 9.5 the module unfortunately generates wrong URLs. The functionality itself is not disturbed because a redirection takes place. However, Google does not like this at all and has thrown around 15,000 pages out of the index.

Sitemap URLs are generated according to this scheme:

Example:
Node -> domain.tld/index.php/channel/node-title-23253
Group -> domain.tld/index.php/grouptype/group-name

An /index.php is inserted after the domain name in every url, so there is a 301 redirection and for Google it is an indicator that something is wrong with the page.

Steps to reproduce

Proposed resolution

* Provide a patch, fix the problem.

Remaining tasks

User interface changes

API changes

Data model changes

CommentFileSizeAuthor
#4 Xnip2023-03-23_01-50-30.jpg193.9 KBzcht

Comments

zcht created an issue. See original summary.

zcht’s picture

Issue summary: View changes
walkingdexter’s picture

Version: 4.1.3 » 4.x-dev
Priority: Critical » Normal
Status: Active » Closed (cannot reproduce)

@zcht I can't reproduce the described problem. It is very likely that the problem is related to the environment settings.

zcht’s picture

Priority: Normal » Major
Status: Closed (cannot reproduce) » Active
StatusFileSize
new193.9 KB

I have to reopen this issue unfortunately. The problem occurs primarily with me, for additional sitemaps that I have configured. Attached is a visual example from Google Search Console. As you can see well in the screenshot, the URLs are indexed if they are correct. But as soon as a domain.tld/index.php/channel/node-title-23253 comes into the path, the URLs fly out of the index.

Meanwhile it is also the case with the regular sitemap, unfortunately it is not very conducive to the project, if to google always other urls are transmitted. google recognizes then a forwarding, as soon as an index.php is contained in the url and removes these pages.

could the problem be that i have many nodes and the internal generation batch is simply overloaded? should i maybe set up a manual cron job for the sitemaps?

would be very grateful for help.

gbyte’s picture

Category: Bug report » Support request
Priority: Major » Normal

This module does not generate the URLs, it asks Drupal core to do it. When searching for the problem, it appears this is an issue that people experience probably due to erroneous environment configuration; please check your apache's htaccess or nginx configuration.

Maybe scanning through this issue will help: #3050261: index.php randomly appears in friendly URLs. Good luck!

zcht’s picture

It may be a partial problem of the core, but also of XML Sitemap.

We also have the Redirects module in use and the option 'Enforce clean and canonical URLs' has always been active.

The problem on the page itself and linked articles does not appear.

Before the XML Sitemap module we had Sitemap module in use, but migrated to XML Sitemap because of IndexNow. Since then, the sitemaps are generated again and again with this structure, to 80% the url structure is correct, the remaining 20% unfortunately remain questionable. So the sitemaps are generated with the index.php in the url.

The nginx rules are set correctly for us, also as in the linked issue.

I do not know if no one has noticed so far, because you configure the module, test in the first time and the urls look normal. until it is then just now and then not generated correctly.

gbyte’s picture

I do not know if no one has noticed so far, because you configure the module, test in the first time and the urls look normal. until it is then just now and then not generated correctly.

Trust me 100k+ users do notice these things. Especially when it impacts their SEO.

I don't know how to debug from here as I've never encountered nor have I heard of this issue. After reverting environmental settings (server) to defaults, I'd disable contrib and custom module one by one to see if the problem disappears.

Suggestions:

  • What happens if you manually set 'Default base URL' under /admin/config/search/simplesitemap/settings?
  • Take a look at this comment: https://www.drupal.org/project/drupal/issues/3050261#comment-13080063
  • If this problem really only occurs in the sitemap (which frankly doesn't make sense), you might work around this problem by implementing hook_simple_sitemap_links_alter() to remove index.php from the URLs.

In any case let us know if you find the source of the problem.

gbyte’s picture

Status: Active » Fixed

Closing as per lack of activity.

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.