Problem/Motivation
Just saw by chance that after the update to Drupal 9.5 the module unfortunately generates wrong URLs. The functionality itself is not disturbed because a redirection takes place. However, Google does not like this at all and has thrown around 15,000 pages out of the index.
Sitemap URLs are generated according to this scheme:
Example:
Node -> domain.tld/index.php/channel/node-title-23253
Group -> domain.tld/index.php/grouptype/group-name
An /index.php is inserted after the domain name in every url, so there is a 301 redirection and for Google it is an indicator that something is wrong with the page.
Steps to reproduce
Proposed resolution
* Provide a patch, fix the problem.
Remaining tasks
User interface changes
API changes
Data model changes
Comments
Comment #2
zcht commentedComment #3
walkingdexter commented@zcht I can't reproduce the described problem. It is very likely that the problem is related to the environment settings.
Comment #4
zcht commentedI have to reopen this issue unfortunately. The problem occurs primarily with me, for additional sitemaps that I have configured. Attached is a visual example from Google Search Console. As you can see well in the screenshot, the URLs are indexed if they are correct. But as soon as a domain.tld/
index.php/channel/node-title-23253 comes into the path, the URLs fly out of the index.Meanwhile it is also the case with the regular sitemap, unfortunately it is not very conducive to the project, if to google always other urls are transmitted. google recognizes then a forwarding, as soon as an index.php is contained in the url and removes these pages.
could the problem be that i have many nodes and the internal generation batch is simply overloaded? should i maybe set up a manual cron job for the sitemaps?
would be very grateful for help.
Comment #5
gbyteThis module does not generate the URLs, it asks Drupal core to do it. When searching for the problem, it appears this is an issue that people experience probably due to erroneous environment configuration; please check your apache's htaccess or nginx configuration.
Maybe scanning through this issue will help: #3050261: index.php randomly appears in friendly URLs. Good luck!
Comment #6
zcht commentedIt may be a partial problem of the core, but also of XML Sitemap.
We also have the Redirects module in use and the option 'Enforce clean and canonical URLs' has always been active.
The problem on the page itself and linked articles does not appear.
Before the XML Sitemap module we had Sitemap module in use, but migrated to XML Sitemap because of IndexNow. Since then, the sitemaps are generated again and again with this structure, to 80% the url structure is correct, the remaining 20% unfortunately remain questionable. So the sitemaps are generated with the index.php in the url.
The nginx rules are set correctly for us, also as in the linked issue.
I do not know if no one has noticed so far, because you configure the module, test in the first time and the urls look normal. until it is then just now and then not generated correctly.
Comment #7
gbyteTrust me 100k+ users do notice these things. Especially when it impacts their SEO.
I don't know how to debug from here as I've never encountered nor have I heard of this issue. After reverting environmental settings (server) to defaults, I'd disable contrib and custom module one by one to see if the problem disappears.
Suggestions:
/admin/config/search/simplesitemap/settings?In any case let us know if you find the source of the problem.
Comment #8
gbyteClosing as per lack of activity.