Problem/Motivation

I would like to be able to add multiple URLs with the same path but different url query strings to the sitemap.xml file.

Steps to reproduce

  1. Create a basic page with the url of /testpage.
  2. Go to /admin/config/search/simplesitemap/custom and in the custom url area put in /testpage?id=111 and /testpage?id=222. Check the “regenerated sitemap” as you save the page.
  3. Validate that the sitemap has only the first URL listed from above and not the other.

Proposed resolution

  1. Make a patch that fixes it to accept multiple URLs with the same path but different url query strings. Treat those urls as unique.

Remaining tasks

  1. Create the patch
Command icon Show commands

Start within a Git clone of the project using the version control instructions.

Or, if you do not have SSH keys set up on git.drupalcode.org:

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

minorwm created an issue. See original summary.

minorwm’s picture

minorwm’s picture

rgristroph’s picture

I tested this patch. I applied it, created a basic page node with the url "/aaaaa", and then I went to the custom URL portion of the sitemap settings, and I added "/aaaaa?a=aaaa" and "/aaaaa?a=bbbb" and then I regenerated the sitemap. The checkbox for checking for duplicates was not checked. After re-generation, the urls "/aaaaa?a=aaaa" and "/aaaaa?a=bbbb" and "/aaaaa" were all in the sitemap.

If the checkbox for checking for duplicates is checked, then "/aaaaa?a=aaaa" and "/aaaaa?a=bbbb" are there, and "/aaaaa" with no url arguments is not there.

rgristroph’s picture

Reading the patch and thinking through this further, I have a couple of thoughts:

* It could check the url argument for equality instead of just the presence of the "?"
* There could be a test
* We could make this behavior configurable
* But for my use case this is working well, I need to allow a site admin to enter custom URLs that are the same except for different arguments, due to how a custom embed on some of our pages works, and then make that appear in the sitemap for google search indexing reasons, and this solves our problem

gbyte’s picture

Title: Fix simple_sitemap to not de-duplicate Custom URLs with different query strings. » Don't de-duplicate URLs with different query strings
Version: 8.x-3.7 » 8.x-3.x-dev
Assigned: minorwm » Unassigned
Category: Bug report » Feature request
Status: Active » Needs work

Good point, two URLs with different query parameters should not be seen as duplicates. This has less to do with custom links - de-duplication is used with all URL generators. I agree that #3 is not enough, but I wouldn't like to make the de-duplication feature much heavier - either this can be implemented cleanly, or a warning should be shown underneath the de-duplication checkbox informing that the feature does not work with query parameters.

WalkingDexter’s picture

Version: 8.x-3.x-dev » 4.x-dev
Status: Needs work » Needs review
FileSize
3.26 KB
WalkingDexter’s picture

Also fix a de-duplication bug for the case when there is more than one entity in the data-set (EntityUrlGenerator::processDataSet()).

WalkingDexter’s picture

Assigned: Unassigned » gbyte
gbyte’s picture

Assigned: gbyte » WalkingDexter

@WalkingDexter Thanks for the patch. I propose the following change:
https://git.drupalcode.org/issue/simple_sitemap-3178229/-/commit/8767f63...

Reason for using this in other generators is: Menu links can be saved with query strings as well and we don't know what contrib modules may do to content entity canonical URLs.

Reason for storing the path with the query in ['meta']['path'] instead of using the additional ['meta']['query'] is just general simplicity. If you feel strongly about this being a bad idea, let me know why.

  • gbyte committed 60a5381 on 4.x
    Issue #3178229 by gbyte, minorwm, WalkingDexter: Don't de-duplicate URLs...
gbyte’s picture

Assigned: WalkingDexter » Unassigned
Status: Needs review » Fixed

Merging it for now; feel free to object before June where I intend to release.

WalkingDexter’s picture

Assigned: Unassigned » gbyte
Status: Fixed » Needs work

Current code breaks the URL variants feature, because variants have the same system path. This problem was fixed in my original patch. However, I'm not sure about that solution. Maybe it would be better to check the final URL string?

protected function removeDuplicates(array &$results): void {
  if ($this->generatorSettings['remove_duplicates'] && !empty($results)) {
    foreach ($results as $key => $result) {
      $url = $result['url'];

      if (isset($this->processedUrls[$url])) {
        unset($results[$key]);
      }
      else {
        $this->processedUrls[$url] = TRUE;
      }
    }
  }
}

Another thing - duplicate lines (364 and 367) in SimplesitemapTest::testRemoveDuplicatesSetting().

P.S. Tests are failing (not related to this issue). I tested locally with Drupal 9.3.13.

There were 2 errors:

1) Drupal\Tests\simple_sitemap\Functional\SimplesitemapTest::testEmptySitemap
Behat\Mink\Exception\ExpectationException: Current response status code is 200, but 404 expected.

2) Drupal\Tests\simple_sitemap\Functional\SimplesitemapTest::testSetEntityInstanceSettings
Behat\Mink\Exception\ExpectationException: The string "node/2" appears in the HTML response of this page, but it should not.

There were 2 failures:

1) Drupal\Tests\simple_sitemap\Functional\SimplesitemapTest::testGenerationResume with data set #1 (1000, 500, 3, array('de'))
Failed asserting that 667 matches expected 334.

2) Drupal\Tests\simple_sitemap\Functional\SimplesitemapTest::testGenerationResume with data set #2 (1000, 500, 5, array('de', 'es'))
Failed asserting that 600 matches expected 200.

gbyte’s picture

Assigned: gbyte » WalkingDexter

Good point about URL variants. So I reverted removeDuplicates() to what it was before and it is working correctly. I don't know why tests are failing now; it appears 509172800164a0467c1f9c532b9cf5c4d61d07da broke them - maybe you can have a look?

gbyte’s picture

@WalkingDexter

Thanks for looking into the tests. Any chance you can check why testRemoveDuplicatesSetting() is still failing? I found no problems when testing manually.

  • WalkingDexter committed 7c9b78a on 4.x
    Issue #3178229 by gbyte, WalkingDexter, minorwm: Don't de-duplicate URLs...
WalkingDexter’s picture

Status: Needs work » Needs review

@gbyte I fixed the tests and improved the removeDuplicates() method, please take a look.

gbyte’s picture

Status: Needs review » Fixed

Thank you, that works.

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.