Hi!

We have had a major regression in our drupal instance when we are generating sitemap.xml. On our site the xmlsitemap module will make about 50k calls to Uri::fromUrl when generating a new sitemap.

Since we run this on a cron job, the job would take over 5 hours to complete for about 6k links. But since it takes so long normally this job will php timeout or get killed by kubernetes. Working back through the call stack we found that the function PathProcessorSearchApiPage::processInbound() (and its call stack) was generating about 98% of our cache hits when we were running this process. We disabled the search_api_page module and the sitemap.xmlwas generated in about 6 minutes.

We do have a patch that we are running against our instance which fixes this performance issue, however I don't think the approach is the best. It simply compares the internal URI against the search api pages that are stored in the database and if there is not a match it skips over the PathProcessorSearchApiPage::processInbound() stack. Skipping this path processing may have some regressions that we are unaware of at this point. However I am happy to add this patch to this issue, if you would like.

Cheers.

Command icon Show commands

Start within a Git clone of the project using the version control instructions.

Or, if you do not have SSH keys set up on git.drupalcode.org:

Comments

max-ar created an issue. See original summary.

max-ar’s picture

Issue summary: View changes
damienmckenna’s picture

Status: Active » Needs review
StatusFileSize
new2.03 KB

Would this achieve the same results? It skips a lot of the logic if clean URLs is disabled on the pages.

Status: Needs review » Needs work

The last submitted patch, 3: search_api_page-n3118168-3.patch, failed testing. View results
- codesniffer_fixes.patch Interdiff of automated coding standards fixes only.

damienmckenna’s picture

neclimdul’s picture

For what its worth, this looks like a known performance bug in xmlsitemap. #3132913: Allow different uri protocols for more effecient url generation The fix is pretty straight forward I think but its complicated enough its been stalled for a couple years.

swentel’s picture

StatusFileSize
new716 bytes

We were bitten by this one too. Added a static variable in getSearchApiPagePathsUsingCleanUrls(). This saved us around 7000 cache calls on a (granted, quite expensive) page .. :)

karlshea made their first commit to this issue’s fork.

karlshea’s picture

Does this work? It incorporates both ideas.

swentel’s picture

Status: Needs review » Reviewed & tested by the community

works for me

  • karlshea committed c51ae4ef on 8.x-1.x
    feat: #3118168 Path processor poor performance with xmlsitemap module....
karlshea’s picture

Status: Reviewed & tested by the community » Fixed

Now that this issue is closed, review the contribution record.

As a contributor, attribute any organization that helped you, or if you volunteered your own time.

Maintainers, credit people who helped resolve this issue.

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.