Hi!
We have had a major regression in our drupal instance when we are generating sitemap.xml. On our site the xmlsitemap module will make about 50k calls to Uri::fromUrl when generating a new sitemap.
Since we run this on a cron job, the job would take over 5 hours to complete for about 6k links. But since it takes so long normally this job will php timeout or get killed by kubernetes. Working back through the call stack we found that the function PathProcessorSearchApiPage::processInbound() (and its call stack) was generating about 98% of our cache hits when we were running this process. We disabled the search_api_page module and the sitemap.xmlwas generated in about 6 minutes.
We do have a patch that we are running against our instance which fixes this performance issue, however I don't think the approach is the best. It simply compares the internal URI against the search api pages that are stored in the database and if there is not a match it skips over the PathProcessorSearchApiPage::processInbound() stack. Skipping this path processing may have some regressions that we are unaware of at this point. However I am happy to add this patch to this issue, if you would like.
Cheers.
| Comment | File | Size | Author |
|---|---|---|---|
| #7 | 3118168-search-api-page-processor.patch | 716 bytes | swentel |
| #3 | search_api_page-n3118168-3.patch | 2.03 KB | damienmckenna |
Issue fork search_api_page-3118168
Show commands
Start within a Git clone of the project using the version control instructions.
Or, if you do not have SSH keys set up on git.drupalcode.org:
Comments
Comment #2
max-ar commentedComment #3
damienmckennaWould this achieve the same results? It skips a lot of the logic if clean URLs is disabled on the pages.
Comment #5
damienmckennaThe tests fail because of #3247781: Database update fails, database schema problem on latest release.
Comment #6
neclimdulFor what its worth, this looks like a known performance bug in xmlsitemap. #3132913: Allow different uri protocols for more effecient url generation The fix is pretty straight forward I think but its complicated enough its been stalled for a couple years.
Comment #7
swentel commentedWe were bitten by this one too. Added a static variable in getSearchApiPagePathsUsingCleanUrls(). This saved us around 7000 cache calls on a (granted, quite expensive) page .. :)
Comment #10
karlsheaDoes this work? It incorporates both ideas.
Comment #11
swentel commentedworks for me
Comment #13
karlshea