We split up our sitemaps in different variants (content which is being added frequently vs. "stale" content). We have lots of those "stale" content, like about 90% of all content.

So what we'd like to do is regenerate the sitemaps for frequently added content daily and the others maybe just once a week or so, because it just takes too long to regenerate all the sitemaps for 290k URLs.

A variant option for the drush ssg command would be great.

And in general: are there any tips on how to configure mysql/php/module-settings to increase the speed of sitemap generation?
Currently the generation is executing since 2 hours and we're only 14% done...

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

alex0412 created an issue. See original summary.

gbyte’s picture

I fully support this request and I believe there is a todo in code to implement this. It is not too high on my priority list though, as I don't have a use case ATM. Feel free to submit a patch to speed up this issue.

And in general: are there any tips on how to configure mysql/php/module-settings to increase the speed of sitemap generation?

The only thing I can think of is to increase the PHP memory limit and to max out the Sitemap generation max duration setting, as it also applies to CLI generation. If you try it out, make sure to let me know how much it helps, as I haven't been doing any speed tests for a long time.

alex0412’s picture

Great to hear!
Regarding the settings optimization:
- we had to increase the value for "max_allowed_packet" in the MySQL Settings. Otherwise it lead to errors like "MySQL server has gone away" and others which I don't remember right now.
- experiment with the "MAXIMUM LINKS IN A SITEMAP" setting. First we started out with 2k, but wanted to increase the number so that we don't have that many sub-sitemaps. Then we tried 20k, but that was way to slow. In the end we settled with 7,5k which is just as fast as 2k. In our case the generation went super slow after 8k.. beyond that it generated literally only like 5-10 elements per batch iteration.

Thanks for the "SITEMAP GENERATION MAX DURATION" tip, will try that too!

Right now we're at 2h 30m ca. which is totally ok I'd say for the amount of URLs.

gbyte’s picture

Sounds good, let me know how much you can squeeze out of it. Actually this should be the setting with the most effect.

alex0412’s picture

I've added a simple patch for creation by variants for Drush 9.

Usage Example for generating sitemaps by variant:
drush ssr --variants=default && drush ssg

Explanation:
The easiest way to implement this functionality is to allow rebuilding the queue by variant. At first I tried to add the option to the generate command itself, but then it gets quite complicated.
So the approach is: first you build the desired queue and then you just generate like you'd usually do.

If no options are passed, then everything works just like before (all variants are queued and generated).

What do you think?

alex0412’s picture

Okay I've encountered a flaw.

Assume you have variants A and B:
- A is generated
- you want to generate B now, drush ssr no matter the parameters deletes A sitemaps.

This is of course not what we want. We want to be able to generate sitemaps by variant without affecting the other already generated sitemaps.

alex0412’s picture

This patch should fix the issue described in #6.

gbyte’s picture

Title: Allow regenerating sitemaps by variant » Drush regeneration on a per variant basis

Looking through your patch, you seem to be duplicating existing functionality.

Regenerating specific variants is already implemented when using the module API. Try this:

\Drupal::service('simple_sitemap.generator')
  ->setVariants(['variant_name'])
  ->rebuildQueue()
  ->generateSitemap();

Simplesitemap::setVariants() is used throughout the code base and is the cleaner approach.

I will fix the problem you mentioned with deleting existing instances during queue rebuilding in another issue: #3090249: Do not remove irrelevant sitemap instances when rebuilding queue

This issue should only implement existing API functionality as a drush command.

gbyte’s picture

Now #3090732: Allow adding variants to existing queue without rebuilding it is also implemented along with the new queue() method which can be used in place of rebuildQueue().

This allows for doing the following:

\Drupal::service('simple_sitemap.generator')
  ->setVariants(['variant1'])
  ->queue()
  ->setVariants(['variant2'])
  ->queue()
  ->generateSitemap();

What happens here is, it is now possible to add variants to an existing queue without the whole queue to rebuild itself. This should make that Drush implementation much more flexible.

alex0412’s picture

Okay, I could simplify the patch even further by adjusting only the "ssr" command.

kyuubi’s picture

Very interested in this feature.
I'll test the patch ASAP and report back.

kyuubi’s picture

Status: Active » Reviewed & tested by the community

Hi guys,
This seems to be working just fine (so long as I use the latest dev version so that #6 doesn't happen).
This means we would need a new release of Simple Sitemap before we can start using this.
@gbyte.co any idea when that might happen?
As for the patch simples to be a simple wrapper of the existing API, and no found no issues, so marking RTBC.
Cheers

gbyte’s picture

@kyuubi You don't need a release, you can require a specific dev version of this module plus the above patch with composer.
Regarding when this will land in dev, I will take some time in the next few days to decide if this is enough or if we need new drush commands instead/on top.

kyuubi’s picture

Hi @gbyte.co,

I know that, the patch is fine, my only concern is the module codebase.

I don't like to push dev versions to composer, prefer to use the stable version and apply the patches, however the issue linked that solves #6 doesn't have a patch available.

So I'm fine with keeping this as a separate patch, just wondering when the dev version is released (or if there is a patch somewhere for that change alone I could use).

gbyte’s picture

@kyuubi Your reasoning does not make sense to me.

If you require a specific hash of the dev version, composer will not update the module until a new release is available.
Using the snapshot of a dev version which has all the changes you want is much cleaner than using a release and a bunch of patches on top which will break with a subsequent release. Also dev versions of this here module are quite stable.

But maybe your project has specific requirements that elude me.

  • gbyte.co committed 9ec6d36 on 8.x-3.x
    Issue #3086749 by alex0412, gbyte.co: Drush regeneration on a per...
gbyte’s picture

Status: Reviewed & tested by the community » Fixed

I've fixed the code to actually generate all variants if none were provided, implemented checking of provided variants and the possibility to provide variants as comma separated list.

Usage:
drush ssr --variants=default,test && drush ssg

Please go through the code and double check if the implementation is sound as I am not too familiar with drush command implementations.

kyuubi’s picture

Hi @gbyte.co,

Apologies if it doesn't make sense, let me clarify.

The stable version has been in production for a while, hence it's considered stable.

When there is a fix for a particular issue that we want (like this one) I can add it to the stable release and know exactly what is changing.
This makes risk management much easier as if there are problems with the module they are easier to identify. If we use dev (and ofc we can use a particular hash) then we are introducing way more changes than just the one we need which opens ourselves to risk.

This is because "dev" versions are in their nature not stable and usually treated as integration branches. That might not be the case for Simple Sitemap (maybe you put as much effort in quality assurance and testing on every dev release than you put on a stable release?) but that's not the most common scenario.

This is why most times people apply granular patches until those are released in the new version, at which time you upgrade to the stable version which assumes more production stable code. Again this might not be the case for Simple Sitemap, but it's usually the case for most modules.

Hope that makes sense and thanks again for the effort put into this.

Cheers

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.