We are facing PHP Fatal error: Allowed memory size of 536870912 bytes exhausted
when we click "Rebuild" in our site.
In our current setup, we use groups and each group has its own sitemap.xml created with a custom variant. Currently, we have 39 variants. Some of them have 30k URLs and some have less than 10k.
The site crashes with the Allowed memory size error and the queue size usually is at 465k, 475k.
To work around this issue we considered using drush to rebuild the variants (10 per run) and then generate the items using the UI (batch) or drush.
However, every time that we call drush simple-sitemap:rebuild-queue --variants=group_x,...,group_y
the queue is reset.
So, the workaround for the workaround is to rebuild-queue for some variants, generate sitemaps for these items, rebuild for other variants, generate again...
Is this expected? Why does the queue gets reset? Shouldn't a rebuild be added on top of other rebuild requests?
Comment | File | Size | Author |
---|---|---|---|
#27 | 3180776-27-rebuild-queue-decouple.patch | 1022 bytes | kyuubi |
| |||
#23 | 3180776-23-rebuild-batch-support.patch | 5.94 KB | kyuubi |
| |||
#18 | 3180776-16-rebuild-queue-decouple.patch | 1020 bytes | kyuubi |
| |||
#17 | 3180776-16-rebuild-queue-decouple.patch | 945 bytes | kyuubi |
#8 | simple_sitemap-chunked_rebuild-3180776-8.patch | 7.69 KB | maximpodorov |
|
Issue fork simple_sitemap-3180776
Show commands
Start within a Git clone of the project using the version control instructions.
Or, if you do not have SSH keys set up on git.drupalcode.org:
Comments
Comment #2
gbyte CreditAttribution: gbyte as a volunteer and at gbyte commentedThis is the first time I hear of the rebuild process failing due to memory exhaustion. I believe the solution to this problem will be to implement a batch process for the queuing, similar to how the generation process is implemented.
Regarding your first workaround: This is by design - it's called 'rebuild queue' because it rebuilds the queue. Most of the time this is what you want - rebuilding the queue for all or specific variants and generating the sitemap. If you want to queue elements on top of queued elements, you will need to utilize
\Drupal\simple_sitemap\Simplesitemap::queue
instead of\Drupal\simple_sitemap\Simplesitemap::rebuildQueue
The former however is not wrapped into a drush command yet (because there was no use case until yours).Feel free to create a feature request to add that drush command, until then you can use
drush php-eval "\Drupal::service('simple_sitemap.generator')->setVariants(['default', 'test'])->queue();"
Comment #3
kyuubi CreditAttribution: kyuubi commentedHi @gbyte,
I think you're right, the rebuild process should be batched.
However I disagree with this being treated as a feature request.
Unless it's clearly stated this module is designed with limitations around a specific number of items, this should be considered a bug.
Looking at the implementation for the rebuild process, there is a clear cap given it loads everything into memory before pushing to the queue for later generation (correct me if I'm wrong).
In regards to the rebuild queue I think that makes sense, but I would suggest (this would be a feature request) a total counter for indexed items exists in the UI. The current one can become confusing if per variant rebuilds are done and while now that you've clarified things it makes perfect sense, I was personally confused to see that counter constantly change with per variant rebuilds.
Thanks again for the great module and responsive support!
Comment #4
carolpettirossi CreditAttribution: carolpettirossi at Prosple commented@gbyte here's a patch to make possible rebuild the queue using batch. It is using the same approach as generateSitemap.
However, rebuilding the queue on cron won't use batch as per cron limitations. This means that for cases like ours, where the site has big amount of variants with lots of URLs the "Regenerate the sitemaps during cron runs" won't work.
We decided to:
use
\Drupal::service('simple_sitemap.generator')->setVariants(['default'])->queue();
use
\Drupal::service('simple_sitemap.generator')->generateSitemap(QueueWorker::GENERATE_TYPE_CRON)
However, when calling
generateSitemap
, if the queue is already processed/completed, it will also call rebuild due to the following code onQueueWorker::generateSitemap
A function that generates the sitemap should only take care of generation and not generation and rebuild. I imagine this was added there because generating sitemap on cron runs need to also rebuild, so I moved the code to
Simplesitemap::generateSitemap()
Could you please review the attached patch and give us feedback?
Really appreciate your time building this great module.
Thank you.
Comment #5
carolpettirossi CreditAttribution: carolpettirossi at Prosple commentedAdding missing 'static' to
doBatchQueue
Comment #6
gbyte CreditAttribution: gbyte as a volunteer and at gbyte commentedThanks for the patch.
That's why we might need an approach similar to what URL generation does - a queue that will be used by cron and batch:
That queue can be used by cron as well as by the batch process. This solution is not too trivial though, as parts of the UI will have to be updated as well as some logic: Right now a regenerating sitemap is defined as one where there are still queued elements to be processed. After implementing the above, should also be defined as regenerating, if the first queue is processed that queues each variant.
If you feel you want to tackle this, go ahead; I don't think I will be tackling this until there is a plan for 4.x.
Comment #7
maximpodorov CreditAttribution: maximpodorov commentedEven a single variant can be huge: imagine millions of nodes of the same bundle and low amount of memory. I face "memory exhausted" problem with 1 million nodes and 256MB of memory. The method which calculates data sets (getDataSets) should be able to return chunks, not the whole data set array. It is possible to pass $chunk_no parameter to it, but this will break BC, so it's better to add new methods:
supportsChunkedDataSets()
getDataSetsChunk($chunk_no)
QueueWorker::queue() will check whether URL generator supports chunks and then call either getDataSets() as before or getDataSetsChunk() until it returns empty set.
Comment #8
maximpodorov CreditAttribution: maximpodorov commentedSomething like this can be used to get data sets in chunks.
Comment #9
gbyte CreditAttribution: gbyte as a volunteer and at gbyte commentedPlease create a merge request against the newest dev and let's colaborate on this using the new tools.
Comment #11
maximpodorov CreditAttribution: maximpodorov commentedhttps://git.drupalcode.org/project/simple_sitemap/-/merge_requests/2/diffs
Comment #12
kyuubi CreditAttribution: kyuubi commentedHi all,
I wasn't able to make @maximpodorov patch work, in the sense that rebuilding the queue still ends up in an OOM.
@carolpettirossi approach works fine however it has the issue of the counter. With this approach the counter constantly increments as rebuildQueue is never specifically called. I think I can solve this by calling in my code (the one that prepares all the variants to be rebuilt) the `deleteQueue` method. This method sets `simple_sitemap.queue_items_initial_amount` to zero, which is necessary as the subsequent `queue` commends will add items to that counter.
The process then is basically like this:
I'm still running some tests but this seems to run just fine in a very large scale platform that powers hundreds of sites with several millions of records.
The only changes to the sitemap module codebase are the ones provided in @carolpettirossi's patch.
Obviously this requires you to use this module more as an API which I think is both advised and showcases the powerful nature of this module (congrats again on this @gbyte, really solid work).
I see there are some conflicts with the new release so I'll need to roll a new patch once I've tested this more solidly.
Thoughts and comments most welcome.
Cheers
Comment #13
kyuubi CreditAttribution: kyuubi commentedAlso I should add I didn't test the batch UI process in that patch.
In fact I think that should probably be a separate patch, the other changes are more critical in allowing developers to leverage the simplesitemap API in order to get something like the above done for large scale sites (and I don't see any major downsides).
This could allow you to tackle the rebuilding process itself in a separate issue for 4.x without tackling it now.
Effectively without this this would mean the only thing we change is moving this code
out of generateSitemap in the QueueWorker (which not ideal due to the reasons mentioned above and move it inside generateSitemap in Simplesitemap.php (exactly as the existing patch) so it's a very small change, easy to review and should have minimal impact.
Let me know what you think!
Comment #17
kyuubi CreditAttribution: kyuubi commentedSubmitted Merge request and adding patch here for those that need it.
Basically I'm just moving out that call as explained above. The end result is the same but less coupled, which allows for greater flexibility.
Comment #18
kyuubi CreditAttribution: kyuubi commentedSorry please ignore the previous patch.
Gitlab is confusing.. When I created a fork branch from this issue, for some reason it missed a bunch of commits and so the diff in the patch above is completely incorrect. The merge request should still be fine.
Nevertheless here is a patch produced the old school way.
Comment #19
kyuubi CreditAttribution: kyuubi commentedComment #23
kyuubi CreditAttribution: kyuubi commentedAdding the patch here for the patch queue rebuild only.
This allows the rebuild of the queue via batch in both UI and drush and fixes an issue on comment #5 where the variants being provided were being ignored.
Comment #24
kyuubi CreditAttribution: kyuubi commentedJust to provide a final summary:
Comment #25
maximpodorov CreditAttribution: maximpodorov commentedThis is not enough, see #7. Even a single variant can be too huge to fit in memory.
Comment #26
kyuubi CreditAttribution: kyuubi commentedShould we combine the batch process with the chunked data to mitigate all scenarios? The path you provided isn't enough either. With a lot f variants, we still go OOM.
Comment #27
kyuubi CreditAttribution: kyuubi commentedUpdating the patch to pass the QueueWorker GENERATE_TYPE_FORM to the downstream method rebuildQueue()
Comment #28
dmitry.korhovEven with 3 patches #8, #23, #27 applied via composer-patches issue still occurs:
What is strange is that fact so we have increasing memory consumption after each batch: 215 => 219 => 226 => 227 => 250 until 256M is reached.
Seems that we need to unset some variables to free the memory or functionality should be refactored e.g. to use generators instead of loading whole array into memory.
Consumption of memory just by bootstrap and drush:
upd:
Created separate issues as issue that I've found is not related to variant chunks:
Comment #29
gbyte CreditAttribution: gbyte as a volunteer and at gbyte commented3.x is at feature freeze and 4.x has been deemed good enough for non-api-dependant use cases. Let's test the necessity of this in 4.x and implement if necessary.
Comment #30
huzookaRe #7:
This is the issue we ran into (we have more than 5 million nodes after a D7 migration).
In our case, even the query cannot be executed because the statement itself consumes all the world's memory.