Problem/Motivation

Hi,

I have the latest Drupal 8, with Purge module & Cloudflare module.

Suddenly I get this error. I tried everything I could find online to fix this, but with no result :

Purge: Queue size
157998
Your queue exceeded 100 000 items! This volume is extremely high and not sustainable at all, so Purge has shut down cache invalidation to prevent your servers from actually crashing. This can happen when no processors are clearing your queue, or when queueing outpaces processing. Please first solve the structural nature of the issue by adding processing power or reducing your queue loads. Empty the queue to unblock your system.

purge issue 100000 items

How can I fix this please ?

Thanks

Steps to reproduce

- Have a queue exceed the 100K limit

Proposed resolution

- [x] Implement an override using state system (drush sset purge.dangerous TRUE (70b34944)
- [ ] Make the 30K and 100K queue limits configurable
- [ ] Don't create duplicate entries #2851893: Deduplicate Queued Items

Remaining tasks

User interface changes

Introduced terminology

API changes

Data model changes

Release notes snippet

Comments

Ananda created an issue. See original summary.

emersonreis.dev’s picture

Hi Ananda,

You can use the following drush command to empty the queue:

drush p:queue-empty

That will basically remove all the items from the queue, I'd suggest purging everything on Cloudflare manually after doing that as you probably needed the tags that were on the queue purged.

Then, you need to find the actually reason for that queue to be so big, first question: are you processing the queue? There are a few ways to do that, but you should know at this point, if not, let us know.

The second thing you can consider: do you have a content type or entity type that is updated a lot? I believe this is my case but I don't know yet if I can ignore certain entity/content types from the purge.

emersonreis.dev’s picture

Status: Active » Postponed (maintainer needs more info)
ananda’s picture

Hi,

Thanks for the quick reply.

Drush works perfectly, but
When I enter that command it doesn't work, and I can't clear the queue.

it gives this error:

The drush command 'p:queue-empty' could not be found.  Run `drush    [error]
cache-clear drush` to clear the commandfile cache if you have
installed new extensions.
emersonreis.dev’s picture

Have you tried the command that the error message suggests? That usually works, if it doesn't, you need to check if you have the purge_drush module installed.

ananda’s picture

You GENIUS,
thank you so much.

I upgraded drush from 8 to 9 with this command :

php composer.phar require drush/drush:^9.0

Then I used your awesome command to clear the queue:
drush p:queue-empty

Now the problem is solved, it was my fault due to outdated drush :)

emersonreis.dev’s picture

That is good news, but you need to monitor the queue size to make sure it does not grow to more than 100 000 items again.

There are a lot of ways to do this, you could start with checking if there are any entity saves that could be skipped, as in if nothing has changed then no need to save the entity, because whenever you save an entity/content a few tags get added to the queue. If you have 1000 entities/content updated every day you will end up having 100 000 again in a few weeks.

You could potentially debug the method invalidateTags from the file purge/modules/purge_queuer_coretags/src/CacheTagsQueuer.php to see the tags that are being added to the queue.

Something else to see is if you are actually processing the queue. If you are running the Drupal CRON often, then you will be processing the queue if the Cron processor is enabled at /admin/config/development/performance/purge

You can also use the drush command drush p:queue-work and run that every 5 minutes or something like that as a CRON job as well.

avpaderno’s picture

Issue tags: -massive queue, -100000 items, -queue exceeded 100000, -cache invalidation shut down
simohell’s picture

Just to note, that some versions of drush have the command with colon instead of a dash.
drush p:queue:empty

vuil’s picture

The updated Drush to 9.x (from 8.x) resolved the problem with the non executed drush p-queue-work command.

louisnagtegaal’s picture

Hello,

Is there any progress on the underlying cause of this? Using drush to empty the queue is of course a kind of a solution but I have the same problem (with Varnish) and when I query the database I see that every few seconds a large number of items are added to the queue and this results in more items than are cleared on cron.

I understand that due to the complexity of this and the large range of possible configurations it may not be possible to give a generic solution to this, but any pointers to how to debug this are more than welcome.

vuil’s picture

Please test if drush p-queue-work (or other) empty the queue, you have to add "automatic cron job" to execute it regular.

didebru’s picture

Non of the commands worked for me.

 drush p:queue-empty

                                   
  The namespace "p" is ambiguous.  
  Did you mean one of these?       
      php                          
      pm                           
      pathauto.                    
                                   

I'm on drush 11.1

Ah you have to install the purge_drush module that worked for me cheers!

fabianfiorotto’s picture

You have to enable "Purge Drush" module.

kyberman’s picture

Version: 8.x-3.0-beta9 » 8.x-3.x-dev
StatusFileSize
new4.37 KB

Hi everybody!

For a very specific use case (e.g. a lot of nodes being updated at once), the queue processing shutdown could cause more trouble than leaving it workable and letting it process at least partially, I would say. In my case, CloudFlare API allows to purge 30 items at once, multiplied by 2000 possible requests daily. If the queue grows to over 100 000 items quickly, the purging process is stopped immediately. That means there are potentially around 60 000 items that could be processed before the CloudFlare API limit is exhausted.

The idea this patch brings is to never stop the queue processing, but instead, there is an error logged after the queue grows to 30 000 items, so there is time to recognize and fix the possible issue. Could you please review and comment on this?

This could be a settings/config/state/hook to override the default 100 000 items limit.
Another idea is to enqueue the item only if it doesn't exist yet. Any thoughts?

Thank you
Vit

achap’s picture

Status: Postponed (maintainer needs more info) » Active

Just want to chime in and say this issue has affected me too when running migrations. We often run migrations that can take a few hours or more. I stepped through the code and think I discovered what's happening. In Drupal\purge\Plugin\Purge\Queue\QueueService::add invalidation tags are not added to the queue straight away but rather to an internal buffer. Then at the end of the request (For example a long running cron job or drush script) in Drupal\purge\Plugin\Purge\Queue\QueueService::destruct it looks like the items from the buffer are finally committed to the queue.

The problem is, during the whole time the migration is running none of the invalidation tags that are generated by the migration can be processed by any of the purge processors. They are all dumped at once at the end of the migration which usually results in being over the 100k limit.

Not sure what the fix is but that seems to be the root cause of the issue at least for us.

achap’s picture

Our workaround for the above was to re-architect our migration using the Queue API to process 1 item at a time, and give our queue worker a cron lease time of 1 hour (same as cron run interval). This way the buffer is emptied once per hour at least and it doesn't overwhelm the purge queue. Hope that helps someone.

rbrownell’s picture

This error baffles me. I understand that architecture is the normal solution, but it can't be if the business requirements of the project require timely and rapid updating of a large volume of nodes/pages.

Please correct me if I am wrong, but it is my understanding that queues are supposed to help prevent server crashing by regulating the volume of data being sent to whatever system is receiving it. This would presumably occur in smaller batches instead of all at once. The fact that the queue stops processing after reaching a certain threshold suggests to me that the queue is not really a proper queue that processes things in smaller batches, but rather a dumping ground which is then sent all at once. There's got to be a better way of handling this than just stopping everything. There are mechanisms that can be added to reduce servers from crashing based on data volume.

japerry’s picture

Status: Active » Needs review

Typically this error is probably occuring if cron is misconfigured (or not configured), or during a migration or other process where lots of invalidations are happening at once.

To counter this edge case, I added a new flag to the state system called purge.dangerous -- if you set this in settings or with drush sset purge.dangerous TRUE then you should be able to have the purger run with over 100,000 items in the queue.

sravalji’s picture

To clear purge queue
drush p-queue-empty

Add processor
drush p:processor-add drush_purge_queue_work

xurizaemon’s picture

That change mentioned in #19 should be available as of 8.x-3.5. The commit doesn't show in this issue as the commit omits the "Issue #3132524" subject. Looks like the fix was in 70b34944.

xurizaemon’s picture

We have a site that is periodically affected by this issue. When investigated, we observe that a single entry in purge_queue table has spiked beyond the 100K limit, which blocks all but manual queue flushes for future operation.

sql> select distinct count(*) as count, data from purge_queue group by data order by count desc limit 10

+---------+------------------------------------------------------------------------------------------+
| count   | data                                                                                     |
+---------+------------------------------------------------------------------------------------------+
| 1305886 | a:4:{i:0;s:3:"tag";i:1;a:0:{}i:2;s:31:"config:views.view.media_library";i:3;a:0:{}}      |
|    4945 | a:4:{i:0;s:3:"tag";i:1;a:0:{}i:2;s:36:"simple_sitemap:example-org-sitemap";i:3;a:0:{}}   |
|    4945 | a:4:{i:0;s:3:"tag";i:1;a:0:{}i:2;s:36:"simple_sitemap:example-com-sitemap";i:3;a:0:{}}   |
|    4687 | a:4:{i:0;s:3:"tag";i:1;a:0:{}i:2;s:34:"simple_sitemap:example-net-sitemap";i:3;a:0:{}}   |
|    4382 | a:4:{i:0;s:3:"tag";i:1;a:0:{}i:2;s:26:"simple_sitemap:example-biz-sitemap";i:3;a:0:{}}   |
|    4276 | a:4:{i:0;s:3:"tag";i:1;a:0:{}i:2;s:31:"simple_sitemap:example-xxx-sitemap";i:3;a:0:{}}   |
|     188 | a:4:{i:0;s:3:"tag";i:1;a:0:{}i:2;s:9:"node_list";i:3;a:0:{}}                             |
|     176 | a:4:{i:0;s:3:"tag";i:1;a:0:{}i:2;s:19:"config:webform_list";i:3;a:0:{}}                  |
|     176 | a:4:{i:0;s:3:"tag";i:1;a:0:{}i:2;s:23:"webform_submission_list";i:3;a:0:{}}              |
|     131 | a:4:{i:0;s:3:"tag";i:1;a:0:{}i:2;s:9:"file_list";i:3;a:0:{}}                             |
+---------+------------------------------------------------------------------------------------------+
10 rows in set (20.22 sec)

Formatting above not easy to scan, but the critical detail (to me) is that there are 1305886 entries in the purge queue to flush the single data point config:views.view.media_library. I'm not sure if it makes sense to have more than one entry for any given data value in this table?

If others are observing this issue, I'm interested to know if executing the query above on their site reveals a similar profile - ie that when grouped by data column, the entries in purge_queue are heavily dominated by a single value of data.

avpaderno’s picture

Title: your queue exceeded 100 000 items ! Purge shut down » "Your queue exceeded 100 000 items! Purge shut down"
o'briat’s picture

Have a look to the patch of the "Deduplicate Queued Items" issue

mlncn’s picture

As far as the error message, clearer problems could possibly be surfaced directly— presuming an unconfigured or misconfigured connection to the external CDN can be the cause of this?

A recent example had, when spelunking down to the "Purge queue browser" by following one of the options for "Database" at the bottom of /admin/config/development/performance/purge, seven pages of "Failed" URLs and 115,820 pages of "New" URLs. Probably this situation of zero successes (if that is a correct reading) warrants a clear message about that?

o'briat’s picture

The problem could also legitimately occurred when massive import/update batch are executed regularly.

The module could provide an option that purge all cache (drush p:invalidate everything -y) and empty the queue (drush p:queue-empty) ?

xurizaemon’s picture

Issue summary: View changes
o'briat’s picture

The current "processing rate" could also be displayed to add a clear message, ex:

"This can happen when no processors are clearing your queue, or when queueing outpaces processing. Please first solve the structural nature of the issue by adding processing power or reducing your queue loads. Empty the queue to unblock your system. The current processing rate (@current_rate clearing requests/s) is lower than the queue growth one (@growth_rate new item to clear/s)."

bceyssens’s picture

Reapplied patch #15 to version 3.6