If I edit 1 page multiple times before the purge queue has been processed I can end up with duplicate invalidations in the queue.
When an item is enqueued it should be checked for dups. If a duplicate exists it should not be enqueued.
| Comment | File | Size | Author |
|---|---|---|---|
| #14 | 2851893-14.patch | 3.02 KB | hchonov |
| #14 | interdiff-13-14.txt | 967 bytes | hchonov |
| #13 | 2851893-13.patch | 3.06 KB | hchonov |
| #12 | duplicate_purge_tags.patch | 3.86 KB | santhoshkumar |
Issue fork purge-2851893
Show commands
Start within a Git clone of the project using the version control instructions.
Or, if you do not have SSH keys set up on git.drupalcode.org:
Comments
Comment #2
pedrop commented+1 for this, we are struggling with too long queues and this would be a significant help. I'm seeing the same urls a lot of times in the queue.
Comment #3
nielsvm commentedThis is not a
purge_queuer_urlproblem as all it does is consuming Purge's API's, it has also no way of finding out if something was queued before or not. This makes it a genericpurge-problem which may be the only place to fix it, but there are serious performance risks here if we're going to precheck the queue for each set of (group) inserts we're going to make.Moving to the purge project, I'll look at a potential fix later.
In the meanwhile for users annoyed by this: process your queues more, use the late runtime processor for isntance!
Niels
Comment #4
rbayliss commented+1 for this, although I can certainly see how it would be difficult to implement properly.
Comment #5
wim leersHah, I talked to @nielsvm in chat yesterday about exactly this!
I did something similar for Fileconveyor: https://github.com/wimleers/fileconveyor/issues/68 -> https://github.com/wimleers/fileconveyor/blob/master/fileconveyor/arbitr....
Comment #6
jonhattanI've created a module that provides a database queue that avoid creation of duplicate items - https://www.drupal.org/project/purge_queues
Comment #7
hanoiiInteresting this hasn't come up more often, and also went looking for this with purge_queue_url as it adds a lot of URLs to it. Module on #6 seems to work and I am currently using it. Thanks although I push for something like this added to the module somehow.
Comment #8
inversed commentedCould this be related to issue #3034525 "Clean up duplicate cache tags created by invalidation tokens"? Note that there's also the #2952277: Minify the cache tags sent in the header issue.
Comment #9
rosk0Thanks a lot for the
purge_queuesmodule Jonathan!That's a real game changer! My queue was growing to millions over-pacing purge cron job running every minute. Local tests are great , will see what it would look like on prod.
I believe that
purge_queuesmodule could be a great addition to thepurgeitself.Comment #10
ericgsmith commentedWe have been investigating performance issues caused by duplicate items when using purge in combination with
purge_queuer_urlmodule.We have encountered issues in 2 areas - 1. duplicates in the buffer and 2. duplicates in the queue.
Duplicate items in the buffer
I can see that when an invalidation is created in the
InvalidationsServiceit is using ainstanceCounterto generate a unique integer ID for the invalidation object. When added to the buffer the buffer is callinghasto see if that ID has been added to the buffer already.Queuers seem to make some attempt to reduce duplicates, e.g by filtering out previously requested tags - but certain situations such as config importing can trigger thousands of duplicates into the buffer, which can lead to high memory consumption.
While I have been looking at this through the context of just the url/path queuer - I wonder if it would be possible for the queuers themselves could set either an id or another property on the invalidation that can be used to dedupe it. E.g - the url registry maintains a list of urls, so the url id could be considered unique. Individual cache tags could also consider themselves unique. Possibly other plugins may have difficulty determining their uniqueness, but opening up the possibility to set id or fallback to an instance counter could help plugins where this is problematic (e.g. the url queuer) to be more efficient.
Without looking through all the code, I would be interested in the maintainers thoughts as it appears the use of
getIdon the invalidation plugin is (according to my IDE) mainly through the buffer and tests.Would there be any reasons against
That would then allow queuers to make changes to provide a unique value when creating an invalidation, and the existing buffer deduping code may not need to change.
Duplicate items in the queue
We are using the module @jonhattan provided - but the checks for duplicate items can be problematic for repeated large updates (e.g in our case it was multiple batch calls that each invalidated the
media_listtag)@RoSk0 raised an idea (offline) of storing an unique identifier for a queued item to make use of upsert queries instead of insert queries using a database queue. We have a proof of concept doing using by hashing the type and expression value of the data, but it would be easier with an enforced / persisted unique ID for an invalidation item. We would be interested in any thoughts on this approach.
Comment #11
o'briatI confirm that duplicated invalidation occur when Drupal is importing or update regularly large volume of content.
A simple solution could be to delete all identical "data" when purging an item?
Or just add a global duplicate deletion at the end of every purge, here's some pseudo code:
Comment #12
santhoshkumar commentedWe have identified similar kind of issue when using purge_queuer_coretags module, there are 2 issues we identified as below
To fix the issue we have added the patch duplicate_purge_tags.patch, In this patch we have DB lookup before insert into purge_queue also maintained the array in static array to prevent multiple database calls for same tag.
Comment #13
hchonovRe-roll.
Comment #14
hchonovFixed issue in the query logic as search for the cache tag "paragraph_list" was returning items like "paragraph_list:text" too. After testing this I can confirm that no duplicate queue items are created anymore that drastically reduces the queue length for us.
Comment #15
hchonovTurns out the patch does not work for new site installations as it queries the database purge_queue table before it is created. We are switching simply to the unique queues provided by https://www.drupal.org/project/purge_queues.