Problem/Motivation
When processing a queue, it's common to run into situations where a single item cannot be processed. For example, perhaps an upstream URL is 404'ing, while other URLs in other queue items work fine. If the worker throws a SuspendQueueException, or nothing at all, all items end up blocked on the single failing item.
In the HTTP 404 case, the error is often transient and will solve itself, in which case trying again later provides a near seamless experience. In other cases (such as a broken queue item), at least other items in the queue won't be blocked from processing.
Proposed resolution
Let's allow queue processors to recreate items at the bottom of a queue, and allow queue workers to throw an exception indicating that the individual queue item is broken and should be recreated.
Remaining tasks
User interface changes
None.
API changes
reCreateItem
becomes a new interface method required for queue implementations. This doesn't violate our b/c policy today as the interface isn't marked as @api
, but if that was a concern we could create a new RecreateQueueInterface
for the new method.
Data model changes
None.
Comment | File | Size | Author |
---|---|---|---|
#80 | interdiff-79-80.txt | 640 bytes | omarlopesino |
#80 | 1832818.80-requeue.patch | 10.5 KB | omarlopesino |
#63 | 1832818.63-requeue.patch | 6.91 KB | deviantintegral |
#59 | 1832818.59-requeue.patch | 9.14 KB | deviantintegral |
#58 | 1832818.58-requeue.patch | 6.18 KB | deviantintegral |
Comments
Comment #2
Berdirqueue_postpone.patch queued for re-testing.
Comment #3
neclimdulmoveToTheEnd is consistent with the implementations given but is it the correct terminology for what a generic queue would do? it seems more in line with the reliable interface.
Very cool though and a useful feature.
Comment #5
chx CreditAttribution: chx commentedAn unreliable queue could do what the memory queue does and what the comments indicate: delete and requeue. Because it's unreliable if the item gets lost, oh well.
Comment #6
chx CreditAttribution: chx commentedI think this will do it.
Comment #7
chx CreditAttribution: chx commentedFiled #1837118: UPDATE foo SET bar=(SELECT...) is not supported for the database part.
Comment #8
chx CreditAttribution: chx commentedThis is using the RTBC version from the DB issue. Should be ready.
Comment #9
neclimdulLooks pretty solid and with tests. I like the explicit weight instead of the implicit weighting by date a lot. Lets bump it up.
Comment #10
BerdirI agree with RTBC but note that this depends on #1837118: UPDATE foo SET bar=(SELECT...) is not supported, which has been updated in the meantime. The easiest way might be to RTBC and commit the other issue and then re-roll this one.
Comment #11
neclimdulFair point, the old version of that fix is included. per IRC discussion postponing this but the relevant code is RTBC by everyone involved pending the commit of the listed fix.
Comment #12
neclimdulwoot, back in action. needs reroll
Comment #13
patrickd CreditAttribution: patrickd commentedreroll here
Comment #14
BerdirPatch in #13 is identical to the one in #8 minus the already commited UpdateQuery related change. Back to RTBC.
Comment #15
chx CreditAttribution: chx commentedReuploading the patch -- it passed but it didn't get updated to green so let's be clear.
Comment #16
catchI don't understand why we have to do all these tricky queries instead of just SELECT then UPDATE, and the code comments don't explain why either (they explain why we can't do other tricky queries but that's not the point). If the issue is reliability then why isn't a transaction and SELECT .. FOR UPDATE enough?
Comment #17
chx CreditAttribution: chx commentedOK, I have rewritten the patch from ground up basically. Oh well. At least we added some useful stuff to DBTNG.
Edit: http://stackoverflow.com/questions/13605874/is-this-query-race-condition...
Comment #19
chx CreditAttribution: chx commentedEh, the update.inc has was not supposed to be there.
Comment #21
chx CreditAttribution: chx commentedComment #23
chx CreditAttribution: chx commentedType in update function.
Comment #24
Lars Toomre CreditAttribution: Lars Toomre commentedif this gets re-rolled again, can you please add a type hint to '@param $item' in the interface class? Thanks.
Other minor documentation issue is that there needs to be a '@throws' directive in any method docblock that throws an exception.
Comment #25
Grayside CreditAttribution: Grayside commentedI think of moveToTheEnd as reEnqueue. Then you can go implement a "LIFO/Stack" queue without confusion.
Comment #26
colinafoley CreditAttribution: colinafoley commented#23: 1832818_23.patch queued for re-testing.
Comment #28
colinafoley CreditAttribution: colinafoley commentedThis is my first reroll. Because of this I wasn't comfortable changing moveToTheEnd to reEnqueue or even adding any documentation. FWIW, I agree that moveToTheEnd should be renamed to something like reEnqueue.
Comment #30
colinafoley CreditAttribution: colinafoley commentedFixed syntax error within previous patch.
Comment #32
colinafoley CreditAttribution: colinafoley commented#30: 1832818-allow-queue-item-postponement_28.patch queued for re-testing.
Comment #33
colinafoley CreditAttribution: colinafoley commentedRenamed moveToTheEnd() to reEnqueue().
Comment #34
xjmThanks @colinafoley! I found a few minor things in the patch that we can clean up.
Nowadays we can replace these with
{@inheritdoc}
.Super minor nitpick: This should be "Moves" rather than "Move".
We need a data type for
$item
here (and, if it's an object or an array, an appropriate typehint in the method and its implementations).Tagging novice for those cleanups.
Also, we'll need some test coverage, so tagging for that.
Comment #35
msonnabaum CreditAttribution: msonnabaum commentedreEnqueue doesn't make sense if we don't have en enQueue method. We should use reCreateItem unless we want to rename the other method.
We also need better documentation for this method. I look at it and I wonder why we're making an API addition to save the runner two commands. I'm guessing it's for transactional reasons, which we should make clear if so.
Comment #36
Jerome7 CreditAttribution: Jerome7 commentedComment #37
neclimdulComment #38
robmc CreditAttribution: robmc commentedUpdating the patch with issues noted in #34
I also found a few other places {@inheritdoc} seemed appropriate so I have adjusted those as well as another doc issue I found.
Comment #39
robmc CreditAttribution: robmc commentedComment #40
robmc CreditAttribution: robmc commentedWow, sorry about the attach/multi post mess - not used to DO D7 yet =(
Comment #41
Shaika CreditAttribution: Shaika commentedassigned to myself
Comment #42
Shaika CreditAttribution: Shaika commentedHi this is my first patch ever. I'm new to Drupal (discovered it 2 days ago!) and PHP in general (discovered last week! :D)
Attached is a re-roll of the patch from comment 38 with one change (see interdiff.txt)
Comment #44
wiifm42: 1832818-allow-queue-item-postponement_42_all.patch queued for re-testing.
Comment #46
Shaika CreditAttribution: Shaika commentedIntegrated feedback from @msonnabaum and changed reEnqueue to reCreateItem and added comment to describe why this was added in the interface.
Also inserted a \stdClass before ($item).
Comment #47
wiifmThis is great work @Shaika, your patch addresses the concerns that @msonnabaum had at #35, and adds an API addition that is sorely missed in the queueing system.
The docblock tidy ups are also great to see, and I can confirm there are no leftover "Implements" in the concrete classes.
All tests pass, happy to RTBC this (would be good to see this backported to D7 if committed to D8).
Comment #48
neclimdulI like it. Welcome to the community @shaika and thanks for the work on this, you nailed it!
Talked to mark and I think we came to a understanding of what he was asking for and it was a little bit more technical which isn't very novice. At the same time, I had some questions about why we _needed_ this and if it was just a special construct of abstracting our database implementation. With a little research it seems some bigger implementations don't explicitly support this but there might be ways to handle like returning failure in some cases. So I think mark may still be thinking about it but I'm happy with this.
Comment #49
wiifmHey @neclimdul, thanks for taking the time to talk to @msonnabaum, the new comment you added looks like a positive step forward.
I have a +1 for this to be RTBC, I think it would be best if @msonnabaum gave it the RTBC ;)
Comment #50
wiifmRe-marking as RTBC as no-one seems to have any issues with this, removing tag "needs tests" as this patch has tests, also adding tag "api addition" as this is only and API addition (not a change).
Comment #58
deviantintegral CreditAttribution: deviantintegral at Lullabot commentedHere is a reroll bringing this up to 8.7.x. The biggest difference is that the update hook only updates the indexes if the queue table exists. I'm unclear on why we need to change the indexes at all. I'm wondering if it's stale from when this was a
moveToEnd
implementation?Comment #59
deviantintegral CreditAttribution: deviantintegral at Lullabot commentedThis patch actually uses the new
reCreateItem()
method when processing the cron queue. I can split it off into a new issue, but it ended up being so simple that I think it's OK to include here.Comment #60
deviantintegral CreditAttribution: deviantintegral at Lullabot commentedI've updated the issue summary as the prior notes were stale given this issue hadn't been fixed for them.
Comment #63
deviantintegral CreditAttribution: deviantintegral at Lullabot commentedThe test failures were caused by the update hook (which I hadn't run yet), and given that it's not clear the index changes are needed I've rolled them back.
Comment #64
neclimdulI also have questions about the interface changes(addition of type hint). As they seem minimally connected to the patch and I'm pretty sure not BC with existing implementations for most of our supported php versions, they aren't allowed. I think this was targeting even pre-beta requirements so after 5 years this probably needs a really solid review to make sure it really fits what's expected to get into core.
Comment #65
deviantintegral CreditAttribution: deviantintegral at Lullabot commented> I also have questions about the interface changes(addition of type hint)
I was thinking the same thing. It should be fine for PHP 5.5, but I'd be much happier to roll those back and get this issue finished. I'll do that as I fix the current test fails.
Comment #66
deviantintegral CreditAttribution: deviantintegral at Lullabot commentedComment #67
deviantintegral CreditAttribution: deviantintegral at Lullabot commentedComment #68
deviantintegral CreditAttribution: deviantintegral at Lullabot commentedRemoves a spurious newline.
Comment #69
deviantintegral CreditAttribution: deviantintegral at Lullabot commentedOne issue (that I've had to work with every time in site code but forgot in the above patches) is handling when a short queue has failing items in it. #68 will repeatedly try to process the item, even though odds are high it will still fail. This causes an effective while true loop on both the queue to recreate the item, and any database or remote API calls.
This patch fixes that by keeping track of the IDs of items that have been recreated, and if they have been seen in the current cron run leaving them locked. As well, this fixes some missing parameter type docs and returns the newly created item ID in recreateItem.
I wonder if we should rename
reCreateItem
topostpone
. Recreate makes sense for the database queues, but a queue in SQS might want to set a visibility timeout instead. And in those cases, it would make sense for queue workers to indicate how long that unavailable time should be, but I'm hesitant to add that as a method to an exception.Anyways, let's see if this patch passes.
Comment #70
deviantintegral CreditAttribution: deviantintegral at Lullabot commentedAfter discussing with a colleague, I decided to go ahead and rename recreate to postpone. Of course, it's the title of the issue so I think it's clear postponed is a term that will be widely understood.
Comment #71
neclimdulI believe I briefly had a discussion with chx about this when he was posting the patch and the actual reason for re-queue is that the method doesn't provide any actual promise of postponing(delaying the run) of a item. It moves it to the end of the queue but with a queue depth of 0 you'll likely end up with the same loop on the item. This doesn't mean its less useful, only that postpone promises a behavior we're not documenting where "requeue" explicitly does.
Comment #73
m4oliveiThe way postpone is implemented in #70, doesn't that make throwing
PostponeItemException
and any otherException
(that isn'tRequeueException
orSuspendQueueException
) functionally equivalent?At least for Database and Memory queues. They both will set a non-zero expiry value in their
claimItem
method. That expiry will be left alone in the case of any other exception from the queue worker (https://git.drupalcode.org/project/drupal/blob/8.8.x/core/lib/Drupal/Cor...). That effectively excludes them from being picked up again until the expiry passes (reset to 0 bysystem_cron
). Meaning that if a queue worker wishes to postpone, it can just throw a custom exception and ensure that a reasonable value forcron.time
is set in it's annotation. With the caveat that drush currently doesn't have the final catch that core cron has, so if your runningdrush queue:run
it won't work. But that's a bug (https://github.com/drush-ops/drush/issues/4032) and if you want to usedrush queue:run
with this patch, you'd need Drush patched to support the samePostponeItemException
.Comment #74
joelstein CreditAttribution: joelstein at On Fire Media commentedI have a scenario where a postponed queue item would be ideal, but not the in the way this issue describes it.
For example, we have a system which sends text messages to users, but we don't want these notifications to be sent during sleeping hours, even if the action which triggers the notification happens during those hours.
I wish there was a way for the Queue API to simply ignore queue items until a specific date in the future. These would not be processed until the postponed date.
Is there any consideration for something like this? If not, I suppose I could look at the [Advanced Queue](https://www.drupal.org/project/advancedqueue) module, which seems to have a delayed processing feature.
Comment #75
deviantintegral CreditAttribution: deviantintegral at Lullabot commentedI think this is something you'd want to handle before outside of the queue system. For example, while SQS has support for delayed sending, it's only a maximum of 15 minutes: https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGui...
If your case doesn't involve time zones, you perhaps you could suspend processing of the whole queue (but again, different backends may not support that).
Comment #76
joelstein CreditAttribution: joelstein at On Fire Media commentedAh yes, I forgot to mention why I don't want to throw a SuspendQueueException. When you throw a SuspendQueueException, it logs an error in the database log. If I have cron running every minute, it would quickly write lots of noisy logs for what is basically not an error. :(
Comment #78
thtas CreditAttribution: thtas commentedRe-rolling patch against 8.9
Comment #79
omarlopesinoHello. Thanks for the patch! I also need this.
I had just one problem with patch #78. When I throw PostponeItemException the item is expired in queue. So when I run cron again the item is not processed anymore.
The problem is produced by the $queue->claimItem() behaviour:
Even we postpone this item in the bucle, as we have claimed it, the item is already expired and it won't be processed again.
The solution to the problem is postponing the item, not when the exception is captured, but after all the queue items have been claimed. Attaching a patch that does it.
Please review, Thanks!
Comment #80
omarlopesinoAnother patch to fix tests. It was being check that the next item is false after postpone exceptions, but now it will contain an item, that will be processed in the next cron execution.
Comment #84
halthHasn't this been fixed here? https://www.drupal.org/project/drupal/issues/3116478
Not super sure if it addresses the same use case, so could anyone confirm?
Comment #85
BR0kENThis is a duplicate of #3116478: Add a way to silently keep an item locked when processing a queue via cron from my perspective. The committed implementation is also better in perf as doesn't store items in memory.
Comment #86
BR0kENremoved
Comment #87
BR0kEN