queue cron workers can't signal a broken queue [#1524550]

Comment	File	Size	Author
#70	1524550-70.drupal.cron-queue-worker-skip-queue-exception.patch	6.25 KB	joachim

#66	1524550-66.drupal.cron-queue-worker-skip-queue-exception.patch	5.95 KB	joachim

#64	1524550-64.drupal.cron-queue-worker-skip-queue-exception.patch	5.92 KB	joachim

#54	1524550-54.drupal.cron-queue-worker-skip-queue-exception.patch	3.02 KB	joachim

#48	1524550-48.drupal.cron-queue-worker-skip-queue-exception.patch	2.03 KB	joachim

#45	1524550.drupal.cron-queue-worker-skip-queue-exception.patch	1.68 KB	joachim

#40	skip_cron_item-1524550-40.patch	11.8 KB	marthinal

#40	interdiff-1524550-38-40.txt	3.46 KB	marthinal
#38	skip_cron_item-1524550-38.patch	8.66 KB	David Hernández

#36	skip_cron_item-1524550-36.patch	10.85 KB	David Hernández

#36	interdiff.txt	883 bytes	David Hernández
#34	skip_cron_item-1524550-34.patch	10.84 KB	David Hernández

#25	skip_cron_item-1524550-25.patch	10.77 KB	socketwench

#20	skip_cron_item-1524550-19.patch	10.77 KB	subson

#16	skip_cron_item-1524550-16.patch	13.33 KB	superspring

#14	skip_cron_item-1524550-14.patch	13.33 KB	superspring

#12	skip_cron_item-1524550-12.patch	13.37 KB	superspring

#11	skip_cron_item-1524550-11.patch	8.82 KB	superspring

#10	skip_cron_item-1524550-10.patch	7.06 KB	superspring

#7	skip_cron_item-1524550-5.patch	4.25 KB	superspring

#5	skip_cron_item-1524550-4.patch	2.28 KB	superspring

#4	skip_cron_item-1524550-3.patch	2.28 KB	superspring

#2	skip_cron_item-1524550-2.patch	2.28 KB	superspring

Comment #2

superspring CreditAttribution: superspring commented 30 October 2012 at 02:20

Version:	7.x-dev	» 8.x-dev
Status:	Active	» Needs review

File	Size
skip_cron_item-1524550-2.patch	2.28 KB

Here is a patch which implements what is talked about above.

Log in or register to post comments

Comment #3

chx CreditAttribution: chx commented 1 November 2012 at 02:43

Status:	Needs review	» Needs work
Issue tags:		+Needs tests

I like this patch a lot, it's going to be superb useful for the new batch API but it needs a test. And nos is a typo, it needs to be noes :)

Log in or register to post comments

Comment #4

superspring CreditAttribution: superspring commented 1 November 2012 at 02:45

Status:

Needs work

» Needs review

File	Size
skip_cron_item-1524550-3.patch	2.28 KB

Same patch with 'noes' spelt correctly.

I've temporarily put it to needs review for the test bot.

Log in or register to post comments

Comment #5

superspring CreditAttribution: superspring commented 1 November 2012 at 02:48

File	Size
skip_cron_item-1524550-4.patch	2.28 KB

Spell check again

Log in or register to post comments

Comment #6

1 November 2012 at 08:40

Status:

Needs review

» Needs work

The last submitted patch, skip_cron_item-1524550-4.patch, failed testing.

Log in or register to post comments

Comment #7

superspring CreditAttribution: superspring commented 2 November 2012 at 01:08

Status:

Needs work

» Needs review

File	Size
skip_cron_item-1524550-5.patch	4.25 KB

Added a simple test to prove it's functionality.

Log in or register to post comments

Comment #8

chx CreditAttribution: chx commented 2 November 2012 at 04:37

There was no second queue before. Why did that appear?

Log in or register to post comments

Comment #9

2 November 2012 at 05:18

Status:

Needs review

» Needs work

The last submitted patch, skip_cron_item-1524550-5.patch, failed testing.

Log in or register to post comments

Comment #10

superspring CreditAttribution: superspring commented 5 November 2012 at 05:17

Status:

Needs work

» Needs review

File	Size
skip_cron_item-1524550-10.patch	7.06 KB

The ordering of items being added _back_ to the queue after they've been tried is important. The whole idea fails if an item is claimed, tested, unclaimed and then the same item repeats. The second queue ensures that any items that has been claimed will be added back to the end of the original queue.

Here is another patch with more commenting, a speed up and more testing code.

Log in or register to post comments

Comment #11

superspring CreditAttribution: superspring commented 16 November 2012 at 00:41

File	Size
skip_cron_item-1524550-11.patch	8.82 KB

This patch is an improvement to the System queue to remove the need of having a 'second queue' as per @chx and @fiasco's reviews.

Log in or register to post comments

Comment #12

superspring CreditAttribution: superspring commented 19 November 2012 at 02:28

File	Size
skip_cron_item-1524550-12.patch	13.37 KB

This is a new patch using this issue's (#1832818: Allow a queue item to be postponed) patch for the queue's ordering guarantee.
Otherwise no changes.

Log in or register to post comments

Comment #13

chx CreditAttribution: chx commented 21 November 2012 at 02:57

This will be rtbc a) once the other patch is in b) the superflous releaseItem is removed.

Log in or register to post comments

Comment #14

superspring CreditAttribution: superspring commented 21 November 2012 at 03:00

File	Size
skip_cron_item-1524550-14.patch	13.33 KB

As per @chx's review.

Log in or register to post comments

Comment #15

21 November 2012 at 03:05

Status:

Needs review

» Needs work

The last submitted patch, skip_cron_item-1524550-14.patch, failed testing.

Log in or register to post comments

Comment #16

superspring CreditAttribution: superspring commented 21 November 2012 at 03:59

Status:

Needs work

» Needs review

File	Size
skip_cron_item-1524550-16.patch	13.33 KB

Same patch as above with diff fix.

Log in or register to post comments

Comment #17

kerasai CreditAttribution: kerasai commented 24 May 2013 at 08:46

Issue tags:

-Needs tests

#16: skip_cron_item-1524550-16.patch queued for re-testing.

Log in or register to post comments

Comment #18

24 May 2013 at 08:49

Status:	Needs review	» Needs work
Issue tags:		+Needs tests

The last submitted patch, skip_cron_item-1524550-16.patch, failed testing.

Log in or register to post comments

Comment #19

rbunch CreditAttribution: rbunch commented 24 May 2013 at 18:43

In progress... Drupalcon sprint, May 24, 2013

Log in or register to post comments

Comment #20

subson CreditAttribution: subson commented 24 May 2013 at 19:23

Status:

Needs work

» Needs review

File	Size
skip_cron_item-1524550-19.patch	10.77 KB

re-rolling the new patch.

@rbunch - I was working on this issue, re-rolled the new patch. Sorry I forgot to assign it to myself before starting on it.

Log in or register to post comments

Comment #21

25 May 2013 at 01:01

Status:

Needs review

» Needs work

The last submitted patch, skip_cron_item-1524550-19.patch, failed testing.

Log in or register to post comments

Comment #22

subson CreditAttribution: subson commented 25 May 2013 at 19:31

Status:

Needs work

» Needs review

trying to run tests again.

Log in or register to post comments

Comment #23

subson CreditAttribution: subson commented 27 May 2013 at 17:38

Issue tags:

-Needs tests

#20: skip_cron_item-1524550-19.patch queued for re-testing.

Log in or register to post comments

Comment #24

27 May 2013 at 19:21

Status:	Needs review	» Needs work
Issue tags:		+Needs tests

The last submitted patch, skip_cron_item-1524550-19.patch, failed testing.

Log in or register to post comments

Comment #25

socketwench CreditAttribution: socketwench commented 28 May 2013 at 03:02

Status:

Needs work

» Needs review

File	Size
skip_cron_item-1524550-25.patch	10.77 KB

Reroll.

Log in or register to post comments

Comment #26

28 May 2013 at 04:36

Status:

Needs review

» Needs work

The last submitted patch, skip_cron_item-1524550-25.patch, failed testing.

Log in or register to post comments

Comment #27

star-szr

he/him

English

CreditAttribution: star-szr commented 26 June 2013 at 17:17

Issue tags:

+Needs reroll

This needs another reroll.

Log in or register to post comments

Comment #28

xjm

she/her

English

CreditAttribution: xjm commented 31 July 2013 at 17:48

Assigned:	Unassigned	» chx
Category:	feature	» bug
Priority:	Major	» Normal

So, from the summary, it sounds like this might be a bug (as @colinafoley pointed out on IRC). Assigning to chx to get his confirmation on that point.

If it is a bug, the next step in this issue is to add a failing test for the bug that will pass when combined with a fix: https://drupal.org/contributor-tasks/write-tests

Since this issue depends on #1832818: Allow a queue item to be postponed, it would be good to also provide a patch named whatever-do-not-test.patch alongside the full one that just shows the changes this issue will add on top of that one. We'll want that issue to go in first, but we have more work we can do here in the meanwhile.

Log in or register to post comments

Comment #29

Crell CreditAttribution: Crell commented 31 July 2013 at 19:03

Log in or register to post comments

Comment #30

chx CreditAttribution: chx commented 31 July 2013 at 21:06

Assigned:

chx

» xjm

This functionality didn't exist before -- although it should have been -- but it's definitely an API addition. Not sure whether "it should've been" makes it a bug or a feature request. I will let xjm decide.

Log in or register to post comments

Comment #31

chx CreditAttribution: chx commented 31 July 2013 at 21:54

Title:	drupal_cron_run should respect the return value of the 'worker callback'	» queue cron workers can't signal an uncompleted job
Assigned:	xjm	» Unassigned

And by that title it's a bug although one that can only be solved by an API addition; however since the exception idea it's not an API change but an addition so it's a go.

Log in or register to post comments

Comment #32

xjm

she/her

English

CreditAttribution: xjm commented 31 July 2013 at 21:55

Issue tags:

+API addition

Works for me.

Log in or register to post comments

Comment #33

Crell CreditAttribution: Crell commented 31 July 2013 at 22:09

By #31, do you mean using an exception to signal "worker fail, try again later" makes this not an API break? (I hadn't intended that with the other issue, but I'm fine with it as that's how update hooks work now.)

Log in or register to post comments

Comment #34

David Hernández CreditAttribution: David Hernández commented 29 September 2013 at 11:54

Status:

Needs work

» Needs review

File	Size
skip_cron_item-1524550-34.patch	10.84 KB

Quick reroll. Updated some deprecated functions (variable_get/set) and fixed some minor coding standard errors. Maybe I left some deprecated functions.

Log in or register to post comments

Comment #35

29 September 2013 at 11:56

Status:

Needs review

» Needs work

The last submitted patch, skip_cron_item-1524550-34.patch, failed testing.

Log in or register to post comments

Comment #36

David Hernández CreditAttribution: David Hernández commented 29 September 2013 at 12:04

Status:

Needs work

» Needs review

File	Size
interdiff.txt	883 bytes
skip_cron_item-1524550-36.patch	10.85 KB

Fixed info.yml file

Log in or register to post comments

Comment #37

29 September 2013 at 13:49

Status:

Needs review

» Needs work

The last submitted patch, skip_cron_item-1524550-36.patch, failed testing.

Log in or register to post comments

Comment #38

David Hernández CreditAttribution: David Hernández commented 29 September 2013 at 18:30

Status:

Needs work

» Needs review

File	Size
skip_cron_item-1524550-38.patch	8.66 KB

Fixing 3 out of 4 tests was easy. But I've been trying to resolve the last one for a couple hours. Unfortunately, my time is over and I have to leave the DrupalCon Prague sprint. I've tracked the issue to the common_test_cron_exception_helper. Looks like the function common_test_cron_exception_helper_callback is not being called and I wasn't able to find why.

I've attached my last version, so maybe someone can fix it for me.

Marked needs review to show the improvements and have the test report.

Log in or register to post comments

Comment #39

29 September 2013 at 19:55

Status:

Needs review

» Needs work

The last submitted patch, skip_cron_item-1524550-38.patch, failed testing.

Log in or register to post comments

Comment #40

marthinal CreditAttribution: marthinal commented 23 October 2013 at 16:50

Status:

Needs work

» Needs review

File	Size
interdiff-1524550-38-40.txt	3.46 KB
skip_cron_item-1524550-40.patch	11.8 KB

1) Each time we do this :

while (time() < $end && ($item = $queue->claimItem())) {

We call rows with expire as 0. I made a few tests manually and always when I call moveToTheEnd() method the other rows seem never accessed.

I have fixed the problem by adding the value 0 to expire once the while loop is ended. But I really don't know the best way...

2) About QueueTest, if we apply the patch, we add different weight per row. So the current test should verify the situation with the weight, right?

3) If we increase the value of the weight each time we'll have very high values in the case we always have an exception. I don't know if this could be a problem.

Hope it helps.

Log in or register to post comments

Comment #41

joachim CreditAttribution: joachim commented 30 January 2014 at 08:46

Issue summary:	View changes
Status:	Needs review	» Needs work

#2021933: Catch exceptions from queue workers (followup: tests don't work) is now in.

Either this needs a reroll, or it could be considered a duplicate, as it's now possible to signal that a job wasn't completed by throwing an exception.

Log in or register to post comments

Comment #42

joachim CreditAttribution: joachim commented 30 January 2014 at 13:02

One thing that might be interesting as an addition to just the worker throwing an exception would be to have the worker able to signal that the *whole queue* is currently fubar and should be skipped.

Example: queued items are data to be fetched or pushed to a remote site. If one worker fails because the remote site is down there is no point in cron trying to work on more items from this queue.

If the worker callback is capable of knowing that this is indeed the reason for the failure, it could signal to system_cron() that the current queue should be skipped completely on this cron run.

Log in or register to post comments

Comment #43

Crell CreditAttribution: Crell commented 31 January 2014 at 04:45

The obvious way to do that would be to allow a special exception to be thrown to indicate the entire queue should be paused. Any other exception just requeues that one item and on we go to the next one.

How the queue system would mark an entire queue as "paused", I have no idea. :-) But that would be the obvious API for workers.

Log in or register to post comments

Comment #44

joachim CreditAttribution: joachim commented 31 January 2014 at 08:33

I think this could be easily done in https://api.drupal.org/api/drupal/core!lib!Drupal!Core!Cron.php/function... at least:

    foreach ($queues as $queue_name => $info) {
      if (isset($info['cron'])) {
        $callback = $info['worker callback'];
        $end = time() + (isset($info['cron']['time']) ? $info['cron']['time'] : 15);
        $queue = $this->queueFactory->get($queue_name);
        while (time() < $end && ($item = $queue->claimItem())) {
          try {
            call_user_func_array($callback, array($item->data));
            $queue->deleteItem($item);
          }
          catch (Exception $e) {
            // In case of exception log it and leave the item in the queue
            // to be processed again later.
            watchdog_exception('cron', $e);
          }
          catch (\Drupal\Core\Queue\SuspendQueue $e) {
            // If the worker indicates there is a problem with the whole queue,
            // skip it.
            watchdog_exception('cron', $e);

            continue 2;
          }
        }
      }
    }

Log in or register to post comments

Comment #45

joachim CreditAttribution: joachim commented 31 January 2014 at 08:50

Title:	queue cron workers can't signal an uncompleted job	» queue cron workers can't signal a broken queue
Status:	Needs work	» Needs review

File	Size
1524550.drupal.cron-queue-worker-skip-queue-exception.patch	1.68 KB

Here's a patch that does that.

Log in or register to post comments

Comment #46

Crell CreditAttribution: Crell commented 4 February 2014 at 22:51

+++ b/core/lib/Drupal/Core/Cron.php
@@ -142,6 +143,14 @@ public function run() {
+          catch (SuspendQueueException $e) {

The more specific catch statement must come first; as is, the first Exception reference will catch SuspendQueueException as well and this block will never be called.

This will bypass that queue for this cron run. Is that sufficient? Probably. For a waiting queue, though (which is what we ought to be using rather than a polling queue), we'd need something more robust. Probably the exception should indicate how long the queue should be paused for. That may be follow-up material, though.

Log in or register to post comments

Comment #47

joachim CreditAttribution: joachim commented 5 February 2014 at 12:54

> The more specific catch statement must come first; as is, the first Exception reference will catch SuspendQueueException as well and this block will never be called.

Ok, will reroll.

I'll see if I have time to add a test.

What would be the best way for the queue worker to indicate to the test case how many times it was been called? All I can think of is setting a system variable each time it's called.

Log in or register to post comments

Comment #48

joachim CreditAttribution: joachim commented 1 March 2014 at 14:14

File	Size
1524550-48.drupal.cron-queue-worker-skip-queue-exception.patch	2.03 KB

Fixes the exception catching. Not had time to tests for this I'm afraid.

Log in or register to post comments

Comment #49

joachim CreditAttribution: joachim commented 2 March 2014 at 10:34

Issue tags:

-Needs reroll

Depending on status of #2208649: document queue worker callback, this will either need to update the documentation that issue adds, or reroll the patch at that issue.

Log in or register to post comments

Comment #50

Crell CreditAttribution: Crell commented 3 March 2014 at 03:00

joachim: Both issues look good to me. To avoid rerolling too much, please fold them both into one patch here and I can RTBC it. :-) (As is I'd RTBC both, but...)

Log in or register to post comments

Comment #51

joachim CreditAttribution: joachim commented 3 March 2014 at 07:16

Thanks for the review.
Though I think I'd rather keep them separate, as the docs issue will need backporting to 7.
Plus I'm not sure it would actually save us a reroll -- I'd reroll now, rather than reroll whichever of the two patches doesn't get in first, if you see what I mean.

Log in or register to post comments

Comment #52

Crell CreditAttribution: Crell commented 3 March 2014 at 17:02

Status:

Needs review

» Reviewed & tested by the community

Eh, whatever. I defer to the maintainers on the logistics.

Log in or register to post comments

Comment #53

joachim CreditAttribution: joachim commented 4 March 2014 at 17:01

Status:

Reviewed & tested by the community

» Needs work

The patch at #2208649: document queue worker callback won the race, therefore I'll reroll this with the documentation changes that it incurs.

Log in or register to post comments

Comment #54

joachim CreditAttribution: joachim commented 4 March 2014 at 21:54

Status:

Needs work

» Needs review

File	Size
1524550-54.drupal.cron-queue-worker-skip-queue-exception.patch	3.02 KB

Added mention of this to the callback docs.

Log in or register to post comments

Comment #55

Crell CreditAttribution: Crell commented 5 March 2014 at 00:29

Status:

Needs review

» Reviewed & tested by the community

And we're back.

Log in or register to post comments

Comment #56

catch

he/him

English

CreditAttribution: catch commented 10 March 2014 at 14:08

Status:	Reviewed & tested by the community	» Needs work
Issue tags:		+Needs change record

The new exception should be documented in a draft change notice for drush/waiting_queue and other queue runners so they can update to use it correctly.

Could also do with a (short) issue summary update since the original use case presented doesn't really match what we've ended up with.

Log in or register to post comments

Comment #57

joachim CreditAttribution: joachim commented 10 March 2014 at 15:28

Issue summary:

View changes

Updated issue summary.

Log in or register to post comments

Comment #58

joachim CreditAttribution: joachim commented 10 March 2014 at 16:11

Change notice: https://drupal.org/node/2214873

Log in or register to post comments

Comment #59

joachim CreditAttribution: joachim commented 10 March 2014 at 16:12

Status:

Needs work

» Needs review

Log in or register to post comments

Comment #60

Crell CreditAttribution: Crell commented 11 March 2014 at 16:37

Status:	Needs review	» Reviewed & tested by the community
Issue tags:	-Needs change record

I tweaked the change notice a little. Back to RTBC.

Log in or register to post comments

Comment #61

joachim CreditAttribution: joachim commented 11 March 2014 at 17:21

> It is up to the runner to determine when it is safe to try that queue again.

I'm not sure a runner would have the means to do that!

Log in or register to post comments

Comment #62

Crell CreditAttribution: Crell commented 11 March 2014 at 23:21

In most cases it's probably some sort of timeout-retry. That's effectively what the cron runner does, ie, try again on the next cron run.

Log in or register to post comments

Comment #63

alexpott

he/they

English

🇪🇺🌍

CreditAttribution: alexpott commented 23 March 2014 at 23:22

Status:

Reviewed & tested by the community

» Needs work

As of #3 we need a test - but we don't appear to have one.

Log in or register to post comments

Comment #64

joachim CreditAttribution: joachim commented 10 May 2014 at 15:08

Status:

Needs work

» Needs review

File	Size
1524550-64.drupal.cron-queue-worker-skip-queue-exception.patch	5.92 KB

Added a test.

Log in or register to post comments

Comment #65

10 May 2014 at 15:59

Status:

Needs review

» Needs work

The last submitted patch, 64: 1524550-64.drupal.cron-queue-worker-skip-queue-exception.patch, failed testing.

Log in or register to post comments

Comment #66

joachim CreditAttribution: joachim commented 10 May 2014 at 17:57

Status:

Needs work

» Needs review

File	Size
1524550-66.drupal.cron-queue-worker-skip-queue-exception.patch	5.95 KB

Whoops.

Log in or register to post comments

Comment #67

10 May 2014 at 18:50

Status:

Needs review

» Needs work

The last submitted patch, 66: 1524550-66.drupal.cron-queue-worker-skip-queue-exception.patch, failed testing.

Log in or register to post comments

Comment #68

joachim CreditAttribution: joachim commented 10 May 2014 at 19:39

Ah I get it:

    // Run cron; the worker for this queue should process as far as the crashing
    // item.
    $this->cronRun();

    // Only one item should have been processed.
    $this->assertEqual($queue->numberOfItems(), 2, 'Failing queue stopped processing at the failing item.');

    // Check the items remaining in the queue.
    $item = $queue->claimItem();
    $this->assertEqual($item->data, 'crash', 'Failing item remains in the queue.');
    $item = $queue->claimItem();
    $this->assertEqual($item->data, 'ignored', 'Item beyond the failing item remains in the queue.');

The queue item that threw the exception is still claimed, hence the test code can't claim it.

I don't see an API for releasing ALL of a queue's leases.

Should we:

a) have cron run release the item that throws the exception, which would also fix the test
b) change the test to inspect the queue's DB table directly rather than go via the API?

I'm really not sure which is best. Any thoughts?

Log in or register to post comments

Comment #69

Crell CreditAttribution: Crell commented 11 May 2014 at 22:17

I think A is best. In practice it shouldn't change the production-time effect for most queues, but seems more robust. Ie, if we know that an item is unprocessable we should explicitly say as much, since we're releasing the whole queue in this case anyway.

Also, coupling the tests to the DB implementation seems like a horrible idea, as this is testing the queue system, not the DB implementation thereof.

Log in or register to post comments

Comment #70

joachim CreditAttribution: joachim commented 12 May 2014 at 07:31

Status:

Needs work

» Needs review

File	Size
1524550-70.drupal.cron-queue-worker-skip-queue-exception.patch	6.25 KB

Done.

Fixed the docs for the new exception class too, which didn't have a proper 1 line summary.

Log in or register to post comments

Comment #71

Crell CreditAttribution: Crell commented 12 May 2014 at 17:05

Status:

Needs review

» Reviewed & tested by the community

Seems reasonable to me.

Log in or register to post comments

Comment #72

alexpott

he/they

English

🇪🇺🌍

CreditAttribution: alexpott commented 23 May 2014 at 13:47

Status:

Reviewed & tested by the community

» Fixed

Committed 082bf59 and pushed to 8.x. Thanks!

Log in or register to post comments

Comment #73

23 May 2014 at 16:23

Commit 082bf59 on 8.x by alexpott:

Issue #1524550 by superspring, joachim, David Hernández, marthinal,...

Log in or register to post comments

Comment #74

6 June 2014 at 16:30

Status:

Fixed

» Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.

Log in or register to post comments

queue cron workers can't signal a broken queue

Problem/Motivation

Proposed resolution

Remaining tasks

API changes

Original report by [username]

Comments