When using the queue to process notifications it looks like it's possible under certain circumstances to inadvertently create a barrage of fatal errors during cron. One way to replicate this is to trigger a message via a standard call to message_subscribe_send_message(), but then delete the message (manually or via the message UI) before the newly queued notifications are processed out of the queue.

When processing a queue item message_subscribe_queue_worker() does a message_load() and then passes the resulting $message to message_subscribe_send_message(). Because message_subscribe_send_message() type hints the $message variable, if $message was not a properly loaded for any reason, fatal errors will occur. This can totally cripple cron until the queue items can be manually cleaned up.

Though the replication scenario above may not be all that common, it has bitten us a few times. There are also probably other ways to trigger this problem. This would probably be quite easy to rectify with some basic error checking inside message_subscribe_queue_worker().

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

rjacobs’s picture

Status: Active » Needs review
FileSize
1014 bytes

Attached is a patch.

I initially thought a try/catch could be wrapped around message_subscribe_queue_worker(), but that convention does not seem to be used elsewhere in the module (and is probably overkill). I also thought that perhaps an error should be logged when the message can't be loaded correctly, but that seems dangerous for situations where lots of notification items are queued in relation to the same (missing) message.

So this simple patch just adds a check to see that $message is not empty and is an object before processing the item with message_subscribe_send_message().

amitaibu’s picture

Status: Needs review » Needs work
+++ b/message_subscribe.module
@@ -701,6 +701,10 @@ function message_subscribe_queue_worker($data, $end_time = FALSE) {
   $message = message_load($mid);

You could:

if (!$message = message_load($mid)) {
  return;
}

Bonus points for adding this to the test in \MessageSubscribeAdvancedQueueTest::testAdvancedQueue

RoySegall’s picture

FileSize
6.13 KB

Manage to reproduce but the queue worker doesn't reset back to 0:

rjacobs’s picture

I'd be happy to revise this a bit and submit a new patch. Maybe in a day or two (I'm swamped with some projects at the moment).

@Amitaibu, I'm generally not a big fan of setting variables inside of conditionals. That's just a matter of taste though. If doing so is a practice you all use in this project, that's fine with me. I certainly won't split hairs about that.

@RoySegall, Are you saying that the "bad" items aren't cleared from the queue even with the patch from #1? I thought that the item was removed from the queue so long as the worker returned (and did not return FALSE). I did a quick scan of the docs on this, but didn't find a whole lot about the ins-and-outs of the worker callback. This is also confusing as under normal circumstances message_subscribe_queue_worker() does not return an explicit value, which makes me think that message_subscribe_send_message() must somehow be manually removing the item from the queue after sending? I can't find any indication of that though. However, I do see this (~line 129):

  if (empty($uids) && !$uids = message_subscribe_get_subscribers($entity_type, $entity, $message, $subscribe_options, $context)) {
    // If we use a queue, it will be deleted.
    return $message;
  }

I think this is checking to see if there are any subscribers to send to, and if not it's just aborting. The comment indicates that if this is also a case where a queue item is being processed "it will be deleted". But how? I assume automatically by the nature that the queue worker will have "finished" just after the return?

Anyway, I'm kinda new to the queue, so if anyone has any pointers about how the items are removed from the queue (and if this has to be done manually in an error case), I'd be grateful.

amitaibu’s picture

> That's just a matter of taste though

I agree, but we prefer it that way, as it saves a line.

> Are you saying that the "bad" items aren't cleared from the queue even with the patch from #1?

Clearing the items happens in message_subscribe_send_message() when we $queue = DrupalQueue::get('message_subscribe');, so if that code won't be called it won't be cleared. So better move your check after that.

rjacobs’s picture

Thanks for the comments.

In #5 you noted:

Clearing the items happens in message_subscribe_send_message() when we $queue = DrupalQueue::get('message_subscribe');, so if that code won't be called it won't be cleared.

However, I'm not sure I understand what you mean. A look at class DrupalQueue reveals that it's just a factory, and that the get() method just fetches a queue object by either instantiating one of the correct class or getting it from a static cache. This method does not appear to do any CRUD operations on any queue (such as clearing).

So I looked at bit deeper at the "queue worker" logic in core, and I think I've validated my initial thinking, namely that the queue item is cleared automatically after the worker returns. This can further be verified with a look inside drupal_cron_run() where $queue->deleteItem($item) is called just after the worker runs for any given queue item. So I believe that no special queue clearing effort should be needed.

So what RoySegall noticed in #3 is exactly what I would expect to see without the patch applied (the worker triggers fatal error, does not return, drupal_cron_run() dies, and items stay in queue). If he saw this behavior with the patch installed, then, well, more investigation is needed. As long as the patch is applied I cannot personally replicate any issues with items getting stalled in the queue.

I'm still going to have a look at MessageSubscribeAdvancedQueueTest::testAdvancedQueue, but am attaching the shorter version of the patch here in the meantime. Will officially toggle to "needs review" again after test changes can be incorporated.

amitaibu’s picture

Status: Needs work » Needs review

Setting correct status.

amitaibu’s picture

Status: Needs review » Needs work

> Will officially toggle to "needs review" again after test changes can be incorporated.

Sorry, missed that - setting back to needs work

amitaibu’s picture

+++ b/message_subscribe.module
@@ -700,7 +700,10 @@ function message_subscribe_queue_worker($data, $end_time = FALSE) {
+    return;

abort won't be enough. You'll need to $queue->deleteItem() as-well.

rjacobs’s picture

abort won't be enough. You'll need to $queue->deleteItem() as-well.

Are you sure? If this really is the case can you let me know what specific part of my analysis in comment #6 is incorrect? From what I can see the item will automatically be cleared from the queue via logic in drupal_cron_run() as long as the queue worker returns without dying. The current problem is that we can have a situation where the worker can die, and drupal_cron_run() can't cleanup.

rjacobs’s picture

Attached is a TESTS ONLY patch which adds some test coverage for this case (e.g. add item to queue, delete related message, run cron). These tests also check whether-or-not queue items are cleared upon running cron.

rjacobs’s picture

Ok, attached is the tests-only patch along with a patch the includes both the tests and the proposed fix from #6.

Hopefully the test-only patch will fail d.o. automated tests, and the combined test+fix patch will pass....

amitaibu’s picture

Awesome! :)

  1. +++ b/message_subscribe.test
    @@ -441,5 +441,23 @@ class MessageSubscribeAdvancedQueueTest extends DrupalWebTestCase {
    +    // Flush queue so we can proceed with additional queue handling cases.
    

    I think this could move to its own test case (testCron()?)

  2. +++ b/message_subscribe.test
    @@ -441,5 +441,23 @@ class MessageSubscribeAdvancedQueueTest extends DrupalWebTestCase {
    +    ¶
    

    Remove trailing space.

rjacobs’s picture

Ok. Yes, I had considered a new test case, but I was trying to be as low impact as possible given that most of the setup and logic needed was already in place from the existing queue test. I can alter the patch shortly for this...

Also, as an aside, I'd suggest adding a bit more clarity/detail to some of the module's code comments going forward. This could best be done retroactively independent of this particular issue, but I figured it would be worth making a note here. Specifically I had a hard time unraveling some intended behavior within message_subscribe_send_message() (there's a lot going on in there and the behavior is very state and call-stack dependent). Slightly more verbose comments in such places may help 3rd parties like me get involved a bit quicker. Now that I've wrapped my head around things I could capture some comment additions in a separate patch if that would help, but I'll leave that for another issue.

More to come....

rjacobs’s picture

Here's an alternate version that breaks out these new tests into a separate test method. Given that most of the setup tasks are the same for both these new tests and the existing queue tests I figure a completely separate test case could be overkill. Just let me know what you think. So ultimately this is just a bit of refactoring on the MessageSubscribeAdvancedQueueTest class.

I also changed things so the test uses the native message API to delete the message (and check that it's deleted), as opposed to calling entity_delete and entity_load directly.

amitaibu’s picture

Looks good - minor:

  1. +++ b/message_subscribe.test
    @@ -381,21 +381,24 @@ class MessageSubscribeAdvancedQueueTest extends DrupalWebTestCase {
    +  function testAdvancedQueue() {
    

    If we are already here, lets change it to testQueue (we already dropped the dependency on advanced queue)

  2. +++ b/message_subscribe.test
    @@ -442,4 +445,31 @@ class MessageSubscribeAdvancedQueueTest extends DrupalWebTestCase {
    +    message_delete_multiple(array($message->mid));
    

    You can $message->delete() instead.

> I'd suggest adding a bit more clarity/detail to some of the module's code comments going forward. This

Understood -- patches would be mostly welcome :)

rjacobs’s picture

Ok, new patch attached.

If we are already here, lets change it to testQueue (we already dropped the dependency on advanced queue)

I assume you are just referring to the naming conventions used for the methods? I've now dropped the "Advanced" part in both methods as well as the test case class name.

You can $message->delete() instead.

Ah yes, thanks. I keep forgetting that entities using the contrib entity API don't even need to be wrapped to provide useful methods like that.

amitaibu’s picture

Status: Needs review » Fixed

Committed, thanks!

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.