Discussion with Queue system maintainer Mark Sonnabaum while trying to use the core queue API for more complex tasks, I'm running into a couple bugs/flaws.

The more annoying is that the createItem() method returns a boolean instead of the ID of the queue item, for no useful reason. This makes it hard to create an item in the queue an watch for it's appearance again in a worker function, for example.

All likely back-ends (SQL, redis, beanstalk) support returning an ID, and the change from bool to int/FALSE shouldn't break any existing code.
https://github.com/kr/beanstalkd/blob/master/doc/protocol.txt#L155
http://drupalcode.org/project/redis_queue.git/blob/refs/heads/7.x-1.x:/r...

The other problem is that the properties of the $item fro claimItem are not clearly specified or documented. It's not even documented that the data is in $item-data! We should document a minimal set of properties as part of the APi so code can know what to expect to be present in all cases.

These changes should be back-portable to 7.x

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

msonnabaum’s picture

This seems pretty reasonable to me. Interested to hear what others think.

pwolanin’s picture

Status: Active » Needs review
FileSize
2.2 KB

turns out the memory queue didn't even respect the existing interface definition.

Minimal changes (2 LOC) plus code comments.

pwolanin’s picture

FileSize
2.2 KB

typo fix.

pwolanin’s picture

D7 backport.

neclimdul’s picture

Looking at the original issue the return value wasn't really discussed past not returning the item so you couldn't try to do work on the item in the queue. I don't see a problem with returning the ID because it still matches the intent of the interface.

As far as backporting, I assume no queue system would return 0 or an empty string or something as a valid identifier so it shouldn't break any existing code creating items, however there are existing implementations outside of core and this would be an API change. As long as d7 maintainers are aware and OK with this seems fine.

chx’s picture

Status: Needs review » Reviewed & tested by the community

Go for it.

tsphethean’s picture

Status: Reviewed & tested by the community » Needs review

Not sure I understand why this would be done as a modification to Queue API. Queues with a STOMP interface (i.e. AcitveMQ, RabbitMQ etc) will not return an ID AFAIK. Do we also propose a means of retrieving an item by its ID (not a queue)?

Feels like something which would live in contrib as a wrapper or extension to the createItem() method in case it is supported. If something is being put on a queue, then the ID of the queue item is irrelevant. The ID of the "thing" going on to the queue is important (I often use the id of the data item to form a queue specific lock to cover race conditions etc)

+1 on more documentation though :-)

pwolanin’s picture

I didn't propose retrieving an item by ID, since that clearly breaks the queue paradigm.

Is there a Drupal implementation of this interface on top of a STOMP interface? Looking at the ActiveMQ docs quicks, I see reference to a message ID, but looking at the Drupal RabbitMQ queue implementation, I don't see that it could get an ID back.

In any case, my goal here was to make it easy to use the unique ID that's typically generated by back-ends instead of generating and adding an additional ID to the data itself. An interface to a back-end that doesn't return an ID could create an ID via uniqid() or a uuid call, though I'd agree that's not ideal either.

tsphethean’s picture

I created a Stomp module, www.drupal.org/project/stomp but have only implemented on activemq and it's only a youngish module so not perfect but so far as I can see the php stomp library has no way to return an id.

My point regarding retrieve by item id was really to illustrate that it seemed off for queue API to return an id when the API has no use for it. That was why it felt more like a contrib use case

The point of the queue should be fire an forget in most cases shouldn't it?

pwolanin’s picture

Well, the APi is a little wonky in that e.g. fore releaseItem() or deleteItem() you have to pass the object back with the IS property unmolested or the API doesn't work.

As above, if you need to insure that a piece of work is eventually done, it's not necessarily "fire and forget". As I said, I was looking at this as a way to make efficient use of the IDs being generated. Plus some queues (mysql, redis, etc) do potentially offer a way to retrieve by ID (not part of the official API), so one can imagine a use case.

neclimdul’s picture

Yeah, I was wondering about this too. The point would be in cases where the queue allows introspection to allow a UI track the state of completion. So it wouldn't be fetching specific items to work on but to check that jobs had been completed or they are still waiting to be run. This is the sort of thing you'd want when you're batching a process and have a user that would like to see its state of completion.

If we can't support this, there is the possibility that such a workflow could be handled with a database and having the worker code report back things like "I've been claimed and am running" "i finished" to a shared database or something.

Going to continue thinking on this, thanks for catching the STOMP issue tsphethean.

msonnabaum’s picture

Compatibility with STOMP is worth discussing, since that's more geared towards message queues than job queues.

I'm of the opinion that the core queue api should focus more on being compatible with popular job queues, beanstalkd being the least common denominator in terms of introspect-ability.

That said, even though we dont have any methods that deal with ids, we do have methods that need something like an id to function (deleteItem, releaseItem), they just take the entire queue item and its expected to have an id.

Maybe instead of returning the id we should return the whole queue item?

pwolanin’s picture

in #5 neclimdul indicates that not returning the whole item was a deliberate design decision.

re: methods that taking an item expecting an ID, I think think that's why documenting the presence of that property is useful, though clearly a back-end might not use the ID?

tsphethean’s picture

Agh wrote a long post on the train then my mobile browser lost it :(

What I think I was saying:

#11 - neclimdul - "to check that jobs had been completed or they are still waiting to be run" isnt this replacing the job of whatever backend was selected? i.e. if we're using ActiveMQ, then AMQ provides all the interfaces for inspecting queue progress, jobs left to run, etc. I imagine Redis and Beanstalkd provide the same? The Queue UI module aims to do this with Drupal DB queues, and future versions are planned to allow deeper introspection into the queue. I'd be nervous of making provision for functionality which should lie outside of Drupal.

I can see the value re: batching, but still not sure this is Drupal's responsibility.

#12 - msonnabaum - "Maybe instead of returning the id we should return the whole queue item?" I like this idea but thinking with a STOMP hat on, I dont think this is feasible, as again the interface just returns success/failure, so to return the item you have to "claim" it and then immediately release it. Would work, but not sure on the performance impact...

In general, I think contructing the queue item object within the Queue interface with some defined required properties such as id, data etc would help with implementing backends. The STOMP module implements (for example):

public function deleteItem($item) {
  $this->stomp->ack($item->headers['message-id']);
}

because on claimItem() the whole object returned by the STOMP lib is assigned to $item (clunky but...). Better object construction would make objects more consistent and easier to work with. Unfortunately you can only get at this message-id when reading a frame from the stomp queue.

tsphethean’s picture

Thinking about the comment in #12 of

more geared towards message queues than job queues

makes me wonder whether we need an alternative (extended?) messagequeue interface which might get us round some of these objections? Job queues will be more geared around doing things to specific items, whilst message queues just see another message. Might give us a reason to work in #1913504: Queue API should support item prioritisation since prioritisation is more message-queue-esque too :-)

neclimdul’s picture

Yes it was a deliberate decision. I believe it was dww that insisted. reference: http://drupal.org/node/391340#comment-1552618 As he put it in the linked comment,

we don't return the $item in createItem() since you shouldn't do anything once it's in the queue unless you claim it first.

I believe you're right pwolanin. The metadata of the item object could be dependent on the implementation so it wasn't documented further. Its expected that what ever claimItem returned would be enough to satisfy the needs of delete and release what ever those needs might be.

Programming for the least common denominator has always been a bit of a hurdle for the QueueAPI though and made it sometimes under featured so I'm by no means suggesting we stop the discussion. I'm still kinda thinking about this.

neclimdul’s picture

@tsphethean yeah, not the entire queue. Say I'm provisioning a site and that has 5 tasks, each of which are fired into a queue to be handled by an external process. I've got a graphic that shows the state of those tasks so I can see when my site will be ready. Inspecting the entire queue will not help with this but some queues give the ability to inspect the status of the individual job so would could support this interface. I hope that's a more clear explanation.

tsphethean’s picture

@neclimdul - I see the value of the use case, just not convinced that's a task for queue api.

It feels more like an extension, or improvement, on batch api. Alternatively you're looking at populating a table (or other store) of tasks, each of which is a task Id, task status etc, and then each item in the queue contains the task id so that as items are processed the status of the task can be updated. Thats not uncommon, and is close to a scenario I've built, but is still a bespoke use of a queue rather than something a generic api should be built around IMHO.

Nick_vh’s picture

So I stumbled upon this exact issue when trying out a queue implementation. There is no point in only returning a bool if there is more information that could be used. Perhaps it could be documented what the pro's and con's are when programming against the Queue API while using this return value.

For me this can go in sooner rather than later.

tsphethean’s picture

if there is more information that could be used
I still don't think the queue item ID is the thing to be used though....

As I said above, I can understand the scenario where the data in the queue item is something you might want to keep track of (maybe you're processing an order and you want to know when that order is done, or you're publishing a series of nodes, and you want to know what the state of a particular node is) but that all sits outside Queue API.

Absolutely agree that if there is a need for a system of keeping tracking things that are put on to a queue then we can build that and return an id to whatever calls it, but when we can't guarantee that the thing we're going to queue into is going to give us back any kind of ID, let alone a unique one, I think this has potential to be really dangerous.

msonnabaum’s picture

Absolutely agree that if there is a need for a system of keeping tracking things that are put on to a queue then we can build that and return an id to whatever calls it, but when we can't guarantee that the thing we're going to queue into is going to give us back any kind of ID, let alone a unique one, I think this has potential to be really dangerous.

That argument doesn't really hold, because we already have deleteItem and releaseItem, and they need *some* kind of id within the queue object. We just don't say where it has to go.

I'm not sure I agree with dww's reasoning for killing that return value in the original patch. You can argue that you shouldn't do anything with the queue item after you create it…but that's kind of obvious? You just passed in the data, the only thing you're getting back is the data you put in plus an id. The only thing we're doing there is explicitly hiding the id from you, even though both beanstalkd and SQS return it. That hardly seems valuable.

tsphethean’s picture

The difference there though is that before you release the item or delete it, you must have claimed it.

Maybe an alternative would be Drupal generating some kind of unique id which is passed as part of the queue item data to the queue backend, then we can return that? That would then be backend agnostic and still allow what we're trying to do?

pwolanin’s picture

FileSize
2.95 KB

@tsphethean - my thought was that if the back-end doesnt' generate a unique ID, createItem could still do so. i.e. we make it a clear responsibility of the implementation.

We also noticed that created is not loaded when the item is claimed, so adding that to make the object match the proposed docs.

pwolanin’s picture

here's a D7 version

tsphethean’s picture

Ok, so my last thought on this is that we need a decision from @msonnabaum and @chx as queue maintainers about the direction of queue Api.

We need to know if the objective of queue api is to be designed to provide a minimum set of functionality that can then be extended upon In custom class implementations, or whether we're aiming to provide as much functionality as possible in the base classes whilst giving individual implementations the ability to not fully implement functionality if not available to the particular backend or use case.

If the latter, then this issue should go ahead, and I think #1913504 should be reconsidered, as this can be easily ignored by backends which don't support prioritisation, in the same way we're saying we would cover backends that can't return a unique Id.

Otherwise this is a job for contrib...

chx’s picture

Status: Needs review » Reviewed & tested by the community

It's fine for D8, I can't imagine how can you call this backportable but as I said on IRC, we know that certain changes like the hash function change tend to disregard process completely so I will leave the tag on it to avoid a tug-of-war and hope that the committers will be more sane this time.

Dries’s picture

Status: Reviewed & tested by the community » Fixed

Committed to 8.x.

pwolanin’s picture

Version: 8.x-dev » 7.x-dev
Status: Fixed » Needs review

for consideration for D7. The changes would not break any existing consumers of the queue, but would suggest non-SQL implementors (like redis, etc) need to be updated to match, and also has the wrinkle that modules that need the (minor) new capabilities would have to check the core or contrib version.

mgifford’s picture

Issue summary: View changes
FileSize
2.61 KB

Here's a port to get this moving.

Status: Needs review » Needs work

The last submitted patch, 29: 1928056-D7-29.patch, failed testing.

mgifford’s picture

Status: Needs work » Needs review
FileSize
3.36 KB

Fixing Fatal error: Call to a member function queryRange().

ShaxA’s picture

FileSize
3.37 KB

Guys i have refactored the patch to work with 7.44 version.

mgifford’s picture

@ShaxA what was the change? Can you provide an interdiff?

neclimdul’s picture

Looks like just a straight reroll around conflicts from the query here #2356983: Fix SystemQueue::claimItem() returns items in the wrong order (resolves locale test failures on PostgreSQL) and a default.settings.php change here
#1930960: Block caching disable hardcoded on sites with hook_node_grant() causes serious performance troubles when not necessary. Basically the difference in the patches is just the sort change from #2346983: Path of module settings page.

+++ b/sites/default/default.settings.php
@@ -435,6 +435,16 @@ ini_set('session.cookie_lifetime', 2000000);
 /**
+ * HTTP Request Buffer
+ *
+ * When drupal requests a file from another webserver, it periodically checks
+ * for end-of-file and server timeouts.  The default is to read a maximum of
+ * 1024 bytes between each check.  Increasing this limit may help performance
+ * when reading large files from the network.
+ */
+# $conf['http_request_buffer'] = 1024;
+
+/**

What's this?

mgifford’s picture

Status: Needs review » Needs work
ntsekov’s picture

FileSize
3.52 KB

@ShaxA I have updated your patch to work with 7.50 version of Drupal core

  • Dries committed 7cfb115 on 8.3.x
    Issue #1928056 by pwolanin: Queue system createItem() should return ID,...

  • Dries committed 7cfb115 on 8.3.x
    Issue #1928056 by pwolanin: Queue system createItem() should return ID,...

  • Dries committed 7cfb115 on 8.4.x
    Issue #1928056 by pwolanin: Queue system createItem() should return ID,...

  • Dries committed 7cfb115 on 8.4.x
    Issue #1928056 by pwolanin: Queue system createItem() should return ID,...