This is a follow up issue to #391340: Job queue API. On this issue, some people (including me) were looking for a scheduler while the intention behind the queue module is merely to provide a method for working off large tasks - not necessarily only scheduled tasks but also e.g. large uploads etc.

A job scheduler could be of great value in Drupal though: we could funnel all cron tasks through it and start to get some awareness of how cron jobs are performing in Drupal (think: timeouts between search and aggregator using up too much time).

This is a list of all cron jobs in Drupal. In my mind they could all be worked off with a job scheduler:

aggregator_cron() // download new items
dblog_cron() // prune watchdog table
filter_cron() // Delete outdated filter cache entries
node_cron() // prune history
poll_cron() // closes polls that have exceeded their allowed runtime
search_cron() // updated index - module_invoke_all('update_index');
statistics_cron() // reset day counts, prune access logs
system_cron() // prune flood and batch tables, prune temporary file log, prune caches
trigger_cron() // trigger cron actions
update_cron() // check for updates

Will follow with a patch soon.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

alex_b’s picture

FileSize
2.09 KB

This is a first proof of concept. The scheduler creates a queue (see #391340: Job queue API) per time slot and works these queues off one after the other on system_cron(). It uses queue module's $process_time property for reserving the item until it is supposed to be called the next time.

This approach makes do with the interface that queue module #391340: Job queue API offers. Thus it uses the queue name for storing the minimum time period between callbacks (we need to know the time to reserve the item at the moment we take it from the queue).

If this approach proves popular we could mature it and move Drupal cron jobs to the scheduler.

Status: Needs review » Needs work

The last submitted patch failed testing.

chx’s picture

  if (!in_array($minimum_period, system_scheduler_get_schedules())) {
    return FALSE;
  }

we never do this. It's documented, and use another minimum_period at your own peril. Other than that I like this. One small tidbit: the idea of encoding the time in the queue name is mine.

alex_b’s picture

The idea of encoding the time in the queue name is mine.

Credit where credit is due, of course. Sorry for the omission.

drewish’s picture

subscribing.

kbahey’s picture

Subscribing.

chx’s picture

Instead of getting all items you really should use a time limit. It's not a good idea at all to try to dequeue all.

moshe weitzman’s picture

Hmmm. Looks different from the famously useful job_queue module. Alex - take a look there if you haven't already.

drumm’s picture

There needs to be a really simple way to simply add a function call with arguments to the queue. I don't care which queue or priority, just add it to the pile. No scheduling, recurrence, or anything fancy. Just function name and argument array. It gets called once when it reaches the top of the queue.

alex_b’s picture

FileSize
3.35 KB

I did another iteration over this patch. This is still in a proof of concept stage and not tested. My main goal is still to play out how #391340: Job queue API could be used for building a job scheduler.

First my responses to the comments above:

#3 @chx - for this particular implementation, this test for existing schedules is essential: otherwise API users who add a callback to an invalid schedule won't be able to remove the callback without going around the API (see system_scheduler_remove).
#8 @moshe: I did have a look, this approach is so different because it's based on #391340: Job queue API. I am not entirely familiar with job queue though, that's why I am glad about any input from people who are.

This patch adds timeouts on cron (#7), one-off tasks (#9) and removal of items. There are three limitations with the proposed functionality that I'd like to point out:

1) Does not support scheduling tasks at a certain time of the day.
2) The first invocation of $callback won't be timed but will occur as soon as possible. This may be a problem for certain use cases, especially if (1) is desired. This could be easily achieved by adding a $process_time (expire) parameter to queueQueue::add() - @chx, would you be open for that?
3) Removing an item from the job scheduler system_scheduler_remove($item) requires passing in the return value of system_scheduler_add($callback) - this limitation is due to the fact that there is no other way of removal supported by queueQueue class. In my mind, system_scheduler_remove($callback) would be ideal. This problem could be solved by either adding an external id field to the queueQueue class that we would use for storing the callback (I know that chx is not in love with this idea) or by creating our own drupalQueue implementation.

Creating our own drupalQueue implementation for the job scheduler could also take care of 1). I'm more and more leaning towards this approach. However, this patch still builds fully on queueQueue class.

chx’s picture

while ($item = queue_reserve('scheduler_' . $time, $time)) { while ($item = queue_reserve('scheduler_' . $time, $time) && time() < $when_we_should_finish)

if (!in_array($minimum_period, system_scheduler_get_schedules()))

$schedules = system_scheduler_get_schedules();
$n = count($schedules);
for ($i = 0 ; $i < $n - 2; $i++) if ($schedules[$i] <= $minimum_period && $minimum_period < $schedules[$i + 1]) { break; }
andrewlevine’s picture

subscribing

dww’s picture

FYI: system.queue.inc has changed *a lot* since #10 was rolled... we're in the very final stages of tweaking before RTBC and commit over at #391340: Job queue API. Some things in #10 are definitely not possible any more (not sure they ever were, in fact).

A big example is queue_add() (which is now $queue->createItem($data)) does not return the $item, it just returns a bool if the item was successfully added to the queue or not. I have some misgivings about this aspect of the job queue API, but b/c job queue is pluggable, many possible backends are going to have a very hard time returning the $item in a reasonable way. So, system_scheduler_add() isn't going to be able to return the job queue $item -- at best it could return the $data (what #10 calls $item -- this part of the job queue API used to be a complete mess of confusion -- it's a lot cleaner now).

Also, re: a $lease_time on createItem() -- I don't see that happening. That really makes no sense from the point of view of the job queue API itself. The only timing the queue cares about is the lease for claiming items. The existing queue API has no notion of scheduling tasks (which chx punts to this issue), so there's no concept of delayed claim, submitting tasks "on hold" to be "released" later once they should become available for claim, etc. chx doesn't want to complicate #391340 with stuff like this -- he points me here, but then we can't actually implement that stuff here without API changes over there...

Basically, this issue brings up a lot of the other problems of trying to use the plugable job queue API for any kind of actual scheduling system. :( Job queue API is great for what it's currently designed for: a pluggable fill/drain queue that's optimized for performance and throughput. But, it's pretty useless for a scheduling system, to be honest. I suspect that to build the kind of flexible scheduling system we want for things like the cron tasks and the other things discussed in this issue, we're probably going to need to do one of the following:

  • Implement our own queue backend and extend the interface for our needs -- at which point we lose the benefit of the pluggability. Not so clear what we gain from sticking with job queue API in this case, really.
  • Implement the scheduler class as its own pluggable interface with a richer API than job queue which will be easier to use for all of this, and harder to implement a backend for if you're using a distributed queue, etc... basically, give up trying to depend on job queue API at all for job scheduler API.

Neither of those sits very well with me. But, I clearly see the value in having a simple fill/drain queue that's not taking on all the complexity of a robust scheduling system. To handle a more complicated thing (scheduling) we need to make more specific assumptions than we need for a generic "pile of tasks" system. Carrying around the assumptions we need for scheduling inside every job queue in core would be a mistake, and make it much less possible to plug in other job queue backends.

chx’s picture

Derek, huh? We do claim an item for a time and although it's a bit of abuse you definitely can make it so that the task repeats after the lease expires. That sounds enough for cron.

neclimdul’s picture

Got some ideas for this. Interested in the next step :)

dww’s picture

@chx: but you can't say "never let this item become available for claim until time X". So, the first time a task gets claimed is ASAP. Thereafter, you can play tricks with leases. It is an interesting approach for (ab)using the leases, but it's definitely a hack. It also depends on promptly resetting items when their lease expires. Currently SystemQueue doesn't do a great job of that, since it uses REQUEST_TIME not time() and only tries to reclaim tasks when hook_cron() runs, not on-demand while trying to claim items. More importantly, without a LOT of churning, I don't see how you can use this system to semi-reliably schedule things at specific times of the day. You have to specify the $lease_time to claim it, but once you've got the item, in general, you'll find that to best fit the schedule, you need to either shorten or extend the lease. job queue API has no methods for manipulating leases.

About the only way to do this given the existing API is to have a *lot* of different queues with most possible options for "how soon from now to start this task?" so that the job scheduler can independently specify "when to first run this" and the period for how often to repeat (if any). E.g. simple example: if it's currently 3pm, and a caller wants to add a task to the schedule that runs every 12 hours at 5(pm|am), you need to first stick it in the "2 hours from now" queue, and when it's done, move it to the "12 hours from now" queue. The granularity of your schedule is directly proportional to N queues you need to create. You might not even need tricks with the lease_time if you did it this way -- you just have to be smart about which queue to put any given item in, and be smart about which queue(s) to try to drain at any given time. However, it requires a lot of queues and a fair bit of complexity. Plus, the accuracy of your schedule will be dependent on the responsiveness of your queue backend -- queues that have higher latency will tend to distort the schedule...

chx’s picture

Title: Job scheduler » New cron framework based on queue API

Renamed the issue to better reflect what we want. If we can do a 'do foo every X minutes' thats enough. And if its not precise, oh well.

chx’s picture

Title: New cron framework based on queue API » Job scheduler

Renaming back. Actually the cron framework might be a different issue and scheduling at specific points of time might be a valid issue. Noone is forcing us to use the queue API. This issue started as one because i was harassed for a scheduler endlessly in the queue but at the end we might not use the q API but write our own.

neclimdul’s picture

Finally downloaded the patch. Going to try and give it an update this weekend.

hass’s picture

+

sun.core’s picture

Version: 7.x-dev » 8.x-dev
EvanDonovan’s picture

In #131536: Make cron watchdog more granular and informative, it was suggested in #55 to make the cron system pluggable and the issue was marked as "won't fix". Would this be a good issue to discuss that, or should a separate issue be opened, as per #18 of this issue.

alex_b’s picture

I am currently working on breaking out scheduling functionality of Feeds - Job Scheduler is the resulting API module. Related issue on Feeds: #908964: Break out job scheduler.

Job Scheduler may be a start for a scheduling API in core - if we convince ourselves that it is indeed a good thing to have one in core.

What I know for now is that it can sure use more polish - feedback appreciated.

sepla’s picture

Subscribing.

mitchell’s picture

Status: Needs work » Closed (duplicate)

I propose we move this to #1088048: Add ability to schedule actions, since that will be easier to follow and is specific to 8.x. The work here should be moved to that issue summary.