Currently the scheduler is last refresh time based. The next scheduled time is calculated at the time of querying the schedule with:

SELECT all sources WHERE last_refreshed_time + importer.update_period > NOW

The big advantage of this approach is that as soon as the importer's update period changes, the schedule for all sources changes. This is a very desirable behavior as it removes any lags in reaction to settings changes. Such lags are very often considerable sources of confusion.

However, this approach has a big disadvantage: per design it does not allow different speeds for sources of the same importer. The update period is defined on the importer level and used as a query parameter. More specifically, there are two features that we cannot implement with last refresh time based scheduling:

A) slower refresh times for feeds that are subscribed to pub sub services (#617054: PubSubHubbub support)
B) schedules such as "Retrieve at Monday, 12PM"

In order to enable features like A or B, we need to switch to a next scheduled time based approach: Every time a source is imported the next time a source should be imported is set. This approach allows us to query looking for the scheduled time:

SELECT all sources WHERE scheduled_time > NOW 

Thus we can set the scheduled time on a per source basis. This will allow us to ask a source for its next scheduled time every time it is updated.

Of course, using next scheduled time basedscheduling will make it hard to update all sources immediately when an importer's update period setting changes. I need to think more on how to build this smartest.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

alex_b’s picture

alex_b’s picture

Other depending issue: #662110: Pause single feed nodes

alex_b’s picture

Status: Active » Needs review
FileSize
11.32 KB

First shot, all tests passing.

alex_b’s picture

- Reschedule when changing import_period settings in the basic settings. All tests passing.

alex_b’s picture

- Fix rescheduling message - don't show if there's no subscription to be changed.

alex_b’s picture

Reroll after latest patches.

andrewlevine’s picture

subscribing to review

alex_b’s picture

All tests passing. There were actually no problems with the tests, I had forgotten to place simplepie.inc into the libraries/ directory, of course that would make tests fail :P

alex_b’s picture

Name the new field 'next_scheduled_time' instead of 'next_schedule_time' - rolls a little better and we still have a choice.

alex_b’s picture

While #9 is fine, I now realize that this approach won't help for these tasks:

#867892: PubSubHubbub: slow down import frequency of feeds that are subscribed to hub
#662110: Pause single feed nodes

(However, a next scheduled time would help for #644588: Schedule import not before a fixed date.).

What's needed for PubSub and pausing is merely an override of the scheduling period. That's something that can be accomplished without changing the last scheduled time/next scheduled time base of the scheduler.

I opened a separate issue to focus on overriding the schedule period only #867910: Override schedule period.

AntiNSA’s picture

subscribe

andrewlevine’s picture

I think this functionality is important and I like your approach. A few things:

  1. It seems odd that when changing from last refreshed based to schedule based that our main schedule API function would be based on the last executed time. Why don't we:
    • Allow schedule() to take an arbitrary time - That would let us say schedule this feed for Aug 1 at 12AM. By default (without the parameter) it would still schedule immediately. It seems to make logical sense because in the UPDATE query in schedule(), 'scheduled' is the only non-settable setting.
    • Make reschedule() take a next_scheduled_time instead of a period. We could also implement rescheduleRelative() or add another parameter to reschedule() which would have the current functionality
  2. Why isn't finished() calling reschedule()?
  3. One more idea: getSchedulePeriod() and getScheduleCallbacks() are indeed awkward. I don't see why FeedsScheduler.inc should have any concept of a period. Maybe we should localize the concept of the period and the call of these functions to just FeedsScheduler::cron (which should be a static method) or just move cron() to the .module file.

Let me know what you think and we can work on a patch.

alex_b’s picture

Thank you for the review. This is really helpful.

1. I think there's a misunderstanding. schedule() is not actually scheduling a job for some future execution, it rather dispatches a job for ASAP execution. schedule() should probably more aptly be called dispatch(), unschedule() should be called resetDispatchFlag() or similar. I've opened a ticket for that #868610: Clean up dispatching in feeds_schedule
2. Same misunderstanding I think, see #868610
3. Agreed. getSchedulePeriod() and getScheduleCallbacks() are awkward. Mostly because this level of inspection of the users of the scheduler shouldn't be required. My best suggestion to do away with them so far is: #867910: Override schedule period

alex_b’s picture

Status: Needs review » Postponed

I am postponing this issue as I don't plan to move on it in the foreseeable future. #867910: Override schedule period will address the issues that led to this thread: pausing single feed nodes and slowing pubsub feeds. Scheduling at certain dates has different requirements altogether. Either way, making the scheduler next scheduled time based doesn't help us as expected (see #10)

alex_b’s picture

Status: Postponed » Closed (fixed)

This is basically implemented now with #908964: Break out job scheduler