Currently the scheduler is last refresh time based. The next scheduled time is calculated at the time of querying the schedule with:
SELECT all sources WHERE last_refreshed_time + importer.update_period > NOW
The big advantage of this approach is that as soon as the importer's update period changes, the schedule for all sources changes. This is a very desirable behavior as it removes any lags in reaction to settings changes. Such lags are very often considerable sources of confusion.
However, this approach has a big disadvantage: per design it does not allow different speeds for sources of the same importer. The update period is defined on the importer level and used as a query parameter. More specifically, there are two features that we cannot implement with last refresh time based scheduling:
A) slower refresh times for feeds that are subscribed to pub sub services (#617054: PubSubHubbub support)
B) schedules such as "Retrieve at Monday, 12PM"
In order to enable features like A or B, we need to switch to a next scheduled time based approach: Every time a source is imported the next time a source should be imported is set. This approach allows us to query looking for the scheduled time:
SELECT all sources WHERE scheduled_time > NOW
Thus we can set the scheduled time on a per source basis. This will allow us to ask a source for its next scheduled time every time it is updated.
Of course, using next scheduled time basedscheduling will make it hard to update all sources immediately when an importer's update period setting changes. I need to think more on how to build this smartest.
Comment | File | Size | Author |
---|---|---|---|
#9 | 721428-9_next_schedule_time.patch | 7.2 KB | alex_b |
#6 | 721428-6_next_schedule_time.patch | 7.19 KB | alex_b |
#5 | 721428-5_next_schedule_time.patch | 7.25 KB | alex_b |
#4 | 721428-4_next_schedule_time.patch | 7.16 KB | alex_b |
#3 | 721428-3_next_schedule_time.patch | 11.32 KB | alex_b |
Comments
Comment #1
alex_b CreditAttribution: alex_b commentedOther depending issue: #644588: Schedule import not before a fixed date.
Comment #2
alex_b CreditAttribution: alex_b commentedOther depending issue: #662110: Pause single feed nodes
Comment #3
alex_b CreditAttribution: alex_b commentedFirst shot, all tests passing.
Comment #4
alex_b CreditAttribution: alex_b commented- Reschedule when changing import_period settings in the basic settings. All tests passing.
Comment #5
alex_b CreditAttribution: alex_b commented- Fix rescheduling message - don't show if there's no subscription to be changed.
Comment #6
alex_b CreditAttribution: alex_b commentedReroll after latest patches.
Comment #7
andrewlevine CreditAttribution: andrewlevine commentedsubscribing to review
Comment #8
alex_b CreditAttribution: alex_b commentedAll tests passing. There were actually no problems with the tests, I had forgotten to place simplepie.inc into the libraries/ directory, of course that would make tests fail :P
Comment #9
alex_b CreditAttribution: alex_b commentedName the new field 'next_scheduled_time' instead of 'next_schedule_time' - rolls a little better and we still have a choice.
Comment #10
alex_b CreditAttribution: alex_b commentedWhile #9 is fine, I now realize that this approach won't help for these tasks:
#867892: PubSubHubbub: slow down import frequency of feeds that are subscribed to hub
#662110: Pause single feed nodes
(However, a next scheduled time would help for #644588: Schedule import not before a fixed date.).
What's needed for PubSub and pausing is merely an override of the scheduling period. That's something that can be accomplished without changing the last scheduled time/next scheduled time base of the scheduler.
I opened a separate issue to focus on overriding the schedule period only #867910: Override schedule period.
Comment #11
AntiNSA CreditAttribution: AntiNSA commentedsubscribe
Comment #12
andrewlevine CreditAttribution: andrewlevine commentedI think this functionality is important and I like your approach. A few things:
Let me know what you think and we can work on a patch.
Comment #13
alex_b CreditAttribution: alex_b commentedThank you for the review. This is really helpful.
1. I think there's a misunderstanding. schedule() is not actually scheduling a job for some future execution, it rather dispatches a job for ASAP execution. schedule() should probably more aptly be called dispatch(), unschedule() should be called resetDispatchFlag() or similar. I've opened a ticket for that #868610: Clean up dispatching in feeds_schedule
2. Same misunderstanding I think, see #868610
3. Agreed. getSchedulePeriod() and getScheduleCallbacks() are awkward. Mostly because this level of inspection of the users of the scheduler shouldn't be required. My best suggestion to do away with them so far is: #867910: Override schedule period
Comment #14
alex_b CreditAttribution: alex_b commentedI am postponing this issue as I don't plan to move on it in the foreseeable future. #867910: Override schedule period will address the issues that led to this thread: pausing single feed nodes and slowing pubsub feeds. Scheduling at certain dates has different requirements altogether. Either way, making the scheduler next scheduled time based doesn't help us as expected (see #10)
Comment #15
alex_b CreditAttribution: alex_b commentedThis is basically implemented now with #908964: Break out job scheduler