Hi guys! Thanks for that great module!
I tried to find info on how to configure scheduled import with job_scheduler module, but haven't found anything efficient, so I hope someone help here.

What I have: Drupal 7.15, Feeds 7.x-2.0-alpha5, Job Scheduler 7.x-2.0-alpha3, configured feeds importer (with periodic import set to 'none' and expire nodes set to 'never', though due to Feeds issue I had to set it manually in DB table "feeds_importer"), configured (as I think, but it may not be so) trigger scheduler with Cron tab set like this "0 23 * * *".

What I need: import feed everyday at 23:00 and nodes (which wasn't updated from CSV) expire after 24 hours and 10 min.

I can't understand how to configure job scheduler and cron for that.

Thx in advance.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

twistor’s picture

Title: Scheduled import » Schedule import to run at specific times

There are two ways you could accomplish this

  1. Implement some custom cron hooks and schedule Elysia cron to run them at specified intervals.
  2. Implement a feature that allows Feed importers to use a cron configuration rather than an interval. I would gladly accept that patch.
ianthomas_uk’s picture

Category: Support request » Feature request
Issue summary: View changes

I'm working on a patch for option 2, but I'm a bit confused by the current logic.

At the moment, when you edit the frequency for an importer it updates the importer's internal config, but doesn't actually change the scheduled job (managed by the job_scheduler module). When cron runs it will trigger feeds_cron() and that will then update the scheduled job for it to be picked up in job_scheduler_cron().

Is this legacy from before feeds used job_scheduler, or is there a need for this two-step process? Presumably it's just luck that these hooks even run in the correct order (because c for cron is earlier in the alphabet than j for job_scheduler).

ianthomas_uk’s picture

OK, I've answered my own question, and eugh! Feeds is re-scheduling its tasks in FeedsSource::scheduleImport() when a batch isn't complete. There must be a better way.

Also, progressImporting() is terribly named - it just updates the totals, it doesn't actually do any importing of it's own. Calling it updateImportingProgress() would work, but not using progress as a verb.

ianthomas_uk’s picture

Status: Active » Needs review
FileSize
3.62 KB

This didn't need as much changing as I feared, because job_scheduler already support a crontab option, so I just needed to add a new field to the configuration and recognise it in scheduleImport().

I've moved the $job['period'] assignment into the elseif because it is not currently needed by the other two cases, and it avoid a potential future bug of the queue depending on period being set (it won't always be set now, if the importer uses a crontab).

pollegie’s picture

Works like a charm!! Should be in a release..

Status: Needs review » Needs work

The last submitted patch, 4: feeds-1795262-scheduled-imports-4.patch, failed testing.

joelpittet’s picture

Status: Needs work » Needs review
FileSize
3.67 KB

Re-rolled.

jedsaet’s picture

#8 works well against dev

Ravenight’s picture

#8 worked for me as well. Lets get this committed.

ianthomas_uk’s picture

If you believe this is ready for commit, please set to Reviewed & tested by the community. See the definition at https://www.drupal.org/node/156119 if you're unsure. I shouldn't, as I wrote the original patch.

stlnyc’s picture

Can we get the patch at #8 added to a release? It has worked well for us and I just noticed that it's not yet a part of the latest 7.x-2.0-alpha9 or 7.x-2.x-dev releases.

ianthomas_uk’s picture

stlnyc: See my comment above yours. We're a volunteer community, so we need a volunteer to review and test the patch. Is that something you or a developer you work with could do?

joelpittet’s picture

Sorry I'd RTBC it if I had a project that needed it. Just re-rolled because I saw it needed it.

@stlnyc or @Ravenight or @jedsaet or @pollegie if any one of you currently use this patch in production or have tested it out, please feel free to set the Status to RTBC as @ianthomas_uk mentioned.

twistor’s picture

Issue tags: +Needs tests

Scheduling is a bit of a hair ball, this will needs tests before we introduce even more scheduling bugs.

pollegie’s picture

Status: Needs review » Reviewed & tested by the community
Issue tags: -Needs tests
ianthomas_uk’s picture

Status: Reviewed & tested by the community » Needs work
Issue tags: +Needs tests

Thanks for the review, but the project maintainer has said this will need a test first (meaning an automated test as part of the patch).

jackbravo’s picture

What already existing feeds test would be a good starting point to create a test for this patch?

tests/feeds_scheduler.test ?

MegaChriz’s picture

@jackbravo
The FeedsSchedulerTestCase class could indeed be a good place to add a new test method to for this feature. I would not recommend though to copy a whole test method of that class to use a starting point. When writing a test, think of the steps that are needed to accomplish a certain task first (thus before looking at the code) and then search for parts in other tests to execute these steps. Else you may get things executed in your test that are completely irrelevant for the thing that you want to test.

PQ’s picture

With the patch in #8 in place, Is there any way to make the import not repeat? What I'm trying to achieve is a situation where when a data file is uploaded, it will be process on cron overnight and then not run again until a new file is uploaded.

I can see that the job gets set with 'periodic' => TRUE in any case, but I'm not sure if with the right configuration, feeds can kill the job once it's run.

Thanks

stlnyc’s picture

@PQ
You might want to check out the Feeds Rules integration (https://www.drupal.org/project/feeds_rules) and setup a Rule to check for a new data file "Before importing feed".

manuvelasco’s picture

Thanks everyone!

steveoriol’s picture

Here is the patch for 7.x-2.0-beta4

MegaChriz’s picture

@steveoriol
Do you want to write a test for this feature?

steveoriol’s picture

@MegaChriz
I can not, I have not written yet and I do not have time right now ...

sano’s picture

attached is re-roll of patch 24 for 7.x-2.0-beta4+18-dev

MegaChriz’s picture

Note that there is also a module for running feeds at a specific time: Feeds Ultimate cron (depends on Ultimate cron). Perhaps that module fulfills this feature request too and then no changes in Feeds are needed.

grahamvalue’s picture

Just an idea.
Not sure if this helps.

Right now, when a periodic import is configured, it runs at the time the last import was completed. A simple workaround for timing imports would be to have the next import run at the time the last import started.

serenitystocks.com has a weekly import that takes about 5 hours to complete. The problem is that right now, when the the import runs for the second time, it starts 5 hours after the first time.

Having every periodic import start at the same time would solve the problem, without having to schedule a specific time for the import. We can just time the first import instead.

Thanks for a great module!

MegaChriz’s picture

@serenity1
It sounds like a great idea to base the next import time on the previous import start time instead of the previous import end time. Or, more specifically, when the previous import was supposed to start. If you pick the exact import start time, the imports may still run less times than expected. I illustrate that with an example: say you have configured the importer to run every hour and you run cron every 15 minutes. Let's say the first imported is scheduled for 14:00. Cron runs at 14:00, but it takes a few seconds before it is Feeds' turn, so the actual import starts at 14:00:19. This would result into that the next import is scheduled for 15:00:19, not 15:00:00. Say that on the next cron run there is less to do for cron, so Feeds turn starts at 15:00:16. In this case, no import would happen as it isn't 15:00:19 yet. So then the next import will happen on 15:15:xx instead.

There is however an issue about using the previous import time as base (whether that is the actual one or the supposedly one). If you for example schedule the import to happen every hour, but the import (sometimes) takes longer than a hour to complete, Feeds may never stop with importing as then the next import time that is calculated may always be in the past. Example: import is configured to run every hour. Import starts at 14:00:xx, but completes at 15:15:xx. Result: next import time is scheduled for 15:00. At 15:30:xx the import starts again and completes at 16:45 and then the next one is scheduled for 16:00, 16:30 or 16:30:xx depending on how this feature would be implemented. The consequence in this scenario is that it will still be unpredictable when exactly the next import runs.

I'm not sure if the described scenario is a real problem or just a configuration issue. We could say that if the periodic interval is lower than the import time, you basically messed up the predictability of your periodic imports.

Also note that the predictability of periodic imports can be messed up if you configured multiple feed importers. If one import is scheduled to run every hour and takes only a few minutes to complete, the predictability of when that import runs can still be messed up if suddenly an import for an importer takes longer than an hour to complete. The next import for importer 1 simply has to wait until the import for importer 2 has been completed. The next import for importer 1 would for example take place a quarter later than it was supposed to be, resulting in shifting the next import times from xx:00 to xx:15.

Well, that are my thoughts about it. What do you think, @serenity1?

grahamvalue’s picture

Great points, MegaChriz!

1. Definitely an interesting problem.

The solution I can think of is to save the start time as the time the import form was submitted, not the time it actually runs the first time.

So if the form was submitted at 12:28 and the first cron runs at 12:30 and the import starts at 12:31, the import still gets picked up by the 12:30 cron run the next day / week.

2. Interesting again.

Perhaps at the end of the import, when setting the time for the next cron run, there needs to be a check that the time being set is not in the past.

So the next start time is set as as the next multiple of the time period since the previous start.

So if an import is configured to run daily and it runs for 25 hours, the next start is set of 48 hours since the previous start.

3. Definitely a complex scenario.

Assuming here that that feeds doesn't support simultaneous imports.
If it did, #2 would also have required a lock on each import; similar to a singleton pattern.

I would leave this to the user to figure out as this is a complex scenario and I don't think it can be solved by code without supporting simultaneous feeds.

But I think #1 and #2 together should solve the problem of predictably timing a regular single import.

Please let me know what you think.

Thanks a lot!