Hi guys! Thanks for that great module!
I tried to find info on how to configure scheduled import with job_scheduler module, but haven't found anything efficient, so I hope someone help here.
What I have: Drupal 7.15, Feeds 7.x-2.0-alpha5, Job Scheduler 7.x-2.0-alpha3, configured feeds importer (with periodic import set to 'none' and expire nodes set to 'never', though due to Feeds issue I had to set it manually in DB table "feeds_importer"), configured (as I think, but it may not be so) trigger scheduler with Cron tab set like this "0 23 * * *".
What I need: import feed everyday at 23:00 and nodes (which wasn't updated from CSV) expire after 24 hours and 10 min.
I can't understand how to configure job scheduler and cron for that.
Thx in advance.
Comment | File | Size | Author |
---|---|---|---|
#27 | schedule_import_to_run-1795262-27.patch | 3.54 KB | sano |
#24 | schedule_import_to_run-1795262-24.patch | 3.57 KB | steveoriol |
| |||
#8 | schedule_import_to_run-1795262-8.patch | 3.67 KB | joelpittet |
Comments
Comment #1
twistor CreditAttribution: twistor commentedThere are two ways you could accomplish this
Comment #2
ianthomas_ukI'm working on a patch for option 2, but I'm a bit confused by the current logic.
At the moment, when you edit the frequency for an importer it updates the importer's internal config, but doesn't actually change the scheduled job (managed by the job_scheduler module). When cron runs it will trigger feeds_cron() and that will then update the scheduled job for it to be picked up in job_scheduler_cron().
Is this legacy from before feeds used job_scheduler, or is there a need for this two-step process? Presumably it's just luck that these hooks even run in the correct order (because c for cron is earlier in the alphabet than j for job_scheduler).
Comment #3
ianthomas_ukOK, I've answered my own question, and eugh! Feeds is re-scheduling its tasks in FeedsSource::scheduleImport() when a batch isn't complete. There must be a better way.
Also, progressImporting() is terribly named - it just updates the totals, it doesn't actually do any importing of it's own. Calling it updateImportingProgress() would work, but not using progress as a verb.
Comment #4
ianthomas_ukThis didn't need as much changing as I feared, because job_scheduler already support a crontab option, so I just needed to add a new field to the configuration and recognise it in scheduleImport().
I've moved the $job['period'] assignment into the elseif because it is not currently needed by the other two cases, and it avoid a potential future bug of the queue depending on period being set (it won't always be set now, if the importer uses a crontab).
Comment #5
pollegie CreditAttribution: pollegie commentedWorks like a charm!! Should be in a release..
Comment #8
joelpittetRe-rolled.
Comment #9
jedsaet CreditAttribution: jedsaet commented#8 works well against dev
Comment #10
Ravenight CreditAttribution: Ravenight commented#8 worked for me as well. Lets get this committed.
Comment #11
ianthomas_ukIf you believe this is ready for commit, please set to Reviewed & tested by the community. See the definition at https://www.drupal.org/node/156119 if you're unsure. I shouldn't, as I wrote the original patch.
Comment #12
stlnyc CreditAttribution: stlnyc commentedCan we get the patch at #8 added to a release? It has worked well for us and I just noticed that it's not yet a part of the latest 7.x-2.0-alpha9 or 7.x-2.x-dev releases.
Comment #13
ianthomas_ukstlnyc: See my comment above yours. We're a volunteer community, so we need a volunteer to review and test the patch. Is that something you or a developer you work with could do?
Comment #14
joelpittetSorry I'd RTBC it if I had a project that needed it. Just re-rolled because I saw it needed it.
@stlnyc or @Ravenight or @jedsaet or @pollegie if any one of you currently use this patch in production or have tested it out, please feel free to set the Status to RTBC as @ianthomas_uk mentioned.
Comment #15
twistor CreditAttribution: twistor as a volunteer commentedScheduling is a bit of a hair ball, this will needs tests before we introduce even more scheduling bugs.
Comment #16
pollegie CreditAttribution: pollegie commentedComment #17
ianthomas_ukThanks for the review, but the project maintainer has said this will need a test first (meaning an automated test as part of the patch).
Comment #19
jackbravo CreditAttribution: jackbravo commentedWhat already existing feeds test would be a good starting point to create a test for this patch?
tests/feeds_scheduler.test ?
Comment #20
MegaChriz CreditAttribution: MegaChriz as a volunteer commented@jackbravo
The FeedsSchedulerTestCase class could indeed be a good place to add a new test method to for this feature. I would not recommend though to copy a whole test method of that class to use a starting point. When writing a test, think of the steps that are needed to accomplish a certain task first (thus before looking at the code) and then search for parts in other tests to execute these steps. Else you may get things executed in your test that are completely irrelevant for the thing that you want to test.
Comment #21
PQ CreditAttribution: PQ commentedWith the patch in #8 in place, Is there any way to make the import not repeat? What I'm trying to achieve is a situation where when a data file is uploaded, it will be process on cron overnight and then not run again until a new file is uploaded.
I can see that the job gets set with
'periodic' => TRUE
in any case, but I'm not sure if with the right configuration, feeds can kill the job once it's run.Thanks
Comment #22
stlnyc CreditAttribution: stlnyc commented@PQ
You might want to check out the Feeds Rules integration (https://www.drupal.org/project/feeds_rules) and setup a Rule to check for a new data file "Before importing feed".
Comment #23
manuvelasco CreditAttribution: manuvelasco as a volunteer commentedThanks everyone!
Comment #24
steveoriolHere is the patch for 7.x-2.0-beta4
Comment #25
MegaChriz CreditAttribution: MegaChriz as a volunteer commented@steveoriol
Do you want to write a test for this feature?
Comment #26
steveoriol@MegaChriz
I can not, I have not written yet and I do not have time right now ...
Comment #27
sano CreditAttribution: sano as a volunteer commentedattached is re-roll of patch 24 for 7.x-2.0-beta4+18-dev
Comment #28
MegaChriz CreditAttribution: MegaChriz as a volunteer commentedNote that there is also a module for running feeds at a specific time: Feeds Ultimate cron (depends on Ultimate cron). Perhaps that module fulfills this feature request too and then no changes in Feeds are needed.
Comment #29
grahamvalue CreditAttribution: grahamvalue commentedJust an idea.
Not sure if this helps.
Right now, when a periodic import is configured, it runs at the time the last import was completed. A simple workaround for timing imports would be to have the next import run at the time the last import started.
serenitystocks.com has a weekly import that takes about 5 hours to complete. The problem is that right now, when the the import runs for the second time, it starts 5 hours after the first time.
Having every periodic import start at the same time would solve the problem, without having to schedule a specific time for the import. We can just time the first import instead.
Thanks for a great module!
Comment #30
MegaChriz CreditAttribution: MegaChriz as a volunteer commented@serenity1
It sounds like a great idea to base the next import time on the previous import start time instead of the previous import end time. Or, more specifically, when the previous import was supposed to start. If you pick the exact import start time, the imports may still run less times than expected. I illustrate that with an example: say you have configured the importer to run every hour and you run cron every 15 minutes. Let's say the first imported is scheduled for 14:00. Cron runs at 14:00, but it takes a few seconds before it is Feeds' turn, so the actual import starts at 14:00:19. This would result into that the next import is scheduled for 15:00:19, not 15:00:00. Say that on the next cron run there is less to do for cron, so Feeds turn starts at 15:00:16. In this case, no import would happen as it isn't 15:00:19 yet. So then the next import will happen on 15:15:xx instead.
There is however an issue about using the previous import time as base (whether that is the actual one or the supposedly one). If you for example schedule the import to happen every hour, but the import (sometimes) takes longer than a hour to complete, Feeds may never stop with importing as then the next import time that is calculated may always be in the past. Example: import is configured to run every hour. Import starts at 14:00:xx, but completes at 15:15:xx. Result: next import time is scheduled for 15:00. At 15:30:xx the import starts again and completes at 16:45 and then the next one is scheduled for 16:00, 16:30 or 16:30:xx depending on how this feature would be implemented. The consequence in this scenario is that it will still be unpredictable when exactly the next import runs.
I'm not sure if the described scenario is a real problem or just a configuration issue. We could say that if the periodic interval is lower than the import time, you basically messed up the predictability of your periodic imports.
Also note that the predictability of periodic imports can be messed up if you configured multiple feed importers. If one import is scheduled to run every hour and takes only a few minutes to complete, the predictability of when that import runs can still be messed up if suddenly an import for an importer takes longer than an hour to complete. The next import for importer 1 simply has to wait until the import for importer 2 has been completed. The next import for importer 1 would for example take place a quarter later than it was supposed to be, resulting in shifting the next import times from xx:00 to xx:15.
Well, that are my thoughts about it. What do you think, @serenity1?
Comment #31
grahamvalue CreditAttribution: grahamvalue commentedGreat points, MegaChriz!
1. Definitely an interesting problem.
The solution I can think of is to save the start time as the time the import form was submitted, not the time it actually runs the first time.
So if the form was submitted at 12:28 and the first cron runs at 12:30 and the import starts at 12:31, the import still gets picked up by the 12:30 cron run the next day / week.
2. Interesting again.
Perhaps at the end of the import, when setting the time for the next cron run, there needs to be a check that the time being set is not in the past.
So the next start time is set as as the next multiple of the time period since the previous start.
So if an import is configured to run daily and it runs for 25 hours, the next start is set of 48 hours since the previous start.
3. Definitely a complex scenario.
Assuming here that that feeds doesn't support simultaneous imports.
If it did, #2 would also have required a lock on each import; similar to a singleton pattern.
I would leave this to the user to figure out as this is a complex scenario and I don't think it can be solved by code without supporting simultaneous feeds.
But I think #1 and #2 together should solve the problem of predictably timing a regular single import.
Please let me know what you think.
Thanks a lot!