Closed (fixed)
Project:
Feeds
Version:
6.x-1.0-alpha11
Component:
Miscellaneous
Priority:
Normal
Category:
Support request
Assigned:
Unassigned
Reporter:
Created:
17 Feb 2010 at 20:30 UTC
Updated:
12 May 2010 at 17:10 UTC
Jump to comment: Most recent file
Comments
Comment #1
alex_b commentedUse http://drupal.org/project/drupal_queue
Closed?
Comment #2
ivanbueno commentedYes, thanks!
Comment #3
ivanbueno commentedDo I need to write a new FeedsScheduler that will add the feed to the queue table faster than cron() could?
Correct me if I'm wrong, here's how I see Feeds + Drupal Queue works:
* cron() adds feeds_queue to Drupal Queue
* Drupal Queue will remove feeds_queue once it is "drush queue cron" is ran.
If I want to refresh a particular Feed Importer every 10 seconds, how do I add it to Drupal Queue faster than cron() hook could? Do I need to, is there another way?
Comment #4
ivanbueno commentedI created some drush commands for the feeds module. (The file is coded for Drush 3.)
The commands available are:
drush feeds-config
* Displays all active importers or displays the config of a given importer (passed as arg).
drush feeds-refresh
* Refreshes a feed based on its schedule.
drush feeds-queue
* Adds a scheduled feed to the drupal_queue. (Needs to be run in conjunction with "drush queue cron".)
I will work on the shell script that will execute these drush commands.
Comment #5
alex_b commentedExactly.
10 seconds! Using dedicated drush commands (like you posted in #4) is probably the way to go. Out of curiosity, what are you importing 6 times in a minute?
Could you post #4 on #608408: Drush integration for Feeds ?
Comment #6
ivanbueno commentedI'm fetching NewsML (an xml standard for multimedia news) files from a server via SSH. If a newsml file popups up in the server, the feeds module has to fetch, parse, and process it RIGHT AWAY in the drupal site (in less than 60 seconds).
For this, I had to create these plugins:
* SSHFetcher
* NewsMLParser
* NewsMLProcessor
As a side note, the NewsML has a very different structure than RSS/syndication xmls. That's why the FetcherResult object in Alpha9 suited me really well; it has no specified structure. With FeedsImportBatch, the only variable I'm using is the $items. Its getRaw() does not quite work for me because I'm fetching it via SSH not http.
Comment #7
alex_b commented"Its getRaw() does not quite work for me because I'm fetching it via SSH not http.
Did you try to override it in your own fetcher then? Compare http://drupalcode.org/viewvc/drupal/contributions/modules/feeds/plugins/... to http://drupalcode.org/viewvc/drupal/contributions/modules/feeds/plugins/... ...
Comment #8
ivanbueno commentedThe logic was inside fetch()... creating an SSHBatchImport is much better, and I can still ensure that all SSH connections only happen in the fetch phase.
Thanks for the tip!!
Comment #9
alex_b commentedI just had the time to review the drush support in #4. While at first I asked you to post it over on the drush integration issue I now realize that there may be some fundamental misunderstanding on how feeds/drupal queue work together. I'd like to explore your use case a little better - that's why I am posting back here.
I see a lot of duplicated code and direct calls to feeds_scheduler_work() - which should be a strict callback invoked from drupal_queue_cron_run().
I wonder what made you duplicate specifically this functionality. What problems did you face that made you break out queuing and refreshing in this way?
Comment #10
ivanbueno commentedMy use case calls for high availability, and the news feeds need to be parsed instantly. I had to create a feeds-refresh command outside of the drupal_queue to avoid cases where there's a lot of other jobs in the queue, which might slow down the news feed parsing for this particular importer.
If the queue has a mechanism that will push my high-priority job to the top, then I would use that. Right now, it's easier to create a dedicated process to handle the importing.
Additional question: when "drush queue cron" is run, does it fire-off the callbacks sequentially or in parallel? How does it determine which to run first?
Comment #11
alex_b commentedRight, but do you actually need that availability for many feeds or just for very few? Because if you don't have many feeds you don't need the queue.
drush queue cron fires of 1 process pulling on item after the other from the queue. You have to dispatch multiple drush queue cron commands to run multiple processes.
Comment #12
ivanbueno commentedIt's not high volume. On average, about 20 items every 10 seconds. Plus, I'm capping the fetcher to only get 100 items per cycle.
What's the benchmark on the Feeds module on how much feed items it can fetch, parse, process without timing out or breaking down? With my testing, the feeds module was able to process 250 feed items (24kb each) in under 40 seconds. So I'm assuming 100 items per cycle is still safe.
Thanks for the drush queue explanation. So it is possible to run parallel executions.
Comment #13
alex_b commentedClosing this support request in favor of #608408: Drush integration for Feeds.
See #754626: Performance - max number of feeds and documentation for more info on performance.