Still on Drupal 7? Security support for Drupal 7 ended on 5 January 2025. Please visit our Drupal 7 End of Life resources page to review all of your options.
This is very useful you host your site on Rackspace Cloud instead of Amazon EC2, because (1) it will be faster and (2) you won't be charged for bandwidth between your server and Cloud Files.
Installation
Download and place the 'backup_migrate_cloudfiles' module into your sites/all/modules directory
Download the Cloud Files PHP API from Rackspace's account on GitHub:
This is a parser plugin for Feeds that uses the SimpleHTMLDOM library to extract elements from HTML documents. It can be used to building screen scraping functionality with Feeds, and for automatically importing items from websites that do not support RSS.
Installation
Download the module and extract into your sites module folder. You will also need the Feeds module, and it's dependencies. Enable the module.
Create a new Feed importer. You probably want to use the HTTP fetcher to download the web page. Change the feed configuration to use SimpleHTMLDOM as the parser, then configure the extractions you wish to make from the page (see docs). You then probably want to use the Node Mapper to map these items onto nodes/fields.
Documentation
Full documentation doesn't exist yet, but there's a usage example on my website.
You can read more about the syntax used in the configuration on the SimpleHTMLDOM website.
Notes
These notes are compiled from support requests as they may be of use to you when configuring parsers: