Sandbox page: http://drupal.org/sandbox/nkschaefer/1216400
The Advanced Aggregator module is designed to make Drupal's core Aggregator module (in version
7 and beyond) more useful, flexible, and scalable. In short, it makes Aggregator feeds "fieldable," adds additional Views integration for Aggregator feeds and items, checks and logs errors (bad HTTP status codes and parse errors) encountered when importing, adds helpful UI components to promote scalability, and provides an "upgrade path" for users who formerly depended on the Feeds module.
In Drupal 6.x, the core Aggregator module was not very flexible. Many users turned to the Feeds module, which was able to treat both RSS feeds and their imported items as nodes, for flexibility. The Feeds module introduced the concept of pluggable fetchers, parsers, and processors, introduced flexibility, and allowed CCK fields to be attached to Feed nodes and imported items. The many options provided flexibility, but created confusion for some users.
Changes made to the Aggregator module and Drupal core in version 7.x have made some Feeds module features unnecessary for many users. The concept of "fieldable entities" allows Aggregator feeds themselves to have attached fields, and keeping imported feed items out of the node table will lead to a performance gain for users with many feeds. Additionally, the simpler UI and smaller set of options makes Aggregator easier to configure for many users.
Aggregator lacks flexibility and scalability in a few key areas, though, so this project seeks to add on to the Aggregator module to make it more suitable for a wider variety of users.
The Advanced Aggregator module does the following:
- Exposes core Aggregator feeds as fieldable entities. Users can now add fields to Aggregator
feeds via a tab at the admin/config/services/aggregator page.
- Adds a Views relationship that allows users to access properties of parent Aggregator feeds
for views built on child Aggregator items. This means that each imported feed item can now
access fields added to its parent feed.
- For users with lots of feeds, makes the Aggregator feed administration page (admin/config/services/
aggregator) scalable by adding pagination and only showing 50 feeds at a time. Additionally,
checkboxes are provided that will allow users to delete multiple feeds at a time, and a search
box is provided to search through feeds by title.
- A table is provided that will store broken and/or redirected Feed URLs. Users can choose to
run a mass check of feed URLs on the site, using Batch API, which will ping the URL of each
stored feed and store any problems in the log table. Users can view a report based on this
table via a view provided by this module, and a form is also provided through which users can
bulk-update redirected URLs.
- A utility is provided for users who (like myself) had formerly relied on the Feeds module and
had lots of Feed nodes. The utility converts the feed nodes to core Aggregator feeds and attempts
to preserve any Field API data associated (if you're upgrading from Drupal 6, be sure to
upgrade all CCK data to Field API first). Once you've done this, you can safely uninstall
Feeds and drop its tables.
- An alternative fetcher and parser are provided via the Aggregator module's hooks. The parser
is designed to avoid a problem that can arise with importing feeds: title and author fields
can contain too much data and result in database errors when insertion is attempted. The parser
provided by this module (Advanced parser) lets the default parser act, then truncates the title
and author fields, on word boundaries, to the maximum allowable size in the database.
The fetcher is designed to log errors to the provided error log. The default fetcher simply
calls drupal_set_message(), which is not useful for sites with lots of feeds importing on cron
runs. This fetcher (Advanced fetcher) logs all bad (non-200-level) HTTP status codes to the error
log, along with redirected URLs (if a 300-level status code is encountered).
Additionally, the parser sometimes encounters errors when trying to parse a document as RSS XML.
If the parser encounters an error, it also stores the error in the log, where administrators
can view it later.