The current parser has lots of issues with parsing different feed formats. The parser of Feeds module was designed a long time ago, when FeedAPI was created, it's not such sleek as well (i remember :) ). Simplepie (http://simplepie.org/) is a huge piece of code. My idea is (at least if we keep aggregator in core :) ), to rewrite aggregator parser based on SimpleXML + XPath.
The basic concept would be the following:

array(
  'title' => array(
    '/xpath/to/title/in/rss',
    '/xpath/to/title/in/atom',
  ),
  'description' => array(
    '/xpath/to/desc/in/rss',
    '/xpath/to/desc/in/atom',
  ),
);

and so on. The hearth of the parser would be a nested array with full of XPathes, maintaining the parser would be like adjusting the elements of the array. And then the parser just has to iterate over the array, trying to fetch the items+properties in order.
There are problems of course: handle namespaces, is simplexml xpath support good enough to do everything?

If aggregator won't be cleaned out, i'm happy to provide patches as well in the near future.

CommentFileSizeAuthor
#2 1268232_xpath_parser-2.patch10.59 KBaron novak

Comments

twistor’s picture

Simplepie is a huge piece of code for a reason :). Why not determine the feed type beforehand and parse based on that? I definitely support rewriting the parser if we can't get something like simplepie in core. Simplepie's license seems prohibitive of this, but there are alternatives.

aron novak’s picture

StatusFileSize
new10.59 KB

My main goal would be to implement something at least as good what we currently have, but in a more elegant way.
Some of the patch below are not ironed out, can be simplified and also it causes loss of functionality (images for example).

aron novak’s picture

if we have something like this, we could drop http://api.drupal.org/api/drupal/includes--unicode.inc/function/drupal_x... as well.

aron novak’s picture

Status: Active » Needs work
twistor’s picture

Status: Needs work » Closed (won't fix)

Closing this since we have a bonafide RSS/Atom in core.