The current parser has lots of issues with parsing different feed formats. The parser of Feeds module was designed a long time ago, when FeedAPI was created, it's not such sleek as well (i remember :) ). Simplepie (http://simplepie.org/) is a huge piece of code. My idea is (at least if we keep aggregator in core :) ), to rewrite aggregator parser based on SimpleXML + XPath.
The basic concept would be the following:
array(
'title' => array(
'/xpath/to/title/in/rss',
'/xpath/to/title/in/atom',
),
'description' => array(
'/xpath/to/desc/in/rss',
'/xpath/to/desc/in/atom',
),
);
and so on. The hearth of the parser would be a nested array with full of XPathes, maintaining the parser would be like adjusting the elements of the array. And then the parser just has to iterate over the array, trying to fetch the items+properties in order.
There are problems of course: handle namespaces, is simplexml xpath support good enough to do everything?
If aggregator won't be cleaned out, i'm happy to provide patches as well in the near future.
| Comment | File | Size | Author |
|---|---|---|---|
| #2 | 1268232_xpath_parser-2.patch | 10.59 KB | aron novak |
Comments
Comment #1
twistor commentedSimplepie is a huge piece of code for a reason :). Why not determine the feed type beforehand and parse based on that? I definitely support rewriting the parser if we can't get something like simplepie in core. Simplepie's license seems prohibitive of this, but there are alternatives.
Comment #2
aron novakMy main goal would be to implement something at least as good what we currently have, but in a more elegant way.
Some of the patch below are not ironed out, can be simplified and also it causes loss of functionality (images for example).
Comment #3
aron novakif we have something like this, we could drop http://api.drupal.org/api/drupal/includes--unicode.inc/function/drupal_x... as well.
Comment #4
aron novakComment #5
ParisLiakos commentedrelated #1839468: [Followup] Replace aggregator rss parsing with Zend Feed
Comment #6
twistor commentedClosing this since we have a bonafide RSS/Atom in core.