Closed (works as designed)
Project:
Feeds
Version:
6.x-1.0-beta9
Component:
Code
Priority:
Normal
Category:
Support request
Assigned:
Unassigned
Reporter:
Created:
2 Oct 2010 at 16:27 UTC
Updated:
10 May 2013 at 22:27 UTC
I've got an RSS feed from a Wordpress blog that only displays a headline and a summary, but the RSS feed contains a section that has the full-text of the blog posts.
I want to import the feed and map that section to the body field of my nodes.
How do I do that? Do I need to write a parser for feeds?
Comments
Comment #1
alex_b commentedThis *may* be fixed by patching the parser you're using.
- What parser are you using?
- Can you post an example feed? Specifically, I don't know what namespace "content:encoded" is in. What is the namespace of 'content' here?
Comment #2
bflora commentedHi, Alex!
Here's the feed: http://www.bearsbeat.com/blog/feed/ check the source to see what I mean.
For parser, I'm using the common syndication parser. Thanks!
Comment #3
stefan81 commentedHi
I have the same issue.
I can import this source through the Common syndication parser (Parse XML feeds in RSS 1, RSS 2 and Atom format).
But I have no mapper for the full content, seen below
<content:encoded>I would be mostly grateful if someone can point me into the right direction.
Here‘s a sample code:
Comment #4
stefan81 commentedI had a look into common_syndication_parser.inc
Apparently it seems to be implemented already?
and
On line 293 I changed
to
Now it picks up the
content:encoded.So It works basically.
Maybe a candidate for a minor patch to fix the flaw?
Unfortunately I am not experienced enough for a serous attempt.
Comment #5
dman commentedThe fix you applied is in mostly the right place ...
The problem here is that _parser_common_syndication_RDF10_property goes looking for any one of 'rss:description', 'dc:description', 'content:encoded' to place into the body ... because throughout different feeds, these are usually equivalent.
YOUR SAMPLE FEED has TWO of these in it - both rss:description AND content:encoded.
The data extraction function finds the first one and returns it immediately. It can't put both values into one target without overwriting, and you never want to concatenate.
(this behavior is the reverse of what happens in other cases of conflict in feeds - I've also seen where the LAST valid match wins with an overwrite)
So, your state is sort of ambiguous. In this case the description is empty and therefore useless to you, so your expected behavior is to carry on looking for the next candidate. #1092652: Possible to allow for blank fields and not overwriting existing data?
But ... there is the generic edge case where sometimes a feed actually does need to update over an existing value with a null value (though damned If I can think of an example where that would actuall be the desired result)
OTOH, maybe it makes sense to just change the order of the fields that get scanned to apply some logical weighting.
Comment #6
jelo commentedI just tried this out in version 7 and it works fine. Feeds imports from either or. However, the issue I ran into is that the feed does not validate if no description is provided. According to some sources, including the feed validator tool, they expect a plain text summary in the description field AND a full text encoded version in the content:encoded field. Apparently, it is not okay to ONLY provide content:encoded. At least this is what I found. Maybe someone with a better understand of the existing standards could clarify...
If that indeed were true, feeds should maybe not treat description and content:encoded as synonymous, but should indeed have a mapping option for each, e.g. map description to a summary/teaser field and content:encoded to body.
In my case I control the feed as source and the destination, i.e. I have abandoned content:encoded and stick everything I need to transfer into the description field.
Given the age of this thread and that it appears to work as intended, I changed the status.