Great module - it helped me to parse a feed with only title & description fields, where more fields were formatted into the description field.

One feature I found missing is to be able to restore HTML code from the decoded fields. In that feed all html entities were encoded (so there would be "&"), and then the "&" stripped. It created a content like this:

amp;lt;bramp;gt;

for a simple break tag "<br>"

That content gets through the scraper and into the node fields. Then nothing can be done to it - it is rendered ugly like that by Drupal.

I made a very simple, and most importantly transparent(*) fix. Please review the attached patch if it can be committed into 1.x-dev. I'm sure others will find it usefull.

Note (*): It will not affect any content that does not have "amp;" in it. If there are "amp;" items, then we restore them to be "&amp;" and decode back into html once. There is very minimal performance penalty, which is negligible compared to the database overheads.

CommentFileSizeAuthor
feedapi_scraper_html_fix.2009-06-28.patch775 bytesiva2k

Comments

iva2k’s picture

nag nag

ademarco’s picture

Committed to HEAD. Thanks for patching.

ademarco’s picture

Status: Needs review » Closed (fixed)
iva2k’s picture

Thanks!

sbydrupal’s picture

Hi Iva,

I am thinking to use Feed Scraper to do exactly the same as you mentioned in the initial posting.

The feeds comes with description field, (more specifically Youtube feeds at the moment), which contains
other fields than just description such as views, ratings, ...

Is it possible to exemplify how to achieve this task with feed scraper.

Thanks for any feedback,

sbydrupal’s picture

I have uploaded the module and works perfectly.

For future reference for others: Just used xpath query //span and mapped it to description text field.

Thanks for the module.