Feature to restore stripped HTML in scraped fields [#504650]

Great module - it helped me to parse a feed with only title & description fields, where more fields were formatted into the description field.

One feature I found missing is to be able to restore HTML code from the decoded fields. In that feed all html entities were encoded (so there would be "&"), and then the "&" stripped. It created a content like this:

amp;lt;bramp;gt;

for a simple break tag "<br>"

That content gets through the scraper and into the node fields. Then nothing can be done to it - it is rendered ugly like that by Drupal.

I made a very simple, and most importantly transparent(*) fix. Please review the attached patch if it can be committed into 1.x-dev. I'm sure others will find it usefull.

Note (*): It will not affect any content that does not have "amp;" in it. If there are "amp;" items, then we restore them to be "&" and decode back into html once. There is very minimal performance penalty, which is negligible compared to the database overheads.

Comment	File	Size	Author
	feedapi_scraper_html_fix.2009-06-28.patch	775 bytes	iva2k

Comments

Comment #1

iva2k commented 26 July 2009 at 20:35

nag nag

Comment #2

ademarco commented 28 July 2009 at 07:32

Committed to HEAD. Thanks for patching.

Comment #3

ademarco commented 28 July 2009 at 07:32

Status:

Needs review

» Closed (fixed)

Comment #4

iva2k commented 30 July 2009 at 05:25

Thanks!

Comment #5

sbydrupal commented 5 August 2009 at 21:18

Hi Iva,

I am thinking to use Feed Scraper to do exactly the same as you mentioned in the initial posting.

The feeds comes with description field, (more specifically Youtube feeds at the moment), which contains
other fields than just description such as views, ratings, ...

Is it possible to exemplify how to achieve this task with feed scraper.

Thanks for any feedback,

Comment #6

sbydrupal commented 6 August 2009 at 01:56

I have uploaded the module and works perfectly.

For future reference for others: Just used xpath query //span and mapped it to description text field.

Thanks for the module.

Feature to restore stripped HTML in scraped fields

Comments

Comment #1

Comment #2

Comment #3

Comment #4

Comment #5

Comment #6

News items

Our community

Documentation

Drupal code base

Governance of community