Aggregator titles display quotes and other characters with HTML entity equivalents badly (write tests) [#61456]

Comment	File	Size	Author
#119	core_aggregator_title_entities_test-61456-screenshot_results-119.png	23.42 KB	mimes
#110	core_aggregator_title_entities_test-61456-d7-110.patch	2.04 KB	albert volkman
#108	core_aggregator_title_entities_test-61456-d7-108.patch	2.01 KB	albert volkman
#106	sigh.jpg	349.11 KB	aascherson
#103	core_aggregator_title_entities_test-61456-d7-103.patch	2.04 KB	albert volkman
#101	core_aggregator_title_entities_test-61456-d7-101.patch	2.05 KB	albert volkman
#99	core_aggregator_title_entities_test-61456-d7-99.patch	2.08 KB	albert volkman
#91	core-aggregator-title_entities-test-61456-91.patch	2.06 KB	pillarsdotnet
#83	61456-83.patch	2.08 KB	xjm
#81	61456-80-core-aggregator-title_entities_test.patch	2.08 KB	David_Rothstein
#80	61456-80-core-aggregator-title_entities_test.patch	2.08 KB	David_Rothstein
#79	61456-79-core-aggregator-title_entities_test.patch	2.1 KB	jeffschuler
#67	61456_html_decode_titles_test.diff	2.19 KB	jeffschuler
#55	61456_html_decode_titles_2.patch	1021 bytes	jeffschuler
#50	61456_html_decode_titles.patch	1.04 KB	alex_b
#46	aggregator_final.patch	1.05 KB	jdefay
#44	aggregator_final.patch	1.05 KB	jdefay
#17	aggregator_checkplain.patch	1.86 KB	csevb10
#10	aggregator_47.patch	1.88 KB	edmund.kwok
#9	aggregator_3_0.patch	1.88 KB	edmund.kwok
	aggregator_2.patch	828 bytes	Steve Dondley

Comment #1

Steve Dondley commented 2 May 2006 at 17:41

Important clarification. This occurs when the feed already has html entities in it. For example, if the feed has a title of:

It&amp;#39;s an Honor

(It's an Honor)

When it gets run through check_plain, the ampersand in the html entity get converted to an html entity, and it becomes:

It&amp#39;s an Honor

So the final output is:

It&#39;s an Honor

in the title.

Log in or register to post comments

Comment #3

Steven commented 7 May 2006 at 15:02

Status:

Needs review

» Needs work

I'm pretty sure this is correct behaviour. Most feed fields contain regular, unmarked up text. Double-escaping like that in the source feed is only valid if the field contains escaped, marked up HTML text (such as the body).

In any case, this needs to be solved at the parsing stage, not the output stage. Drupal takes feeds in various formats.

Finally, using strip_tags() on output as a validation measure is disallowed. It should only be used as a conscious filtering op (such as done for taxonomy terms in title attributes).

Log in or register to post comments

Comment #4

Steve Dondley commented 8 May 2006 at 16:16

> Most feed fields contain regular, unmarked up text.

Well, Google uses htmlentities. They are a pretty major supplier of feeds.

Log in or register to post comments

Comment #5

Roadskater.net commented 27 May 2006 at 03:05

yes google news is pretty important and i like it, and it would be nice to have at least a checkbox available in aggregator setup to specify whether to unescape the title. i know nothing about this, and can't do this myself, i'm sorry to say. i had other google news related problems (duplicate news items) so i'm open to suggestions from those who may have good reasons for thinking google news is not important, or not best. thanks for anything anyone does to help us. please share if there's an adopted fix. thanks.

Log in or register to post comments

Comment #6

kastaway commented 14 July 2006 at 17:56

Version:

x.y.z

» 4.7.2

Does anybody have brainstorming ideas of how to work around this? I'd settle for just dropping the entities from the RSS feed, which would be better than the current mash of ascii characters on the front page of the site.....

Log in or register to post comments

Comment #7

matt westgate commented 21 July 2006 at 21:47

Rather than strip_tags(), you'll want to use filter_xss($item->title, array()) around the item title. It's been shown on the security mailing list that strip_tags() can be bypassed in terms of XSS exploits.

Log in or register to post comments

Comment #8

rosenblum68 commented 18 September 2006 at 17:16

Version:

4.7.2

» 4.7.3

I downloaded newest version of 4.7.3 aggregator.module, and it still has the problem. I has to modify the file in multiple locations (using advise for filter_xss) to get it to work both in the block AND the "more" summary page... just a heads up. http://wellweight.org

Log in or register to post comments

Comment #9

edmund.kwok commented 5 October 2006 at 18:14

Version:	4.7.3	» x.y.z
Assigned:	Steve Dondley	» edmund.kwok
Status:	Needs work	» Needs review

Status	File	Size
new	aggregator_3_0.patch	1.88 KB

Changed strip_tags to filter_xss as suggested and in two other places, block_item and summary_item, including the original page_item.

Btw, issue also applies in HEAD, http://drupal.org/node/63459, marking that duplicate because this was reported first. Also, changing version to cvs to get more attention.

Patch is for cvs version.

Log in or register to post comments

Comment #10

edmund.kwok commented 5 October 2006 at 18:15

Status	File	Size
new	aggregator_47.patch	1.88 KB

Patch for 4.7 cvs.

Log in or register to post comments

Comment #11

ahoeben commented 9 October 2006 at 08:26

In this issue (which got sidetracked), I suggested cleaning up the title before it is inserted in the database. This has the obvious benefit of cleaning it up only once, making my patch a lot smaller than the one proposed here.

The argument against cleaning up the tile before insertion in the database is that it is not the Drupal way to do so. For example, when adding a title to a node, the title is inserted into the database as is (after making it database safe); no removal of tags etc. The rationale behind this is that when the user goes to edit the node, he would expect his/her original input and not a parsed/filtered version of it. There is an obvious need to keep the original input, for consistency to the user.

In the case of the aggregator module however, there is no need to store the original input, which is machine generated and which will never be edited. So I'ld say: filter before insertion in the database, and be done with it.

Log in or register to post comments

Comment #12

jacauc commented 22 November 2006 at 12:35

Version:

x.y.z

» 5.x-dev

Will this patch be part of drupal 5.0 core?
I am running drupal 5 CVS on a test site, and I still see & # 3 9 ; (spaces inserted there to prevent it from being parsed)

Log in or register to post comments

Comment #13

drumm

he/him

NY, US

commented 27 November 2006 at 00:47

Status:

Needs review

» Needs work

Tags need to be stripped from titles.

Log in or register to post comments

Comment #14

edmund.kwok commented 27 November 2006 at 08:42

Assigned:

edmund.kwok

» Unassigned

Releasing this issue back into the wild. I'm not sure what's the best way to fix this..

Log in or register to post comments

Comment #15

msmiffy commented 22 February 2007 at 01:30

I see that this problem still exists in 5.1. Any sign of a permanent, main-stream resolution? I'll just go hack the code for now as this is UGLY.

Log in or register to post comments

Comment #16

jacauc commented 22 February 2007 at 05:33

I agree 100%

Log in or register to post comments

Comment #17

csevb10 commented 10 April 2007 at 21:14

Version:	5.x-dev	» 6.x-dev
Assigned:	Unassigned	» csevb10

Status	File	Size
new	aggregator_checkplain.patch	1.86 KB

I repatched this for Drupal 6.
There's an aggregator_filter_xss so you can actually filter the feed content independent of the site content which seems like a slightly better solution than invoking the filter_xss directly. I patched the same 3 areas as before.
What does everyone think of this as a viable solution moving forward?

Log in or register to post comments

Comment #18

csevb10 commented 11 April 2007 at 16:24

Status:

Needs work

» Needs review

Log in or register to post comments

Comment #19

agentrickard

he/him

English

Georgia (US)

commented 11 June 2007 at 02:53

I'm going to make a run at closing the Aggregator queue for D6.

Personally, I like http://drupal.org/files/issues/aggregator-striptags.patch and agree with ahoeben from #11.

This is machine-text, so keeping the "integrity" of the original title string is not relevant. Stripping the entities seems perfectly fine during the parsing stage.

Log in or register to post comments

Comment #20

catch

he/him

English

commented 30 October 2007 at 11:56

Status:

Needs review

» Needs work

No longer applies.

Log in or register to post comments

Comment #21

ezra-g commented 10 November 2007 at 21:48

Since it was only one line, in D5.3 I just made the quick edit, removed the items from the feed, updated the feed and had the same #39 business.

Log in or register to post comments

Comment #22

underpressure commented 7 January 2008 at 21:00

What is the status of this problem? I use Google news and get this ugly output as well.

Log in or register to post comments

Comment #23

alpha2zee commented 12 February 2008 at 08:25

Drupal's architecture needs to be changed to allow admins complete control over filtering of both input (what is stored in the database) and output (the final, displayed HTML). Further, this filtering should be customizable as per every content type (newsfeed, user comment, and so on). E.g., machine-generated newsfeed titles (database stored) can be filtered for XSS, appropriate HTML tags, etc., during the input stage to avoid the overhead of similar filtering everytime the title is output as web page content.

Also, appropriate code/principles should be used from better (than Drupal's Filter module) filtering scripts like htmLawed and HTMLPurifier. Though the mentioned scripts can be used in plugged-in modules, they then either cannot have certain functionalities (like, to address the newsfeed-entity issue), or re-do some actions that Drupal's core does anyway (increasing the processing time).

Log in or register to post comments

Comment #24

cburschka

they

commented 12 February 2008 at 08:48

What is the valid standard in XML - " or " or & #39;? I don't know how much more attention people pay to feed validity than website validity - would it be enough to require validity and reject quirky feeds, or do we lose a lot of functionality?

Edit: I note that whatever filter is here on d.o. is one whose code I would probably yell upon seeing. You do not unescape & amp; # 39; twice to make '. You just don't.

Log in or register to post comments

Comment #25

alpha2zee commented 13 February 2008 at 00:24

Lines 289, 1179, 1267, 1352, 1367 and 1394 of the drupal/modules/aggregator/aggregator.module file (version timed 1-10-08 22:14) have code like:

... check_plain($item->title) ...

Looks like replacing all those six check_plain occurrences with aggregator_filter_xss should take care of the issue.

Log in or register to post comments

Comment #26

jdefay commented 10 June 2008 at 22:58

I implemented the changes suggested by ahoeben above in his patch in post #11. Then I tried changing all the check_plain references to aggregator_filter_xss, but still got funky characters in Google News feeds on my site .

ahoeben's patch seems to clean titles that show up in feed blocks, but not in the Drupal aggregator category view which was still giving me funky single quotes in feed titles. It looks like the funky symbols are still being stored in the database, and ahoeben's patch fixes them when they are displayed in blocks.

Since the funky stuff was still getting displayed I wanted to remove them before they go into the database & thus make ahoeben's display patch unnecessary. So I added htmlspecialchars_decode() in the aggregator.module aggregator_parse_feed function.

It works great, whenever a "& #39;" shows up in the title of my news feeds, it's replaced with a single quote and then inserted into the database for later use.

Here's the line before:

$title = $item['TITLE'];

Here it is after:

$title = htmlspecialchars_decode($item['TITLE'], ENT_QUOTES);

Log in or register to post comments

Comment #27

TimAlsop commented 9 August 2008 at 22:00

Can somebody let me know what patch/changes are recommended for a Drupal 6.3 environment ? I am happy to modify code, but wanted to know which code changes to make so that these special characters are changed before the google news feed items are written to the database.

Log in or register to post comments

Comment #28

dominich commented 11 August 2008 at 19:39

For me on 6.3 it's as the above post. I have a google news feed displayed in a block on a site, and the title links were showing &blah; style text - much in the same way that Firefox RSS feed reader does for sites like Slashdot et all interestingly.

Anyway - all I had to do was as jdefay says:

add htmlspecialchars_decode() in the aggregator.module aggregator_parse_feed function.

Here's the line before:

$title = $item['TITLE'];

Here it is after:

$title = htmlspecialchars_decode($item['TITLE'], ENT_QUOTES);

jdefay rox.

D

Log in or register to post comments

Comment #29

TimAlsop commented 11 August 2008 at 22:16

I tried this, but when I update feed I get:

Fatal error: Call to undefined function: htmlspecialchars_decode() in /data01/mysite/public_html/drupal/modules/aggregator/aggregator.module on line 738

Is there something else I need to do so that this function is available ?

Thanks,
Tim

Log in or register to post comments

Comment #30

TimAlsop commented 11 August 2008 at 22:24

I just noticed that the htmlspecialchars_decode() function is only available in PHP >= 5.1.0. I am using PHP 4.4.4 which is supported by Drupal 6.3. Unfortunately I am unable to upgrade PHP because my site is hostsed on a server managed by my ISP and uses cpanel. So, the version of PHP is out of my control.

Is there a PHP 4.4.4 function which will do same/similar, or do I need to suffer this until PHP 5.1.0 or later is available ?

Thanks,
Tim

Log in or register to post comments

Comment #31

TimAlsop commented 11 August 2008 at 22:34

I found some suggestions at http://uk.php.net/htmlspecialchars_decode

So, I have now changed code as shown below:

if ( !function_exists('htmlspecialchars_decode') )
{
    function htmlspecialchars_decode($text)
    {
        return strtr($text, array_flip(get_html_translation_table(HTML_SPECIALCHARS)));
    }
}

    // Resolve the item's title. If no title is found, we use up to 40
    // characters of the description ending at a word boundary but not
    // splitting potential entities.
    if (!empty($item['TITLE'])) {
      $title = htmlspecialchars_decode($item['TITLE'], ENT_QUOTES);
    }

I don't get any error now, so I will see if I get any news items with special characters included. I will report my findings in next few days.

Log in or register to post comments

Comment #32

bradnana commented 25 November 2008 at 04:32

This bug still exists as of D6.6 in core aggregator.module. Any plans for a patch to be brought into core?

--Brad

Log in or register to post comments

Comment #33

GetActive commented 14 January 2009 at 21:53

Version:

6.x-dev

» 6.8

It also/still happens on the "domain.com/user" page where the title is generated dynamically below the login form.

Log in or register to post comments

Comment #34

domesticat commented 22 January 2009 at 14:27

I can confirm that @jdefay's patch (#26) works well for >PHP5; I've been using it for months. I have not tested the other version, nor can I vouch for whether or not doing this is 'the Drupal way.' (I just know that my users need the feeds to work properly!)

Log in or register to post comments

Comment #35

problue solutions

Northern Ireland

commented 6 February 2009 at 11:31

thank you jdefay in #26, after much useless information your advice has finally solved my problem :)

Log in or register to post comments

Comment #36

cvining commented 26 February 2009 at 15:59

jdefay,

I just looked at your site, which has a Google News Feed here: http://jason.defay.org/aggregator/categories (titled 'UCSD In the News').

How did you get that to work? When I setup something similar, all the ampersands (&) are stripped out of the links, which breaks them. Example:

http://news.google.com/news/url?sa=T&ct=us/0-0&fd=R&url=http://www3.sign...

http://news.google.com/news/url?sa=Tct=us/0-0fd=Rurl=http://www3.signons...

Any suggestions? Thx.

-- Cronin

Log in or register to post comments

Comment #37

jdefay commented 26 February 2009 at 17:11

Hi Cronin,

I just looked at the Google News Feed 'UCSD In the News' you mentioned on my site and noticed it does in fact have the ampersands in the links. Are you seeing something different on my site?

I just upgraded to 6.10 and haven't reimplemented any of these patches described in this thread. So I can't say for certain if the patches will actually strip the Google tags or not.

What I can say is that I got frustrated with the way Google feeds were causing frequent problems with the Drupal aggregator. I switched over to Yahoo news feeds and am getting consistently clean results. It's too bad because I like the Google news service better and would prefer to use it over Yahoo. I just didn't want to deal with headaches anymore.

Jason

Log in or register to post comments

Comment #38

jdefay commented 26 February 2009 at 19:05

All,

Since Cronin asked about it, I decided to go in and re-implement the simple patch I described in my earlier post to this thread.

I added the htmlspecialchars_decode function back into the aggregator.module, removed all the old items from the news feed, then refreshed it. Viola, no more funky characters in the news feed titles.

Since several new updates/releases of Drupal have happened since my original patch, and all of them still contain the old problematic PHP code, perhaps someone can volunteer to help me get this fix added to the next version of Drupal so we don't all have to keep re-patching every time a new release comes out.

Please feel free to respond to this thread and/or email me if you can help make this happen.

Cheers,

Jason

Log in or register to post comments

Comment #39

jdefay commented 26 February 2009 at 19:45

Status:

Needs work

» Reviewed & tested by the community

Tim,

Have you had any errors on this yet? If not, your aggregator patch seems to be a better, more backward compatible, improvement over mine.

I implemented your version and tested it on my site. It works great so I'd appreciate it if any of you who are reading this & willing to help would do the same and report back here.

Cheers,

Jason

Log in or register to post comments

Comment #40

jdefay commented 26 February 2009 at 20:39

Oops, I shouldn't have changed the patch status since it looks like the original patch isn't the one we trying to implement. I'm setting back to 'code needs work' until someone can create and submit a patch that meets the Drupal patch standards. Here is what I'm proposing someone help me do:

Write a valid patch that replaces line 737 through 741 with the following:

if ( !function_exists('htmlspecialchars_decode') )
{
    function htmlspecialchars_decode($text)
    {
        return strtr($text, array_flip(get_html_translation_table(HTML_SPECIALCHARS)));
    }
}

    // Resolve the item's title. If no title is found, we use up to 40
    // characters of the description ending at a word boundary but not
    // splitting potential entities.
    if (!empty($item['TITLE'])) {
      $title = htmlspecialchars_decode($item['TITLE'], ENT_QUOTES);
    }

As soon as that patch is up and tested I'll work on getting it into the queue for the next Drupal release.

Cheers,

Jason

Log in or register to post comments

Comment #41

domesticat commented 26 February 2009 at 19:48

Status:

Reviewed & tested by the community

» Needs work

Been using this patch for a while with no issues; would be glad to see it get in the next version.

Log in or register to post comments

Comment #42

cvining commented 27 February 2009 at 01:41

Jason,

"I just looked at the Google News Feed 'UCSD In the News' you mentioned on my site and noticed it does in fact have the ampersands in the links. Are you seeing something different on my site?"

No, your site looks fine. But my site (http://www.zts.com, see the section "Google Web Search Filtered for Recent Thermoelectric Content"). Check the links: "&"s are stripped out, and broken. I checked and they are stored in the DB that way too.

I just upgraded to drupal 6.9 (a day before 6.10 came out!).

Any thoughts on what's going on? Thanks for the replies.

-- Cronin

Log in or register to post comments

Comment #43

jdefay commented 27 February 2009 at 20:10

Version:	6.10	» 6.8
Assigned:	jdefay	» csevb10

Sounds like the patch isn't working for some reason. Have you done all of the following?

1. Checked to make sure you have PHP version 5.1.0 or greater installed on your server
2. Changed line 741 in the aggregator.module from $title = $item['TITLE']; to $title = htmlspecialchars_decode($item['TITLE'], ENT_QUOTES);
3. Removed all previous posts from your DB and re-imported them.

You might also try out the patch I just created below. Please give me feedback on the patch, it's in need of QA testing!

Thanks,

Jason

Log in or register to post comments

Comment #44

jdefay commented 27 February 2009 at 20:09

Version:	6.8	» 6.10
Assigned:	csevb10	» jdefay

Status	File	Size
new	aggregator_final.patch	1.05 KB

Hi everyone,

I've created and tested the attached patch on my Drupal 6.10 site and am looking for volunteers to do additional patch testing.

The patch chages the aggregator.module as generally descibed in my posts above. It first checks to see if the server has the htmlspecialcharacters_decode() function available (PHP 5.1.0 and greater). If so it uses the function to clean up HTML in the article titles and saves the results to the database. If the function isn't there for some reason, it defaults to the prior method for saving article titles.

I'll give it a few weeks for testing, then if I don't hear anything I'll assume it's good to go and try to get it into the next patch release.

Thanks,

Jason

P.S. Thanks to domesticat for turning me on to WinMerge!

Log in or register to post comments

Comment #45

cvining commented 28 February 2009 at 20:06

Version:	6.8	» 6.10
Assigned:	csevb10	» jdefay

Jason,

Thanks much for the help, but after further testing & searching I'm pretty sure my problem is distinct from this thread.

A drupal thread (Aggregator Module Shows HTML Gibberish) which described my problem quite closely (dropping of key symbols like "&") is at this link: http://drupal.org/node/346990

It seems there is a known bug in one of the php libraries, libxml2 2.7.1,which strips out key html symbols (like <, >, & etc.). The bug is described here: http://bugs.php.net/bug.php?id=45996

My host's php installation has libxml2 2.7.2, which (if I'm reading the libxml release notes right) doesn't have the fix in it. But the latest release, libxml2 2.7.3, should. So only a few systems will be affected by this particular bug, and it will go away as the php libraries get updated.

In mean time, guess I'll use some work-around.

Thanks again .

Log in or register to post comments

Comment #46

jdefay commented 18 March 2009 at 04:42

Status:

Needs work

» Reviewed & tested by the community

Status	File	Size
new	aggregator_final.patch	1.05 KB

I fear there may not be sufficient traffic on this thread to get additional patch testing done. After waiting a few weeks for testers to evaluate the final version of the patch, and not getting any responses, I've gone ahead and updated the status to "reviewed & tested by the community".

While nobody seems to have tested the final version, it was tested and improved by a number of users since it was first created in May of 2006 by Steve Dondley. Since those earlier iterations led us to this final version of the patch, I think it has been sufficiently tested to be committed to the next version of Drupal.

I apologize in advance for not being able to get other developers to provide a supporting opinion on the patch. However, I'm confident this patch will work just fine, particularly since the final patch checks to make sure the htmlspecialchars_decode() function is available on the server before running it.

Cheers,

Jason

Log in or register to post comments

Comment #47

jdefay commented 18 March 2009 at 16:54

Version:

6.10

» 6.x-dev

After re-reading the Drupal PATCH instructions, it appears that I should be upgrading this bug to Version "6.x-dev", so that's what I just did :-)

Log in or register to post comments

Comment #48

packdragon commented 25 March 2009 at 00:52

I've been having problems with my aggregator in that my feeds are displaying HTML code. I've got Drupal 6.10 installed. This patch sounds like exactly what I need. I clicked on the attachment, but it appears to display some text in my browser. How do I download and install this patch?

Log in or register to post comments

Comment #49

catch

he/him

English

commented 25 March 2009 at 01:19

Version:	6.x-dev	» 7.x-dev
Status:	Reviewed & tested by the community	» Needs work

The patch seems to be reversed - htmlspeciachars_decode() is removed, not added by #46.

Additionally the current development version of Drupal is 7.x rather than 6.x - where possible we fix issues in 7 first, then backport. http://api.drupal.org/api/function/aggregator_parse_feed hasn't changed that much (although it's in a different file now), so it'd be roughly the same change to make, although Drupal 7 requires PHP 5.2 or greater, so no need for the function_exists() there. Only did a very cursory review of the patch, but if you could re-roll against 7 that'd help to get some more reviewers.

Log in or register to post comments

Comment #50

alex_b commented 26 March 2009 at 03:13

Status:

Needs work

» Needs review

Status	File	Size
new	61456_html_decode_titles.patch	1.04 KB

This might work.

Log in or register to post comments

Comment #51

cburschka

they

commented 17 May 2009 at 09:24

Status:

Needs review

» Needs work

The patch looks good, and this is a great idea, but we definitely need a test case for it.

Log in or register to post comments

Comment #52

jdefay commented 11 July 2009 at 05:27

Status:

Needs work

» Needs review

Thanks for re-rolling this Alex_b, sounds like we're ready to test & patch

Log in or register to post comments

Comment #53

11 July 2009 at 05:50

Status:

Needs review

» Needs work

The last submitted patch failed testing.

Log in or register to post comments

Comment #54

jeffschuler

Boulder, Colorado

commented 17 August 2009 at 20:14

Assigned:	jdefay	» Unassigned
Status:	Needs work	» Needs review

Re-rolled for current D7 HEAD.

Log in or register to post comments

Comment #55

jeffschuler

Boulder, Colorado

commented 17 August 2009 at 20:15

Status	File	Size
new	61456_html_decode_titles_2.patch	1021 bytes

Re-rolled for current D7 HEAD.

Log in or register to post comments

Comment #56

26 August 2009 at 11:49

Status:

Needs review

» Needs work

The last submitted patch failed testing.

Log in or register to post comments

Comment #57

jeffschuler

Boulder, Colorado

commented 31 August 2009 at 20:34

Status:

Needs work

» Needs review

Testbot says that the Error handlers test failed. It passes on my machine.
Patch still applies to HEAD.
Re-setting to Needs Review for another shot.

Log in or register to post comments

Comment #58

31 December 2009 at 15:59

Re-test of 61456_html_decode_titles_2.patch from comment #55 was requested by Arancaytar.

Log in or register to post comments

Comment #59

MichaelCole commented 1 May 2010 at 21:53

#55: 61456_html_decode_titles_2.patch queued for re-testing.

Log in or register to post comments

Comment #60

s4j4n commented 28 June 2010 at 21:53

subscribing

Log in or register to post comments

Comment #61

gausarts commented 29 November 2010 at 11:14

I have my aggregator feed titles ended with  .
Subscribing. Thanks

Log in or register to post comments

Comment #62

jody lynn

she/her

English

commented 5 January 2011 at 21:34

Version:

7.x-dev

» 8.x-dev

Log in or register to post comments

Comment #63

agentrickard

he/him

English

Georgia (US)

commented 5 January 2011 at 21:43

So after 5 years, we should just declare that we don't care and close this.

Moving to D8 is pointless.

Log in or register to post comments

Comment #64

jody lynn

she/her

English

commented 5 January 2011 at 21:50

LOL, I don't think this one is really that bad, and it's finally in 'needs review'. (I was trying to update the versions on the more promising issues).

Log in or register to post comments

Comment #65

agentrickard

he/him

English

Georgia (US)

commented 6 January 2011 at 14:41

Version:	8.x-dev	» 7.0
Status:	Needs review	» Reviewed & tested by the community

Well, the patch is green, and should be RTBC, so let's give it a try for 7.1.0.

Log in or register to post comments

Comment #66

jeffschuler

Boulder, Colorado

commented 6 January 2011 at 22:25

Status:

Reviewed & tested by the community

» Active

I started writing a test for this for verification/assurance. The included patch adds this test but does not include the patch in #55, (nor any other proposed changes in this issue,) in order to allow testing with-and-without such changes.

The test in its current form adds a feed with an item whose title includes a quote (") and an ampersand (&).

However... these look to be displaying properly -- using SimpleTest and adding the feed manually -- without patching.
Am I testing this properly?
Is anyone actually seeing this error in D7?

I had some issues with testing the apostrophe ('). I see the correct output (manually and in SimpleTest's verbose results) when I use it in the test feed item's title, but assertText() doesn't seem to acknowledge it.

Changing this issue back from RTBC seems appropriate.

Log in or register to post comments

Comment #67

jeffschuler

Boulder, Colorado

commented 6 January 2011 at 22:27

Status	File	Size
new	61456_html_decode_titles_test.diff	2.19 KB

sorry, here's the test...

Log in or register to post comments

Comment #68

agentrickard

he/him

English

Georgia (US)

commented 6 January 2011 at 22:44

@jeffschuler

Now if you can just combine the patch and the test into one, set the issue to Needs Review and let TestBot try it out.

Log in or register to post comments

Comment #69

jeffschuler

Boulder, Colorado

commented 6 January 2011 at 22:54

@agentrickard:

Sorry if I wasn't clear: the test passes without applying the patch.

We should demonstrate that something is broken, [change the test to do so,] before fixing it.

Log in or register to post comments

Comment #70

agentrickard

he/him

English

Georgia (US)

commented 7 January 2011 at 14:28

So you are suggesting that this actually got fixed somewhere else?

Log in or register to post comments

Comment #71

agentrickard

he/him

English

Georgia (US)

commented 7 January 2011 at 14:29

Status:

Active

» Needs review

Marking 'needs review' to test the test patch.

Log in or register to post comments

Comment #72

jeffschuler

Boulder, Colorado

commented 7 January 2011 at 16:57

Either it's been fixed elsewhere or I'm not testing it properly.

I just tried the same test feed (generated in #67) in Drupal 6 and the entities displayed properly. So, perhaps I'm not testing the correct conditions?

The test uses a manually-generated RSS 0.91 feed (copied from the existing RSS091 test file) whose only item's title includes & and " entities.

I've added items to this feed using the title phrase "It&39;s an Honor" (from comment #1,) as well as a few other examples folks cited as problems in this issue, and they all displayed properly in D6 and D7.

Maybe someone experiencing this issue could point us to a feed they're having a problem with?

Log in or register to post comments

Comment #73

agentrickard

he/him

English

Georgia (US)

commented 7 January 2011 at 17:09

The test passes and accounts for the original condition, so I suppose this was fixed elsewhere and we can close this.

If it recurs, we can re-open?

Log in or register to post comments

Comment #74

jeffschuler

Boulder, Colorado

commented 8 January 2011 at 03:52

Status:

Needs review

» Closed (fixed)

Sounds reasonable to me.

Log in or register to post comments

Comment #75

smscotten commented 9 August 2011 at 15:43

Version:	7.0	» 7.7
Status:	Closed (fixed)	» Active

Patch in #55 worked for me, mostly. This suggests to me that the result of #66 is a faulty test condition, not that the problem magically fixed itself. Testing against an external site known to cause the problem may be a better metric than creating a local feed to test against. (Of course, maybe something else has changed between 7.0 and 7.7)

I said the patch "mostly" worked for me. ’ still appears in my feed titles, but that's a htmlspecialchars_decode() issue and may be outside the scope of this issue.

Log in or register to post comments

Comment #76

jeffschuler

Boulder, Colorado

commented 9 August 2011 at 16:09

smscotten: can you point us to the feed you're having issues with?

Log in or register to post comments

Comment #77

smscotten commented 9 August 2011 at 20:28

Sorry, that probably should have been one of the first things I put in the post.

Feed: http://www.politifact.com/feeds/statements/truth-o-meter/

And here on my site: http://splicer.com/aggregator/sources/19

Ugh. Looked at the source. Looks like they have double-converted their characters to character entities, resulting in &quote; all through, as described in reply #1.

So the patch in #55 is a workaround, not a fix, because the behavior in the aggregator isn't wrong.

Um... right?

EDIT TO ADD: I sent mail to politifact.com and got a response back saying they were looking into it so it may not even show the issue much longer.

Log in or register to post comments

Comment #78

David_Rothstein commented 23 December 2011 at 05:40

Title:	Aggregator titles display quotes and other characters with HTML entity equivalents badly	» Aggregator titles display quotes and other characters with HTML entity equivalents badly (write tests)
Version:	7.7	» 8.x-dev
Category:	bug	» task
Status:	Active	» Needs work

I saw this issue recently also, but same thing as above; it turned out that the feed itself (on the source site) was incorrectly escaping its titles. So there's no bug here for Drupal to fix, at least not in D7/D8.

If an actual bug is still reproducible in Drupal 6 (i.e. with a feed that is formatted correctly on the source site but broken when imported to Drupal), someone could feel free to set this issue back to 6.x.

In the meantime, the tests in #67 look pretty solid so it's worth getting the patch rerolled and committed; tests are good regardless of whether the bug itself exists...

Log in or register to post comments

Comment #79

jeffschuler

Boulder, Colorado

commented 24 December 2011 at 00:10

Status:

Needs work

» Needs review

Status	File	Size
new	61456-79-core-aggregator-title_entities_test.patch	2.1 KB

Cool. Rerolled #67 for 8.x.

Log in or register to post comments

Comment #80

David_Rothstein commented 14 January 2012 at 23:26

Status:

Needs review

» Reviewed & tested by the community

Status	File	Size
new	61456-80-core-aggregator-title_entities_test.patch	2.08 KB

Patch looks good to me. The tests pass, and they're completely consistent with the way the other tests in the same area of the code are written.

I did a quick reroll to remove the "No newline at end of file" issue from the newly-added file. This should be good to go.

Log in or register to post comments

Comment #81

David_Rothstein commented 14 January 2012 at 23:29

Status	File	Size
new	61456-80-core-aggregator-title_entities_test.patch	2.08 KB

Ah, but someone is going to come point out that Drupal documentation standards now require the PHPDoc to say "Tests a feed.." rather than "Test a feed.." :)

Fixing that in the attached.

Log in or register to post comments

Comment #82

xjm

she/her

English

commented 15 January 2012 at 07:41

Haha, you heard me coming! But also....

+++ b/core/modules/aggregator/aggregator.testundefined
@@ -914,4 +918,15 @@ class FeedParserTestCase extends AggregatorTestCase {
+  /**
+  * Tests a feed that uses HTML entities in item titles.
+  */

Indentation is goofed on the second and third lines. Needs one more space before the asterisks.

Log in or register to post comments

Comment #83

xjm

she/her

English

commented 15 January 2012 at 07:49

Status	File	Size
new	61456-83.patch	2.08 KB

Fixing that. No commit credit please. :P

Log in or register to post comments

Comment #84

xjm

she/her

English

commented 15 January 2012 at 07:52

Issue tags:

+Needs backport to D7

Oh, and yeah.

Log in or register to post comments

Comment #85

dries commented 25 January 2012 at 02:28

Version:

8.x-dev

» 7.x-dev

Committed the tests to 8.x. Moving to 7.x.

Log in or register to post comments

Comment #86

dave reid

he/him

English

Nebraska USA

commented 25 January 2012 at 15:13

Title:	Aggregator titles display quotes and other characters with HTML entity equivalents badly (write tests)	» [ROLLBACK] Aggregator titles display quotes and other characters with HTML entity equivalents badly (write tests)
Version:	7.x-dev	» 8.x-dev
Priority:	Normal	» Major

This appears to have broken tests. Failure on "Quote" Amp&" found http://qa.drupal.org/8.x-status

Log in or register to post comments

Comment #87

andypost

he/him

Russian

commented 25 January 2012 at 15:18

Title:

[ROLLBACK] Aggregator titles display quotes and other characters with HTML entity equivalents badly (write tests)

» Aggregator titles display quotes and other characters with HTML entity equivalents badly (write tests)

--- /dev/null
+++ b/core/modules/aggregator/tests/aggregator_test_title_entities.xml

this file was not commited

Log in or register to post comments

Comment #88

dave reid

he/him

English

Nebraska USA

commented 25 January 2012 at 16:07

Title:	Aggregator titles display quotes and other characters with HTML entity equivalents badly (write tests)	» [BROKEN HEAD] Aggregator titles display quotes and other characters with HTML entity equivalents badly (write tests)
Priority:	Major	» Critical

Log in or register to post comments

Comment #89

sun

German

Karlsruhe

commented 25 January 2012 at 19:05

+++ b/core/modules/aggregator/aggregator.test
@@ -914,4 +918,15 @@ class FeedParserTestCase extends AggregatorTestCase {
+    $this->assertText("Quote&quot; Amp&amp;");

Why are you testing the page output that has been converted to plain-text?

When testing HTML escaping/sanitization/entities, use assertRaw() to check the actual, unprocessed content in the page output.

assertText() runs the entire page output through filter_xss(), which performs a plethora of HTML escaping, entity conversions, and validations.

Log in or register to post comments

Comment #90

pillarsdotnet commented 25 January 2012 at 19:31

Status:

Reviewed & tested by the community

» Needs review

Patch suggested by #87 and #89.

Log in or register to post comments

Comment #91

pillarsdotnet commented 25 January 2012 at 19:33

Status	File	Size
new	core-aggregator-title_entities-test-61456-91.patch	2.06 KB

Patch for real this time.

Log in or register to post comments

Comment #92

pillarsdotnet commented 25 January 2012 at 19:39

Category:	task	» bug
Status:	Needs review	» Reviewed & tested by the community

Note that since HEAD is broken, this is a critical bug.

Log in or register to post comments

Comment #93

andypost

he/him

Russian

commented 25 January 2012 at 20:00

Category:	bug	» task
Status:	Reviewed & tested by the community	» Needs review

+1 to commit asap

# drush test-run FeedParserTestCase
Feed parser functionality 62 passes, 0 fails, 0 exceptions, and 15 debug messages                              [ok]
No leftover tables to remove.                                                                                  [status]
No temporary directories to remove.                                                                            [status]
Removed 1 test result.                                                                                         [status]

Log in or register to post comments

Comment #94

andypost

he/him

Russian

commented 25 January 2012 at 20:01

Status:

Needs review

» Reviewed & tested by the community

Log in or register to post comments

Comment #95

pillarsdotnet commented 25 January 2012 at 20:16

Category:

task

» bug

Log in or register to post comments

Comment #96

dries commented 25 January 2012 at 20:59

Sorry about that. I just committed the patch in #91. I think that should fix HEAD. Let's see ...

Log in or register to post comments

Comment #97

dries commented 25 January 2012 at 20:59

Priority:	Critical	» Normal
Status:	Reviewed & tested by the community	» Fixed

Log in or register to post comments

Comment #98

David_Rothstein commented 25 January 2012 at 21:15

Title:	[BROKEN HEAD] Aggregator titles display quotes and other characters with HTML entity equivalents badly (write tests)	» Aggregator titles display quotes and other characters with HTML entity equivalents badly (write tests)
Version:	8.x-dev	» 7.x-dev
Category:	bug	» task
Status:	Fixed	» Patch (to be ported)

Looks good - seems like we can still use a patch for D7.

Regarding assertText() vs assertRaw()... either should be fine here since it's just a text string. Leaving it as assertText() actually has the nice property that it verifies that filter_xss() doesn't do anything funny to " or ;&amp (which it shouldn't), but I suppose that's an extremely backwards way to test that :) Either one is OK.

Log in or register to post comments

Comment #99

albert volkman commented 26 January 2012 at 17:08

Status:

Patch (to be ported)

» Needs review

Status	File	Size
new	core_aggregator_title_entities_test-61456-d7-99.patch	2.08 KB

D7 backport. I *think* I got it right.

Log in or register to post comments

Comment #100

26 January 2012 at 18:33

Status:

Needs review

» Needs work

The last submitted patch, core_aggregator_title_entities_test-61456-d7-99.patch, failed testing.

Log in or register to post comments

Comment #101

albert volkman commented 26 January 2012 at 19:04

Status:

Needs work

» Needs review

Status	File	Size
new	core_aggregator_title_entities_test-61456-d7-101.patch	2.05 KB

Trying again.

Log in or register to post comments

Comment #102

David_Rothstein commented 26 January 2012 at 19:24

Should probably use assertRaw() rather than assertText() to be consistent with what went into Drupal 8.

Log in or register to post comments

Comment #103

albert volkman commented 26 January 2012 at 19:28

Status	File	Size
new	core_aggregator_title_entities_test-61456-d7-103.patch	2.04 KB

Good call.

Log in or register to post comments

Comment #104

patcher commented 15 February 2012 at 20:11

does anyone know, is there a similar patch for the feeds module? i am importing a xml with feeds and the html entities are not shown correctly in the title

Log in or register to post comments

Comment #105

xjm

she/her

English

commented 15 February 2012 at 22:22

@patcher: This is a core issue. Try looking in the feeds module queue.

Log in or register to post comments

Comment #106

aascherson commented 18 August 2012 at 13:30

Status	File	Size
new	sigh.jpg	349.11 KB

Hey guys,

Long time drupal user but not a pro so maybe I am doing something dumb here. I am getthe the tags in my titles from a google alert feed. Been looking everywhere for a solution to this and trying to apply this patch to drupal 7.15 core.
using git, I get a fail on :

checking patch modules/aggregator /aggregator.test ...

I get a file not found error. So it finds the patch, but not the test file that I can see there in the directory. Am new to git, am I doing something stupid? Any guidance would be really appreciated as this is driving me nuts. Have tried to look to see if I can apply patch manually but cant seem to work out what needs to go where on this one. Apologies if this is a newb moment.

For reference I am running this from the modules/aggregator directory, which is also where the patch is. Tried running from /test as well but same problem.

Log in or register to post comments

Comment #107

lundj commented 16 November 2012 at 13:47

Is there any solution in one of the last versions of the aggregator.module?

In version 7.17 I have the same problem with Facebook-RSS-Feeds: http://cl.ly/image/220U1V26180j on http://junge-akademie-wittenberg.de/projekt/sprache-und-politik (right sidebar at the end of the page)
The related Feed for this one: https://www.facebook.com/feeds/page.php?id=301588419892418&format=rss20

Is there any solution for this?

Log in or register to post comments

Comment #108

albert volkman commented 16 November 2012 at 14:46

Status	File	Size
new	core_aggregator_title_entities_test-61456-d7-108.patch	2.01 KB

Re-roll.

Log in or register to post comments

Comment #109

16 November 2012 at 15:06

Status:

Needs review

» Needs work

The last submitted patch, core_aggregator_title_entities_test-61456-d7-108.patch, failed testing.

Log in or register to post comments

Comment #110

albert volkman commented 16 November 2012 at 19:44

Status:

Needs work

» Needs review

Status	File	Size
new	core_aggregator_title_entities_test-61456-d7-110.patch	2.04 KB

Oops.

Log in or register to post comments

Comment #111

fbreckx commented 21 November 2012 at 09:57

Issue tags:

-Needs backport to D7

#110: core_aggregator_title_entities_test-61456-d7-110.patch queued for re-testing.

Log in or register to post comments

Comment #112

fbreckx commented 21 November 2012 at 09:57

Issue tags:

+Needs backport to D7

#110: core_aggregator_title_entities_test-61456-d7-110.patch queued for re-testing.

Log in or register to post comments

Comment #113

rtrubshaw commented 13 December 2012 at 10:27

Not proud of this, but since the views that referenced the aggregator feeds already had template files I simply ran the entire output through strtr().

E.g. in "views-view--twitter-feed.tpl.php":

<ul><?php print strtr( $rows, array( '&amp;' => '&', '&#' => '&#', ) ); ?></ul>

(and similar code for other full feed template files)

I found that Twitter, Facebook and Flickr have all - at times - stuck HTML entities in the title. YMMV

Log in or register to post comments

Comment #114

chirikamo17640 commented 23 April 2013 at 02:37

#55: 61456_html_decode_titles_2.patch queued for re-testing.

Log in or register to post comments

Comment #115

liam morland

English

Ontario, CA 🇨🇦

commented 7 June 2013 at 19:26

Log in or register to post comments

Comment #116

ggevalt commented 18 June 2013 at 23:53

Wow.
This has to be one of the most incredible threads on Drupal forums.
The problem began in 2006 on Drupal 4 and folks seem to be still focusing on it for Drupal 8.
I'm in Drupal 6, I have the issue, and I see no resolution.
If I have missed something, please advise or provide a link as to the solution for Drupal 6 folks which still represent a substantial number of people.
Thanks so much for all you do.
cheers,
g

Log in or register to post comments

Comment #117

webengr commented 17 December 2013 at 16:22

any fix for this? I agree hard to believe drupal 7 is doing this with a standard install. ?

Log in or register to post comments

Comment #118

albert volkman commented 20 December 2013 at 03:28

@ggevalt Issues are fixed upstream as to avert regressions.

@webengr Have you tested my latest patch? When a patch is in "Needs Review", it's helpful to test out the patch, report back your findings, and possibly mark the issue as RTBC.

Log in or register to post comments

Comment #119

mimes commented 30 December 2013 at 05:40

Issue summary:

View changes

Status	File	Size
new	core_aggregator_title_entities_test-61456-screenshot_results-119.png	23.42 KB

Test patch in #110 applied.
Results:

Log in or register to post comments

Comment #120

albert volkman commented 30 December 2013 at 15:59

110: core_aggregator_title_entities_test-61456-d7-110.patch queued for re-testing.

Log in or register to post comments

Comment #121

albert volkman commented 2 January 2014 at 17:25

@mimes That's odd, it's still passing on the bot. Do you have a vanilla setup?

Log in or register to post comments

Comment #122

guillaumeduveau

Toulouse

commented 2 February 2014 at 15:39

Mmmm... spent a while on this. I could sure help with testing the test patch, but I what I believe, it that it's a real-world situation we have with feeds that already have html entities in it, like the original author of the issue said in comment #1 7.5 years ago. I can't really see what the test helps with that. Having an option to html_decode_entities by feed item was proposed and forgotten. I'd happily submit a patch to include that option if any Core maintainer gives me the green light.

Otherwise here is a solution to html_entity_decode the titles without hacking Core. We define and use a new parser thanks to hook_aggregator_parse, in a custom module "mymodule". Basically it's exactly the same parser except that we replace in the copy of aggregator_parse_feed() :

      $item['title'] = $item['title'];

with :

      $item['title'] = html_entity_decode($item['title'], ENT_QUOTES, 'UTF-8');

You have to choose the new parser on admin/config/services/aggregator/settings. Here the code of mymodule.module :

<?php

/**
 * @file
 * MyModule.
 */

/**
 * Implementation of hook_init().
 */
function mymodule_init() {
  module_load_include('inc', 'aggregator', 'aggregator.parser');
  /* note : I tried to move this in mymodule_aggregator_parse() but then admin/config/services/aggregator/settings doesn't show the parser select options. */
}

/**
 * Implements hook_aggregator_parse_info().
 */
function mymodule_aggregator_parse_info() {
  return array(
    'title' => t('MyModule parser'),
    'description' => t('Parses RSS, Atom and RDF feeds, decodes html entities.'),
  );
}

/**
 * Implements hook_aggregator_parse().
 */
function mymodule_aggregator_parse($feed) {
  global $channel, $image;

  // Filter the input data.
  if (mymodule_parse_feed($feed->source_string, $feed)) {
    $modified = empty($feed->http_headers['last-modified']) ? 0 : strtotime($feed->http_headers['last-modified']);

    // Prepare the channel data.
    foreach ($channel as $key => $value) {
      $channel[$key] = trim($value);
    }

    // Prepare the image data (if any).
    foreach ($image as $key => $value) {
      $image[$key] = trim($value);
    }

    $etag = empty($feed->http_headers['etag']) ? '' : $feed->http_headers['etag'];

    // Add parsed data to the feed object.
    $feed->link = !empty($channel['link']) ? $channel['link'] : '';
    $feed->description = !empty($channel['description']) ? $channel['description'] : '';
    $feed->image = !empty($image['url']) ? $image['url'] : '';
    $feed->etag = $etag;
    $feed->modified = $modified;

    // Clear the cache.
    cache_clear_all();

    return TRUE;
  }

  return FALSE;
}

/**
 * Parses a feed and stores its items.
 *
 * @param $data
 *   The feed data.
 * @param $feed
 *   An object describing the feed to be parsed.
 *
 * @return
 *   FALSE on error, TRUE otherwise.
 */
function mymodule_parse_feed(&$data, $feed) {
  global $items, $image, $channel;

  // Unset the global variables before we use them.
  unset($GLOBALS['element'], $GLOBALS['item'], $GLOBALS['tag']);
  $items = array();
  $image = array();
  $channel = array();

  // Parse the data.
  $xml_parser = drupal_xml_parser_create($data);
  xml_set_element_handler($xml_parser, 'aggregator_element_start', 'aggregator_element_end');
  xml_set_character_data_handler($xml_parser, 'aggregator_element_data');

  if (!xml_parse($xml_parser, $data, 1)) {
    watchdog('mymodule', 'The feed from %site seems to be broken, due to an error "%error" on line %line.', array('%site' => $feed->title, '%error' => xml_error_string(xml_get_error_code($xml_parser)), '%line' => xml_get_current_line_number($xml_parser)), WATCHDOG_WARNING);
    drupal_set_message(t('The feed from %site seems to be broken, because of error "%error" on line %line.', array('%site' => $feed->title, '%error' => xml_error_string(xml_get_error_code($xml_parser)), '%line' => xml_get_current_line_number($xml_parser))), 'error');
    return FALSE;
  }
  xml_parser_free($xml_parser);

  // We reverse the array such that we store the first item last, and the last
  // item first. In the database, the newest item should be at the top.
  $items = array_reverse($items);

  // Initialize items array.
  $feed->items = array();
  foreach ($items as $item) {

    // Prepare the item:
    foreach ($item as $key => $value) {
      $item[$key] = trim($value);
    }

    // Resolve the item's title. If no title is found, we use up to 40
    // characters of the description ending at a word boundary, but not
    // splitting potential entities.
    if (!empty($item['title'])) {
      $item['title'] = html_entity_decode($item['title'], ENT_QUOTES, 'UTF-8');
    }
    elseif (!empty($item['description'])) {
      $item['title'] = preg_replace('/^(.*)[^\w;&].*?$/', "\\1", truncate_utf8($item['description'], 40));
    }
    else {
      $item['title'] = '';
    }

    // Resolve the items link.
    if (!empty($item['link'])) {
      $item['link'] = $item['link'];
    }
    else {
      $item['link'] = $feed->link;
    }

    // Atom feeds have an ID tag instead of a GUID tag.
    if (!isset($item['guid'])) {
      $item['guid'] = isset($item['id']) ? $item['id'] : '';
    }

    // Atom feeds have a content and/or summary tag instead of a description tag.
    if (!empty($item['content:encoded'])) {
      $item['description'] = $item['content:encoded'];
    }
    elseif (!empty($item['summary'])) {
      $item['description'] = $item['summary'];
    }
    elseif (!empty($item['content'])) {
      $item['description'] = $item['content'];
    }

    // Try to resolve and parse the item's publication date.
    $date = '';
    foreach (array('pubdate', 'dc:date', 'dcterms:issued', 'dcterms:created', 'dcterms:modified', 'issued', 'created', 'modified', 'published', 'updated') as $key) {
      if (!empty($item[$key])) {
        $date = $item[$key];
        break;
      }
    }

    $item['timestamp'] = strtotime($date);

    if ($item['timestamp'] === FALSE) {
      $item['timestamp'] = aggregator_parse_w3cdtf($date); // Aggregator_parse_w3cdtf() returns FALSE on failure.
    }

    // Resolve dc:creator tag as the item author if author tag is not set.
    if (empty($item['author']) && !empty($item['dc:creator'])) {
      $item['author'] = $item['dc:creator'];
    }

    $item += array('author' => '', 'description' => '');

    // Store on $feed object. This is where processors will look for parsed items.
    $feed->items[] = $item;
  }

  return TRUE;
}

Log in or register to post comments

Comment #123

Irene Meisel commented 2 February 2014 at 21:08

Status:

Needs review

» Reviewed & tested by the community

tested patch #110 it worked. no fatal error on ubuntu. new test ran + passed.

Log in or register to post comments

Comment #124

Irene Meisel commented 2 February 2014 at 21:14

Issue summary:

View changes

Log in or register to post comments

Comment #125

5 May 2014 at 16:00

Commit 478d1a0 on 7.x by David_Rothstein:

Issue #61456 by Albert Volkman, jeffschuler, David_Rothstein, jdefay,...

Log in or register to post comments

Comment #126

David_Rothstein commented 5 May 2014 at 16:01

Status:

Reviewed & tested by the community

» Fixed

Committed #110 to 7.x - thanks!

(And I switched t() to format_string() in the test assertion message on commit, since that was changed everywhere in core a long time ago.)

I'm really unclear if there's any functional bugs left to fix here (especially in Drupal 6) but someone can reopen/retitle or create a new issue if there is.

Log in or register to post comments

Comment #127

19 May 2014 at 16:00

Status:

Fixed

» Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.

Log in or register to post comments

Comment #128

gojensen commented 8 September 2014 at 20:01

Maybe I missed something, this still didn't work for me using a facebook page feed... Had to replace the check_plain calls with aggregator_filter_xss as mentioned in post #25. This was running on v7.31. (I could see everything from patch #110 was added, but didn't seem to resolve this specific issue...)

Another problem I have is that when I import the feeds "something" parses the content/description and changes all urls from absolute urls (i.e. http://facebook.com/ ...) to relative paths (i.e. /pagename/photos/image.jpg)... so when a user reads the post and wants to click a link or tiny image in the post they get a drupal error about the site not being found. Anyone know WHERE the scripts strips the url?

Log in or register to post comments

Comment #129

gojensen commented 8 September 2014 at 20:28

Status:

Closed (fixed)

» Needs review

Log in or register to post comments

Comment #130

liam morland

English

Ontario, CA 🇨🇦

commented 8 September 2014 at 21:08

Status:

Needs review

» Active

The "needs review" status is for issues that have a patch which might fix the issue.

Log in or register to post comments

Comment #131

mediaformat commented 6 October 2014 at 18:08

Seems fixed for the <item><title>, but not for the <channel><title>

Log in or register to post comments

Comment #132

3 August 2016 at 00:04

Dries committed a084058 on 8.3.x

- Patch #61456 by jeffschuler, edmund.kwok, jdefay, David_Rothstein,...

Dries committed a775f05 on 8.3.x

- Patch #61456 by jeffschuler, edmund.kwok, jdefay, David_Rothstein,...

Dries committed d78f6a6 on 8.3.x

- Patch #61456 by jeffschuler, edmund.kwok, jdefay, David_Rothstein,...

Log in or register to post comments

Comment #133

3 August 2016 at 00:05

Dries committed a084058 on 8.3.x

- Patch #61456 by jeffschuler, edmund.kwok, jdefay, David_Rothstein,...

Dries committed a775f05 on 8.3.x

- Patch #61456 by jeffschuler, edmund.kwok, jdefay, David_Rothstein,...

Dries committed d78f6a6 on 8.3.x

- Patch #61456 by jeffschuler, edmund.kwok, jdefay, David_Rothstein,...

Log in or register to post comments

Comment #134

27 January 2017 at 17:53

Dries committed a084058 on 8.4.x

- Patch #61456 by jeffschuler, edmund.kwok, jdefay, David_Rothstein,...

Dries committed a775f05 on 8.4.x

- Patch #61456 by jeffschuler, edmund.kwok, jdefay, David_Rothstein,...

Dries committed d78f6a6 on 8.4.x

- Patch #61456 by jeffschuler, edmund.kwok, jdefay, David_Rothstein,...

Log in or register to post comments

Comment #135

27 January 2017 at 17:52

Dries committed a084058 on 8.4.x

- Patch #61456 by jeffschuler, edmund.kwok, jdefay, David_Rothstein,...

Dries committed a775f05 on 8.4.x

- Patch #61456 by jeffschuler, edmund.kwok, jdefay, David_Rothstein,...

Dries committed d78f6a6 on 8.4.x

- Patch #61456 by jeffschuler, edmund.kwok, jdefay, David_Rothstein,...

Log in or register to post comments

Comment #136

20 March 2020 at 16:52

Dries committed a084058 on 9.1.x

- Patch #61456 by jeffschuler, edmund.kwok, jdefay, David_Rothstein,...

Dries committed a775f05 on 9.1.x

- Patch #61456 by jeffschuler, edmund.kwok, jdefay, David_Rothstein,...

Dries committed d78f6a6 on 9.1.x

- Patch #61456 by jeffschuler, edmund.kwok, jdefay, David_Rothstein,...

Log in or register to post comments

Comment #137

poker10 commented 5 December 2023 at 18:46

Status:	Active	» Fixed
Issue tags:	-Needs backport to D7

Moving back to Fixed. If there are any problems left, please create a new issue (feel free to link it here). Thanks.

Log in or register to post comments

Comment #138

19 December 2023 at 18:49

Status:

Fixed

» Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.

Log in or register to post comments

Aggregator titles display quotes and other characters with HTML entity equivalents badly (write tests)

Comments