Problem/Motivation

This is a followup from #3409587: [10.2 regression] RSS feeds invalid due to   .

RSS feeds are now valid but have a warning on the W3C feed validator:

This feed is valid, but interoperability with the widest range of feed readers could be improved by implementing the following recommendations.

line 6, column 42: description should not contain HTML: &

<description>Training &amp;amp; Events</description>

Steps to reproduce

Create a feed in views with an RSS channel description that contains an ampersand, e.g. "Training & Events". Channel description is a field in the Feed:Style options setting section in views.

Checking the feed output against https://validator.w3.org it prints a warning for the channel description line: "description should not contain HTML: &amp;". The RSS feed literally contains &amp;amp; which is parsed into human-readable text &amp;. It should contain &amp; which is parsed as human-readable text &.

Proposed resolution

The \Drupal\Core\EventSubscriber\RssResponseRelativeUrlFilter::transformRootRelativeUrlsToAbsolute() method processes all RSS feed description elements as markup. However, RSS has two different kinds of description elements: item description elements, which according to the RSS specs are interpreted as markup, and channel description elements, which are interpreted as human-readable. So that method should skip channel description elements.

Remaining tasks

User interface changes

API changes

Data model changes

Issue fork drupal-3424768

Command icon Show commands

Start within a Git clone of the project using the version control instructions.

Or, if you do not have SSH keys set up on git.drupalcode.org:

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

OMD created an issue. See original summary.

cilefen’s picture

Title: RSS Feed interoperablity issue » Problematic XML characters are not escaped in views RSS feeds
Component: base system » views.module
Issue tags: -rss feed

So it should be &amp;, correct?

See https://www.w3.org/TR/REC-xml/#dt-chardata

mfb’s picture

What I found when trying to reproduce this issue is that Views outputs <channel><description>Training &amp;amp; Events</description></channel> as the feed description.

However, it is recommended to output <channel><description>Training &amp; Events</description></channel>

My interpretation of what the feed validator is saying is that it's recommended that a feed description be semantically considered to be plain text, and thus a user-entered feed description should be encoded once to be rendered in the feed description XML element: <channel><description>Training &amp; Events</description></channel>

An item description, on the other hand, could be semantically considered to be markup, thus a user-entered item description would be encoded once to render valid markup from the entered text, and a second time to be rendered in the item description XML element: <item><description>Training &amp;amp; Events</description></item>

mfb’s picture

This does seem to be pretty heavily related to #3409587: [10.2 regression] RSS feeds invalid due to &nbsp; after all, although it's a separate bug in the same code.

It appears that \Drupal\Core\EventSubscriber\RssResponseRelativeUrlFilter::transformRootRelativeUrlsToAbsolute() is operating on channel description elements, but it should not, as these are considered to be human-readable plain text. It should only be operating on item description elements, which are considered to be markup.

mfb’s picture

Status: Active » Needs review
Issue tags: +Needs tests

@OMD can you test my attempted fix in MR 6842? If it resolves the warning then we can update issue summary, add a test and reroll as a merge request on 11.x branch.

cilefen’s picture

The & is being escaped twice?

mfb’s picture

@cilefen Yes, that's what I found when trying to reproduce the issue. If we confirm that the issue is basically the opposite of the title then I will update it :)

mfb’s picture

Issue tags: -Needs tests

Added unit test coverage for &amp; in channel description element.

smustgrave’s picture

Version: 10.2.x-dev » 11.x-dev
Status: Needs review » Needs work

From reading the issue summary provided I believe @mfb you were correct in your assumption.

So can the issue summary be updated to match. Also MR should probably be pointed to 11.x

Thanks.

mfb’s picture

Title: Problematic XML characters are not escaped in views RSS feeds » Channel description of RSS feeds is double-escaped
Component: views.module » base system
Issue summary: View changes
Status: Needs work » Needs review
Issue tags: -Needs issue summary update
smustgrave’s picture

Status: Needs review » Reviewed & tested by the community
Issue tags: +Needs Review Queue Initiative

Thanks @mfb!

Ran the test-only feature here https://git.drupalcode.org/issue/drupal-3424768/-/jobs/983276 which showed the test failure.

The change to the loop makes sense and fixes the issue following the scenario described.

longwave’s picture

Version: 11.x-dev » 10.2.x-dev
Status: Reviewed & tested by the community » Fixed

The findings here match with the comment in template_preprocess_views_view_rss():

  // The RSS 2.0 "spec" doesn't indicate HTML can be used in the description.
  // We strip all HTML tags, but need to prevent double encoding from properly
  // escaped source data (such as &amp becoming &amp;amp;).
  $variables['description'] = Html::decodeEntities(strip_tags($style->getDescription()));

Committed and pushed e8db570e86 to 11.x and 3ff7664833 to 10.3.x and e851b33905 to 10.2.x. Thanks!

  • longwave committed e851b339 on 10.2.x
    Issue #3424768 by mfb, OMD, cilefen, smustgrave: Channel description of...

  • longwave committed 3ff76648 on 10.3.x
    Issue #3424768 by mfb, OMD, cilefen, smustgrave: Channel description of...

  • longwave committed e8db570e on 11.x
    Issue #3424768 by mfb, OMD, cilefen, smustgrave: Channel description of...

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.