Beta phase evaluation

Reference: https://www.drupal.org/core/beta-changes
Issue category Bug because valid HTML in RSS feeds is rendered unescaped.
Issue priority Major because this breaks RSS feeds that involve a field that allows HTML, including the default site RSS feed (/rss.xml).
Prioritized changes Yes: bug fix.
Disruption None.

Problem/Motivation

HTML in views feeds isn't escaped anymore at all which completely breaks the feed.

It looks like this:

<description><span data-quickedit-field-id="node/1/title/en/rss" class="field field-node--title field-name-title field-type-string field-label-hidden">RSS Feed test</span>
<span data-quickedit-field-id="node/1/uid/en/rss" class="field field-node--uid field-name-uid field-type-entity-reference field-label-hidden"><a title="View user profile." href="/user/1" lang="" about="/user/1" typeof="schema:Person" property="schema:name" datatype="" class="username">admin</a></span>
<span data-quickedit-field-id="node/1/created/en/rss" class="field field-node--created field-name-created field-type-created field-label-hidden">Fri, 06/05/2015 - 08:44</span>
</description>

Additional information

The W3C's RSS 2.0 spec implies that the description field is the only field that may have "entity-encoded HTML."

Proposed resolution

The RSS description element should be converted to a string and automatically escaped by Twig when sent to the template so it appears as escaped HTML and not part of the DOM. With the move towards as-late-as-possible rendering, this will be done in the template_preprocess_views_view_row_rss().

Remaining tasks

  1. Moot: Need feedback on #8, specifically the last blockquote and the paragraph following it. The one that starts with "I don't understand this question..." :)
  2. Add tests to prevent regression. Done: #2.
  3. Determine if additional tests are needed for the Comment and Aggregator module changes (there are no existing tests for RSS feeds for the Aggregator module and very limited for the Comment module). See #8. Moved to a followup: #2600922: Followup: Add more tests for RSS feeds to the Comment and Aggregator modules

User interface changes

None.

API changes

None.

Why this should be an RC target

Without this fix, the default site-wide RSS feed (/rss.xml) is broken and will not validate using the W3C's feed validation tool. A partial workaround is to rebuild the default RSS feed using fields and strip HTML from the field assigned to the description element. I say "partial" because there are many feed readers that will render properly escaped HTML and this workaround leaves us with plain text.

Disruptions for this change are minimal. Contrib modules that provide RSS display or row plugins will need to ensure they are passing a render array for the description element. Core modules that provide those plugins (Comment and Aggregator) are updated with this patch.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

mikeker’s picture

Assigned: Unassigned » mikeker
mikeker’s picture

Assigned: mikeker » Unassigned
Status: Active » Needs review
FileSize
6.61 KB
8.22 KB

I've added some very basic tests to ensure this doesn't regress again.

For now, I'm checkPlain'ing the <description> element. As far as I can tell, that's the only element that's supposed to have encoded HTML in it.

The last submitted patch, 2: 2500931-2-unescaped-HTML-in-feeds-tests-only.patch, failed testing.

Berdir’s picture

Status: Needs review » Reviewed & tested by the community

Nice work!

Tested this on my site, works well. Has tests, can't see anything to complain about. Fix makes sense.

Ashutosh.tripathi’s picture

I have also tested this patch #2. its working fine.

Berdir’s picture

Status: Reviewed & tested by the community » Needs work

It seems that I was a bit too eager to see this committed.

* We should add a very explicit comment why we have to do this. Because it must be encoded as it is being inlined into an XML document and shouldn't be treated as part of the structure.
* We also need to update the comment and aggregator (I think?) rss plugins in the same way.

dawehner’s picture

  • We should document, why we need it
  • We need to change also the comment subclass of \Drupal\views\Plugin\views\row\RssPluginBase
  • Given that the views render caching changes these lines and moves the drupal_render_call indirectly to the rss template, does that mean we need to manually escape it there?
mikeker’s picture

Addressing issues from both #6 and #7:

We should add a very explicit comment why we have to do this. Because it must be encoded as it is being inlined into an XML document and shouldn't be treated as part of the structure.

Good call -- I wouldn't know why I made that change if I came across this two months from now! :)

Done.

We need to change also the comment subclass of \Drupal\views\Plugin\views\row\RssPluginBase

We also need to update the comment and aggregator (I think?) rss plugins in the same way

According to this, there are three classes that subclass RssPluginBase:

  • core/modules/aggregator/src/Plugin/views/row/Rss.php
  • core/modules/node/src/Plugin/views/row/Rss.php
  • core/modules/comment/src/Plugin/views/row/Rss.php

The second was covered by the patch in #2. The attached patch takes care of the first and third.

Given that the views render caching changes these lines and moves the drupal_render_call indirectly to the rss template, does that mean we need to manually escape it there?

I don't understand this question... If you're asking "could we do this is the Twig template instead of in preprocess or in the plugin->render() function," then the answer is yes. But it felt cleaner to do it earlier in the render pipeline to provide fewer places for someone to accidentally undo the escaping. Also keeps the Twig template cleaner as we don't need to escape there (see core/modules/views/templates/views-view-row-rss.html.twig)

If not, sorry, I don't understand the question. :/

Finally, I don't have any tests in place for the newly added Comment and Aggregator module changes. The only thing in the current test suite for the Aggregator module that hits this code is in the pager test (testFeedPage). Comment module has RowRssTest, which at least is testing content, but only the publication date.

Is content verification something that needs to be added in the scope of this issue? Or can we push that into a followup?

mikeker’s picture

FileSize
2.87 KB

And the interdiff.... Ooops.

mikeker’s picture

Status: Needs work » Needs review

And setting status.

It's Monday...

mikeker’s picture

Issue summary: View changes

Updated issue summary.

mikeker’s picture

The patch in #8 was broken by the changes introduced in #2381277: Make Views use render caching and remove Views' own "output caching". So now I'm taking a different approach since the description field is now a render array -- using a #post_render to call SafeMarkup::checkPlain.

This will not affect the newly added caching from the above issue. From the API docs, "if this element has #cache defined, the rendered output of this element is saved to Renderer::render()'s internal cache. This includes the changes made by #post_render." Woo-hoo! :)

Note: The interdiff is after rerolling #8 against the latest HEAD.

mikeker’s picture

Issue summary: View changes

Added Beta eval.

mikeker’s picture

Issue summary: View changes
dawehner’s picture

+++ b/core/modules/aggregator/src/Plugin/views/row/Rss.php
@@ -50,8 +51,17 @@ public function render($row) {
+        // The description is the only place where we should find HTML.
+        // Escape it so that it appears as text rather than part of the DOM.
+        // @see https://validator.w3.org/feed/docs/rss2.html#hrelementsOfLtitemgt
+        $item->description = SafeMarkup::checkPlain($field->value);

Maybe a totally stupid question ...
What is the reason that it is not escaped as part of the template?

mikeker’s picture

What is the reason that it is not escaped as part of the template?

No reason other than I figured it was better to do it earlier in the render pipeline. It does make for fewer code changes to do it in the template (two vs. four).

Thoughts?

Berdir’s picture

This is killing me :p

Looks like something changed again and this is being escaped again correctly in HEAD and this patch results in this being double-escaped.

No idea why the tests here are still passing?

mikeker’s picture

@Berdir: I'm still seeing the original bug in the latest HEAD (ba5a3d2e178f1, which seems to be the same as beta13) and the test from #2 still fails unless the patch is applied. Post-patch, I get this:

Though this is killing me as well -- I just spent 10 minutes looking for escaped HTML instead of UN-escaped HTML while doing git bisect! :)

mikeker’s picture

FileSize
45.43 KB

Goofed the image upload...

banviktor’s picture

The contents of description don't always get sanitized.
For example with this setting:

Format: RSS Feed | Settings
Show: Content | Full content

Wouldn't it be better to sanitize the contents of description in the template preprocessor functions? That's how it's done for the feed's description field, I think it should be followed for feed rows too.

banviktor’s picture

^ This is when applied on beta12.
Haven't tried beta13 yet.

mod: The bug still exists in 8.0.0-beta13

Berdir’s picture

Oh. Yes, I think this problem still exists. *But* I think it is hidden for me due to #2542764: Views rewritten row output: html is escaped, as I am using rewrite to combine multiple fields into one.

othermachines’s picture

Since #12 patch relies heavily on SafeMarkup::checkPlain() I thought this might be relevant?

Issue #: Deprecate SafeMarkup::checkPlain() for Drupal 8.0.x

In #2545972: Remove all usages SafeMarkup::checkPlain() and rely more on Twig autoescaping we remove all usages in core. We should rely on Twig auto escape and completely remove SafeMarkup::checkPlain(). SafeMarkup::checkPlain() is dangerous because if the resulting string is modified in any way, for example nl2br(), the result will no longer be marked safe. In places where explicit escaping is needed, Html::escape() should be used.

cilefen’s picture

Status: Needs review » Needs work

This needs work based on #25.

mikeker’s picture

Assigned: Unassigned » mikeker

@othermachines: Thanks for pointing that out. This issue had fallen off my radar...

Rerolling.

mikeker’s picture

Assigned: mikeker » Unassigned
Status: Needs work » Needs review
FileSize
11.59 KB
3.07 KB

For now, pretty much swap of checkPlain() for Html::escape().

jhedstrom’s picture

Issue tags: +Needs reroll

Is this still an issue now that #2545972: Remove all code usages SafeMarkup::checkPlain() and rely more on Twig autoescaping is in? If so, patch needs rerolling.

jhedstrom’s picture

Assigned: Unassigned » jhedstrom

Verified this is still an issue, rerolling now.

jhedstrom’s picture

jhedstrom’s picture

Assigned: jhedstrom » Unassigned
dawehner’s picture

+++ b/core/modules/aggregator/src/Plugin/views/row/Rss.php
@@ -50,8 +51,17 @@ public function render($row) {
+        // Otherwise Twig handles it via auto-escape.
+        // @see core/modules/views/templates/views-view-row-rss.html.twig
+        $item->{$name} = $field->value;
+      }

Its a little bit odd to rely on autoescaping of HTML entities inside a xml document. Are we sure this is the right strategy?

Berdir’s picture

Its a little bit odd to rely on autoescaping of HTML entities inside a xml document. Are we sure this is the right strategy?

No, it's definitely weird. But we're using twig to build XML, so what else can we do? :)

mgifford queued 31: 2500931-31.patch for re-testing.

mikeker’s picture

I reread this issue and talked to @joelpittet at the PNWDS this weekend and we both agree that the current approach in this issue is wrong. Instead, we should be building render arrays in the various RSS plugins and rendering those as strings during preprocess so that Twig will autoescape them when passing them to the template.

I think the reason this issue appeared and disappeared over the months is that we're not clear when we are doing the escaping. Doing it as late as possible -- in the preprocess -- will hopefully clear things up.

joelpittet’s picture

Status: Needs review » Reviewed & tested by the community
Issue tags: +rc target triage

This seems to be the correct approach as it ensures twig escaping will encode the the HTML every time and deals with the render arrays being passed along.

xjm’s picture

Status: Reviewed & tested by the community » Needs work
Issue tags: +Needs issue summary update

Thanks for tagging this for rc target triage! To have committers consider it for inclusion in RC, we should add a statement to the issue summary of why we need to make this change during RC, including what happens if we do not make the change and what any disruptions from it are. We can add a section <h3>Why this should be an RC target</h3> to the summary.

mikeker’s picture

Issue summary: View changes

Updated issue summary. Also added RC target section as requested in #38.

Berdir’s picture

Issue summary: View changes
Status: Needs work » Reviewed & tested by the community
Issue tags: -Needs issue summary update

Looks good to me, just removed a single word. Back to RTBC so that this can be evaluated.

As the issue summary says, it is currently impossible to have formatted RSS feeds, which prevents very common standard use cases (like an image + description feed.

alexpott’s picture

Status: Reviewed & tested by the community » Needs work
Issue tags: -rc target triage +rc target, +Needs change record

Discussed with the committers and we agreed this was an rc target. The approach looks correct reading the link in the code and things like http://bit.ly/1GC66PE.

Given

Disruptions for this change are minimal. Contrib modules that provide RSS display or row plugins will need to ensure they are passing a render array for the description element. Core modules that provide those plugins (Comment and Aggregator) are updated with this patch.

I think we need a change record.

mikeker’s picture

Assigned: Unassigned » mikeker

Working on the CR.

mikeker’s picture

Assigned: mikeker » Unassigned
Status: Needs work » Reviewed & tested by the community
Issue tags: -Needs change record
Wim Leers’s picture

Status: Reviewed & tested by the community » Needs work
+++ b/core/modules/views/tests/modules/views_test_config/test_views/views.view.test_display_feed.yml
@@ -90,10 +164,17 @@ display:
+      cacheable: false

@@ -115,14 +196,29 @@ display:
+      cacheable: false
...
+      cacheable: false

These are deprecated, we should not add them.

mikeker’s picture

Status: Needs work » Needs review
FileSize
11.64 KB
1.97 KB

@Wim Leers, Thanks for the review. I've removed the cacheable entries and round-tripped the view through the latest D8 head to get the latest bits and pieces in the export.

Berdir’s picture

Status: Needs review » Reviewed & tested by the community

Looks good to me again.

alexpott’s picture

Status: Reviewed & tested by the community » Fixed

Committed 28f5b70 and pushed to 8.0.x. Thanks!

  • alexpott committed 30c6960 on 8.0.x
    Issue #2500931 by mikeker, jhedstrom, Berdir: Views feed doesn't encode...
dcam’s picture

I was testing #2552037-16: Aggregator RSS feed outputting empty feed items last night and noticed that the description elements weren't appearing in Aggregator's RSS feed. I backtracked the issue to here. This issue did not really update Aggregator's plugin, contrary to what's mentioned in the issue summary. Its plugin is not yet providing descriptions as render arrays. This makes me wonder whether Comment's RSS feed is having the same problem, but I haven't checked.

I'm not sure it's worth re-opening this issue to fix Aggregator. You can't even see the problem until #2552037: Aggregator RSS feed outputting empty feed items is fixed because the whole RSS feed is currently broken. A proper test can't be written for the other issue until the description is fixed, nor could a test be written for the description until the whole feed is fixed. Due to the catch-22, I'm going to propose that the description render array be added as part of the other issue, even though they really ought to be separate issues. Otherwise, there's just no way to test either of them.

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.

Jabastin Arul’s picture

Team,

This issue again happen latest drupal 8 versin. So I will reopen this issue.

Thanks,
Jabastin

Jabastin Arul’s picture

Priority: Major » Critical
Jabastin Arul’s picture

cilefen’s picture

Priority: Critical » Major

Please open a new issue.