Beta phase evaluation
Issue category | Bug because valid HTML in RSS feeds is rendered unescaped. |
---|---|
Issue priority | Major because this breaks RSS feeds that involve a field that allows HTML, including the default site RSS feed (/rss.xml ). |
Prioritized changes | Yes: bug fix. |
Disruption | None. |
Problem/Motivation
HTML in views feeds isn't escaped anymore at all which completely breaks the feed.
It looks like this:
<description><span data-quickedit-field-id="node/1/title/en/rss" class="field field-node--title field-name-title field-type-string field-label-hidden">RSS Feed test</span>
<span data-quickedit-field-id="node/1/uid/en/rss" class="field field-node--uid field-name-uid field-type-entity-reference field-label-hidden"><a title="View user profile." href="/user/1" lang="" about="/user/1" typeof="schema:Person" property="schema:name" datatype="" class="username">admin</a></span>
<span data-quickedit-field-id="node/1/created/en/rss" class="field field-node--created field-name-created field-type-created field-label-hidden">Fri, 06/05/2015 - 08:44</span>
</description>
Additional information
The W3C's RSS 2.0 spec implies that the description field is the only field that may have "entity-encoded HTML."
Proposed resolution
The RSS description element should be converted to a string and automatically escaped by Twig when sent to the template so it appears as escaped HTML and not part of the DOM. With the move towards as-late-as-possible rendering, this will be done in the template_preprocess_views_view_row_rss()
.
Remaining tasks
- Moot:
Need feedback on #8, specifically the last blockquote and the paragraph following it. The one that starts with "I don't understand this question..." :) Add tests to prevent regression.Done: #2.Determine if additional tests are needed for the Comment and Aggregator module changes (there are no existing tests for RSS feeds for the Aggregator module and very limited for the Comment module). See #8. Moved to a followup: #2600922: Followup: Add more tests for RSS feeds to the Comment and Aggregator modules
User interface changes
None.
API changes
None.
Why this should be an RC target
Without this fix, the default site-wide RSS feed (/rss.xml) is broken and will not validate using the W3C's feed validation tool. A partial workaround is to rebuild the default RSS feed using fields and strip HTML from the field assigned to the description
element. I say "partial" because there are many feed readers that will render properly escaped HTML and this workaround leaves us with plain text.
Disruptions for this change are minimal. Contrib modules that provide RSS display or row plugins will need to ensure they are passing a render array for the description
element. Core modules that provide those plugins (Comment and Aggregator) are updated with this patch.
Comment | File | Size | Author |
---|---|---|---|
#45 | 2500931-36-45.interdiff.txt | 1.97 KB | mikeker |
#45 | 2500931-45-unescaped-html-in-rss.patch | 11.64 KB | mikeker |
#36 | 2500931-31-36.interdiff.txt | 7.16 KB | mikeker |
#36 | 2500931-36-unescaped-html-in-rss.patch | 11.32 KB | mikeker |
#31 | 2500931-31.patch | 11.54 KB | jhedstrom |
Comments
Comment #1
mikeker CreditAttribution: mikeker as a volunteer commentedComment #2
mikeker CreditAttribution: mikeker as a volunteer commentedI've added some very basic tests to ensure this doesn't regress again.
For now, I'm
checkPlain
'ing the<description>
element. As far as I can tell, that's the only element that's supposed to have encoded HTML in it.Comment #4
BerdirNice work!
Tested this on my site, works well. Has tests, can't see anything to complain about. Fix makes sense.
Comment #5
Ashutosh.tripathi CreditAttribution: Ashutosh.tripathi as a volunteer commentedI have also tested this patch #2. its working fine.
Comment #6
BerdirIt seems that I was a bit too eager to see this committed.
* We should add a very explicit comment why we have to do this. Because it must be encoded as it is being inlined into an XML document and shouldn't be treated as part of the structure.
* We also need to update the comment and aggregator (I think?) rss plugins in the same way.
Comment #7
dawehner\Drupal\views\Plugin\views\row\RssPluginBase
Comment #8
mikeker CreditAttribution: mikeker as a volunteer commentedAddressing issues from both #6 and #7:
Good call -- I wouldn't know why I made that change if I came across this two months from now! :)
Done.
According to this, there are three classes that subclass
RssPluginBase
:The second was covered by the patch in #2. The attached patch takes care of the first and third.
I don't understand this question... If you're asking "could we do this is the Twig template instead of in preprocess or in the plugin->render() function," then the answer is yes. But it felt cleaner to do it earlier in the render pipeline to provide fewer places for someone to accidentally undo the escaping. Also keeps the Twig template cleaner as we don't need to escape there (see core/modules/views/templates/views-view-row-rss.html.twig)
If not, sorry, I don't understand the question. :/
Finally, I don't have any tests in place for the newly added Comment and Aggregator module changes. The only thing in the current test suite for the Aggregator module that hits this code is in the pager test (
testFeedPage
). Comment module hasRowRssTest
, which at least is testing content, but only the publication date.Is content verification something that needs to be added in the scope of this issue? Or can we push that into a followup?
Comment #9
mikeker CreditAttribution: mikeker as a volunteer commentedAnd the interdiff.... Ooops.
Comment #10
mikeker CreditAttribution: mikeker as a volunteer commentedAnd setting status.
It's Monday...
Comment #11
mikeker CreditAttribution: mikeker as a volunteer commentedUpdated issue summary.
Comment #12
mikeker CreditAttribution: mikeker as a volunteer commentedThe patch in #8 was broken by the changes introduced in #2381277: Make Views use render caching and remove Views' own "output caching". So now I'm taking a different approach since the description field is now a render array -- using a
#post_render
to callSafeMarkup::checkPlain
.This will not affect the newly added caching from the above issue. From the API docs, "if this element has #cache defined, the rendered output of this element is saved to Renderer::render()'s internal cache. This includes the changes made by #post_render." Woo-hoo! :)
Note: The interdiff is after rerolling #8 against the latest HEAD.
Comment #13
mikeker CreditAttribution: mikeker as a volunteer commentedAdded Beta eval.
Comment #14
mikeker CreditAttribution: mikeker as a volunteer commentedComment #15
dawehnerMaybe a totally stupid question ...
What is the reason that it is not escaped as part of the template?
Comment #16
mikeker CreditAttribution: mikeker as a volunteer commentedNo reason other than I figured it was better to do it earlier in the render pipeline. It does make for fewer code changes to do it in the template (two vs. four).
Thoughts?
Comment #18
BerdirThis is killing me :p
Looks like something changed again and this is being escaped again correctly in HEAD and this patch results in this being double-escaped.
No idea why the tests here are still passing?
Comment #19
mikeker CreditAttribution: mikeker as a volunteer commented@Berdir: I'm still seeing the original bug in the latest HEAD (ba5a3d2e178f1, which seems to be the same as beta13) and the test from #2 still fails unless the patch is applied. Post-patch, I get this:
Though this is killing me as well -- I just spent 10 minutes looking for escaped HTML instead of UN-escaped HTML while doing git bisect! :)
Comment #20
mikeker CreditAttribution: mikeker as a volunteer commentedGoofed the image upload...
Comment #21
banviktor CreditAttribution: banviktor commentedThe contents of description don't always get sanitized.
For example with this setting:
Wouldn't it be better to sanitize the contents of description in the template preprocessor functions? That's how it's done for the feed's description field, I think it should be followed for feed rows too.
Comment #22
banviktor CreditAttribution: banviktor commented^ This is when applied on beta12.
Haven't tried beta13 yet.
mod: The bug still exists in 8.0.0-beta13
Comment #23
BerdirOh. Yes, I think this problem still exists. *But* I think it is hidden for me due to #2542764: Views rewritten row output: html is escaped, as I am using rewrite to combine multiple fields into one.
Comment #25
othermachines CreditAttribution: othermachines commentedSince #12 patch relies heavily on
SafeMarkup::checkPlain()
I thought this might be relevant?Issue #: Deprecate SafeMarkup::checkPlain() for Drupal 8.0.x
Comment #26
cilefen CreditAttribution: cilefen as a volunteer commentedThis needs work based on #25.
Comment #27
mikeker CreditAttribution: mikeker as a volunteer commented@othermachines: Thanks for pointing that out. This issue had fallen off my radar...
Rerolling.
Comment #28
mikeker CreditAttribution: mikeker as a volunteer commentedFor now, pretty much swap of checkPlain() for Html::escape().
Comment #29
jhedstromIs this still an issue now that #2545972: Remove all code usages SafeMarkup::checkPlain() and rely more on Twig autoescaping is in? If so, patch needs rerolling.
Comment #30
jhedstromVerified this is still an issue, rerolling now.
Comment #31
jhedstromReroll of #28.
Comment #32
jhedstromComment #33
dawehnerIts a little bit odd to rely on autoescaping of HTML entities inside a xml document. Are we sure this is the right strategy?
Comment #34
BerdirNo, it's definitely weird. But we're using twig to build XML, so what else can we do? :)
Comment #36
mikeker CreditAttribution: mikeker as a volunteer commentedI reread this issue and talked to @joelpittet at the PNWDS this weekend and we both agree that the current approach in this issue is wrong. Instead, we should be building render arrays in the various RSS plugins and rendering those as strings during preprocess so that Twig will autoescape them when passing them to the template.
I think the reason this issue appeared and disappeared over the months is that we're not clear when we are doing the escaping. Doing it as late as possible -- in the preprocess -- will hopefully clear things up.
Comment #37
joelpittetThis seems to be the correct approach as it ensures twig escaping will encode the the HTML every time and deals with the render arrays being passed along.
Comment #38
xjmThanks for tagging this for rc target triage! To have committers consider it for inclusion in RC, we should add a statement to the issue summary of why we need to make this change during RC, including what happens if we do not make the change and what any disruptions from it are. We can add a section
<h3>Why this should be an RC target</h3>
to the summary.Comment #39
mikeker CreditAttribution: mikeker as a volunteer commentedUpdated issue summary. Also added RC target section as requested in #38.
Comment #40
BerdirLooks good to me, just removed a single word. Back to RTBC so that this can be evaluated.
As the issue summary says, it is currently impossible to have formatted RSS feeds, which prevents very common standard use cases (like an image + description feed.
Comment #41
alexpottDiscussed with the committers and we agreed this was an rc target. The approach looks correct reading the link in the code and things like http://bit.ly/1GC66PE.
Given
I think we need a change record.
Comment #42
mikeker CreditAttribution: mikeker as a volunteer commentedWorking on the CR.
Comment #43
mikeker CreditAttribution: mikeker as a volunteer commentedDraft CR: https://www.drupal.org/node/2601028
Comment #44
Wim LeersThese are deprecated, we should not add them.
Comment #45
mikeker CreditAttribution: mikeker as a volunteer commented@Wim Leers, Thanks for the review. I've removed the
cacheable
entries and round-tripped the view through the latest D8 head to get the latest bits and pieces in the export.Comment #46
BerdirLooks good to me again.
Comment #47
alexpottCommitted 28f5b70 and pushed to 8.0.x. Thanks!
Comment #49
dcam CreditAttribution: dcam as a volunteer commentedI was testing #2552037-16: Aggregator RSS feed outputting empty feed items last night and noticed that the description elements weren't appearing in Aggregator's RSS feed. I backtracked the issue to here. This issue did not really update Aggregator's plugin, contrary to what's mentioned in the issue summary. Its plugin is not yet providing descriptions as render arrays. This makes me wonder whether Comment's RSS feed is having the same problem, but I haven't checked.
I'm not sure it's worth re-opening this issue to fix Aggregator. You can't even see the problem until #2552037: Aggregator RSS feed outputting empty feed items is fixed because the whole RSS feed is currently broken. A proper test can't be written for the other issue until the description is fixed, nor could a test be written for the description until the whole feed is fixed. Due to the catch-22, I'm going to propose that the description render array be added as part of the other issue, even though they really ought to be separate issues. Otherwise, there's just no way to test either of them.
Comment #51
Jabastin Arul CreditAttribution: Jabastin Arul commentedTeam,
This issue again happen latest drupal 8 versin. So I will reopen this issue.
Thanks,
Jabastin
Comment #52
Jabastin Arul CreditAttribution: Jabastin Arul commentedComment #53
Jabastin Arul CreditAttribution: Jabastin Arul commentedhttps://www.drupal.org/project/drupal/issues/3001874#comment-13595732
Comment #54
cilefen CreditAttribution: cilefen as a volunteer commentedPlease open a new issue.