Problem/Motivation

Drupal RSS feeds have html elements in the description field for each node. This is not allowed, all blocks of html codes should be contained within a CDATA tag.

You can read more at this site, http://webservices.xml.com/pub/a/ws/2002/11/19/rssfeedquality.html.

RSS Standard 2.0
Rss Best Practices for element description

Steps to reproduce

Install drupal
Visit example.com/rss.xml
See double encoding of description content.

Proposed resolution

Wrap rss in CDATA
Introduce event subscriber RssResponseCdata to wrap description in CDATA

Remaining tasks

N/A

User interface changes

N/A

API changes

The RssResponseCdata filter for rss now places html in CDATA properly

Data model changes

N/A

Release notes snippet

Issue fork drupal-3433

Command icon Show commands

Start within a Git clone of the project using the version control instructions.

Or, if you do not have SSH keys set up on git.drupalcode.org:

Comments

Anonymous’s picture

In Drupal 4.1.0 the workaround for this issue is editing /usr/share/drupal/include/common.inc and remove all html_entities() in format_rss_item() and format_rss_channel. You can't remove algo strip_tags() because the XML generated won't be valid. I don't know if this issue has been already fixed in Drupal 4.3.0 as I'm waiting for the Debian maintainer to upload the new package. I'll try then.

Steven’s picture

The problem is not malformed XML as far as I can see: all entitities relevant to XML are escaped (as < > ...). HTML entities are doubley escaped (< into <). Drupal outputs correct XML.

It's a question of RSS quality and the general recommendation for using CDATA though: it offers advantages in terms of filesize.

killes@www.drop.org’s picture

Title: Malformed XML Feeds » Use CDATA in XML Feeds (was: Malformed XML Feeds)
Category: bug » feature

Steven says this isn't a bug. Changing title.

drumm’s picture

If this problem is what I think it is then you might want to look at the fixentities filter module I made. It replaces un-entity-coded less than signs and ampersands with the proper codes if they are not part of tags or entity codes. This fixes some validity issues with invalid user input in both the feed and xhtml view.

el777’s picture

CDATA helps us to save a space and get document more readable. But I see here one pitfall. What if our feed contains substring ]]> in it? So feeb become broken?

LAsan’s picture

Version: » 7.x-dev

Is this a feature request?

Still applies to current version?

mdupont’s picture

Version: 7.x-dev » 8.x-dev

Bumping to 8.x-dev

EvanDonovan’s picture

Title: Use CDATA in XML Feeds (was: Malformed XML Feeds) » Use CDATA in XML RSS Feeds (was: Malformed XML Feeds)
Issue tags: +Needs markup review

Does anyone know what the current best practice is for RSS feeds serialized in XML? CDATA seems to make sense to me, as per the OP from 2003, but double entity-encoding appears to be more common, and is used in Views as well.

It would be good to have someone assess this issue who is still active with Drupal (i.e., not Steven). Tagging with "Needs markup review" accordingly.

jhedstrom’s picture

Version: 8.0.x-dev » 8.1.x-dev
Issue summary: View changes
Status: Active » Postponed (maintainer needs more info)
Issue tags: +Needs issue summary update

Version: 8.1.x-dev » 8.2.x-dev

Drupal 8.1.0-beta1 was released on March 2, 2016, which means new developments and disruptive changes should now be targeted against the 8.2.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.2.x-dev » 8.3.x-dev

Drupal 8.2.0-beta1 was released on August 3, 2016, which means new developments and disruptive changes should now be targeted against the 8.3.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.3.x-dev » 8.4.x-dev

Drupal 8.3.0-alpha1 will be released the week of January 30, 2017, which means new developments and disruptive changes should now be targeted against the 8.4.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.4.x-dev » 8.5.x-dev

Drupal 8.4.0-alpha1 will be released the week of July 31, 2017, which means new developments and disruptive changes should now be targeted against the 8.5.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.5.x-dev » 8.6.x-dev

Drupal 8.5.0-alpha1 will be released the week of January 17, 2018, which means new developments and disruptive changes should now be targeted against the 8.6.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.6.x-dev » 8.7.x-dev

Drupal 8.6.0-alpha1 will be released the week of July 16, 2018, which means new developments and disruptive changes should now be targeted against the 8.7.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

james.williams’s picture

Component: base system » views.module
Category: Feature request » Bug report
Status: Postponed (maintainer needs more info) » Active

Stumbled across this today, on investigating code blocks within RSS feeds. If a node has this content in a field, included in an item of an RSS feed:

< pre>< code>&lt;a href=&quot;/contact&quot;&gt;Enquire&lt;/a&gt;< /code>< /pre>

(Spaces deliberately added in the code & pre tags here, just to demonstrate the markup, as it doesn't seem possible to nest them in a comment on drupal.org! The actual content would not have these.)

(This example comes from an article I wrote - the first codeblock here: https://www.computerminds.co.uk/articles/open-links-popups-foundation?so... - I'm trying to show the actual HTML that should be used for a link.)

The version in the RSS feed source decodes the &quot; entities to be actual double-quotes ("). It leaves the &gt; and &lt; angle brackets within the code tag as they were, even though the ones on the wrapping pre and code elements are encoded. So the interpreted result for RSS readers is:

< pre>< code><a href="/contact">Enquire</a>< /code>< /pre>

So readers actually render this as a link, within a styled pre/code block, rather than the plain HTML code, which is definitely not wanted.

I suspect wrapping the whole content in CDATA tags would be the best way forward, otherwise Drupal is always going to be trying to fudge the desired result, when CDATA is designed exactly for this.

Otherwise, how else can code blocks within content be included correctly in an RSS feed?

-----

Phew, getting the right encoding written into this comment is hard work!

Version: 8.7.x-dev » 8.8.x-dev

Drupal 8.7.0-alpha1 will be released the week of March 11, 2019, which means new developments and disruptive changes should now be targeted against the 8.8.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.8.x-dev » 8.9.x-dev

Drupal 8.8.0-alpha1 will be released the week of October 14th, 2019, which means new developments and disruptive changes should now be targeted against the 8.9.x-dev branch. (Any changes to 8.9.x will also be committed to 9.0.x in preparation for Drupal 9’s release, but some changes like significant feature additions will be deferred to 9.1.x.). For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

Version: 8.9.x-dev » 9.1.x-dev

Drupal 8.9.0-beta1 was released on March 20, 2020. 8.9.x is the final, long-term support (LTS) minor release of Drupal 8, which means new developments and disruptive changes should now be targeted against the 9.1.x-dev branch. For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

larowlan’s picture

Issue tags: +Bug Smash Initiative
larowlan’s picture

Issue tags: +Novice

I think that we can do this in template_preprocess_views_view_row_rss by wrapping the description - as that is the only field that should contain HTML. I think this is possibly now a novice issue, tagging as such.

lendude’s picture

Yeah that already has the following comment:
// The description is the only place where we should find HTML.
// @see https://validator.w3.org/feed/docs/rss2.html#hrelementsOfLtitemgt
// If we have a render array, render it here and pass the result to the
// template, letting Twig autoescape it.

Which was done in #2500931: Views feed doesn't encode embedded HTML anymore, there in no talk of cdata there, but some test coverage was added for it in \Drupal\views\Tests\Plugin\DisplayFeedTest which we would need to update

Version: 9.1.x-dev » 9.2.x-dev

Drupal 9.1.0-alpha1 will be released the week of October 19, 2020, which means new developments and disruptive changes should now be targeted for the 9.2.x-dev branch. For more information see the Drupal 9 minor version schedule and the Allowed changes during the Drupal 9 release cycle.

larowlan’s picture

Issue tags: +DrupalGov 2020
oceanic’s picture

Assigned: Unassigned » oceanic
oceanic’s picture

Assigned: oceanic » Unassigned
lendude’s picture

I tried to reproduce the issues with double escaping and other malformed things, but as far as I can tell, that doesn't happen anymore.

So supporting CDATA vs escaping HTML feels like a feature and not a bug.

lendude’s picture

Category: Bug report » Feature request
Status: Active » Postponed (maintainer needs more info)

I've tested this further and the google RSS reader has no trouble converting the escaped HTML back to readable HTML at least, so to me, there is no bug here.

So for now I'm moving this to a feature request to use CDATA over escaped HTML, but also postponing on getting some more information on why using CDATA over escaped HTML would be preferable.

quietone’s picture

Issue summary: View changes
mradcliffe’s picture

I am removing the Novice tag from this issue because it looks like @quietone updated the issue summary to reflect why it's been postponed. I don't think there's anything further here to do for a novice without further details.

Version: 9.2.x-dev » 9.3.x-dev

Drupal 9.2.0-alpha1 will be released the week of May 3, 2021, which means new developments and disruptive changes should now be targeted for the 9.3.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Version: 9.3.x-dev » 9.4.x-dev

Drupal 9.3.0-rc1 was released on November 26, 2021, which means new developments and disruptive changes should now be targeted for the 9.4.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Version: 9.4.x-dev » 9.5.x-dev

Drupal 9.4.0-alpha1 was released on May 6, 2022, which means new developments and disruptive changes should now be targeted for the 9.5.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Version: 9.5.x-dev » 10.1.x-dev

Drupal 9.5.0-beta2 and Drupal 10.0.0-beta2 were released on September 29, 2022, which means new developments and disruptive changes should now be targeted for the 10.1.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Version: 10.1.x-dev » 11.x-dev

Drupal core is moving towards using a “main” branch. As an interim step, a new 11.x branch has been opened, as Drupal.org infrastructure cannot currently fully support a branch named main. New developments and disruptive changes should now be targeted for the 11.x branch, which currently accepts only minor-version allowed changes. For more information, see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

smustgrave’s picture

Status: Postponed (maintainer needs more info) » Closed (outdated)

If still a valid feature request please reopen updating issue summary per #28.

Also could be unrelated but #3259255: Html::escapeCdataElement() not adding CDATA correctly was committed so wonder if that solved anything.

Thanks!

fgm’s picture

Status: Closed (outdated) » Active

Technically, as I understand the RSS 2.0 standard, HTML in the items should (must ?) be escaped, and should still be in a CDATA section: I had to look this up when upgrading the G2 module for Drupal 10/9.

This being said, comment #28 mentions Google Reader, which is no longer a thing, but all current RSS readers I've used (only a few, admittedly) handle this "incorrect" escaped HTML correctly these days.

I've reset the issue to active because it still feels like we should have a clearly defined, standards-backed answer, instead of just some "it works" anecdotal info.

luke.stewart’s picture

Assuming this validator is valid...

https://validator.w3.org/feed/check.cgi?url=drupal.org%2Fnode%2Ffeed

As of now reports that this is a valid feed.
It recommends the following:
description should not contain iframe tag
description should not contain relative URL references:
Missing atom:link with rel="self"

Does this mean that this issue can be closed? (Again?)

james.williams’s picture

@luke.stewart If you’re seeing “description should not contain iframe tag”, then no, this is still a problem to fix. Drupal is outputting HTML in the description elements which needs escaping via CDATA tags.

nicxvan made their first commit to this issue’s fork.

nicxvan’s picture

I've been following this for a while. I found this: https://www.drupal.org/project/views_rss/issues/3409451 and the related with more discussion: https://www.drupal.org/project/views_rss/issues/3079683

I am going to create an issue fork and just wrap the description in CDATA and see if any tests complain to see if we can move this forward.

nicxvan’s picture

Aside from being curious if this will have any failures I do have two implementation questions.

1. Do we need to detect if there is html in the description before adding the CDATA wrapper, my understanding is it is not harmful to add the CDATA wrapper everywhere.
2. Do we need to wrap the main description in views-view-rss.html.twig? It was done in the issue I linked above, but I don't think that is actually necessary.

nicxvan’s picture

Looks like there was at least one failure in:

Testing Drupal\Tests\node\Functional\NodeRSSContentTest
    .E                                                                  2 / 2
    (100%)
    
    Time: 00:14.744, Memory: 4.00 MB
    
    There was 1 error:
    
    1) Drupal\Tests\node\Functional\NodeRSSContentTest::testUrlHandling
    Behat\Mink\Exception\ExpectationException: The string
    "http://localhost/subdirectory/sites/simpletest/14042105/files/root-relative"
    was not found anywhere in the HTML response of the current page.
    
    /builds/issue/drupal-3433/vendor/behat/mink/src/WebAssert.php:888
    /builds/issue/drupal-3433/vendor/behat/mink/src/WebAssert.php:363
    /builds/issue/drupal-3433/core/tests/Drupal/Tests/WebAssert.php:547
    /builds/issue/drupal-3433/core/modules/node/tests/src/Functional/NodeRSSContentTest.php:120
    /builds/issue/drupal-3433/vendor/phpunit/phpunit/src/Framework/TestResult.php:728

I can look at this later if someone else doesn't get to it before me.
There was another failure in AJAX but I think it was unrelated because rerunning it succeeded.

duaelfr’s picture

I did the exact same change in my own views-view-row-rss.html.twig for both the title and the description.
While there is no issue with the title, the description gets filtered somehow.

I tried to duplicate the description field with another name:

<item>
  <title><![CDATA[{{ title }}]]></title>
  <link>{{ link }}</link>
  <description><![CDATA[{{ description }}]]></description>
  <anything_else_but_description><![CDATA[{{ description }}]]></anything_else_but_description>
  {% for item in item_elements -%}
    <{{ item.key }}{{ item.attributes -}}
    {% if item.value -%}
      >{{ item.value }}</{{ item.key }}>
    {% else -%}
      {{ ' />' }}
    {% endif %}
  {%- endfor %}
</item>

The "anything_else_but_description" field does not get filtered out.
Does anyone know where it could happen?

nicxvan’s picture

I'm not sure why that is happening, but you shouldn't need to add cdata to the title as I don't think that can be html.

nicxvan’s picture

I did some investigation, there is some process that is converting root relative and protocol relative urls to absolute in the twig file when the cdata is not wrapping it.

Here are the assertions with comments alluding to this:
// Verify that root-relative URL is transformed to absolute.
$this->assertSession()->responseContains($file_url_generator->generateAbsoluteString('public://root-relative'));
// Verify that protocol-relative URL is left untouched.
$this->assertSession()->responseContains($protocol_relative_url);
// Verify that absolute URL is left untouched.
$this->assertSession()->responseContains($absolute_url);

I downloaded the artifacts and here are the urls after decoding:
Protocol-relative URL
Root-relative URL

I compared with a passing test to confirm those do get http://localhost. Does anyone know what mechanism is converting those to absolute urls?

I pushed up a description filtered with RAW to test how that gets output. For some reason these tests are not working locally. I'm confident this will still fail, but wanted to check.

nicxvan’s picture

That actually worked!

I'm not sure it actually solves the issue, I don't see the CDATA section properly.

Other modules that use xml with CDATA add it like this:
public function addCdata($cdata_text): void {
$node = dom_import_simplexml($this);
$no = $node->ownerDocument;
$node->appendChild($no->createCDATASection($cdata_text));
}

And builds it from a \SimpleXMLElement object first.

I'm not sure we need to do all of that. It's possible I'm looking at the wrong thing.

nicxvan’s picture

There is something weird going on.
With this in the template:

<description><![CDATA[{{ description|raw }}]]></description>
The output looks like this and is encoded, there is no cdata array in the output.

<description>
&lt;span&gt;test&lt;/span&gt;
&lt;span&gt;&lt;span&gt;admin&lt;/span&gt;&lt;/span&gt;
&lt;span&gt;&lt;time datetime="2023-11-19T02:52:51+00:00" title="Sunday, November 19, 2023 - 02:52"&gt;Sun, 11/19/2023 - 02:52&lt;/time&gt;
&lt;/span&gt;
</description>

I then changed the twig template to this to confirm that I was editing the right template:
<description>Before CDATA<![CDATA[{{ description|raw }}]]>After CDATA</description>

Now the output looks like this
The description content shows up twice for some reason, the before callout shows up once and the after callout shows up twice. The cdata array is there now too.

<description>Before CDATA
&lt;span&gt;adsf&lt;/span&gt;
&lt;span&gt;&lt;span&gt;admin&lt;/span&gt;&lt;/span&gt;
&lt;span&gt;&lt;time datetime="2023-11-19T02:52:58+00:00" title="Sunday, November 19, 2023 - 02:52"&gt;Sun, 11/19/2023 - 02:52&lt;/time&gt;
&lt;/span&gt;
After CDATA<![CDATA[
<span>adsf</span>
<span><span>admin</span></span>
<span><time datetime="2023-11-19T02:52:58+00:00" title="Sunday, November 19, 2023 - 02:52">Sun, 11/19/2023 - 02:52</time>
</span>
]]>After CDATA</description>
fgm’s picture

FWIW I temporarily had this problem of escaped content after the D9->D10 migration on https://osinet.fr/rss.xml, and is disappeared a few days ago when I upgraded to 10.2.4. Not sure what changed in core that could cause these differences.

duaelfr’s picture

Status: Active » Needs review

I traced down the CDATA removal to \Drupal\Core\EventSubscriber\RssResponseRelativeUrlFilter::transformRootRelativeUrlsToAbsolute()

I updated MR !7201 with a fix.

nicxvan’s picture

Great find!
Rerunning tests since they are failing, but I can't tell if they are germane.

nicxvan’s picture

Status: Needs review » Needs work

Functional test is failing, I can't take a look at the moment.

nicxvan’s picture

Ah it's the new test: Drupal\Tests\Core\EventSubscriber\RssResponseRelativeUrlFilter

nicxvan’s picture

I'm rerunning again, I'm slightly suspicious since the failing test is the one that was just updated, but running that test locally passes, trying to run all tests locally and see if I get failures.

nicxvan’s picture

Assigned: Unassigned » nicxvan

Ok I can replicate this locally now, I think I can fix this.

nicxvan’s picture

Assigned: nicxvan » Unassigned
Issue summary: View changes

Ok this test should run, it seems to put the closing CDATA tag on a new line.

I patched a site and ran the rss validator and it looks valid as far as the CDATA is concerned.
It shows this error: Missing atom:link with rel="self" which is unrelated to the issue being solved here.

I'll move to needs review once I confirm tests are passing.

nicxvan’s picture

Status: Needs work » Needs review
nicxvan’s picture

Issue summary: View changes
nicxvan’s picture

I am using this site: https://validator.w3.org/feed/#validate_by_input
Here is the test rss:

<?xml version="1.0" encoding="utf-8"?>
<rss xmlns:dc="http://purl.org/dc/elements/1.1/" version="2.0" xml:base="https://mercury.ddev.site/">
  <channel>
    <title>Drush Site-Install</title>
    <link>https://mercury.ddev.site/</link>
    <description/>
    <language>en</language>
    
    <item>
  <title>test</title>
  <link>https://mercury.ddev.site/node/3</link>
  <description><![CDATA[
<span data-entity-title-editable-id="node/3">test</span>

<span><a title="View user profile." href="https://mercury.ddev.site/user/1">admin</a></span>

<span><time datetime="2024-04-10T19:11:26+00:00" title="Wednesday, April 10, 2024 - 19:11">Wed, 04/10/2024 - 19:11</time>
</span>

            <div class="text-content clearfix field field--name-body field--type-text-with-summary field--label-hidden field__item"><p>asdf</p></div>
      ]]></description>
  <pubDate>Wed, 10 Apr 2024 19:11:26 +0000</pubDate>
    <dc:creator>admin</dc:creator>
    <guid isPermaLink="false">3 at https://mercury.ddev.site</guid>
    </item>

  </channel>
</rss>

If I remove the cdata tag and use this instead

<?xml version="1.0" encoding="utf-8"?>
<rss xmlns:dc="http://purl.org/dc/elements/1.1/" version="2.0" xml:base="https://mercury.ddev.site/">
  <channel>
    <title>Drush Site-Install</title>
    <link>https://mercury.ddev.site/</link>
    <description/>
    <language>en</language>
    
    <item>
  <title>test</title>
  <link>https://mercury.ddev.site/node/3</link>
  <description>
<span data-entity-title-editable-id="node/3">test</span>

<span><a title="View user profile." href="https://mercury.ddev.site/user/1">admin</a></span>

<span><time datetime="2024-04-10T19:11:26+00:00" title="Wednesday, April 10, 2024 - 19:11">Wed, 04/10/2024 - 19:11</time>
</span>

            <div class="text-content clearfix field field--name-body field--type-text-with-summary field--label-hidden field__item"><p>asdf</p></div>
      </description>
  <pubDate>Wed, 10 Apr 2024 19:11:26 +0000</pubDate>
    <dc:creator>admin</dc:creator>
    <guid isPermaLink="false">3 at https://mercury.ddev.site</guid>
    </item>

  </channel>
</rss>

The validator complains about the div and html as expected.

smustgrave’s picture

Status: Needs review » Needs work
Issue tags: +Needs change record

So as a template change will need to update the other views-view-row-rss.html.twig in core.

But also do a CR to announce to other themes to update their templates.

nicxvan’s picture

Got some new failures, I'm taking a look at them.

nicxvan’s picture

Here is the failing test:

[31mDrupal\Tests\views\Functional\Plugin\DisplayFeedTest           0 passes             1 exceptions             
[0mFATAL Drupal\Tests\views\Functional\Plugin\DisplayFeedTest: test runner returned a non-zero error code (2).
[31mDrupal\Tests\views\Functional\Plugin\DisplayFeedTest           0 passes   1 fails    

I am unable to replicate this failure locally at the moment.

nicxvan’s picture

I was able to replicate I am pushing up a fix:

I'm not sure if the updated test needs more work.

We also updated stable, let me know if that needs a follow up issue instead.

nicxvan’s picture

Status: Needs work » Needs review
nicxvan’s picture

Issue tags: -Needs change record
smustgrave’s picture

Status: Needs review » Reviewed & tested by the community
Issue tags: +Needs Review Queue Initiative

I reviewed this the other day and forgot to hit save :(

But tweaked the CR slightly to include an examples.

https://git.drupalcode.org/issue/drupal-3433/-/jobs/1313487 shows the test coverage

All core templates appear to be updated.

Believe this is good.

nicxvan’s picture

Status: Reviewed & tested by the community » Needs work

Temporarily moving this back to needs work while I double check something, I think I can simplify this a bit.

nicxvan’s picture

Status: Needs work » Needs review

Moving back to needs review since I changed the approach.

I think the raw filter was not appropriate and I realized we could always wrap the description in CDATA so I moved it to the url transform.

I then used XSS::filterAdmin to make sure no dangerous scripts get through.

I'll bump it back to needs work if the tests don't pass.

nicxvan’s picture

Status: Needs review » Needs work

Tests faililng

nicxvan’s picture

Status: Needs work » Needs review

I cleaned up the tests further since we are wrapping the description in CDATA we don't need an extra test case, we just need to update the feed test to account for CDATA and ensure the feed is outputting CDATA (which we already put in place).

nicxvan’s picture

I also simplified the Change Record since we are no longer updating the templates.

smustgrave’s picture

Status: Needs review » Reviewed & tested by the community

Even better! Whenever we don't have to update all the twig templates its definitely the route. Helps contrib themes.

Believe the CR is still needed and what you have is good.

nicxvan’s picture

Title: Use CDATA in XML RSS Feeds (was: Malformed XML Feeds) » Use CDATA in XML RSS Feeds
alexpott’s picture

Status: Reviewed & tested by the community » Needs work

Is the HTML admin created or user created. If it is user created we should be using Xss::filter() because the admin filter is still very liberal.

nicxvan’s picture

I used filterAdmin because that is what views usually uses as a final pass for rewrites and I wanted to preserve the most HTML that might have been configured as allowed on filter formats and added to the view.

You're probably right though, cause if someone entered HTML in a plain text field then it passed through this filter it would convert it to HTML.

I think converting this to filter could strip out some HTML that was enabled by an admin on a text format once it passes through the CDATA conversion, but that is probably better than outputting more HTML than expected.

I'll make the change.

nicxvan’s picture

Status: Needs work » Needs review

Ok I used the example in locale for which html tags to allow since the default blank filter even removes paragraph tags:

['a', 'abbr', 'acronym', 'address', 'b', 'bdo', 'big', 'blockquote', 'br', 'caption', 'cite', 'code', 'col', 'colgroup', 'dd', 'del', 'dfn', 'dl', 'dt', 'em', 'h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'hr', 'i', 'ins', 'kbd', 'li', 'ol', 'p', 'pre', 'q', 'samp', 'small', 'span', 'strong', 'sub', 'sup', 'table', 'tbody', 'td', 'tfoot', 'th', 'thead', 'tr', 'tt', 'ul', 'var']

smustgrave’s picture

Status: Needs review » Reviewed & tested by the community

Coverage still appears to be there https://git.drupalcode.org/issue/drupal-3433/-/jobs/1385050

Believe the feedback/change request has been addressed.

CR probably is still useful, 50/50 since it doesn't require a twig change now.

paulsilva made their first commit to this issue’s fork.

larowlan’s picture

Category: Feature request » Bug report

I think this is a bug, not a feature request. Which means its still eligible for 11.x/10.3.x

larowlan’s picture

Status: Reviewed & tested by the community » Needs work

Left some minor nits on the MR, feel free to self RTBC after addressing.

nicxvan’s picture

Status: Needs work » Reviewed & tested by the community

Thanks, I updated the comments for both instances!

nicxvan’s picture

Status: Reviewed & tested by the community » Needs work

Ah my local is out of date, one moment while I rebase.

nicxvan’s picture

Status: Needs work » Reviewed & tested by the community

Ok, that is resolved now, I'll keep an eye on tests, but changing comments shouldn't affect them.

nicxvan’s picture

Status: Reviewed & tested by the community » Needs work

I'm doing a full rebase, there are a bunch of unrelated failures.

nicxvan’s picture

Status: Needs work » Reviewed & tested by the community

Ok, I rebased on 11.x I'll keep an eye on tests.

nicxvan’s picture

Tests are happy on 11.x.

10.2 branch is failing for an unrelated test.

quietone’s picture

Issue summary: View changes

Added useful links to the issue summary and updated credit.

longwave’s picture

Status: Reviewed & tested by the community » Needs review

Is piggybacking on RssResponseRelativeUrlFilter really the right thing to do? That event subscriber is only supposed to be fixing up relative URIs, and now we're extending it to convert the description to CDATA - this feels like the wrong place to be doing this. Should we be doing this when the RSS feed is generated? Or maybe we need a separate event subscriber?

nicxvan’s picture

Good point, I'll take a look later this week unless someone else gets to it first.

nicxvan’s picture

Assigned: Unassigned » nicxvan
nicxvan’s picture

Assigned: nicxvan » Unassigned

Got some failures, my local tests are not working at the moment so I'm working on figuring that out now.

nicxvan’s picture

Got my local working, this should fix it.

nicxvan’s picture

I did not update the 10.2 branch this time since I didn't create it and I'm not sure this is eligible for backport anyway.

nicxvan’s picture

I ran test only and it's failing as expected.

nicxvan’s picture

Issue summary: View changes
smustgrave’s picture

Status: Needs review » Reviewed & tested by the community
Issue tags: -Needs markup review

Removing Needs markup review as this isn't changing the twig template (assuming that's what the tag is for).

Per the slack thread, issue summary and change record have both been updated with new approach.

Small feedback to the MR has also been addressed.

Test-only feature did fail before last push so coverage is there.

LGTM

quietone’s picture

I made changes to comments, mostly about capitalization. And small changes to the change record to put the change first. I did not do a code review.

After that I then saw that there are two new functions that do not have return types. I need a break so I am setting to NW for that. See https://www.drupal.org/docs/develop/standards/php/php-coding-standards#s....

nicxvan’s picture

Status: Reviewed & tested by the community » Needs work

I'll take a look

nicxvan’s picture

Status: Needs work » Needs review

I set string|false since https://www.php.net/manual/en/domdocument.savexml.php shows that, let me know if that should be handled differently.

smustgrave’s picture

Status: Needs review » Needs work

2 small comments.

nicxvan’s picture

Status: Needs work » Needs review

Committed the suggestion and answered your question.

nicxvan’s picture

I addressed your second comment.

smustgrave’s picture

Status: Needs review » Reviewed & tested by the community

Thanks LGTM!

longwave’s picture

Status: Reviewed & tested by the community » Needs work

I think this solution is really clean, just one nit about an unused argument.

longwave’s picture

Status: Needs work » Reviewed & tested by the community

In the interests of not holding up a 20+ year old issue any longer, I fixed the nit and set it back to RTBC and will commit later if the tests pass.

  • longwave committed ebc8e063 on 10.4.x
    Issue #3433 by nicxvan, quietone, longwave, DuaelFr, smustgrave,...

  • longwave committed eba56b0f on 11.x
    Issue #3433 by nicxvan, quietone, longwave, DuaelFr, smustgrave,...
longwave’s picture

Version: 11.x-dev » 10.4.x-dev
Status: Reviewed & tested by the community » Fixed

Decided to commit this to 11.1.x and 10.4.x only as there is a chance of it being vaguely disruptive to existing sites that are using RSS and not necessarily expecting CDATA in their output here, given we wrote a change record it makes sense to put it in a minor release only.

Still, great to see a 20+ year old bug finally fixed, and in such a clean way given we can just add the event subscriber to modify RSS responses.

Committed and pushed eba56b0fd7 to 11.x and ebc8e0639e to 10.4.x. Thanks!

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.

mmillford’s picture

Is there a way to avoid this xss filter? If we're constructing an rss feed whose content we trust, then there should be a way for that content to survive this event subscriber. I would've assumed that wrapping the trusted content in its own CDATA during rendering would've worked, but this new event subscriber doesn't check for existing cdata, and just filters everything without prejudice.

opsdemon’s picture

Definitely agree with this idea. Since 10.4.0 all div, style and img tags are getting stripped from our newsletter RSS feed and we ended up downgrading to 10.3 to get it working again.

We use mailchimp and there is no option to add html structure or styling on the other end. If we could wrap our trusted, styled output from the view using a CDATA section and bypass the new filter then that would clear our upgrade path to 10.4 (which is currently blocked by this issue).

nicxvan’s picture

They are event subscribers so you should be able to override them.

opsdemon’s picture

How and where would I do that? Any pointers or examples would be much appreciated.

mmillford’s picture

I put together this module to override those services (both the xss filter and the relative url filter, as both strip an existing cdata tag) and add in a check for cdata sections: https://www.drupal.org/project/rss_trust_cdata

You can use that module or check the source to see how it was implemented in order to write your own.

I still think this core service should include some ability to trust content or just opt out of rewriting, but as nicxvan said, overriding the core services will work for now.

opsdemon’s picture

Thank you for providing a module to override this behaviour.

In this case, the view output comes from a twig template and it turns out that wrapping the output in a CDATA section doesn't work there either (as it also gets stripped out).

Eventually I found a fix to that issue here: views_rss/issues/3079683#comment-15110037

Applying the fix to the .theme file fixed the issue and the newsletter formatting looks ok now.

dries’s picture

This commit appears to have broken my RSS feed at https://dri.es/rss.xml: all 'img' tags are now being stripped from the feed.

I suspect the issue is related to this change, as it seems the 'img' tag isn't included in the allow-list.

It is also be worth considering whether other elements like 'figcaption', 'figure', and 'video' should be preserved to maintain important content in RSS feeds.

longwave’s picture

Thanks for reporting, the regression with img and certain other tags is being tracked at #3497758: Regression: RssResponseCdata filtering out common HTML tags from RSS feeds

alexpott’s picture

heikkiy’s picture

EDIT: This comment went to wrong issue.

richarddavies’s picture

An issue was just reported with a view RSS feed on our website that I think may be caused/related to this change. We have HTML in the RSS description element and it's being double encoded now. First, tags like <p> are being encoded as &lt;p&gt; and then everything is wrapped in a CDATA tag. The correct approach is to use one or the other, but not both (https://cyber.harvard.edu/rss/encodingDescriptions.html).

I can't seem to find a way to configure the feed to turn off either the CDATA or entity encoding, so how do I fix this?

nicxvan’s picture

This change was reverted so it must be something else double encoding.