Problem/Motivation

After upgrading to Drupal 10.4.0, I found that our RSS feed doesn't render div, img, and style tags. Doing so breaks our RSS email structurally (div) and visually (img and style).

Steps to reproduce

Proposed resolution

Remove filtering
Add XSS test

Remaining tasks

User interface changes

Introduced terminology

API changes

Data model changes

Release notes snippet

CommentFileSizeAuthor
#3 3497758.patch11.55 KBcilefen

Issue fork drupal-3497758

Command icon Show commands

Start within a Git clone of the project using the version control instructions.

Or, if you do not have SSH keys set up on git.drupalcode.org:

Comments

wlofgren created an issue. See original summary.

opsdemon’s picture

I'm having the same issue. Since 10.4.0 all div, style and img tags are getting stripped from our newsletter RSS feed and we ended up downgrading to 10.3 to get it working again.

Is there any workaround for this?

cilefen’s picture

StatusFileSize
new11.55 KB

It's obviously a temporary measure, but you could apply the reverse of the feature patch.

git diff ebc8e0639e0e12391087d9abe0ad8eca0034d1ea ebc8e0639e0e12391087d9abe0ad8eca0034d1ea~1 >patches/3497758.patch

I've attached that patch.

opsdemon’s picture

After some more investigation I found a workaround for the issue here: views_rss/issues/3079683#comment-15110037

Applying the fix to the .theme file fixed the issue and the newsletter formatting looks ok now with 10.4.1.

wlofgren’s picture

I ended up extending the RssResponseCdata class and register a service that decorates it in my module.

module_name.services.yml

services:
  module_name.response_filter.rss.cdata:
    class: Drupal\module_name\EventSubscriber\RssResponseCdata
    decorates: response_filter.rss.cdata
    decoration_priority: 10

src/EventSubscriber/RssResponseCdata.php

<?php

namespace Drupal\module_name\EventSubscriber;

use Drupal\Component\Utility\Xss;
use Drupal\Core\EventSubscriber\RssResponseCdata as PrimaryRssResponseCdata;
use Symfony\Component\EventDispatcher\EventSubscriberInterface;
use Symfony\Component\HttpKernel\Event\ResponseEvent;
use Symfony\Component\HttpKernel\KernelEvents;

/**
 * Subscribes to wrap RSS descriptions in CDATA.
 */
class RssResponseCdata extends PrimaryRssResponseCdata implements EventSubscriberInterface {

  /**
   * Converts description node to CDATA RSS markup.
   *
   * @param string $rss_markup
   *   The RSS markup to update.
   *
   * @return string|false
   *   The updated RSS XML or FALSE if there is an error saving the xml.
   */
  protected function wrapDescriptionCdata(string $rss_markup): string|false {
    $rss_dom = new \DOMDocument();

    // Load the RSS, if there are parsing errors, abort and return the unchanged
    // markup.
    $previous_value = libxml_use_internal_errors(TRUE);
    $rss_dom->loadXML($rss_markup);
    $errors = libxml_get_errors();
    libxml_use_internal_errors($previous_value);
    if ($errors) {
      return $rss_markup;
    }

    foreach ($rss_dom->getElementsByTagName('item') as $item) {
      foreach ($item->getElementsByTagName('description') as $node) {
        $html_markup = $node->nodeValue;
        if (!empty($html_markup)) {
          $tag_list = [
            'a', 'abbr', 'acronym', 'address',
            'b', 'bdo', 'big', 'blockquote', 'br',
            'caption', 'cite', 'code', 'col', 'colgroup',
            'dd', 'del', 'dfn', 'div', 'dl', 'dt', 'em',
            'h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'hr',
            'i', 'img', 'ins', 'kbd', 'li', 'ol', 'p', 'pre', 'q',
            'samp', 'small', 'span', 'strong', 'sub', 'sup',
            'table', 'tbody', 'td', 'tfoot', 'th', 'thead', 'tr', 'tt', 'ul', 'var'
          ];
          $html_markup = Xss::filter($html_markup, $tag_list);
          $new_node = $rss_dom->createCDATASection($html_markup);
          $node->replaceChild($new_node, $node->firstChild);
        }
      }
    }

    return $rss_dom->saveXML();
  }

}

nicxvan’s picture

You can also use the module posted at the end of the parent issue.

It's easy to override, but as you can imagine we couldn't allow all html by default for security.

At least we get valid rss feeds now, and if you need more you can decorate or override.

Do these issues normally get closed works as designed or left open for a couple of weeks for discussion?

wim leers’s picture

Title: RssResponseCdata filtering out common html tags » Regression: RssResponseCdata filtering out common HTML tags from RSS feeds
Priority: Normal » Critical
Issue tags: +Regression

I too am getting hit by this on my blog 😅 This explains why mysteriously I couldn't get feeds to work correctly after migrating from Drupal 7 to 10!

The change record states:

[…] add the CDATA directive and provide XSS Filtering. Since this can be user entered data we use strict filtering.

… but in my case, this is not data entered by an untrusted user. And isn't that quite often the case? It's content filtered through a text format, so any images that appear are explicitly allowed by the text format.

AFAICT this was not considered in #3433: Use CDATA in XML RSS Feeds 🫣

How often are RSS feeds created that expose data from untrusted user input (e.g. a feed of comments) vs that what is the very purpose of the site (blog, photos, products …)?

nicxvan’s picture

It was considered, initially I wanted to use the admin filter and @alexpott recommended a more strict filter since we don't know if we can trust the content and site owners can override it since it's an event subscriber.

I'd be happy changing the core if it passes security muster or we can release a module that overrides it like referenced so it's more secure in core and more eyes open contrib if you want more permissive.

longwave’s picture

@Dries also reported this over in #3433-118: Use CDATA in XML RSS Feeds.

If we add <img> and <div> to the list of allowed tags, is that good enough as a stopgap? We can consider removing the filtering later but it feels like that needs more investigation.

nicxvan’s picture

Version: 10.4.x-dev » 11.x-dev
Status: Active » Needs work

That seems more reasonable to add, are there other tags we should add?

nicxvan changed the visibility of the branch 3497758-regression-rssresponsecdata-filtering to hidden.

nicxvan’s picture

Status: Needs work » Needs review
opsdemon’s picture

In my case, the style tag was also needed as there is no way to style our newsletter items at the other end in mailchimp.

dries’s picture

I believe allowing the img and div tags would resolve many of the issues, but not all of them.

For example, on my site, videos are also being incorrectly filtered out. My implementation for embedding videos uses an iframe, which is a common approach recommended by providers like YouTube (see Google's YouTube documentation). I believe that Drupal's oEmbed feature also relies on iframes, as this is the standard method for embedding external content. iframes are interesting because they can be used for embedding malicious content or phishing. But of course, so can the a or img tags. In this case, the iframe is the result of Drupal replacing a token (e.g. [video ID]), hence the generated HTML should be safe and trustworthy.

I tend to agree with Wim's comment in #7: why do we need additional filtering beyond what already passes Drupal's text formatters and output filters? Don't they do the job of balancing flexibility and security?

It seems unnecessary to apply additional filtering to RSS feeds beyond what is used for generating regular HTML pages (to anonymous users). Instead, the solution could be to rely on the fields' existing text formatters and output filters.

If there are specific security concerns beyond that, it would be helpful to have those clarified or documented. Until then, I don't understand why RSS feeds need stricter filtering. Maybe we believe that RSS readers don't sanitize or inspect HTML the same way browsers do?

nicxvan’s picture

I think there are some alternatives:

  1. We can use XSS:filterAdmin
  2. We can create a contrib module that overrides this and gives more flexibility
  3. We can add a form to configure this in Drupal directly

The first option is the direction I took in the original issue, that was considered too risky, do we want to add a form / permission and configuration to allow overriding this? Or does that belong in contrib?

dries’s picture

Why not the following option: 'Use Drupal's text formatters and output filters already configured for the fields'?

nicxvan’s picture

When that event fires we don't necessarily know the source, I bet we could let you choose a text formatter to use then you could configure it however you wanted. There are still concerns there if you have untrusted content in the RSS feed and you use a text format that is more trusted.

That might work though.

nicxvan’s picture

Status: Needs review » Needs work

I'm trying something.

nicxvan’s picture

Issue summary: View changes

Ok so this probably needs tests.

Also a review of the following specifics.
Confirmation that the text format loading is correct.
Confirmation that renderInIsolation is correct.

@alexpott had security concerns on the original issue so we may want him to take a look here too.

nicxvan’s picture

Status: Needs work » Needs review
nicxvan’s picture

Issue tags: +Needs tests

Tests are failing, but a review of the process would be helpful before taking a look at those.

oily made their first commit to this issue’s fork.

oily’s picture

Edited MR, PHPSTAN errors x2

nicxvan’s picture

@oily would you mind reverting your changes?

We likely need that additional mock, phpstan was failing because I forgot the use statement and it needs some more work, but it's needed so we can provide the filter format which this is adding.

It's also probably in the wrong place, but as I mentioned I'm hoping for a review of the approach.

longwave’s picture

We are trying to remove the global RSS settings form over in #2601030: Views RSS view mode settings are completely broken, I don't think we should be adding to it here.

After thinking about it a bit more this event subscriber should really only be responsible for adding CDATA - any sanitization needs to be done further up the chain. In core is there anything other than Views that outputs RSS? Can we add a test that ensures XSS attempts via e.g. node body are correctly escaped by Twig/Views and that we don't need to filter here at all?

nicxvan’s picture

That might work, I don't know how we'd write that test though, but that would be a simpler change here.

I think the concern in the original issue was changing contexts, if only partially trusted users add the content then we could promote the content here.

But if we only wrap it I think that might work.

catch’s picture

I don't think we should be using text formats here, it makes the entire thing dependent on filter module. The whole event subscriber seems a bit wrong, which seems to be what @longwave is pointing to in #26 - e.g. remove the sanitization and make the it the responsibility of the code that's producing the RSS to sanitize it.

nicxvan changed the visibility of the branch 11.x to hidden.

nicxvan changed the visibility of the branch 3497758-missingTags to hidden.

nicxvan’s picture

Issue tags: -Needs tests +Security

I think this is ready for review. I added a test for the title, full html and plain text.

It's escaped for plain text and title and stripped for the full that I created.

I think this is ready for review, I tagged for security review.

smustgrave’s picture

Status: Needs review » Needs work

Left a comment on the MR.

Can issue summary be updated too as proposed solution doesn't seem to line up.

Question has it ever been discussed allowing a settings variable to be set for what's allowed? Not in scope of this but just curious

nicxvan’s picture

Issue summary: View changes

I responded to your comment with a question, I updated the proposed resolution.

I think the consensus is to not have filtering at this stage at all so a variable doesn't make sense here.

cmlara’s picture

Adjusting tag assigned in #32.

Security is for disclosing vulnerabilities that need fixing.

nicxvan’s picture

Thanks @cmlara!

nicxvan’s picture

Status: Needs work » Needs review
wim leers’s picture

Status: Needs review » Needs work

any sanitization needs to be done further up the chain

+1

In core is there anything other than Views that outputs RSS?

AFAICT any/all other RSS feed generation logic was removed in the Drupal 7-to-8 transition, where we embraced Views' ability to generate it in favor of custom logic.

Can we add a test that ensures XSS attempts via e.g. node body are correctly escaped by Twig/Views and that we don't need to filter here at all?

Such escaping is not the responsibility of Twig/Views, but of the text format author, unless I'm missing something? 🤔

The whole event subscriber seems a bit wrong, which seems to be what @longwave is pointing to in #26 - e.g. remove the sanitization and make the it the responsibility of the code that's producing the RSS to sanitize it.

This! The text format/filtering discussion is a red herring. I only raised it in #7 because that is the only way Drupal displays untrusted user input, unless the site is horrendously broken already.


The MR looks like it's on the right track, but I think the test coverage could be a lot clearer. Left suggestions to clarify it.

nicxvan’s picture

Thank you for your review!

I applied most suggestions. I had to revert most of them for the following reasons:

  • We reuse the field storage array multiple times
  • I think your computer made all of the arrows pretty font which broke the arrays
  • I had to add the save for the format so that the permission is created.

I added a div to the plaintext one to confirm it's being escaped and added the img to the trusted format.

I think that addresses all of your comments.

I'll go through and review after tests run to double check.

I did handle the substantive comments on the testing, so I think that's addressed, I updated the comments too.

nicxvan’s picture

Status: Needs work » Needs review

Ok to clarify too.

The body field tests that formats that support html tags get rendered in rss.
Plain text confirms that html they are escaped.

longwave’s picture

Status: Needs review » Reviewed & tested by the community
Issue tags: -Needs security review

The fix is trivial and the new test looks good to me and should ensure this feature won't regress in the future. Removing the security review tag. Thanks!

alexpott’s picture

I wonder if we should be concerned if there is another way of meeting the condition stripos($event->getResponse()->headers->get('Content-Type', ''), 'application/rss+xml') === TRUE and not having already filtered HTML here. Probably not from views… but what about contrib?

I think adding the CDATA sections in \Drupal\Core\EventSubscriber\RssResponseCdata means we become responsible for all ways that Drupal generates rss not just from Views and that makes this very tricky as we have to consider the security of custom and contrib code.

longwave’s picture

Wondering if we should drop the event subscriber entirely and try to do this in Views. https://www.rssboard.org/rss-encoding-examples documents the encoding.

In views-view-row-rss.html.twig can we just force Twig to double encode

  <description>{{ description|escape }}</description>

or just add CDATA there

  <description><![CDATA[{{ description }} ]]></description>

?

alexpott’s picture

Status: Reviewed & tested by the community » Needs work

I really like the idea of making this the responsibility of the thing constructing the output and not a blanket response subscriber where the security concerns are hard to impossible to consider all possibilities. Setting back to needs work to see if we can implement #43.

I think we'll also need to adjust views-view-rss.html.twig. Oh we don't need to do that because of

  // The RSS 2.0 "spec" doesn't indicate HTML can be used in the description.
  // We strip all HTML tags, but need to prevent double encoding from properly
  // escaped source data (such as &amp becoming &amp;amp;).
  $variables['description'] = Html::decodeEntities(strip_tags($style->getDescription()));

in \template_preprocess_views_view_rss()

Hmmm.... actually reading template_preprocess_views_view_rss()

  // The description is the only place where we should find HTML.
  // @see https://validator.w3.org/feed/docs/rss2.html#hrelementsOfLtitemgt
  // If we have a render array, render it here and pass the result to the
  // template, letting Twig autoescape it.
  if (isset($item->description) && is_array($item->description)) {
    $variables['description'] = (string) \Drupal::service('renderer')->render($item->description);
  }

I think the whole cdata approach we did can be removed. It's just not necessary for views in core. Anything in contrib that needed that should implement what views has.

nicxvan’s picture

That was attempted here: #3433-49: Use CDATA in XML RSS Feeds

Cdata gets stripped by Drupal\Core\EventSubscriber\RssResponseRelativeUrlFilter::transformRootRelativeUrlsToAbsolute()

catch’s picture

At this point should we revert #3433: Use CDATA in XML RSS Feeds, add the XSS test coverage in this issue, and then start again on #3433: Use CDATA in XML RSS Feeds?

alexpott’s picture

I agree with #46 I think we should revert the cdata change. I think we already were complying to the spec as per the escaping being done by

  // The description is the only place where we should find HTML.
  // @see https://validator.w3.org/feed/docs/rss2.html#hrelementsOfLtitemgt
  // If we have a render array, render it here and pass the result to the
  // template, letting Twig autoescape it.
  if (isset($item->description) && is_array($item->description)) {
    $variables['description'] = (string) \Drupal::service('renderer')->render($item->description);
  }

The comment in #3433-60: Use CDATA in XML RSS Feeds is not a justification for the change because we escape html in the description field. We need to test that XSS injections are filtered or double escaped.

alexpott’s picture

Status: Needs work » Needs review

I've added two MRs to this issue to do the revert - for 11.x and 10.x - we need a post update function because the container needs rebuilding due to the event registration.

alexpott changed the visibility of the branch 3497758-SubscriberFiltering to hidden.

catch’s picture

Should we also still add the additional test coverage from https://git.drupalcode.org/project/drupal/-/merge_requests/11023 here?

alexpott’s picture

@catch I think we should do that in a follow-up.

catch’s picture

Issue tags: +Needs followup

OK makes sense. Let's make sure we open the follow-up and transfer the test coverage over, moving to RTBC.

alexpott’s picture

Issue tags: -Needs followup
catch’s picture

Status: Needs review » Reviewed & tested by the community
catch’s picture

Status: Reviewed & tested by the community » Needs work

We can skip the post update because the container is cached by both Drupal version and deployment identifier so will automatically be invalidated by a core update.

Also makes this easier to backport to a patch release.

alexpott’s picture

Status: Needs work » Reviewed & tested by the community

Removed the update function. Back to RTBC as this is quite a simple change.

heikkiy’s picture

I would like to understand this change record a bit better. I was testing this with the core rss.xml.

We basically have previously implemented a preprocess where we add CDATA to certain RSS views and also alter the headers to use text/xml to pass RSS validation with CDATA.

After upgrading to 10.3 we started to see double encoding in the feed. I was testing this with the core rss.xml and I was seeing also the double encoding there. After investigation I noticed that core is now checking for the header to be application/rss+xml before it adds the CDATA wrapper to the description. So our problem was setting a wrong header type in custom functionality. Upgrading to 10.4 and removing our header alter seemed to fix the problem and it started to render HTML and CDATA correctly. But after I apply the patch from here the CDATA gets removed and it gives double encoding again.

So my question basically is that is it still possible to get CDATA in RSS views from #3433: Use CDATA in XML RSS Feeds or is this functionality going to be completely reverted from core? To me this issue title seems to say that certain HTML tags are being stripped away but when looking at the MR, it seems that the CDATA functionality is going to be removed and the recommended alternative is to use for example https://www.drupal.org/project/rss_trust_cdata?

alexpott’s picture

@heikkiy we're going to remove the cdata tags because they are not necessary. We're already escaping the html in RSS description fields which as per #43 is a supported way of embedding HTML in there. The whole cdata approach appears to not have been necessary. Do you have needs or information that says it is necessary?

catch’s picture

  • catch committed 77e42bc1 on 11.1.x
    Issue #3497758 by nicxvan, oily, alexpott, cilefen, catch, longwave,...

  • catch committed 5e9220f2 on 10.4.x
    Issue #3497758 by nicxvan, oily, alexpott, cilefen, catch, longwave,...

  • catch committed f637dda0 on 10.5.x
    Issue #3497758 by nicxvan, oily, alexpott, cilefen, catch, longwave,...
catch’s picture

Version: 11.x-dev » 10.4.x-dev
Status: Reviewed & tested by the community » Fixed

Committed/pushed to 11.x and 10.5.x and cherry-picked back to 11.1.x and 10.4.x, thanks!

@heikkiy this commit will restore the situation to how it was in 10.3.x, as if #3433: Use CDATA in XML RSS Feeds had never been committed.

  • catch committed e753ab59 on 11.x
    Issue #3497758 by nicxvan, oily, alexpott, cilefen, catch, longwave,...
heikkiy’s picture

@alexpott Not necessarily. I would like to perhaps just understand why the CDATA was deemed as a good solution in #3433 and then reverted back to the original version? I can understand the regression problem but was just wondering when a very old issue gets resolved and then it gets reverted. My original approach has been also that if you want to put HTML in your RSS feed that it should be inside CDATA but seems like encoding is fine.

I guess the change record still needs updating https://www.drupal.org/node/3504403 because it says:

The CDATA event now only wraps the HTML output in CDATA tags.

One thing I noticed is that when I was testing our current feed in 10.3 that when validating the feed I got a warning about drupal-media not being a valid tag. This might be because of our current custom preprocess and definitely out of scope for this issue but just something that caught my eye when testing.

alexpott’s picture

@heikkiy I think the premise of #3433: Use CDATA in XML RSS Feeds was probably valid when it was filed - back in 2003!!! - but the code in views and use of views for core's RSS feed came later and had an alternate fix - escaping the html. Unfortunately 3433 remained open and people continued to work on it without validating that the fix was still necessary although there definitely is discussion of this on the issue. Also it appears that what RSS validors say and RSS readers do is different. Note that https://www.rssboard.org/rss-encoding-examples is quite clear that escaped HTML tags in the description field are supposed to work and be respected and this is was working well prior to 10.4.0.

heikkiy’s picture

Thank you. That clears it a lot.

alexpott’s picture

I've rewritten the CR and published it and I've added a note to the original CR https://www.drupal.org/node/3440505 that it is no longer valid.

longwave’s picture

https://stackoverflow.com/questions/7220670/difference-between-descripti... has a summary of the problem, but there is no clear answer, because it depends on the reader. Some interpret <description> as plaintext, others as encoded HTML. Some readers also support <content:encoded> for HTML but this is not backward compatible with older readers that only support <description>. What a mess, and I don't think we can have a one-size-fits-all solution here.

longwave’s picture

FWIW Wordpress.com's RSS feed does use <content:encoded> for HTML and <description> for a plaintext summary *and* wraps everything in CDATA: https://wordpress.com/blog/feed/

Neither d.o or Wordpress.com's RSS validates 100%:

https://validator.w3.org/feed/check.cgi?url=https%3A%2F%2Fwww.drupal.org...
https://validator.w3.org/feed/check.cgi?url=https%3A%2F%2Fwordpress.com%...

heikkiy’s picture

Would it make sense to create a follow up ticket where this would be possible to be configured? Or perhaps a contrib module. I guess it's quite regular that people are using RSS feed to share content and now that Drupal CMS has already launched I think the need might be growing to be able to build RSS feeds easily for different use cases. Would be nice if you could also configure the <content:encoded> as an optional value if you know you are exposing the RSS feed to a known reader that can support it.

My original interpretation also was that CDATA is more supported and it was news to me that encoded is fine also. Live and learn.

And regarding @longwave WP example, it's interesting that iframe for example is not passing validation.

heikkiy’s picture

Answering myself but seems like https://www.drupal.org/project/views_rss does have the support for both description and content:encoded.

I quickly tested it with the core rss.xml view and it seems like it wraps both the description and content:encoded inside CDATA.

I think that might be a good option if you need more robust RSS support.

nicxvan’s picture

aaronrus’s picture

What is the solution? I updated from 10.3.x to 10.4.3 and now my xml feed used for a mail program no longer shows images. The img tags are being stripped out and it shows ![cdata[[

catch’s picture

@aaronrus this will get released in the next patch release of Drupal 11/10, which is likely to be first week of March (e.g. next week).

aaronrus’s picture

Ive been waiting for the next update with the fix but Ive not seen anything past 10.4.3 ready for production. My newsletter is down on my production site an I need a little help. What is the status of the fix being rolled out into the next production update? or what is the process of patching the existing 10.4.3 as Ive been unable to successfully apply the patch.

nicxvan’s picture

@aaronrus catch already mentioned the planned release is this week, I'm not sure if that is still the case, but you can patch your site using the instructions here: https://www.drupal.org/docs/develop/git/using-git-to-contribute-to-drupa...

longwave’s picture

https://www.drupal.org/project/drupal/releases/10.4.4 has been tagged and will be available in the next few minutes, including this fix. Patch releases are usually released on the first Wednesday of each month.

opsdemon’s picture

Upgraded today to drupal 10.4.4 and removed the workaround we've been using since January to fix the issues with our newsletters.

I tested the RSS feed with the new release and it seems to be working normally again.

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.