Problem/Motivation
After upgrading to Drupal 10.4.0, I found that our RSS feed doesn't render div, img, and style tags. Doing so breaks our RSS email structurally (div) and visually (img and style).
Steps to reproduce
Proposed resolution
Remove filtering
Add XSS test
Remaining tasks
User interface changes
Introduced terminology
API changes
Data model changes
Release notes snippet
| Comment | File | Size | Author |
|---|
Issue fork drupal-3497758
Show commands
Start within a Git clone of the project using the version control instructions.
Or, if you do not have SSH keys set up on git.drupalcode.org:
Comments
Comment #2
opsdemon commentedI'm having the same issue. Since 10.4.0 all div, style and img tags are getting stripped from our newsletter RSS feed and we ended up downgrading to 10.3 to get it working again.
Is there any workaround for this?
Comment #3
cilefen commentedIt's obviously a temporary measure, but you could apply the reverse of the feature patch.
git diff ebc8e0639e0e12391087d9abe0ad8eca0034d1ea ebc8e0639e0e12391087d9abe0ad8eca0034d1ea~1 >patches/3497758.patchI've attached that patch.
Comment #4
opsdemon commentedAfter some more investigation I found a workaround for the issue here: views_rss/issues/3079683#comment-15110037
Applying the fix to the .theme file fixed the issue and the newsletter formatting looks ok now with 10.4.1.
Comment #5
wlofgren commentedI ended up extending the RssResponseCdata class and register a service that decorates it in my module.
module_name.services.yml
src/EventSubscriber/RssResponseCdata.php
Comment #6
nicxvan commentedYou can also use the module posted at the end of the parent issue.
It's easy to override, but as you can imagine we couldn't allow all html by default for security.
At least we get valid rss feeds now, and if you need more you can decorate or override.
Do these issues normally get closed works as designed or left open for a couple of weeks for discussion?
Comment #7
wim leersI too am getting hit by this on my blog 😅 This explains why mysteriously I couldn't get feeds to work correctly after migrating from Drupal 7 to 10!
The change record states:
… but in my case, this is not data entered by an untrusted user. And isn't that quite often the case? It's content filtered through a text format, so any images that appear are explicitly allowed by the text format.
AFAICT this was not considered in #3433: Use CDATA in XML RSS Feeds 🫣
How often are RSS feeds created that expose data from untrusted user input (e.g. a feed of comments) vs that what is the very purpose of the site (blog, photos, products …)?
Comment #8
nicxvan commentedIt was considered, initially I wanted to use the admin filter and @alexpott recommended a more strict filter since we don't know if we can trust the content and site owners can override it since it's an event subscriber.
I'd be happy changing the core if it passes security muster or we can release a module that overrides it like referenced so it's more secure in core and more eyes open contrib if you want more permissive.
Comment #9
longwave@Dries also reported this over in #3433-118: Use CDATA in XML RSS Feeds.
If we add
<img>and<div>to the list of allowed tags, is that good enough as a stopgap? We can consider removing the filtering later but it feels like that needs more investigation.Comment #10
nicxvan commentedThat seems more reasonable to add, are there other tags we should add?
Comment #13
nicxvan commentedComment #14
opsdemon commentedIn my case, the style tag was also needed as there is no way to style our newsletter items at the other end in mailchimp.
Comment #15
dries commentedI believe allowing the
imganddivtags would resolve many of the issues, but not all of them.For example, on my site, videos are also being incorrectly filtered out. My implementation for embedding videos uses an
iframe, which is a common approach recommended by providers like YouTube (see Google's YouTube documentation). I believe that Drupal's oEmbed feature also relies oniframes, as this is the standard method for embedding external content.iframesare interesting because they can be used for embedding malicious content or phishing. But of course, so can theaorimgtags. In this case, theiframeis the result of Drupal replacing a token (e.g.[video ID]), hence the generated HTML should be safe and trustworthy.I tend to agree with Wim's comment in #7: why do we need additional filtering beyond what already passes Drupal's text formatters and output filters? Don't they do the job of balancing flexibility and security?
It seems unnecessary to apply additional filtering to RSS feeds beyond what is used for generating regular HTML pages (to anonymous users). Instead, the solution could be to rely on the fields' existing text formatters and output filters.
If there are specific security concerns beyond that, it would be helpful to have those clarified or documented. Until then, I don't understand why RSS feeds need stricter filtering. Maybe we believe that RSS readers don't sanitize or inspect HTML the same way browsers do?
Comment #16
nicxvan commentedI think there are some alternatives:
The first option is the direction I took in the original issue, that was considered too risky, do we want to add a form / permission and configuration to allow overriding this? Or does that belong in contrib?
Comment #17
dries commentedWhy not the following option: 'Use Drupal's text formatters and output filters already configured for the fields'?
Comment #18
nicxvan commentedWhen that event fires we don't necessarily know the source, I bet we could let you choose a text formatter to use then you could configure it however you wanted. There are still concerns there if you have untrusted content in the RSS feed and you use a text format that is more trusted.
That might work though.
Comment #19
nicxvan commentedI'm trying something.
Comment #20
nicxvan commentedOk so this probably needs tests.
Also a review of the following specifics.
Confirmation that the text format loading is correct.
Confirmation that renderInIsolation is correct.
@alexpott had security concerns on the original issue so we may want him to take a look here too.
Comment #21
nicxvan commentedComment #22
nicxvan commentedTests are failing, but a review of the process would be helpful before taking a look at those.
Comment #24
oily commentedEdited MR, PHPSTAN errors x2
Comment #25
nicxvan commented@oily would you mind reverting your changes?
We likely need that additional mock, phpstan was failing because I forgot the use statement and it needs some more work, but it's needed so we can provide the filter format which this is adding.
It's also probably in the wrong place, but as I mentioned I'm hoping for a review of the approach.
Comment #26
longwaveWe are trying to remove the global RSS settings form over in #2601030: Views RSS view mode settings are completely broken, I don't think we should be adding to it here.
After thinking about it a bit more this event subscriber should really only be responsible for adding CDATA - any sanitization needs to be done further up the chain. In core is there anything other than Views that outputs RSS? Can we add a test that ensures XSS attempts via e.g. node body are correctly escaped by Twig/Views and that we don't need to filter here at all?
Comment #27
nicxvan commentedThat might work, I don't know how we'd write that test though, but that would be a simpler change here.
I think the concern in the original issue was changing contexts, if only partially trusted users add the content then we could promote the content here.
But if we only wrap it I think that might work.
Comment #28
catchI don't think we should be using text formats here, it makes the entire thing dependent on filter module. The whole event subscriber seems a bit wrong, which seems to be what @longwave is pointing to in #26 - e.g. remove the sanitization and make the it the responsibility of the code that's producing the RSS to sanitize it.
Comment #32
nicxvan commentedI think this is ready for review. I added a test for the title, full html and plain text.
It's escaped for plain text and title and stripped for the full that I created.
I think this is ready for review, I tagged for security review.
Comment #33
smustgrave commentedLeft a comment on the MR.
Can issue summary be updated too as proposed solution doesn't seem to line up.
Question has it ever been discussed allowing a settings variable to be set for what's allowed? Not in scope of this but just curious
Comment #34
nicxvan commentedI responded to your comment with a question, I updated the proposed resolution.
I think the consensus is to not have filtering at this stage at all so a variable doesn't make sense here.
Comment #35
cmlaraAdjusting tag assigned in #32.
Security is for disclosing vulnerabilities that need fixing.
Comment #36
nicxvan commentedThanks @cmlara!
Comment #37
nicxvan commentedComment #38
wim leers+1
AFAICT any/all other RSS feed generation logic was removed in the Drupal 7-to-8 transition, where we embraced Views' ability to generate it in favor of custom logic.
Such escaping is not the responsibility of Twig/Views, but of the text format author, unless I'm missing something? 🤔
This! The text format/filtering discussion is a red herring. I only raised it in #7 because that is the only way Drupal displays untrusted user input, unless the site is horrendously broken already.
The MR looks like it's on the right track, but I think the test coverage could be a lot clearer. Left suggestions to clarify it.
Comment #39
nicxvan commentedThank you for your review!
I applied most suggestions. I had to revert most of them for the following reasons:
I added a div to the plaintext one to confirm it's being escaped and added the img to the trusted format.
I think that addresses all of your comments.
I'll go through and review after tests run to double check.
I did handle the substantive comments on the testing, so I think that's addressed, I updated the comments too.
Comment #40
nicxvan commentedOk to clarify too.
The body field tests that formats that support html tags get rendered in rss.
Plain text confirms that html they are escaped.
Comment #41
longwaveThe fix is trivial and the new test looks good to me and should ensure this feature won't regress in the future. Removing the security review tag. Thanks!
Comment #42
alexpottI wonder if we should be concerned if there is another way of meeting the condition
stripos($event->getResponse()->headers->get('Content-Type', ''), 'application/rss+xml') === TRUEand not having already filtered HTML here. Probably not from views… but what about contrib?I think adding the CDATA sections in \Drupal\Core\EventSubscriber\RssResponseCdata means we become responsible for all ways that Drupal generates rss not just from Views and that makes this very tricky as we have to consider the security of custom and contrib code.
Comment #43
longwaveWondering if we should drop the event subscriber entirely and try to do this in Views. https://www.rssboard.org/rss-encoding-examples documents the encoding.
In views-view-row-rss.html.twig can we just force Twig to double encode
or just add CDATA there
?
Comment #44
alexpottI really like the idea of making this the responsibility of the thing constructing the output and not a blanket response subscriber where the security concerns are hard to impossible to consider all possibilities. Setting back to needs work to see if we can implement #43.
I think we'll also need to adjust views-view-rss.html.twig. Oh we don't need to do that because of
in \template_preprocess_views_view_rss()
Hmmm.... actually reading template_preprocess_views_view_rss()
I think the whole cdata approach we did can be removed. It's just not necessary for views in core. Anything in contrib that needed that should implement what views has.
Comment #45
nicxvan commentedThat was attempted here: #3433-49: Use CDATA in XML RSS Feeds
Cdata gets stripped by Drupal\Core\EventSubscriber\RssResponseRelativeUrlFilter::transformRootRelativeUrlsToAbsolute()
Comment #46
catchAt this point should we revert #3433: Use CDATA in XML RSS Feeds, add the XSS test coverage in this issue, and then start again on #3433: Use CDATA in XML RSS Feeds?
Comment #47
alexpottI agree with #46 I think we should revert the cdata change. I think we already were complying to the spec as per the escaping being done by
The comment in #3433-60: Use CDATA in XML RSS Feeds is not a justification for the change because we escape html in the description field. We need to test that XSS injections are filtered or double escaped.
Comment #50
alexpottI've added two MRs to this issue to do the revert - for 11.x and 10.x - we need a post update function because the container needs rebuilding due to the event registration.
Comment #52
catchShould we also still add the additional test coverage from https://git.drupalcode.org/project/drupal/-/merge_requests/11023 here?
Comment #53
alexpott@catch I think we should do that in a follow-up.
Comment #54
catchOK makes sense. Let's make sure we open the follow-up and transfer the test coverage over, moving to RTBC.
Comment #55
alexpottCreated the follow-up - #3504803: Add XSS testing for RSS descriptions
Comment #56
catchComment #57
catchWe can skip the post update because the container is cached by both Drupal version and deployment identifier so will automatically be invalidated by a core update.
Also makes this easier to backport to a patch release.
Comment #58
alexpottRemoved the update function. Back to RTBC as this is quite a simple change.
Comment #59
heikkiy commentedI would like to understand this change record a bit better. I was testing this with the core rss.xml.
We basically have previously implemented a preprocess where we add CDATA to certain RSS views and also alter the headers to use text/xml to pass RSS validation with CDATA.
After upgrading to 10.3 we started to see double encoding in the feed. I was testing this with the core rss.xml and I was seeing also the double encoding there. After investigation I noticed that core is now checking for the header to be application/rss+xml before it adds the CDATA wrapper to the description. So our problem was setting a wrong header type in custom functionality. Upgrading to 10.4 and removing our header alter seemed to fix the problem and it started to render HTML and CDATA correctly. But after I apply the patch from here the CDATA gets removed and it gives double encoding again.
So my question basically is that is it still possible to get CDATA in RSS views from #3433: Use CDATA in XML RSS Feeds or is this functionality going to be completely reverted from core? To me this issue title seems to say that certain HTML tags are being stripped away but when looking at the MR, it seems that the CDATA functionality is going to be removed and the recommended alternative is to use for example https://www.drupal.org/project/rss_trust_cdata?
Comment #60
alexpott@heikkiy we're going to remove the cdata tags because they are not necessary. We're already escaping the html in RSS description fields which as per #43 is a supported way of embedding HTML in there. The whole cdata approach appears to not have been necessary. Do you have needs or information that says it is necessary?
Comment #61
catchComment #65
catchCommitted/pushed to 11.x and 10.5.x and cherry-picked back to 11.1.x and 10.4.x, thanks!
@heikkiy this commit will restore the situation to how it was in 10.3.x, as if #3433: Use CDATA in XML RSS Feeds had never been committed.
Comment #67
heikkiy commented@alexpott Not necessarily. I would like to perhaps just understand why the CDATA was deemed as a good solution in #3433 and then reverted back to the original version? I can understand the regression problem but was just wondering when a very old issue gets resolved and then it gets reverted. My original approach has been also that if you want to put HTML in your RSS feed that it should be inside CDATA but seems like encoding is fine.
I guess the change record still needs updating https://www.drupal.org/node/3504403 because it says:
One thing I noticed is that when I was testing our current feed in 10.3 that when validating the feed I got a warning about drupal-media not being a valid tag. This might be because of our current custom preprocess and definitely out of scope for this issue but just something that caught my eye when testing.
Comment #68
alexpott@heikkiy I think the premise of #3433: Use CDATA in XML RSS Feeds was probably valid when it was filed - back in 2003!!! - but the code in views and use of views for core's RSS feed came later and had an alternate fix - escaping the html. Unfortunately 3433 remained open and people continued to work on it without validating that the fix was still necessary although there definitely is discussion of this on the issue. Also it appears that what RSS validors say and RSS readers do is different. Note that https://www.rssboard.org/rss-encoding-examples is quite clear that escaped HTML tags in the description field are supposed to work and be respected and this is was working well prior to 10.4.0.
Comment #69
heikkiy commentedThank you. That clears it a lot.
Comment #70
alexpottI've rewritten the CR and published it and I've added a note to the original CR https://www.drupal.org/node/3440505 that it is no longer valid.
Comment #71
longwavehttps://stackoverflow.com/questions/7220670/difference-between-descripti... has a summary of the problem, but there is no clear answer, because it depends on the reader. Some interpret
<description>as plaintext, others as encoded HTML. Some readers also support<content:encoded>for HTML but this is not backward compatible with older readers that only support<description>. What a mess, and I don't think we can have a one-size-fits-all solution here.Comment #72
longwaveFWIW Wordpress.com's RSS feed does use
<content:encoded>for HTML and<description>for a plaintext summary *and* wraps everything in CDATA: https://wordpress.com/blog/feed/Neither d.o or Wordpress.com's RSS validates 100%:
https://validator.w3.org/feed/check.cgi?url=https%3A%2F%2Fwww.drupal.org...
https://validator.w3.org/feed/check.cgi?url=https%3A%2F%2Fwordpress.com%...
Comment #73
heikkiy commentedWould it make sense to create a follow up ticket where this would be possible to be configured? Or perhaps a contrib module. I guess it's quite regular that people are using RSS feed to share content and now that Drupal CMS has already launched I think the need might be growing to be able to build RSS feeds easily for different use cases. Would be nice if you could also configure the
<content:encoded>as an optional value if you know you are exposing the RSS feed to a known reader that can support it.My original interpretation also was that CDATA is more supported and it was news to me that encoded is fine also. Live and learn.
And regarding @longwave WP example, it's interesting that iframe for example is not passing validation.
Comment #74
heikkiy commentedAnswering myself but seems like https://www.drupal.org/project/views_rss does have the support for both
descriptionandcontent:encoded.I quickly tested it with the core rss.xml view and it seems like it wraps both the
descriptionandcontent:encodedinside CDATA.I think that might be a good option if you need more robust RSS support.
Comment #75
nicxvan commentedComment #76
aaronrus commentedWhat is the solution? I updated from 10.3.x to 10.4.3 and now my xml feed used for a mail program no longer shows images. The img tags are being stripped out and it shows ![cdata[[
Comment #77
catch@aaronrus this will get released in the next patch release of Drupal 11/10, which is likely to be first week of March (e.g. next week).
Comment #78
aaronrus commentedIve been waiting for the next update with the fix but Ive not seen anything past 10.4.3 ready for production. My newsletter is down on my production site an I need a little help. What is the status of the fix being rolled out into the next production update? or what is the process of patching the existing 10.4.3 as Ive been unable to successfully apply the patch.
Comment #79
nicxvan commented@aaronrus catch already mentioned the planned release is this week, I'm not sure if that is still the case, but you can patch your site using the instructions here: https://www.drupal.org/docs/develop/git/using-git-to-contribute-to-drupa...
Comment #80
longwavehttps://www.drupal.org/project/drupal/releases/10.4.4 has been tagged and will be available in the next few minutes, including this fix. Patch releases are usually released on the first Wednesday of each month.
Comment #81
opsdemon commentedUpgraded today to drupal 10.4.4 and removed the workaround we've been using since January to fix the issues with our newsletters.
I tested the RSS feed with the new release and it seems to be working normally again.