Problem/Motivation
Inline images with data uri scheme becomes malformed after Xss filtering.
Steps to reproduce
You do not need to configure a filter format with filter_html filter in use, just pass the following string to Xss:filter() directly.
<img src="data:image/gif;base64,R0lGODlhEAAOALMAAOazToeHh0tLS/7LZv/0jvb29t/f3//Ub/
/ge8WSLf/rhf/3kdbW1mxsbP//mf///yH5BAAAAAAALAAAAAAQAA4AAARe8L1Ekyky67QZ1hLnjM5UUde0ECwLJoExKcpp
V0aCcGCmTIHEIUEqjgaORCMxIC6e0CcguWw6aFjsVMkkIr7g77ZKPJjPZqIyd7sJAgVGoEGv2xsBxqNgYPj/gAwXEQA7"
width="16" height="14" alt="embedded folder icon">
(Source: https://www.websiteoptimization.com/speed/tweak/inline-images/)
The output is:
<img src="image/gif;base64,R0lGODlhEAAOALMAAOazToeHh0tLS/7LZv/0jvb29t/f3//Ub/
/ge8WSLf/rhf/3kdbW1mxsbP//mf///yH5BAAAAAAALAAAAAAQAA4AAARe8L1Ekyky67QZ1hLnjM5UUde0ECwLJoExKcpp
V0aCcGCmTIHEIUEqjgaORCMxIC6e0CcguWw6aFjsVMkkIr7g77ZKPJjPZqIyd7sJAgVGoEGv2xsBxqNgYPj/gAwXEQA7"
width="16" height="14" alt="embedded folder icon">
Proposed resolution
Follow Mozilla's guidelines and support inline images with “data:image/*” unless it’s “data:image/svg+xml”.
https://blog.mozilla.org/security/2017/11/27/blocking-top-level-navigati...
Related RFC: https://www.rfc-editor.org/rfc/rfc2397
Remaining tasks
User interface changes
API changes
Data model changes
Release notes snippet
Comments
Comment #3
mxr576Comment #4
longwaveAlways wary of changing this code but this looks like a sensible and self-contained change, however I added a question to the MR about applying the new condition.
Comment #5
longwaveOut of scope for this, but I wonder if there is a way of reusing the Masterminds HTML5 parser in the XSS filter instead of having separate code that attempts to parse HTML attributes.
Comment #6
longwaveAdded lots more comments that I thought of, mostly around the regex, as these can be brittle and there might be some sneaky ways around it.
Comment #7
ghost of drupal past@longwave Replacing this code with something more modern would surely be welcome. When I ported KSES to become filter_xss for Drupal 4.5.6/4.6.4 WhatWG was but a year old and HTML5 didn't exist for another two years yet. And as far as I can see, this code didn't change at all in the near two decades since.
Comment #8
dalinI recommend we go the other way and use a whitelist of known safe mimetypes. Who knows what the future brings. Or if I intentionally mangle the format to be something like "data:image/_svg" can I get the web browser to still render this as an svg?
Comment #9
longwave#8 seems like a good idea over having a regex that contains exceptions.
Comment #10
mxr576Thanks for all the valuable feedback, everyone!
Well, should not this be the concern of browsers rather than us? I used the Mozilla blog post as a source because I found no other, but if anybody knows what the current logic is that Firefox/Chrome/Safari or others use to decide safe data URI-s for images, please share.
Yeah.. as part of my investigation, I also concluded that this code is so old... and it made me wonder if and when we should adopt symfony/html-sanitizer, was that already considered? (Again, Drupal core issue search yielded no result.)
Does anybody can come up with a sane default list? I generally like and prefer the whitelisting-based solution, but I wonder (again) if browsers are doing the same nowadays; based on this older Mozilla article, they are not.
Comment #11
mxr576Comment #12
mxr576Maybe this MDN doc could provide us with a sane default list?
https://developer.mozilla.org/en-US/docs/Web/HTML/Element/img#supported_...
If this is the direction we should take, should we allow overriding this list with a new parameter in
\Drupal\Component\Utility\Xss::filter()?Comment #13
mxr576Opened #3413706: Adopt Symfony's HTML Sanitizer as a potential follow up.
Comment #14
joseph.olstadWe're still using D10.2.x in a project and need to roll the MR for 10.2.x, I'll give it a go!
Nevermind, it rolls cleanly on 10.2.x, trying it now!
Comment #15
joseph.olstadBefore patching:
Broken image in rich_text filtered content when all attributes are allowed for the img element.

broken image in formatted text field with filtered mode enabled:
After patching and rebuilding caches:
Image is displayed as expected.

Thank you for this fix, it is an important one!
Comment #16
alexpottOne problem with the current MR is that it only accounts for
src="stuff"and notsrc='stuff'andsrc=stuffnospaceswhich are what the two following blocks of code account for.Also the discussion seems to be to change the list to an allow list rather than and exception list but it got rtbc'd in the current state.
Comment #17
alexpottInterestingly Symfony's sanitizer allows data on all media urls see https://github.com/symfony/html-sanitizer/blob/a25620fc6407e14331f3c0c56... and doesn't do anything special for SVGs. I tried inlining an SVG with JS using a data url and the JS is not executed.
Comment #18
tr commentedI assume you are referring to #8 by @dalin:
So let me summarize:
Right now, there are NO inline image formats supported. That is not acceptable. For things like email we NEED this. One of my modules, https://www.drupal.org/project/barcodes, also needs this.
Whitelists don't scale well. There are dozens of image formats, and more are being added all the time. If we use a whitelist we will always be out of date and there will constantly be feature requests to add new formats.
BUT, as a way forward, how about we add some common, known formats right now? To cover the most common use cases. Trying to come up with a comprehensive list is a futile exercise which will delay this capability and won't prevent us from getting all those feature requests for additional formats.
I suggest:
And if we're going to do a whitelist, perhaps it should be done in
UrlHelper::filterBadProtocol()?Comment #19
tr commented@alexpott: Please clarify the direction you want to take on this one.
I asked "as a way forward, how about we add some common, known formats right now?"
Is this acceptable as an important fix/improvement? Or do we have to continue to suffer with this bug pending a complete examination and re-write of how we do protocol filtering throughout core?
Right now this is going nowhere, because it's not clear where you want to go with this.
I'm happy to re-do the MR along the lines I proposed in #18 if there is some guidance here as to what stands a chance of getting accepted.
Comment #20
smustgrave commentedLeft small comments on the MR.
If you are another contributor eager to jump in, please allow the previous poster(s) at least 48 hours to respond to feedback first, so they have the opportunity to finish what they started!
Comment #21
ptaff commented@smustgrave comment #20 proposed modifications make sense.
I'm unfamiliar with the Drupal development workflow and I just cannot figure out how to commit to the issue branch; all the docs I can find create new issue branches.
Hence, here is a patch implementing comment #20 changes.
Please consider porting this fix to Drupal 10 too as it has the same issue and the same modifications do work.
Comment #23
prudloff commentedThis looks dangerous to me because:
<img src="data:image/svg+ xml;works in most browser.So I agree with #8, an allow list would be more secure.
I also think we should use a properly tested URI parser instead of a regex.