Problem/Motivation

Inline images with data uri scheme becomes malformed after Xss filtering.

Steps to reproduce

You do not need to configure a filter format with filter_html filter in use, just pass the following string to Xss:filter() directly.

<img src="data:image/gif;base64,R0lGODlhEAAOALMAAOazToeHh0tLS/7LZv/0jvb29t/f3//Ub/
/ge8WSLf/rhf/3kdbW1mxsbP//mf///yH5BAAAAAAALAAAAAAQAA4AAARe8L1Ekyky67QZ1hLnjM5UUde0ECwLJoExKcpp
V0aCcGCmTIHEIUEqjgaORCMxIC6e0CcguWw6aFjsVMkkIr7g77ZKPJjPZqIyd7sJAgVGoEGv2xsBxqNgYPj/gAwXEQA7"
width="16" height="14" alt="embedded folder icon">

(Source: https://www.websiteoptimization.com/speed/tweak/inline-images/)

The output is:

<img src="image/gif;base64,R0lGODlhEAAOALMAAOazToeHh0tLS/7LZv/0jvb29t/f3//Ub/
/ge8WSLf/rhf/3kdbW1mxsbP//mf///yH5BAAAAAAALAAAAAAQAA4AAARe8L1Ekyky67QZ1hLnjM5UUde0ECwLJoExKcpp
V0aCcGCmTIHEIUEqjgaORCMxIC6e0CcguWw6aFjsVMkkIr7g77ZKPJjPZqIyd7sJAgVGoEGv2xsBxqNgYPj/gAwXEQA7"
width="16" height="14" alt="embedded folder icon">

Proposed resolution

Follow Mozilla's guidelines and support inline images with “data:image/*” unless it’s “data:image/svg+xml”.

https://blog.mozilla.org/security/2017/11/27/blocking-top-level-navigati...

Related RFC: https://www.rfc-editor.org/rfc/rfc2397

Remaining tasks

User interface changes

API changes

Data model changes

Release notes snippet

CommentFileSizeAuthor
#21 xss-make-attributes-element-optional.patch853 bytesptaff

Issue fork drupal-3413396

Command icon Show commands

Start within a Git clone of the project using the version control instructions.

Or, if you do not have SSH keys set up on git.drupalcode.org:

Comments

mxr576 created an issue. See original summary.

mxr576’s picture

Status: Active » Needs review
longwave’s picture

Always wary of changing this code but this looks like a sensible and self-contained change, however I added a question to the MR about applying the new condition.

longwave’s picture

Out of scope for this, but I wonder if there is a way of reusing the Masterminds HTML5 parser in the XSS filter instead of having separate code that attempts to parse HTML attributes.

longwave’s picture

Status: Needs review » Needs work

Added lots more comments that I thought of, mostly around the regex, as these can be brittle and there might be some sneaky ways around it.

ghost of drupal past’s picture

@longwave Replacing this code with something more modern would surely be welcome. When I ported KSES to become filter_xss for Drupal 4.5.6/4.6.4 WhatWG was but a year old and HTML5 didn't exist for another two years yet. And as far as I can see, this code didn't change at all in the near two decades since.

dalin’s picture

Follow Mozilla's guidelines and support inline images with “data:image/*” unless it’s “data:image/svg+xml”.

I recommend we go the other way and use a whitelist of known safe mimetypes. Who knows what the future brings. Or if I intentionally mangle the format to be something like "data:image/_svg" can I get the web browser to still render this as an svg?

longwave’s picture

#8 seems like a good idea over having a regex that contains exceptions.

mxr576’s picture

Thanks for all the valuable feedback, everyone!

Or if I intentionally mangle the format to be something like "data:image/_svg" can I get the web browser to still render this as an svg?

Well, should not this be the concern of browsers rather than us? I used the Mozilla blog post as a source because I found no other, but if anybody knows what the current logic is that Firefox/Chrome/Safari or others use to decide safe data URI-s for images, please share.

@longwave Replacing this code with something more modern would surely be welcome. When I ported KSES to become filter_xss for Drupal 4.5.6/4.6.4 WhatWG was but a year old and HTML5 didn't exist for another two years yet. And as far as I can see, this code didn't change at all in the near two decades since.

Yeah.. as part of my investigation, I also concluded that this code is so old... and it made me wonder if and when we should adopt symfony/html-sanitizer, was that already considered? (Again, Drupal core issue search yielded no result.)

I recommend we go the other way and use a whitelist of known safe mimetypes.

Does anybody can come up with a sane default list? I generally like and prefer the whitelisting-based solution, but I wonder (again) if browsers are doing the same nowadays; based on this older Mozilla article, they are not.

mxr576’s picture

Issue summary: View changes
mxr576’s picture

I recommend we go the other way and use a whitelist of known safe mimetypes.

Does anybody can come up with a sane default list? I generally like and prefer the whitelisting-based solution, but I wonder (again) if browsers are doing the same nowadays; based on this older Mozilla article, they are not.

Maybe this MDN doc could provide us with a sane default list?
https://developer.mozilla.org/en-US/docs/Web/HTML/Element/img#supported_...

If this is the direction we should take, should we allow overriding this list with a new parameter in \Drupal\Component\Utility\Xss::filter()?

mxr576’s picture

Opened #3413706: Adopt Symfony's HTML Sanitizer as a potential follow up.

joseph.olstad’s picture

We're still using D10.2.x in a project and need to roll the MR for 10.2.x, I'll give it a go!
Nevermind, it rolls cleanly on 10.2.x, trying it now!

joseph.olstad’s picture

Status: Needs work » Reviewed & tested by the community


Before patching:

Broken image in rich_text filtered content when all attributes are allowed for the img element.
broken image in formatted text field with filtered mode enabled:
Only local images are allowed.

After patching and rebuilding caches:

Image is displayed as expected.
Example of an image loading.

Thank you for this fix, it is an important one!

alexpott’s picture

Status: Reviewed & tested by the community » Needs work

One problem with the current MR is that it only accounts for src="stuff" and not src='stuff' and src=stuffnospaces which are what the two following blocks of code account for.

Also the discussion seems to be to change the list to an allow list rather than and exception list but it got rtbc'd in the current state.

alexpott’s picture

Interestingly Symfony's sanitizer allows data on all media urls see https://github.com/symfony/html-sanitizer/blob/a25620fc6407e14331f3c0c56... and doesn't do anything special for SVGs. I tried inlining an SVG with JS using a data url and the JS is not executed.

tr’s picture

Also the discussion seems to be to change the list to an allow list rather than and exception list but it got rtbc'd in the current state.

I assume you are referring to #8 by @dalin:

Follow Mozilla's guidelines and support inline images with “data:image/*” unless it’s “data:image/svg+xml”.

I recommend we go the other way and use a whitelist of known safe mimetypes. Who knows what the future brings. Or if I intentionally mangle the format to be something like "data:image/_svg" can I get the web browser to still render this as an svg?

So let me summarize:

Right now, there are NO inline image formats supported. That is not acceptable. For things like email we NEED this. One of my modules, https://www.drupal.org/project/barcodes, also needs this.

Whitelists don't scale well. There are dozens of image formats, and more are being added all the time. If we use a whitelist we will always be out of date and there will constantly be feature requests to add new formats.

BUT, as a way forward, how about we add some common, known formats right now? To cover the most common use cases. Trying to come up with a comprehensive list is a futile exercise which will delay this capability and won't prevent us from getting all those feature requests for additional formats.

I suggest:

image/gif
image/jpeg
image/png
image/webp

And if we're going to do a whitelist, perhaps it should be done in UrlHelper::filterBadProtocol()?

tr’s picture

Status: Needs work » Needs review

@alexpott: Please clarify the direction you want to take on this one.

I asked "as a way forward, how about we add some common, known formats right now?"
Is this acceptable as an important fix/improvement? Or do we have to continue to suffer with this bug pending a complete examination and re-write of how we do protocol filtering throughout core?

Right now this is going nowhere, because it's not clear where you want to go with this.

I'm happy to re-do the MR along the lines I proposed in #18 if there is some guidance here as to what stands a chance of getting accepted.

smustgrave’s picture

Status: Needs review » Needs work

Left small comments on the MR.

If you are another contributor eager to jump in, please allow the previous poster(s) at least 48 hours to respond to feedback first, so they have the opportunity to finish what they started!

ptaff’s picture

StatusFileSize
new853 bytes

@smustgrave comment #20 proposed modifications make sense.

I'm unfamiliar with the Drupal development workflow and I just cannot figure out how to commit to the issue branch; all the docs I can find create new issue branches.

Hence, here is a patch implementing comment #20 changes.

Please consider porting this fix to Drupal 10 too as it has the same issue and the same modifications do work.

Version: 11.x-dev » main

Drupal core is now using the main branch as the primary development branch. New developments and disruptive changes should now be targeted to the main branch.

Read more in the announcement.

prudloff’s picture

if ($element === 'img' && preg_match('/^data:image\/(?!svg\+xml;base64,)[^;]+;base64,/i', $match[1]) === 1) {

This looks dangerous to me because:

  • Browsers are very forgiving with syntax so it is hard to cover every syntax variant with a regex. For example <img src="data:image/svg+ xml; works in most browser.
  • Other dangerous image formats could emerge in the future.

So I agree with #8, an allow list would be more secure.
I also think we should use a properly tested URI parser instead of a regex.