Problem/Motivation

The current filtering in \Drupal\Component\Utility\Xss::filter() and \Drupal\Component\Utility\Xss::attributes() has 2 issues related to how we handle data-* attributes:

XSS attack vector (confirmed on 8.0.0-beta9).

Image captions and other data-* attributes are stored HTML-encoded. This causes them to bypass our current XSS filtering logic. When viewing content in the editor, the data-attribute's content is decoded and could be evaluated, leading to an XSS vulnerability when editing content.

To reproduce, create a node as a regular user, add an image with caption "<script>alert('xss');</script>" and save the node. Edit the node as editor, and the script will be executed when the CKEditor-interface is loaded.

Upgrading to critical because of this vulnerability.

Captions or data-attribute values are mangled.

When adding captions to images through the editor, content is stripped incorrectly. Some examples:

Actual caption in editor Rendered caption
(http://drupal.org) //drupal.org)
Source:reuters reuters
Example use of the Scope Resolution Operator (::) :)

(more examples and screenshots in the first 20 comments of this issue).

Proposed resolution

The initial approach of altering stripDangerousProtocols in #10 was rejected because it proved too difficult and error-prone.
In #22, chx suggested to introduce the concept of whitelisted "harmless"-attributes (such as data-caption), for which no bad protocol filtering is executed. This approach was signed off on by Damien Tournoud in #32
Additionally, encoded data-attributes should be filtered for XSS to prevent attacks through the editor.

Remaining tasks

  • Fix remaining test failure

User interface changes

None. (expect removing the XSS vulnerability on the editor interface).

API changes

None.

Original issue

tested with Last packaged version: 8.0-alpha3+423-dev
if you put a http:// on the caption this strip something else ..i mean this is not wanted..so bug report

caption bug if you write a domain

CommentFileSizeAuthor
#111 drupal-no_protocol_filter-2105841-111-D7.patch6.33 KBLiam Morland
#97 interdiff-2105841-90-97.txt1.74 KBDevin Carlson
#97 do-2105841_no_protocol_filter-97.patch6.33 KBDevin Carlson
#97 do-2105841_no_protocol_filter-97-tests-only.patch2.5 KBDevin Carlson
#94 filter_php_5.2.17.png163.65 KBDevin Carlson
#90 do-2105841_no_protocol_filter-90.patch6.08 KBDevin Carlson
#90 do-2105841_no_protocol_filter-90-tests-only.patch2.5 KBDevin Carlson
#85 do-2105841_no_protocol_filter-84.patch3.78 KBHeine
#84 do-2105841_no_scheme_filter_on_allowed_attributes.patch3.79 KBHeine
#84 do-2105841_no_scheme_filter_on_allowed_attributes.patch3.79 KBHeine
#83 do-2105841_no_scheme_filter_on_allowed_attributes.patch3.82 KBHeine
#75 filter_html-2105841-75.patch12.79 KBmr.baileys
#75 interdiff-71-75.txt3.21 KBmr.baileys
#71 filter_html-2105841-71.patch12.08 KBmr.baileys
#71 interdiff-67-71.txt1.63 KBmr.baileys
#68 interdiff-61-67.txt7.85 KBmr.baileys
#68 filter_html-2105841-67.patch12.03 KBmr.baileys
#61 interdiff-56-61.txt2.22 KBmr.baileys
#61 filter_html-2105841-61.patch11.72 KBmr.baileys
#57 inderdiff-52-56.txt1.21 KBmr.baileys
#56 filter_html-2105841-56.patch11.65 KBmr.baileys
#56 filter_html-2105841-56.patch11.65 KBmr.baileys
#52 filter_html-2105841-52.patch10.45 KBsanduhrs
#51 filter_html-2105841-51.patch10.45 KBsanduhrs
#50 filter_html-2105841-50.patch10.42 KBmr.baileys
#50 interdiff-47-50.txt2.03 KBmr.baileys
#47 filter_html-2105841-47.patch10.47 KBmr.baileys
#47 interdiff-45-47.txt1.83 KBmr.baileys
#45 interdiff-42-45.txt5.18 KBmr.baileys
#45 filter_html-2105841-45.patch10.34 KBmr.baileys
#42 filter_html-2105841-42.patch9.89 KBmr.baileys
#42 2105841-40-42-interdiff.txt4.47 KBmr.baileys
#40 interdiff-38-40.txt861 bytescs_shadow
#40 filter_html-2105841-40.patch8.96 KBcs_shadow
#38 interdiff-35-38.txt6.79 KBcs_shadow
#38 filter_html-2105841-38.patch8.98 KBcs_shadow
#35 2105841_35.patch3.89 KBwebflo
#22 2105841_22.patch2.27 KBchx
#10 interdiff.txt21.02 KBWim Leers
#10 url_in_attribute_broken-2105841-10.patch20.45 KBWim Leers
#9 kitten3.jpg30.39 KBmr.baileys
#9 kitten2.jpg50.16 KBmr.baileys
#9 kitten1.jpg33.58 KBmr.baileys
#8 url_in_attribute_broken-2105841-8.patch6.13 KBWim Leers
#2 url_in_attribute_broken-2105841-2.patch1.13 KBWim Leers
#2 url-in-data-attribute-BEFORE.png20.26 KBWim Leers
#2 url-in-data-attribute-AFTER.png20.25 KBWim Leers
caption-bug-drupal8.jpg25.88 KBeule
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

eule’s picture

Title: small bug in capation » small bug in caption
Wim Leers’s picture

Title: small bug in caption » filter_html (Xss::filter()) breaks URLs in image captions
Component: editor.module » filter.module
Assigned: Unassigned » Wim Leers
Status: Active » Needs review
Issue tags: +sprint, +Spark
FileSize
20.25 KB
20.26 KB
1.13 KB

Excellent find, and awesome bug report: all the info is here, explained very clearly. Thank you! :)

Whew, I fear this is going to be mighty difficult to fix. The cause is deep in the filter system: the filter_html uses Xss::filter(), which is responsible for stripping out evil protocols. If you type something (anything) before "http:", such as "You should see http://drupal.org", then ""You should see http:" is considered the protocol! Patch attached to fix this.


Tested with this content, and with the "restrict images to this site" filter off:

<p>Simple</p>
<img data-align="center" data-caption="http://drupal.org" src="https://drupal.org/sites/all/modules/drupalorg/drupalorg/images/d8.svg" />
<p>Anything preceding the URL is interpreted as the protocol</p>
<img data-align="center" data-caption="See http://drupal.org" src="https://drupal.org/sites/all/modules/drupalorg/drupalorg/images/d8.svg" />
<p>And when HTML is embedded in a data- attribute, that too is interpreted as the protocol</p>
<img data-align="center" data-caption="This is a &lt;a href=&quot;http://drupal.org&quot;&gt;quick&lt;/a&gt; test.." src="https://drupal.org/sites/all/modules/drupalorg/drupalorg/images/d8.svg" />

(Note that you have to copy/paste the above in CKEditor's "Source" mode, without exiting source mode; the current CKEditor build will mangle HTML stored in data- attributes. That's fixed in newer versions of CKEditor.)

Before:
url-in-data-attribute-BEFORE.png

After:
url-in-data-attribute-AFTER.png

Wim Leers’s picture

Issue tags: +Needs tests

.

eule’s picture

Issue tags: -Needs tests

works for me..but need´s deeper tests

Wim Leers’s picture

Of course. But before I do that, I want sign-off on the approach.

mr.baileys’s picture

Issue summary: View changes
Status: Needs review » Needs work

Some test cases with the patch applied and using Basic HTML:

Actual caption in editor Rendered caption
(http://drupal.org) //drupal.org)
Source:reuters reuters
Example use of the Scope Resolution Operator (::) :)

The most recent patch only works when the scheme is preceded by a space. There are other valid captions where the scheme is not preceded by a space (see example 1). Rather than checking everything between space and ':', the logic should follow RFC 3986's definition of a scheme as "Scheme names consist of a sequence of characters beginning with a letter and followed by any combination of letters, digits, plus ("+"), period ("."), or hyphen ("-")"

However, I'm not sure how we can fix examples 2 and 3 while still using Url::stripDangerousProtocols.

Needs work for example 1, although we might have to rethink the approach to accommodate examples 2 and 3 too.

Wim Leers’s picture

Thanks for the review — much appreciated!

And you're right, I didn't consider the (http://example.com) scenario, which of course must also work :/ Thanks for the pointers!

Wim Leers’s picture

Status: Needs work » Needs review
Issue tags: -Needs tests, -Needs security review
FileSize
6.13 KB

So I'm now testing with this HTML:

<p>Simple</p>
<img data-align="center" data-caption="http://drupal.org" src="/sites/default/files/inline-images/dark.png" />
<p>Anything preceding the URL is interpreted as the protocol</p>
<img data-align="center" data-caption="See http://drupal.org" src="/sites/default/files/inline-images/dark.png" />
<p>And when HTML is embedded in a data- attribute, that too is interpreted as the protocol</p>
<img data-align="center" data-caption="This is a &lt;a href=&quot;http://drupal.org&quot;&gt;quick&lt;/a&gt; test.." src="/sites/default/files/inline-images/dark.png" />
<p>#6.1:</p>
<img data-align="center" data-caption="(http://drupal.org)" src="/sites/default/files/inline-images/dark.png" />
<p>#6.2a:</p>
<img data-align="center" data-caption="Source:reuters" src="/sites/default/files/inline-images/dark.png" />
<p>#6.2b:</p>
<img data-align="center" data-caption="Source: reuters" src="/sites/default/files/inline-images/dark.png" />

For a moment, I thought it would've been possible to first run the filter_caption filter and then run the filter_html filter, but that won't work, because it'd require the <figure> and <figcaption> tags to be whitelisted. We don't want that.

The solution that I found is super simple and elegant: we simply leverage the FilterInterface::prepare() phase that filters may implement! By URL-encoding the value of the data-caption attribute, no protocol stripping happens anymore. The data-caption attribute is intended to contain (HTML-encoded) HTML, therefore filtering should be applied. (This already is the case.) And it is then that filtering that is responsible for stripping dangerous protocols.

This patch no longer touches the Xss or UrlHelper classes at all anymore, so the "needs security review" tag is not even necessary anymore. However, thanks to the test coverage added by #2099741: Protect WYSIWYG Editors from XSS Without Destroying User Data (Drupal\editor\Tests\EditorXssFilter\StandardTest), we can also be certain that we did not enable any new security holes! Hence I'm definitely able to remove the "needs security review" part.

mr.baileys’s picture

Status: Needs review » Needs work
FileSize
33.58 KB
50.16 KB
30.39 KB

I took the patch in #8 for quick test-drive, and there seems to still be an issue. To reproduce:

Creating/editing a node and adding a caption:

Viewing the result on node view:

Editing the node again:

Wim Leers’s picture

Priority: Normal » Major
Issue tags: +Security, +Needs security review
FileSize
20.45 KB
21.02 KB

How do you keep coming up with these edge cases? :D Amazing!

Now testing with:

<p>Simple</p>
<img data-align="center" data-caption="http://drupal.org" src="/sites/default/files/inline-images/dark.png" />
<p>Anything preceding the URL is interpreted as the protocol</p>
<img data-align="center" data-caption="See http://drupal.org" src="/sites/default/files/inline-images/dark.png" />
<p>And when HTML is embedded in a data- attribute, that too is interpreted as the protocol</p>
<img data-align="center" data-caption="This is a &lt;a href=&quot;http://drupal.org&quot;&gt;quick&lt;/a&gt; test.." src="/sites/default/files/inline-images/dark.png" />
<p>#6.1:</p>
<img data-align="center" data-caption="(http://drupal.org)" src="/sites/default/files/inline-images/dark.png" />
<p>#6.2a:</p>
<img data-align="center" data-caption="Source:reuters" src="/sites/default/files/inline-images/dark.png" />
<p>#6.2b:</p>
<img data-align="center" data-caption="Source: reuters" src="/sites/default/files/inline-images/dark.png" />
<p>#7:</p>
<img data-align="center" data-caption="Interesting (Scope resolution operator ::)" src="/sites/default/files/inline-images/dark.png" />

Upon further (and deep — spent my entire Saturday on this) investigation, it turns out that the solution in #8 indeed solves it for filtering (i.e. output for the end user), but it's broken because of #2099741: Protect WYSIWYG Editors from XSS Without Destroying User Data, which introduced EditorXssFilterInterface to ensure safety of authors when using WYSIWYG editors. That also uses
Xss::filter()… and therefore we end up having exactly the same problems as before #8, but only for the author, not for the end user!

If you'd paste the sample data, save it, edit it again, you'd notice that exactly the same things are broken as before #8, but only for authors!

In other words: the solution in #8 is not a complete solution. And since we have to fix it in Xss::filter() anyway, we might as well get rid of #8 entirely — but keep the added test coverage, of course.


By very carefully adapting UrlHelper::stripDangerousProtocols(), I've been able to find a solution, that:

  1. still has all of the existing security test coverage still passing tests: \Drupal\editor\Tests\EditorXssFilter\StandardTest, \Drupal\Tests\Component\Utility\XssTest, \Drupal\filter\Tests\FilterUnitTest and \Drupal\Tests\Component\Utility\UrlHelperTest.
  2. also fixes the known security flaws we had since PHP 5.4 due to an API change in PHP — see #1210798: In PHP 5.4+, html_entity_decode() doesn't decode invalid numeric entities (but that issue might still need to be resolved, it could have more symptoms than only the two test cases that were disabled because of it)
  3. also ensures that if there are *multiple* protocols in a single attribute value (e.g. in the case of the encoded HTML for the data-caption attribute), they are all stripped. Note that we in fact committed \Drupal\editor\Tests\EditorXssFilter\StandardTest with a test case that shows that this is broken in HEAD — the mitigating factor is that you'd need the <META> tag to be allowed to use it.
  4. does not support the Source:Wikipedia use case, because it's impossible for Drupal to know whether Source: is an actual protocol, now or in the future. That will be a minor WTF, but we must always err on the side of safety.
Wim Leers’s picture

Status: Needs work » Needs review
jcisio’s picture

Sorry for stupid question, but I don't understand a few XSS protection filter. E.g. this test:

$data[] = array('<IMG SRC="jav&#x09;ascript:alert(\'XSS\');">', '<IMG src="jav' . "\t" . 'alert(&#039;XSS&#039;);">');

Why that string needs to be "protected"? I test (only with Chrome) and it does not do anything harmful.

Wim Leers’s picture

There's a link just above that line of code that explains it. Please remember that XSSes don't need to work in every browser to be dangerous, if they only work in some browsers, that's already enough.

jcisio’s picture

IMO that link say the contrary: it is used to protect against XSS, not to do an XSS attack. Because I don't trust 100% on any type of wiki, I tried to reproduce it in any browser, but no popup was trigger, so I raised the question.

Wim Leers’s picture

Well, in any case: what you're talking about is completely off-topic for this issue. Please open a new issue if you want to remove that.

jcisio’s picture

Damien Tournoud’s picture

Status: Needs review » Needs work
+        // If the colon is followed by whitespace, then this cannot be a URI,
+        // per RFC3986, section 3 (Syntax Components).
+        // Example: the alt attribute of an image might contain the value
+        // "Source: Wikipedia".
+        if (preg_match('/\s/', $uri[$colonpos + 1])) {
+          break;
+        }

^ javascript: alert(123) works pretty well. Virtually every browser will automatically url-encode spaces in embedded URLs.

Formal validation is pretty useless in an XSS filter, precisely because browsers have very lenient parsers.

+        // First: ignore (but retain) leading whitespace. (There could be a
+        // leading space that would cause the URI scheme to be considered
+        // disallowed, which does not make sense.)
+        $parts = preg_split('/\s+/', $protocol);
+        $actual_protocol = array_pop($parts);

^ \s is possibly local specific, I'm not sure we want to go there.

+        $whitespace_prefix = substr($protocol, 0, strrpos($protocol, $actual_protocol));

^ This looks like a really slow version of substr($protocol, 0, -strlen($actual_protocol))

+        // If $protocol is empty after stripping leading whitespace, we cannot
+        // be dealing with an URI, and therefore also not an URI scheme.
+        // Example: the alt attribute of an image might contain the value
+        // "Scope resolution operator ::".
+        if (empty($protocol)) {
+          break;
+        }

^ "javascript\t:alert(123)" works pretty well as an XSS (at least in Chrome), so this assumption is just wrong with the split on \s.

         // If a colon is preceded by a slash, question mark or hash, it cannot
-        // possibly be part of the URL scheme. This must be a relative URL, which
+        // possibly be part of the URI scheme. This must be a relative URL, which
         // inherits the (safe) protocol of the base document.
-        if (preg_match('![/?#]!', $protocol)) {
+        if (in_array($protocol[0], array('/', '?', '#'))) {
           break;
         }

^ The new version is missing a lot of relative URLs, that do not need to start with either /, ? or #.

(For example: about?test:toto is a valid URL.)

+          $offset = $colonpos + 1 - strlen($protocol);

^ From the look of it, there is a off-by-one error here. You are stripping the protocol *and* the colon.

chx’s picture

stripDangerousProtocols comes from the original KSES port ( https://api.drupal.org/api/drupal/modules%21filter.module/function/filte... ) and as such it should not be changed at all unless we have very very good reasons and a lot of expertise. I do not think it possible to enhance the KSES code but feel free to prove me wrong.

chx’s picture

Can we instead think on which attributes are JS capable? I know it's dangerous too but perhaps it's less dangerous than trying this.

neclimdul’s picture

That could be a solution. Its clear that our attribute needs are a bit more complex then they where almost 9 years ago when the port of KSES was added. It is still doing its job exceedingly well, its just the requirements of complex attributes and more advanced browsers seems to have pushed us to require something a little more surgical.

chx’s picture

Status: Needs work » Needs review
FileSize
2.27 KB

So what we can do here; we can write a harmless attribute list based on https://developer.mozilla.org/en-US/docs/Web/HTML/Attributes ; data-* surely belongs there. And be quite careful with it.

chx’s picture

Assigned: Wim Leers » Damien Tournoud

Damien, please review this -- and please noone else set to RTBC even if you like it.

Dave Reid’s picture

@chx: See also #952964: String stripped from title and alt attribute if contains a colon which is pretty much doing the same thing, and was the solution I was heading towards as well.

This implies FilterHtml::getHTMLRestrictions() must also be updated.

Maybe we should continue this "new" direction either on #952964: String stripped from title and alt attribute if contains a colon or #2327341: Xss::attributes() corrupts data attributes that contain colons or JSON-encoded data?

chx’s picture

Just mark everything as a duplicate and keep this one going?

Wim Leers’s picture

I'm fine with this approach. Only one question: do we want this list of harmless attributes to be configurable or not? #952964: String stripped from title and alt attribute if contains a colon made it configurable.

(Especially considering Web Components are upcoming, I think we must allow for additional attributes to be considered harmless. And that list could vary per site and potentially even per text format.)

chx’s picture

You put the wrong attribute there and you are hosed. Do we dare? I am already very very anxious to touch this code at all but to hand it over to users? Eek :)

Yes, we should make it configurable.

Wim Leers’s picture

#27: I'm fine with deferring configurability to a follow-up if you want :) data-* attributes are the immediate need/problem/use case, everything else could be dismissed as speculation :)

Dave Reid’s picture

I'm ok to making it configurable without an exposed UI in a follow-up.

Wim Leers’s picture

Dave Reid’s picture

Issue tags: +Media Initiative
Damien Tournoud’s picture

Assigned: Damien Tournoud » Unassigned

(About one month later...)

I reviewed the approach from #23 and it sounds very reasonable to me.

The whitelist approach is recommended by several sources (for example the nice write-up by Jakob Kallin and Irene Lobo Valbuena at excess-xss.com). A few years back, a "common" list of sanitization rules was posted on the whatwg wiki. It has a list of "acceptable attributes" and a list of "attributes whose value is a URI", we probably want to consider in our whitelist only the difference between those two (+ everything that starts with data-), and run anything else through the filter.

That would be a first start. The other approach would be to just leverage an externally maintained filter? HTML purifier looks extremely relevant.

jhedstrom queued 22: 2105841_22.patch for re-testing.

Wim Leers’s picture

Another duplicate was opened: #2383205: Xss::filter removes valid HTML attributes . But one that has nice tests :)


Looks like we want to go with #23. What's missing in that case, are tests.

We want these tests that I wrote (in #10) for the caption filter in particular:

--- a/core/modules/filter/lib/Drupal/filter/Tests/FilterUnitTest.php
+++ b/core/modules/filter/lib/Drupal/filter/Tests/FilterUnitTest.php

But also tests for the XSS filtering changes themselves in #23. webflo's patch in #2383205 does just that.

webflo’s picture

FileSize
3.89 KB
Wim Leers’s picture

Issue summary: View changes
Status: Needs review » Needs work
  1. +++ b/core/tests/Drupal/Tests/Component/Utility/XssTest.php
    @@ -486,6 +486,41 @@ public function testQuestionSign() {
    +   * Check that strings in html attributes are are correctly processed.
    

    s/html/HTML/

    Missing @covers.

    And finally, missing the test coverage I wrote for the caption filter, see my previous comment.

  2. +++ b/core/tests/Drupal/Tests/Component/Utility/XssTest.php
    @@ -486,6 +486,41 @@ public function testQuestionSign() {
    +    if ($allowed_tags === NULL) {
    +      $value = Xss::filter($value);
    +    }
    

    This is never used. Let's delete it?

  3. +++ b/core/tests/Drupal/Tests/Component/Utility/XssTest.php
    @@ -486,6 +486,41 @@ public function testQuestionSign() {
    +        'Image tag with alt and title attribute',
    

    s/alt and title/data-/

The last submitted patch, 35: 2105841_35.patch, failed testing.

cs_shadow’s picture

Status: Needs work » Needs review
FileSize
8.98 KB
6.79 KB

Addressed Wim's comments in #36. I've added tests from #10. Although, I've tried to fix the errors but there are still some errors in the filterCaption tests. Maybe someone can help me on how to fix it.

Status: Needs review » Needs work

The last submitted patch, 38: filter_html-2105841-38.patch, failed testing.

cs_shadow’s picture

Status: Needs work » Needs review
FileSize
8.96 KB
861 bytes

Fixed a few test failures.

Status: Needs review » Needs work

The last submitted patch, 40: filter_html-2105841-40.patch, failed testing.

mr.baileys’s picture

Status: Needs work » Needs review
FileSize
4.47 KB
9.89 KB
  • FilterCaption does not have a prepare() step, just process(), so removed calls to the FilterBase::prepare() since we are not calling those for FilterHtml either.
  • Swapped String::decodeEntities() and String::check_plain for their non-deprecated counterparts.
  • Fixed incorrect capitalisation of Xss in a use-statement.
  • The tests were originally written in #10, before the special treatment of "data-"-attributes introduced in #22. I have updated the expected values in the three remaining failing tests accordingly.
Wim Leers’s picture

Thank you very much for making this patch green again, @mr.baileys!

  1. +++ b/core/modules/filter/src/Tests/FilterUnitTest.php
    @@ -241,7 +238,7 @@ function testCaptionFilter() {
         $input = '<img data-caption="This is an &lt;a href=&quot;javascript:alert(&quot;foo&quot;)&gt;evil&lt;/a&gt; test…" src="llama.jpg" />';
         $expected = '<figure><img src="llama.jpg" /><figcaption>This is an <a>evil</a> test…</figcaption></figure>';
         $this->assertIdentical($expected, $test_with_html_filter($input));
    

    This unchanged test code still ensures that no XSS can make it to the site's output.

  2. +++ b/core/modules/filter/src/Tests/FilterUnitTest.php
    @@ -241,7 +238,7 @@ function testCaptionFilter() {
    -    $expected_xss_filtered = '<img data-caption="This is an &lt;a href=&quot;alert(&quot;foo&quot;)&gt;evil&lt;/a&gt; test…" src="llama.jpg" />';
    +    $expected_xss_filtered = '<img data-caption="This is an &lt;a href=&quot;javascript:alert(&quot;foo&quot;)&gt;evil&lt;/a&gt; test…" src="llama.jpg" />';
         $this->assertIdentical($expected_xss_filtered, $test_editor_xss_filter($input));
    

    This on the other hand… it now actually verifies that an XSS does make it all the way to the WYSIWYG editor. Which means that an XSS is possible in theory. CKEditor protects against this automatically, but another editor may not.

    A mitigating factor is that the editing user would actually have to click this link within the editor (which CKEditor also doesn't let you do) — he couldn't be automatically attacked. And note that this never makes it to the end user, see point 1.

    So that makes this less than ideal, but still better than what I proposed in #10 — see the comments by chx and Damien Tournoud.

    AFAICT it is literally impossible to not have this problem when going with the approach recommended by chx & Damien, since it uses a simple whitelist of tags. My code in #10 used to do what you could call "speculative URL parsing".

    The only way I see to fix this problem is to add a editorXssFilter() method to FilterInterface, to allow filters to perform additional XSS filtering for editors. That way, the FilterCaption filter could apply XSS filtering to the data-caption attribute's value.

  3. +++ b/core/modules/filter/src/Plugin/Filter/FilterCaption.php
    @@ -9,7 +9,6 @@
    -use Drupal\Component\Utility\String;
    
    @@ -40,12 +39,12 @@ public function process($text, $langcode) {
    -        $caption = String::checkPlain($node->getAttribute('data-caption'));
    +        $caption = SafeMarkup::checkPlain($node->getAttribute('data-caption'));
    ...
    -        $caption = String::decodeEntities($caption);
    +        $caption = Html::decodeEntities($caption);
    

    These are deprecated, but changing them here is actually not necessary. It actually is distracting, because it suggests it's part of the solution for this issue.

mr.baileys’s picture

Assigned: Unassigned » mr.baileys

Talked to @Wim Leers in irc, assigning to myself and will try to upload a new patch later today.

mr.baileys’s picture

After talking to Wim Leers on irc, we decided it might be better to perform XSS filtering on all data-*-attributes globally rather than relying on filters implementing editorXssFilter() individually. Attached patch implements this global data-* attribute filtering.

These are deprecated, but changing them here is actually not necessary. It actually is distracting, because it suggests it's part of the solution for this issue.

You're right, I've changed them back.

Status: Needs review » Needs work

The last submitted patch, 45: filter_html-2105841-45.patch, failed testing.

mr.baileys’s picture

mr.baileys’s picture

Status: Needs work » Needs review
Wim Leers’s picture

Status: Needs review » Needs work
  1. +++ b/core/modules/editor/src/EditorXssFilter/Standard.php
    @@ -85,7 +89,30 @@ public static function filterXss($html, FilterFormatInterface $format, FilterFor
    +    // Since data-*-attributes can contain encoded markup that could be decoded
    

    s/data-*-/data-/, like everywhere else.

    s/markup/HTML markup/, to clarify, because "markup" in general does not need to be filtered, only HTML markup.

    Maybe also add an @see \Drupal\filter\Plugin\Filter\FilterCaption?

  2. +++ b/core/modules/editor/src/EditorXssFilter/Standard.php
    @@ -89,7 +89,7 @@ public static function filterXss($html, FilterFormatInterface $format, FilterFor
    -    $output = Xss::filter($html, $blacklisted_tags);
    +    $output = parent::filter($html, $blacklisted_tags);
    

    Hah :)

  3. +++ b/core/modules/editor/src/EditorXssFilter/Standard.php
    @@ -85,7 +89,30 @@ public static function filterXss($html, FilterFormatInterface $format, FilterFor
    +        $value = String::decodeEntities($node->value);
    +        $value = Xss::filterAdmin($value);
    +        $node->value = SafeMarkup::checkPlain($value);
    

    Add a comment for these 3 lines:

    // Decode, filter, encode.
    
  4. +++ b/core/modules/filter/src/Plugin/Filter/FilterCaption.php
    @@ -39,12 +40,12 @@ public function process($text, $langcode) {
    -        $caption = SafeMarkup::checkPlain($node->getAttribute('data-caption'));
    +        $caption = String::checkPlain($node->getAttribute('data-caption'));
    ...
    -        $caption = Html::decodeEntities($caption);
    +        $caption = String::decodeEntities($caption);
    

    Let's omit these distracting, unnecessary changes.

  5. +++ b/core/modules/filter/src/Tests/FilterUnitTest.php
    @@ -199,8 +199,11 @@ function testCaptionFilter() {
    +      $filtered_html_format = FilterFormat::create(array(
    

    s/array(…)/[…]/

  6. +++ b/core/modules/filter/src/Tests/FilterUnitTest.php
    @@ -199,8 +199,11 @@ function testCaptionFilter() {
    +        'format' => 'filtered_html',
    +        'name' => 'Filtered HTML',
    

    Please use random names instead; this creates unnecessary assumptions.

    In fact, why not just omit this completely?

mr.baileys’s picture

Status: Needs work » Needs review
FileSize
2.03 KB
10.42 KB

Thanks for the review, new patch attached.

Maybe also add an @see \Drupal\filter\Plugin\Filter\FilterCaption?

Not sure about this, since FilterCaption is just one of the potentially many filters for which we globally sanitize the data-attributes?

sanduhrs’s picture

FileSize
10.45 KB

Straight reroll.

sanduhrs’s picture

FileSize
10.45 KB

Small typo, trying again. Sorry.

The last submitted patch, 51: filter_html-2105841-51.patch, failed testing.

Wim Leers’s picture

Status: Needs review » Needs work
+++ b/core/modules/editor/src/EditorXssFilter/Standard.php
@@ -85,7 +89,31 @@ public static function filterXss($html, FilterFormatInterface $format, FilterFor
+    // Since data-attributes can contain encoded HTML markup that could be
+    // decoded and interpreted by editors, we need to apply XSS filtering to
+    // their contents.
+    return Standard::filterXssDataAttributes($output);

This means we need to expand the test coverage in \Drupal\Tests\editor\Unit\EditorXssFilter\StandardTest.

After that, this is RTBC.

mr.baileys’s picture

Issue tags: +drupaldevdays
mr.baileys’s picture

Status: Needs work » Needs review
FileSize
11.65 KB
11.65 KB

I started adding tests to \Drupal\Tests\editor\Unit\EditorXssFilter\StandardTest specifically for the attributing filtering.

There is one failing test because DOMAttr->value automatically decodes attribute values, while we want the original unescaped value.

mr.baileys’s picture

FileSize
1.21 KB

Forgot to attach the interdiff to previous patch.

Status: Needs review » Needs work

The last submitted patch, 56: filter_html-2105841-56.patch, failed testing.

The last submitted patch, 56: filter_html-2105841-56.patch, failed testing.

mr.baileys’s picture

Title: filter_html (Xss::filter()) breaks URLs in image captions » Xss::filter() ignores malicous content in data-attributes and mangles image captions
Issue summary: View changes
mr.baileys’s picture

Status: Needs work » Needs review
FileSize
11.72 KB
2.22 KB
pfrenssen’s picture

Assigned: mr.baileys » pfrenssen

Going to review this.

xjm’s picture

Priority: Major » Critical
Issue tags: +D8 upgrade path

Discussed with @pfrenssen; this is a critical (and upgrade path blocking because of the disclosed vulnerability).

pfrenssen’s picture

As discussed with @xjm this is critical because it is an actual XSS issue which is easily exploitable.

This issue can be quite confusing to figure out. Here's what I've learned from experimenting with the patch, and talking to @mr.baileys and @Wim Leers:

  • The image captions are saved in the data-caption attribute. It is possible to use HTML in captions, for example if you want to output a link. If you create a link using the button in the CKEditor interface this will be single escaped. If you want to show any HTML entities then these will need to be double escaped. Double escaped captions are inherently safe and should never be filtered.
  • If the user manually enters any HTML code in the caption this will be interpreted by CKEditor. For example, if I enter <strong>my text</strong> and submit the form this will become bold text. If I enter <script>alert('xss');</script> then this will be executed as well, causing the XSS vulnerability. I suspect that this is a bug in CKEditor. Even if it would be intentional to allow editors to embed scripts in their captions it doesn't make much sense to execute the scripts when they are being edited since any executed scripts no longer have their code present in the editor and will be lost on a subsequent form submit.
  • The only way we can prevent the XSS issue on our side is by stripping the dangerous code. Technically this is data loss (the <script> tags will be gone), but this is of course preferable over having an actual XSS issue.
  • If you use the Full HTML text format then the XSS will still be present, even if the patch is applied. This is as designed, since the Full HTML format does allow to embed scripts, for example to display external assets.
  • The patch disables filtering of bad protocols (eg. javascript:alert('xss')) in selected "harmless" attributes (data attributes + title and alt attributes). These attributes are fully text based and will never have their contents executed as javascript code. This made me wonder about the data-caption attribute. It turns out fine, since its contents are fully XSS-filtered by FilterCaption::process(). A potential attack with a bad protocol could be data-caption="&lt;a href=&quot;javascript:alert(1);&quot;&gt;test&lt;/a&gt;" but if this is decoded and XSS-filtered the javascript: part is filtered correctly since it now is part of an href which is not on the whitelist of "harmless" attributes.
  1. +++ b/core/modules/editor/src/EditorXssFilter/Standard.php
    @@ -7,7 +7,11 @@
    +use Drupal\Component\Utility\SafeMarkup;
     use Drupal\Component\Utility\Xss;
    +use Drupal\Component\Utility\Html;
    

    Order use statements alphabetically.

  2. +++ b/core/modules/editor/src/EditorXssFilter/Standard.php
    @@ -7,7 +7,11 @@
    +use Drupal\Component\Utility\String;
    +use Drupal\Component\Utility\Unicode;
    

    These two namespaces are unused.

  3. +++ b/core/modules/editor/src/EditorXssFilter/Standard.php
    @@ -85,7 +89,31 @@ public static function filterXss($html, FilterFormatInterface $format, FilterFor
    -    return static::filter($html, $blacklisted_tags);
    +    $output = parent::filter($html, $blacklisted_tags);
    

    I don't understand why it was changed from static:: to parent::. This has a different meaning when this class is extended. It would mean that if the filter() method is overridden in the called class then it would not be called in this case, but instead its parent would be called. I think this should be put back to static::.

  4. +++ b/core/modules/editor/src/EditorXssFilter/Standard.php
    @@ -85,7 +89,31 @@ public static function filterXss($html, FilterFormatInterface $format, FilterFor
    +    // Since data-attributes can contain encoded HTML markup that could be
    +    // decoded and interpreted by editors, we need to apply XSS filtering to
    +    // their contents.
    +    return Standard::filterXssDataAttributes($output);
    

    Same here, it should be static::filterXssDataAttributes(), so that in case this class is overridden the method of the called class is used, and not specifically always the one from the Standard class.

  5. +++ b/core/modules/editor/src/EditorXssFilter/Standard.php
    @@ -85,7 +89,31 @@ public static function filterXss($html, FilterFormatInterface $format, FilterFor
    +        // Decode, filter, encode. There is no need to explicitly decode
    +        // $node->value, since DOMAttr holds the decoded value.
    

    We are not decoding in this part, only filtering and encoding. As is already said $node->value is already decoded.

  6. +++ b/core/modules/editor/tests/src/Unit/EditorXssFilter/StandardTest.php
    @@ -512,6 +512,12 @@ public function providerTestFilterXss() {
    +    // Test XSS filtering on data-attributes.
    +    // @see \Drupal\editor\EditorXssFilter::filterXssDataAttributes()
    +    $data[] = array('<img src="butterfly.jpg" data-caption="&lt;script&gt;alert();&lt;/script&gt;" />', '<img src="butterfly.jpg" data-caption="alert();" />');
    +    $data[] = array('<img src="butterfly.jpg" data-caption="&amp;lt;script&amp;gt;alert();&amp;lt;/script&amp;gt;" />', '<img src="butterfly.jpg" data-caption="&amp;lt;script&amp;gt;alert();&amp;lt;/script&amp;gt;" />');
    +    $data[] = array('<img src="butterfly.jpg" data-caption="&lt;EMBED SRC=&quot;http://ha.ckers.org/xss.swf&quot; AllowScriptAccess=&quot;always&quot;&gt;&lt;/EMBED&gt;" />', '<img src="butterfly.jpg" data-caption="" />');
    

    Maybe explain why it is needed to filter single escaped values but not double escaped values.

  7. +++ b/core/modules/filter/src/Tests/FilterUnitTest.php
    @@ -8,6 +8,8 @@
     use Drupal\Component\Utility\Html;
    +use Drupal\filter\Entity\FilterFormat;
    +use Drupal\editor\EditorXssFilter\Standard;
     use Drupal\Component\Utility\SafeMarkup;
    

    Alphabetical ordering of use statements, my favourite pet coding standard :)

  8. +++ b/core/modules/filter/src/Tests/FilterUnitTest.php
    @@ -176,6 +178,69 @@ function testCaptionFilter() {
    +    // So far we've tested that the caption filter works correctly. But we also
    +    // want to make sure that it works well in tandem with the "Limit allowed
    +    // HTML tags" filter, which is typically used with.
    

    "which is typically used with" is not a complete sentence. There should be an "it" in there somewhere.

  9. +++ b/core/modules/filter/src/Tests/FilterUnitTest.php
    @@ -176,6 +178,69 @@ function testCaptionFilter() {
    +    $test_with_html_filter = function($input) use ($filter, $html_filter) {
    

    Anonymous functions should have a space between "function" and the opening parenthesis.

  10. +++ b/core/modules/filter/src/Tests/FilterUnitTest.php
    @@ -176,6 +178,69 @@ function testCaptionFilter() {
    +    // All the tricky cases encountered at https://drupal.org/node/2105841.
    +    // A plain URL preceded by text.
    +    $input = '<img data-caption="See http://drupal.org" src="llama.jpg" />';
    +    $expected = '<figure><img src="llama.jpg" /><figcaption>See http://drupal.org</figcaption></figure>';
    +    $this->assertIdentical($expected, $test_with_html_filter($input));
    +    $this->assertIdentical($input, $test_editor_xss_filter($input));
    +    // An anchor.
    +    $input = '<img data-caption="This is a &lt;a href=&quot;http://drupal.org&quot;&gt;quick&lt;/a&gt; test…" src="llama.jpg" />';
    +    $expected = '<figure><img src="llama.jpg" /><figcaption>This is a <a href="http://drupal.org">quick</a> test…</figcaption></figure>';
    +    $this->assertIdentical($expected, $test_with_html_filter($input));
    +    $this->assertIdentical($input, $test_editor_xss_filter($input));
    +    // A plain URL surrounded by parentheses.
    +    $input = '<img data-caption="(http://drupal.org)" src="llama.jpg" />';
    +    $expected = '<figure><img src="llama.jpg" /><figcaption>(http://drupal.org)</figcaption></figure>';
    +    $this->assertIdentical($expected, $test_with_html_filter($input));
    +    $this->assertIdentical($input, $test_editor_xss_filter($input));
    

    ...

    This is very dense and hard to read, can we separate each case with an empty line?

  11. +++ b/core/modules/filter/src/Tests/FilterUnitTest.php
    @@ -176,6 +178,69 @@ function testCaptionFilter() {
    +    // A source being credited, with a typo.
    +    $input = '<img data-caption="Source:Wikipedia" src="llama.jpg" />';
    +    $expected = '<figure><img src="llama.jpg" /><figcaption>Source:Wikipedia</figcaption></figure>';
    +    $this->assertIdentical($expected, $test_with_html_filter($input));
    +    $expected_xss_filtered = '<img data-caption="Source:Wikipedia" src="llama.jpg" />';
    +    $this->assertIdentical($expected_xss_filtered, $test_editor_xss_filter($input));
    

    What's the typo? The missing space after the double colon?

    Also it seems that $input and $expected_xss_filtered are identical, so $input can be used in the final assert.

pfrenssen’s picture

Assigned: pfrenssen » Unassigned
Status: Needs review » Needs work
mr.baileys’s picture

Assigned: Unassigned » mr.baileys

Thanks for the review @pfrenssen, will address your feedback asap.

Wim Leers’s picture

For the strangeness on the CKEditor side WRT encoding/decoding, we already have #2460145: Image captions incorrectly deal with plain-text HTML tags for that. Specifically see #2460145-3: Image captions incorrectly deal with plain-text HTML tags, which shows that it's a bug in CKEditor itself.

mr.baileys’s picture

Status: Needs work » Needs review
FileSize
12.03 KB
7.85 KB
  • 1, 2: Fixed
  • 3, 4: Changed back to static. These were changed since tests were failing with static earlier, but I can no longer reproduce those test failures to figure out what exactly was happening (I remember it had something to do with the switch between tag white-listing and black-listing, and the incorrect value being used).
  • 5: Re-worded the comment.
  • 6: Added some additional comments to explain the test cases.
  • 7: Fixed.
  • 8: Fixed.
  • 9: Fixed.
  • 10: Newlines added between individual test-cases.
  • 11: Reworded the comment, but the "typo" is indeed the missing space, since protocol-filtering used to consider "Source:" as a protocol, invoking the bad protocol filtering and mangling the caption.
pfrenssen’s picture

Status: Needs review » Needs work

Thanks a lot Ivo! Just some minor cleanup left and this is good to go.

  1. +++ b/core/modules/editor/src/EditorXssFilter/Standard.php
    @@ -85,7 +87,33 @@ public static function filterXss($html, FilterFormatInterface $format, FilterFor
    +  /**
    +   * Applies a very permissive XSS/HTML filter to data-attributes.
    +   */
    +  protected static function filterXssDataAttributes($html) {
    

    Document the parameter and the return value.

  2. +++ b/core/modules/filter/src/Tests/FilterUnitTest.php
    @@ -176,6 +178,75 @@ function testCaptionFilter() {
    +    $input = '<img data-caption="Source:Wikipedia" src="llama.jpg" />';
    +    $expected = '<figure><img src="llama.jpg" /><figcaption>Source:Wikipedia</figcaption></figure>';
    +    $this->assertIdentical($expected, $test_with_html_filter($input));
    +    $expected_xss_filtered = '<img data-caption="Source:Wikipedia" src="llama.jpg" />';
    +    $this->assertIdentical($expected_xss_filtered, $test_editor_xss_filter($input));
    

    $expected_xss_filtered is identical to $input, so we can get rid of $expected_xss_filtered and use $input in the final assert.

xjm’s picture

Issue tags: +D8 Accelerate Dev Days
mr.baileys’s picture

Status: Needs work » Needs review
FileSize
1.63 KB
12.08 KB

Addressed the comments from #69.

pfrenssen’s picture

Assigned: mr.baileys » Unassigned
Status: Needs review » Reviewed & tested by the community

Looking good now, thanks a lot!

alexpott’s picture

Status: Reviewed & tested by the community » Needs work
+++ b/core/lib/Drupal/Component/Utility/Xss.php
@@ -203,6 +204,10 @@ protected static function attributes($attributes) {
+            $harmless = substr($attribute_name, 0, 5) === 'data-' || in_array($attribute_name, array(
+              'title',
+              'alt',
+            ));

Let's add a comment as to why this is harmless.

pfrenssen’s picture

Title: Xss::filter() ignores malicous content in data-attributes and mangles image captions » Xss::filter() ignores malicious content in data-attributes and mangles image captions
mr.baileys’s picture

Status: Needs work » Needs review
FileSize
3.21 KB
12.79 KB

Added a comment. I've also renamed the $harmless-variable to $skip_protocol_filtering, since that better describes what it does.

Dom.’s picture

I have read though the interdiff which looks fine. Since we claim Drupal 8 to be HTML5 ready I would expect the @see link to point out at html5 doc not html4, but I could not find a proper one.
RTBC+1

pfrenssen’s picture

Status: Needs review » Reviewed & tested by the community

Very good, I like the variable rename too. Thanks!

  • alexpott committed 2febbff on 8.0.x
    Issue #2105841 by mr.baileys, Wim Leers, cs_shadow, sanduhrs, chx,...
alexpott’s picture

Status: Reviewed & tested by the community » Fixed
Issue tags: -Needs security review

This issue addresses a critical bug and is allowed per https://www.drupal.org/core/beta-changes. Afaics @Damien Tournoud has done a security review and agreed that the current whitelist approach is the correct one to take. Committed 2febbff and pushed to 8.0.x. Thanks!

diff --git a/core/lib/Drupal/Component/Utility/Xss.php b/core/lib/Drupal/Component/Utility/Xss.php
index 5f8e981..9081fae 100644
--- a/core/lib/Drupal/Component/Utility/Xss.php
+++ b/core/lib/Drupal/Component/Utility/Xss.php
@@ -207,14 +207,12 @@ protected static function attributes($attributes) {
 
             // Values for attributes of type URI should be filtered for
             // potentially malicious protocols (for example, an href-attribute
-            // starting with "javascript:").
-            // However, for some non-URI attributes, performing this filtering
-            // actually causes valid and safe data to be mangled. To prevent
-            // this, we whitelist a number of non-URI attributes for which
-            // protocol filtering can be safely by-passed.
-            //
-            // @See \Drupal\Component\Utility\UrlHelper::filterBadProtocol()
-            // @See http://www.w3.org/TR/html4/index/attributes.html
+            // starting with "javascript:"). However, for some non-URI
+            // attributes performing this filtering causes valid and safe data
+            // to be mangled. We prevent this by skipping protocol filtering on
+            // such attributes.
+            // @see \Drupal\Component\Utility\UrlHelper::filterBadProtocol()
+            // @see http://www.w3.org/TR/html4/index/attributes.html
             $skip_protocol_filtering = substr($attribute_name, 0, 5) === 'data-' || in_array($attribute_name, array(
               'title',
               'alt',

I reworked the comment to be shorter and less repetitive.

Liam Morland’s picture

Will this be backported to D7?

Wim Leers’s picture

Issue tags: -sprint

Amazing work on this very, very tricky issue, @mr.baileys! Thank you so much for sticking with it for one and a half years! (Your first comment on this issue was on November 6, 2013…)

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.

Heine’s picture

Title: Xss::filter() ignores malicious content in data-attributes and mangles image captions » Xss filter() ignores malicious content in data-attributes and mangles image captions
Version: 8.0.x-dev » 7.x-dev
Status: Closed (fixed) » Needs review
FileSize
3.82 KB

Attached patch is a partial backport of the changes to filter_xss code in the D8 patch. The part filtering and encoding data attributes values has not been backported as this may break existing sites.

Heine’s picture

Heine’s picture

Fix codestyle issue

xjm’s picture

Issue tags: +Needs backport to D7

Tagging to indicate this is a backport.

Les Lim’s picture

Up in #26 through #29, there was talk about having a follow-up to make the attribute whitelist configurable. Adding that here: #2544110: XSS attribute filtering is inconsistent and strips valid attributes

This is newly relevant based on #2501697: Remove SafeMarkup::set in rdf_preprocess_comment(), where we discovered that the XSS filter mangles valid RDF attribute name/value combinations, like rel="schema:author" and typeof="foaf:image".

webchick’s picture

Issue tags: -D8 upgrade path

Removing the upgrade path tag.

Devin Carlson’s picture

Expanded #85 to backport filterXssDataAttributes() as well so it could be used by D7 contrib projects, such as CKEditor. If this isn't needed then #85 should be good to go.

I found #85 to be a straight backport of #75 and didn't find any issues after a review or manual testing. This issue affects the D7 version of the Entity Embed module. #2550669: Backport field formatter display capabilities

The last submitted patch, 90: do-2105841_no_protocol_filter-90-tests-only.patch, failed testing.

samuel.mortenson’s picture

Status: Needs review » Reviewed & tested by the community

Marking (#90) as RTBC, looks like a pretty straight-forward backport of the current D8 filtering.

David_Rothstein’s picture

Title: Xss filter() ignores malicious content in data-attributes and mangles image captions » Xss filter() mangles image captions and title/alt/data attributes (Drupal 7 and 8) and ignores malicious content in data-attributes (Drupal 8 only)

#90 looks like it won't work on PHP 5.2?

Seems like #85 is the most important part to backport, but the extra function in #90 could be useful also (although I haven't read up on the details).

In the meantime, just changing the issue title, since it's pretty scary-looking and confusing otherwise :)

Devin Carlson’s picture

FileSize
163.65 KB

To address #93, I set up an environment with PHP 5.2.17 and ran the filter tests, with #90 applied, which all passed without any warnings.

David_Rothstein’s picture

Status: Reviewed & tested by the community » Needs work

I don't see how since namespaces weren't introduced until PHP 5.3. And if I try this at http://sandbox.onlinephpfunctions.com using PHP 5.2 I get a PHP warning due to the slash in "\DOMXPath" (Unexpected character in input: '\').

Also if we're going to introduce this new function then I think we need more documentation on how and when to use it. It is not used at all in Drupal 7 core, and it's not really clear from the current documentation what you would use it for.

+          // @see filter_xss_bad_protocol()
+          // @see http://www.w3.org/TR/html4/index/attributes.html

(minor) These should be part of a sentence since they're inline code comments rather than PHPDoc, e.g. See filter_xss_bad_protocol() and http://www.w3.org/TR/html4/index/attributes.html.

Otherwise it looked good to me.

David_Rothstein’s picture

Priority: Critical » Normal

Also demoting the priority here, since the part of this issue that affects Drupal 7 is not a critical bug. Would still be very nice to fix it though.

Devin Carlson’s picture

An updated version of #90 to address #93 and #95.

The last submitted patch, 97: do-2105841_no_protocol_filter-97-tests-only.patch, failed testing.

  • alexpott committed 2febbff on 8.1.x
    Issue #2105841 by mr.baileys, Wim Leers, cs_shadow, sanduhrs, chx,...
aDarkling’s picture

Tried it out & it worked great for me in D-7.41.

Ran the testing suite. 1 test failed -- "HTML scheme clearing evasion -- spaces and metacharacters before scheme." I don't think that's related to this fix, though.

I'm not sure how filter_xss_data_attributes() is really needed, though since it is currently only used for the tests.
As I understand it we're supposed to run data through the filters that are set up (Full, Filtered, Plain, etc.), so there'd be no need to call filter_xss_data_attributes() separately.

Otherwise, I'd say that its good to go.

dbazuin’s picture

aDarkling witch patch did you use?
And if it is #90 dit you try it with PHP 5.2?

aDarkling’s picture

Sorry, no.
I used #97 on PHP Version 5.4.38.
Just (manually) re-applied the patch to D-7.42. Still working.

aDarkling’s picture

Status: Needs review » Reviewed & tested by the community

This patch is already being used by (at least) the 2,424 sites that are using the Entity Embed module.

stefan.r’s picture

stefan.r’s picture

Title: Xss filter() mangles image captions and title/alt/data attributes (Drupal 7 and 8) and ignores malicious content in data-attributes (Drupal 8 only) » Xss filter() mangles image captions and title/alt/data attributes
Fabianx’s picture

Issue tags: +Pending Drupal 7 commit

RTBC + 1, it looks good to me.

Marking for commit.

David_Rothstein’s picture

Status: Reviewed & tested by the community » Needs review
Issue tags: -Pending Drupal 7 commit

I thought about this issue a little more and I'm not sure the data- part of it is safe in terms of backwards compatibility.

The problem is that data- attributes are by definition arbitrary, so it's certainly possible for a URL to be stored in them intentionally (which is later moved into an actual href attribute by JavaScript). And I've definitely seen modules store URLs in data- attributes before.

For a simple example, if your JavaScript code does $('.some-link').attr('href', $('.some-source').attr('data-url')) that's safe against XSS currently as long as the source has been run through filter_xss(), but not safe with this patch. So this patch would potentially open up a security hole in existing code in that case.

Seems like we might need some kind of whitelist here, where a module can declare that a particular data- attribute shouldn't undergo this filtering? (This could be as simple as a single binary flag that is applied sitewide, e.g. a list of whitelisted data- attributes which modules could set via variable_set() and which filter_xss() then checks... as long as we're willing to assume that modules only use this on attributes that they can reasonably believe they themselves "control" and that other JavaScript code out there won't be using also.)

Some other minor issues I would have fixed on commit if not for the above:

+    // Check that strings in HTML attributes are are correctly processed.

"are are"

+ * Contrib modules which allow rich text fields to be edited using client-side
+ * WYSIWYG editors must apply XSS filtering to the contents of data-attributes
+ * since they can contain encoded HTML markup that could be decoded and
+ * interpreted by editors.

Should be "Contributed modules". And maybe this could use a specific explanation that contrib modules are expected to call this function themselves in those cases, and how to call it? (e.g., filter_xss_data_attributes(filter_xss($html))... specifically that this function by itself is not a substitute for regular XSS-filtering)

  • alexpott committed 2febbff on 8.3.x
    Issue #2105841 by mr.baileys, Wim Leers, cs_shadow, sanduhrs, chx,...

  • alexpott committed 2febbff on 8.3.x
    Issue #2105841 by mr.baileys, Wim Leers, cs_shadow, sanduhrs, chx,...
Gábor Hojtsy’s picture

Issue tags: -Media Initiative +D8Media

Swap media tags to the right one.

Liam Morland’s picture

Status: Needs review » Needs work

The last submitted patch, 111: drupal-no_protocol_filter-2105841-111-D7.patch, failed testing.

cilefen’s picture

Liam Morland’s picture

Status: Needs work » Needs review

Status: Needs review » Needs work

The last submitted patch, 111: drupal-no_protocol_filter-2105841-111-D7.patch, failed testing.

Liam Morland’s picture

Status: Needs work » Needs review
Wim Leers’s picture

Please create a Drupal 7 issue, so we can finally close this against Drupal 8. This was committed in April 2015, almost two years ago!

  • alexpott committed 2febbff on 8.4.x
    Issue #2105841 by mr.baileys, Wim Leers, cs_shadow, sanduhrs, chx,...

  • alexpott committed 2febbff on 8.4.x
    Issue #2105841 by mr.baileys, Wim Leers, cs_shadow, sanduhrs, chx,...
Jaypan’s picture

In 8.2.x, I'm still seeing this issue. I'm posting here in case someone can point me at where I'm going wrong.

I've got a field on my account, named bio. In this field, I'm manually adding the following:

<a href="https://www.jaypan.com/" typeof="schema:Organization>Jaypan</a>

However, the resulting HTML is as follows:

<a href="https://www.jaypan.com/" typeof="Organization>Jaypan</a>

So schema: is being stripped from the 'typeof' property.

Eric_A’s picture

Jaypan’s picture

Done. Thanks.

Charles Belov’s picture

@Wim Leers I've created an issue #2859006 which I just closed as a duplicate of this issue. If that would be the appropriate 7.x issue, then it's available for that.

jhodgdon’s picture

I think that this issue is probably a duplicate of #2544110: XSS attribute filtering is inconsistent and strips valid attributes (or vice versa)? I just commented there as well and added this one as related there.

jhodgdon’s picture

See also #2635632-20: Drupal XSS filtering cuts valid attributes, make it configurable / use a blacklist, where I have tried to summarize all the mangled attribute issues and ask whether we should solve them together or apart.

jhodgdon’s picture

In case we are going to mark this issue as a duplicate of #2635632: Drupal XSS filtering cuts valid attributes, make it configurable / use a blacklist, I just made a test-only patch on comment #22 there that incorporates (I think) the test cases here.

Status: Needs review » Needs work

The last submitted patch, 111: drupal-no_protocol_filter-2105841-111-D7.patch, failed testing. View results

jhodgdon’s picture

Version: 7.x-dev » 8.0.x-dev
Status: Needs work » Fixed

Oh, I see. This was an 8.x issue and was apparently fixed. Someone finally created the 7.x version of it in #123. This one should be marked Fixed / 8.0.x. It was closed several years ago and reopened around comment #83 for 7.x, but now our procedure is to have separate 8.x and 7.x issues. Hence... fixed.

generalredneck’s picture

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.