Follow-up to #2105841: Xss filter() mangles image captions and title/alt/data attributes

Problem/Motivation

\Drupal\Component\Utility\Xss::filter() cleans potentially dangerous protocols like "javascript:" from element attributes. It does this by stripping any set of characters that ends with a colon, unless the string is "http:" or "https:".

The filter strips out valid attribute name/value combinations that provide RDF metadata, such as rel="schema:author" or property="foaf:name".

Some attributes are exempt from this treatment, including alt, title, and any data-* attribute. In #2105841: Xss filter() mangles image captions and title/alt/data attributes, the decision was made to hard-code the exempt attributes list, and possibly make the list configurable in a follow-up issue.

Proposed resolution

Create two lists: one for for safe attributes (for which we can skip protocol check) and one for unsafe attributes (for which we enforce a protocol check).

User interface changes

None.

API changes

None.

Beta phase evaluation

Reference: https://www.drupal.org/core/beta-changes
Issue category Bug because RDF attributes are being stripped
Issue priority Major because ... Critical/Not critical because ...
Unfrozen changes Unfrozen because it is a bug fix
Prioritized changes The main goal of this issue is bug fix and security
Disruption Disruptive for core/contributed and custom modules/themes because it will require a BC break/deprecation/data model changes/an upgrade path/internal refactoring/widespread changes... (Which? Specify.)

-->

CommentFileSizeAuthor
#178 2544110-10.4.x-xss-attribute-filtering-is-inconsistent.diff10.67 KBonnia
#162 2544110-162.patch10.76 KBheddn
#159 2544110-159.patch10.72 KBsmustgrave
#159 interdiff-156-159.txt1.67 KBsmustgrave
#156 2544110-156.patch9.59 KBsmustgrave
#156 interdiff-154-156.txt2.06 KBsmustgrave
#155 interdiff-151-154.txt6.44 KBsmustgrave
#154 2544110-154.patch11.56 KBsmustgrave
#154 interdiff-151-154.txt0 bytessmustgrave
#151 2544110-151.patch12.62 KBsmustgrave
#151 interdiff-146-151.txt10.09 KBsmustgrave
#149 2544110-149.patch12.88 KBlazysoundsystem
#146 2544110-146.patch12.88 KBsmustgrave
#146 interdiff-136-146.txt4.96 KBsmustgrave
#136 2544110-136.patch11.26 KBsmustgrave
#136 interdiff-129-136.txt10.92 KBsmustgrave
#129 2544110-129-no-logger.patch10.65 KBsmustgrave
#129 interdiff-128-129.txt1.36 KBsmustgrave
#128 2544110-128-no-logger.patch10.83 KBsmustgrave
#125 2544110-125.patch12.61 KBsmustgrave
#125 interdiff-118-125.txt8.11 KBsmustgrave
#125 2544110-125-tests-only.patch5.66 KBsmustgrave
#118 2544110-118-WIP.patch14.52 KBsmustgrave
#118 2544110-118-WIP-tests-only.patch7.02 KBsmustgrave
#100 2544110-100.patch5.72 KBgaards
#94 2544110-94-allow-class-attribute-colon-in-styles-combo.patch1.11 KBtedfordgif
#90 2544110-90-allow-class-attribute.patch452 bytestedfordgif
#76 2544110-76.patch5.79 KBjofitz
#69 2544110-68.patch5.82 KBpcambra
#61 interdiff.txt799 bytessubhojit777
#61 xss_attributes-2544110-61.patch5.97 KBsubhojit777
#56 interdiff.txt1.45 KBsubhojit777
#56 xss_attributes-2544110-56.patch5.97 KBsubhojit777
#50 xss_attributes-2544110-50.patch5.96 KBmpdonadio
#42 interdiff-23-42.txt794 bytesmpdonadio
#42 xss_attributes-2544110-42.patch5.94 KBmpdonadio
#7 xss_attributes-2544110-7.patch1.51 KBmpdonadio
#23 interdiff-20-23.txt3.4 KBmpdonadio
#23 xss_attributes-2544110-23.patch5.93 KBmpdonadio
#7 xss_attributes-2544110-test-only.patch1003 bytesmpdonadio
#20 interdiff-07-20.txt6.63 KBmpdonadio
#6 interdiff-2544110-4-6.txt965 bytessubhojit777
#20 xss_attributes-2544110-20.patch5.89 KBmpdonadio
#6 xss_attributes-2544110-6.patch1.55 KBsubhojit777
#20 xss_attributes-2544110-test-only.patch1.76 KBmpdonadio
#4 xss_attributes-2544110-4.patch1.56 KBsubhojit777

Issue fork drupal-2544110

Command icon Show commands

Start within a Git clone of the project using the version control instructions.

Or, if you do not have SSH keys set up on git.drupalcode.org:

  • 2544110-xss-attribute-filtering Comparechanges, plain diff MR !5957
  • 11.3.x Comparecompare
  • 1 hidden branch
  • 11.x Comparecompare

Comments

wim leers’s picture

Issue tags: +Needs tests

First step: exempt rel and property too?

Also: how come test coverage is not detecting this problem!? :O

les lim’s picture

RDF module mostly uses the Attribute class to prepare its attribute name/value strings, which sanitizes using htmlspecialchars() but does not run through Xss::filter().

I've opened up a sister issue to explore whether Attribute needs additional filtering: #2545056: Attribute class does not sanitize for "javascript:alert('XSS')"

subhojit777’s picture

I found this bug and created an issue for this #2556051: Xss::filterAdmin() strips namespaced attributes from elements. I think we should close the other one and work on this one.

mpdonadio has done some work there. I am going to replicate it here and submit a patch.

subhojit777’s picture

Status: Active » Needs review
StatusFileSize
new1.56 KB

Status: Needs review » Needs work

The last submitted patch, 4: xss_attributes-2544110-4.patch, failed testing.

subhojit777’s picture

Status: Needs work » Needs review
StatusFileSize
new1.55 KB
new965 bytes
mpdonadio’s picture

Not sure why #4 failed to run. Here is a reroll from the other issue w/ the test-only patch.

I think coming up with the list of attributes may be difficult. Maybe we want to whilelist prefixes instead? We have to deal with RDF, Facebook / OpenGraph, Dublin Core, and probably others.

Short term we should hardcode the list, but long term we should consider a way for modules to declare their namespaces, which would probably mean moving this out of Component.

?

subhojit777’s picture

@mpdonadio See the interdiff in #6, may be that is the reason.

The last submitted patch, 7: xss_attributes-2544110-test-only.patch, failed testing.

mpdonadio’s picture

Assigned: Unassigned » mpdonadio

I have an idea for this that may make it a little less fragile and more maintainable in the long run, but also keep it secure. I am going to pick away at it during the day, and will post my progress tonight.

scor’s picture

It's awesome to see work on this bug! For RDFa, we should allow the following attributes in the white list: property, typeof, rel, rev and datatype. Those can include tokens of the form "prefix:term" and they would be stripped if not in the whitelist, which is a problem.

Not sure if we would need to go as far as establishing a whitelist of prefixes as suggested in #7, I think filtering on a whitespace separated list of tokens "[alphanum]:[alphanum]" would be sufficient if we want to inspect the content of those attributes.

mpdonadio’s picture

I like #11, and can move my idea in that direction.

subhojit777’s picture

For the long run I think #11 will be good. As @mpdonadio has suggested in #7 there can be many like Facebook, OpenGraph, etc.

mpdonadio’s picture

@scor, there are more attributes listed in http://www.w3.org/TR/xhtml-rdfa/ This list you provided are the only ones we need to worry about for the semicolons?

joelpittet’s picture

Issue summary: View changes
Issue tags: -Needs beta evaluation, -Needs tests +RDF

Added beta evaluation and there are tests on this so removing the tag. Only left out if this is major or not in the B.E, I'll let @scor decide that bit I think or someone who'd like to write something for that feel free.

joelpittet’s picture

Issue summary: View changes
joelpittet’s picture

Issue summary: View changes
joelpittet’s picture

Issue summary: View changes
mpdonadio’s picture

We have a proof of concept test that demonstrates this is a real problem. I wrote a patch at lunch that has way better testing. I want to do my best, though, to make sure I catch all edge cases from the get go, so I am going to wait until tonight to post it.

mpdonadio’s picture

Assigned: mpdonadio » Unassigned
Issue tags: +SafeMarkup
StatusFileSize
new1.76 KB
new5.89 KB
new6.63 KB

Think this is good for review. The interdiff is kinda useless, but I included it anyway.

The only open question is whether we want methods to update and/or set Xss::$safeAttributes and Xss::$rdfaAttributes. Easy to add, but didn't include in this patch.

joelpittet’s picture

This is looking pretty good IMO.

+++ b/core/lib/Drupal/Component/Utility/Xss.php
@@ -261,7 +276,12 @@ protected static function attributes($attributes) {
+            if (in_array($attribute_name, static::$rdfaAttributes)) {
+              $thisval = ($skip_protocol_filtering || preg_match('/^[[:alnum:]]+\:[[:alnum:]]+$/', $match[1])) ? $match[1] : UrlHelper::filterBadProtocol($match[1]);
+            }
+            else {
+              $thisval = $skip_protocol_filtering ? $match[1] : UrlHelper::filterBadProtocol($match[1]);
+            }

Maybe worth putting the regex pattern as a object property? or better yet factor out this value retrieval as it's own function because it's duplicated 3 times.

If someone needs that we can let them ask for the setters/getters.

The last submitted patch, 20: xss_attributes-2544110-test-only.patch, failed testing.

mpdonadio’s picture

StatusFileSize
new5.93 KB
new3.4 KB

Not in love with the docblock, method name, or the comment above the code, but here is that logic pulled into its own function. Suggestions appreciated.

droplet’s picture

+++ b/core/tests/Drupal/Tests/Component/Utility/XssTest.php
@@ -516,6 +516,42 @@ public function providerTestAttributes() {
+        '<img src="http://example.com/foo.jpg" foo="bar:baz">',
...
+        '<h2 property="javascript:alert(0);">The Title</h2>',
+        '<h2 property="alert(0);">The Title</h2>',

Why Why ? I don't understand. Danger strings in safe area, this is safe.

mpdonadio’s picture

The new assertions on this issue are just to demonstrate that the patch doesn't change the behavior for the non RDFa attributes. The test only patch shows this. The single failure is from what the issue addresses. The other new assertions just double check that the patch doesn't introduce additional problems.

subhojit777’s picture

Status: Needs review » Reviewed & tested by the community

Looks good +1 RTBC

Status: Reviewed & tested by the community » Needs work

The last submitted patch, 23: xss_attributes-2544110-23.patch, failed testing.

droplet’s picture

+++ b/core/tests/Drupal/Tests/Component/Utility/XssTest.php
@@ -516,6 +516,42 @@ public function providerTestAttributes() {
+      array(
+        '<img src="http://example.com/foo.jpg" typeof="foaf:Image">',
+        '<img src="http://example.com/foo.jpg" typeof="foaf:Image">',
+        'Image tag with RDFa with namespaced attribute',
+        array('img')
+      ),
...
+      array(
+        '<img src="http://example.com/foo.jpg" foo="bar:baz">',
+        '<img src="http://example.com/foo.jpg" foo="baz">',
+        'Image tag with non-RDFa attribute',
+        array('img')
+      ),

I still confused with them.

Can you tell the difference between these 2 test cases and show an attack way (proof-of-concept). From my own thoughts, if you assumed latter one is unsafe, I can't see how the first example is safe in the case.

subhojit777’s picture

Status: Needs work » Reviewed & tested by the community

RE #30:
According to this issue Xss::attributes() should not strip valid RDF metadata atttribute. The first test case will allow foaf:Image attribute, while in the second case bar:baz should be restricted.

subhojit777’s picture

Status: Reviewed & tested by the community » Needs review

oops..

mpdonadio’s picture

#30, just to clarify.

In the code snippet you posed the first dataset tests the functionality in this patch, which is to not strip out RDFa prefixes from element attributes.

The second snippet I added is to ensure that this patch doesn't introduce a regression. In other words, that is how HEAD currently works.

The scope of this issue is to essentially just extend the attribute whitelist to allow RDFa attributes through.

Changing the behavior would be best discussed and done in a different issue, I think.

joelpittet’s picture

Status: Needs review » Needs work
+++ b/core/lib/Drupal/Component/Utility/Xss.php
@@ -319,6 +335,28 @@ protected static function attributes($attributes) {
+      return preg_match('/^[[:alnum:]]+\:[[:alnum:]]+$/', $value) ? $value : UrlHelper::filterBadProtocol($value);

Do you need to escape the result as well? May result in a double escape?

Should be using UrlHelper::stripDangerousProtocols() directly no?

mpdonadio’s picture

Status: Needs work » Needs review

Not sure about #34. That is really just shuffling some existing code around. Read in a bigger context:

This was the change to the attributes method. Since we needed a weird regex to match the RDFa attributes, we introduced a helper function per #21 in #23.

@@ -260,7 +276,7 @@ protected static function attributes($attributes) {
         case 2:
           // Attribute value, a URL after href= for instance.
           if (preg_match('/^"([^"]*)"(\s+|$)/', $attributes, $match)) {
-            $thisval = $skip_protocol_filtering ? $match[1] : UrlHelper::filterBadProtocol($match[1]);
+            $thisval = $skip_protocol_filtering ? $match[1] : static::filterProtocol($attribute_name, $match[1]);
 
             if (!$skip) {
               $attributes_array[] = "$attribute_name=\"$thisval\"";

and then the helper is

+  protected static function filterProtocol($name, $value) {
+    // If the value matches the typical namespace:value pattern used in RDFa,
+    // return it directly. Otherwise, filter it.
+    if (in_array($name, static::$rdfaAttributes)) {
+      return preg_match('/^[[:alnum:]]+\:[[:alnum:]]+$/', $value) ? $value : UrlHelper::filterBadProtocol($value);
+    }
+    else {
+      return UrlHelper::filterBadProtocol($value);
+    }
+  }

So UrlHelper::filterBadProtocol() is what is currently used in HEAD and the patch retains this functionality. Essentially, it whitelists [alnum]:[alnum] attributes and lets them through unscathed (the request in #11). Or are you saying that we should have

return preg_match('/^[[:alnum:]]+\:[[:alnum:]]+$/', $value) ? Html::escape($value) : UrlHelper::filterBadProtocol($value);

I think this will essentially be a no-op since nothing in [alnum] will be escapable.

?

joelpittet’s picture

@mpdonadio I'm pretty sure about #34 but we could add a test to prove it get's double escaped if you want.

Adding some tangentially related issues around attribute values. To consolidate efforts a bit.
#2567743: Add protocol filtering to the core Attribute component
#2531824: Attribute class to check safe strings before escaping (has tests)

catch’s picture

Priority: Normal » Major
mpdonadio’s picture

Assigned: Unassigned » mpdonadio
Status: Needs review » Needs work

Discussed on IRC w/ @joelpittet. Will replace UrlHelper::filterBadProtocol() with UrlHelper::stripDangerousProtocols and add some additional assertions.

catch’s picture

#2567743: Add protocol filtering to the core Attribute component has an ever-increasing whitelist.

We may want to consolidate that somewhere where both classes can use it, and make it extensible at least via Settings.

mpdonadio’s picture

Xss is in \Component. If we consolidate, won't we have to move to \Core and therefore have a BC break?

catch’s picture

We could put all the centralization in Component, although I don't think that can depend on Settings...

Either way if it's tricky we can fix both separately and have a follow-up to consolidate.

mpdonadio’s picture

Assigned: mpdonadio » Unassigned
Status: Needs work » Needs review
StatusFileSize
new5.94 KB
new794 bytes

Here's just the function swap. XssTest still passes locally. Doing a full run to see if there are any side effects in other tests.

Wasn't sure if we need new assertions in XssTest, since we are unit testing here and Twig autoescape would need an integration test. So, do we need to add an integration test for this, or is there enough coverage elsewhere for it?

Re #41, and idea I had earlier on was to expose public static setter/getters for $safeAttributes and $rdfaAttributes. Would that suffice for sharing the attribute list with Attribute, and we could eventually expose that to Settings in a followup?

dawehner’s picture

wim leers’s picture

How does this relate to #2567743: Add protocol filtering to the core Attribute component? It seems like they should be one issue?

dawehner’s picture

How does this relate to #2567743: Add protocol filtering to Attribute? It seems like they should be one issue?

Well, its in different code paths, but they should at least share some common code around which attributes are allowed.

mpdonadio’s picture

dawehner’s picture

https://github.com/yesodweb/haskell-xss-sanitize/blob/master/Text/HTML/S... seems to provide quite a lit of attributes as whitelist.

mpdonadio’s picture

#47, is Yesod using this for sanitization? It amost looks like the list of all valid HTML attributes per the HTML4 Strict DTD. My Haskell is really rusty, but I will try to trace this out later. I can't quite tell is this is just removing attributes not in that list, or whether they get sanitized.

Status: Needs review » Needs work

The last submitted patch, 42: xss_attributes-2544110-42.patch, failed testing.

mpdonadio’s picture

Status: Needs work » Needs review
StatusFileSize
new5.96 KB
gnuget’s picture

+++ b/core/tests/Drupal/Tests/Component/Utility/XssTest.php
@@ -522,6 +522,42 @@ public function providerTestAttributes() {
+      array(

A few lines above in this patch the short syntax is used but here it's using the old syntax.

subhojit777’s picture

The function providerTestAttributes() is written in the old syntax itself.

subhojit777’s picture

Status: Needs review » Reviewed & tested by the community

- Coding standards checked. The only concern is the patch does not uses short array syntax, however the rest of the file core/tests/Drupal/Tests/Component/Utility/XssTest.php uses long array syntax.
- Tests passing.

Moving this to RTBC.

chx’s picture

Status: Reviewed & tested by the community » Needs work

I think the properties should use long array syntax.

But a bigger problem I have is with preg_match('/^[[:alnum:]]+\:[[:alnum:]]+$/', $value). Can you tell me what :alnum: is in the first place? Is it locale aware? Unicode aware? Unicode aware if the tables are compiled? http://www.pcre.org/original/doc/html/pcresyntax.html says "alphanumeric" which tells me absolutely nada. Let's not use it. Let's be unambiguous and say what we have. Note how such is not used anywhere. Literally, you can search the entire codebase, Drupal, vendors, all, it's not there.

subhojit777’s picture

Assigned: Unassigned » subhojit777
Issue tags: +drupalconasia2016
subhojit777’s picture

Status: Needs work » Needs review
StatusFileSize
new5.97 KB
new1.45 KB

I was unable to find [:alnum:] inside core, but I found references and usage of [:alnum:] character class here https://www.hscripts.com/tutorials/regular-expression/character-classes/.... I am not sure whether it's unicode aware. I too was unable to find proper documentation for [:alnum:], in that case I agree we shouldn't use this.

chx’s picture

Status: Needs review » Needs work

Careful there, you left out the initial ^ anchor in the new patch. Also, if you are rerolling anyways, the : doesn't need an escape, it's not a preg special char. Otherwise, if that's what the RDFa standards prescribe, we are probably good to go.

Could you please file a followup to sprinkle this class with final modifiers? Thanks.

chx’s picture

Issue summary: View changes
mpdonadio’s picture

Usage in core aside, [:alnum:] is documented at http://php.net/manual/en/regexp.reference.character-classes.php as a valid class: "letters and digits". I'll let others decide what Unicode / locale implications there are as that is not in my expertise and this behavior is rather poorly documented anyway in the PHP docs (eg, I see the same problem with using \w for "word characters").

chx’s picture

Yes, I know that's what it is but my question was what "letters" are and this is not documented anywhere as far as I can see. And there are only ten \w in core/ and you are more than welcome to nuke them all.

subhojit777’s picture

Status: Needs work » Needs review
StatusFileSize
new5.97 KB
new799 bytes
subhojit777’s picture

@chx If I am not wrong the final modifiers are for $safeAttributes and $rdfaAttributes.

chx’s picture

Status: Needs review » Reviewed & tested by the community

I think this is good to go; let's discuss final in a followup.

catch’s picture

This looks right to me. Leaving RTBC for additional review time.

catch’s picture

Status: Reviewed & tested by the community » Needs review

Reading back, there's still a mis-match here between the allowed attributes here and #2567743: Add protocol filtering to the core Attribute component. I think we should either adopt the longer list from here, or at least document why we haven't.

Version: 8.0.x-dev » 8.1.x-dev

Drupal 8.0.6 was released on April 6 and is the final bugfix release for the Drupal 8.0.x series. Drupal 8.0.x will not receive any further development aside from security fixes. Drupal 8.1.0-rc1 is now available and sites should prepare to update to 8.1.0.

Bug reports should be targeted against the 8.1.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.2.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

chx’s picture

Issue summary: View changes
pcambra’s picture

Version: 8.1.x-dev » 8.2.x-dev
Assigned: subhojit777 » Unassigned

Here's a plain reroll of the patch in #61 so it applies to the latest 8.2.x.

There's another potential issue with the Xss filter and tokens that I've described in this message module issue https://github.com/Gizra/message/issues/85#issuecomment-242653200. When a token pattern such as [node:token] is present in a textarea/textfield, the "[node:'" is removed because I think the Xss filter considers it a protocol. Currently trying to add a test to demonstrate this behavior.

pcambra’s picture

StatusFileSize
new5.82 KB

Status: Needs review » Needs work

The last submitted patch, 69: 2544110-68.patch, failed testing.

pcambra’s picture

Status: Needs work » Needs review

Setting the right status as test pass.

catch’s picture

Issue tags: +Triaged core major

Discussed on a triage call with the other committers and we agreed this should stay major. I really think we need to discuss the allowed list/moving to a blacklist etc. in its own issue somewhere though then apply it everywhere.

catch’s picture

Title: Xss::attributes() strips valid attribute values such as RDF metadata » XSS attribute filtering is inconsistent and strips valid attributes
Parent issue: #2105841: Xss filter() mangles image captions and title/alt/data attributes »

Slightly re-titling, I think we should centralize the whitelist in this issue. Also removing the parent issue since that moved to 7.x long ago.

Version: 8.2.x-dev » 8.3.x-dev

Drupal 8.2.6 was released on February 1, 2017 and is the final full bugfix release for the Drupal 8.2.x series. Drupal 8.2.x will not receive any further development aside from critical and security fixes. Sites should prepare to update to 8.3.0 on April 5, 2017. (Drupal 8.3.0-alpha1 is available for testing.)

Bug reports should be targeted against the 8.3.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.4.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

ao2’s picture

Hi,

can the class attribute also be added to the list of safe attributes? In particular having colons in the class attribute value should be allowed (my use case is basically the same as #372454: Filtered HTML filter modifies class attribute).

Thanks,
Antonio

jofitz’s picture

StatusFileSize
new5.79 KB

Patch no longer applies. Re-rolled.

jaypan’s picture

Hunk #2 of #69 failed for me, though the patch solved the problem I was having. So I've rescheduled a new test.

ao2’s picture

Can someone comment about adding also the class attribute to the list of safe attributes?
I was asking about that in #75.

Thanks,
Antonio

Version: 8.3.x-dev » 8.4.x-dev

Drupal 8.3.6 was released on August 2, 2017 and is the final full bugfix release for the Drupal 8.3.x series. Drupal 8.3.x will not receive any further development aside from critical and security fixes. Sites should prepare to update to 8.4.0 on October 4, 2017. (Drupal 8.4.0-alpha1 is available for testing.)

Bug reports should be targeted against the 8.4.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.5.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.4.x-dev » 8.5.x-dev

Drupal 8.4.4 was released on January 3, 2018 and is the final full bugfix release for the Drupal 8.4.x series. Drupal 8.4.x will not receive any further development aside from critical and security fixes. Sites should prepare to update to 8.5.0 on March 7, 2018. (Drupal 8.5.0-alpha1 is available for testing.)

Bug reports should be targeted against the 8.5.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.6.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

jhodgdon’s picture

This is possibly related to #2944173: CKEditor mangles tokens in URLs, due to bug in Xss::attributes(), and/or the same bug? That has to do with using tokens in attributes.
I wrote a quick test-only patch to XssTest on that issue, which adds the following to the "Normalized" section of the test:

+      // Token-like attributes.
+      [
+        '<img src="[something:like:a:token]">',
+        '<img src="[something:like:a:token]">',
+        'Token-like attributes, no change',
+        ['img'],
+      ],

Hm.... I just tested, and the above patch in #76 does not fix the token bug. So, I guess they are separate bugs in the the same function.

jhodgdon’s picture

Hm. So, rereading this issue and @Les Lim's comment on my related issue, it does seem like they are due to the same problem, that things before : are stripped out. However, the patch here does not fix the token mangling problem.

The problem is that tokens have a format like [piece1:piece2:piece3], and they could appear in any attribute. I don't know if things that look like that are a problem or not for XSS, but since they're a big and important part of Drupal, and Drupal's Editor module passes the HTML through XSS filtering before putting it into an editor (such as CKEditor), it's not OK to mangle things that look like tokens, just because they happen to be in something like an img src attribute, or an a href attribute.

jhodgdon’s picture

I think that this issue and #2105841: Xss filter() mangles image captions and title/alt/data attributes are quite possibly duplicates of each other?

I'll add a comment there as well...

jhodgdon’s picture

See also #2635632-20: Drupal XSS filtering cuts valid attributes, make it configurable / use a blacklist, where I have tried to summarize all the mangled attribute issues and ask whether we should solve them together or apart.

jhodgdon’s picture

In case we are going to mark this issue as a duplicate of #2635632: Drupal XSS filtering cuts valid attributes, make it configurable / use a blacklist, I just made a test-only patch on comment #22 there that incorporates (I think) the test cases here.

jhodgdon’s picture

Disregard #84. That issue was fixed for 8.x back in 8.0.x and is only open for 7.x at this point.

However, there's still a question about #86 -- we might combine all the open "XSS filtering is mangling attributes" issues into one.

Version: 8.5.x-dev » 8.6.x-dev

Drupal 8.5.6 was released on August 1, 2018 and is the final bugfix release for the Drupal 8.5.x series. Drupal 8.5.x will not receive any further development aside from security fixes. Sites should prepare to update to 8.6.0 on September 5, 2018. (Drupal 8.6.0-rc1 is available for testing.)

Bug reports should be targeted against the 8.6.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.7.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.6.x-dev » 8.8.x-dev

Drupal 8.6.x will not receive any further development aside from security fixes. Bug reports should be targeted against the 8.8.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.9.x-dev branch. For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

tedfordgif’s picture

StatusFileSize
new452 bytes

For those looking to mark the class attribute safe, here's a patch.

tedfordgif’s picture

tedfordgif’s picture

tedfordgif’s picture

If you're whitelisting the class attribute, it probably makes sense to allow an expanded set of characters in the styles combo for the editor config. The attached patch works for my use case, but is clearly not the right approach for everyone.

markhalliwell’s picture

The real latest patch is #76, recent patches have ignored previous work.

This is an issue we've run into over at #3131224-8: CommonMark 3rd-party Extension: Footnotes.

HTML5 allows pretty much any character for classes and IDs, including colons:
https://mathiasbynens.be/notes/html5-id-class

This isn't just about "RDF" values anymore, but being more HTML5 compliant.

I'm all for stripping bad javascript when it's warranted, but blanket filtering anything that remotely resembles a "protocol" seems like overkill now.

markhalliwell’s picture

Version: 8.8.x-dev » 8.9.x-dev

Drupal 8.8.7 was released on June 3, 2020 and is the final full bugfix release for the Drupal 8.8.x series. Drupal 8.8.x will not receive any further development aside from security fixes. Sites should prepare to update to Drupal 8.9.0 or Drupal 9.0.0 for ongoing support.

Bug reports should be targeted against the 8.9.x-dev branch from now on, and new development or disruptive changes should be targeted against the 9.1.x-dev branch. For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

fabianx’s picture

For those searching for workarounds:

https://www.drupal.org/project/drupal/issues/2944173#comment-13786061

TL;DR:

- Create a filter before Filtered HTML that converts e.g. ":" in "class" to "!"
- Run Filtered HTML
- Convert back

Need to ensure to implement that securely, but that is the first approach that could be made into a contrib module.

fabianx’s picture

Version: 8.9.x-dev » 9.2.x-dev
Issue tags: +Needs reroll
gaards’s picture

StatusFileSize
new5.72 KB

Re-rolled patch from #76

sadikyalcin’s picture

Having an option to whitelist protocols would be great. We use deeplinks all over the net nowadays but Drupal doesn't let us use them!

What is insecure about the following?

<a href="myapp://home">click to launch app</a>

Version: 9.2.x-dev » 9.3.x-dev

Drupal 9.2.0-alpha1 will be released the week of May 3, 2021, which means new developments and disruptive changes should now be targeted for the 9.3.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

marc.groth’s picture

FWIW it is possible to whitelist any anchor link protocol using 'filter_protocols' in services.yml

We were having a similar issue to https://drupal.stackexchange.com/questions/281894/problem-with-text-form... and adding the 'data' protocol fixed it (i.e. it was no longer stripped). YMMV

lolcode’s picture

Thank you #103! I was trying to implement Tailwind CSS which uses colons in variant names and they were getting stripped despite being whitelisted in my CKEditor config. The Stackexchange solution worked.

yogeshmpawar’s picture

Issue tags: -Needs reroll

Removing Needs reroll tag as it is no longer needed.

Version: 9.3.x-dev » 9.4.x-dev

Drupal 9.3.0-rc1 was released on November 26, 2021, which means new developments and disruptive changes should now be targeted for the 9.4.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

chalk’s picture

I've described my case with Tailwind CSS here: #3252587: Extend the $skip_protocol_filtering list of attributes to use Tailwind CSS classes with prefix ":". Seems it's a bit related.

les lim’s picture

Issue summary: View changes
amber himes matz’s picture

I dug into whether #2944173: CKEditor mangles tokens in URLs, due to bug in Xss::attributes() could be closed as a duplicate of this one. (See comment #2944173-74: CKEditor mangles tokens in URLs, due to bug in Xss::attributes().) What I found was that they are both intent on modifying Drupal\Component\Utility\XSS's filtering in order to prevent XSS class from filtering out/mangling data from 2 different kinds of contexts:

  1. #2944173: CKEditor mangles tokens in URLs, due to bug in Xss::attributes() - Tokens being used inside href or src attributes, e.g. <a href="[node:url:absolute]">link</a>
  2. in a body field that uses CKEditor and enables "Limit allowed HTML tags and correct faulty HTML" in the text format configuration.

  3. This issue - Valid attribute name/value combinations that provide RDF metadata, e.g. rel="schema:author" or property="foaf:name"

Right now, as per the scope described in each issue summary, they are not duplicates, but if the scope was widened to consider both use cases/contexts, they could be combined into 1 issue, ensuring a compatible solution. Considering that they're both dealing with XSS class, that might be desirable.

As it stands, the patches in each issue do not solve the other's problem:

  1. The patch in #2944173-69: CKEditor mangles tokens in URLs, due to bug in Xss::attributes() is saying, “hey XSS::attributes, allow tokens in this hardcoded safelist of attributes (src, href)“.
  2. The patch in #2544110-100: XSS attribute filtering is inconsistent and strips valid attributes is saying, “hey everyone in the XSS class, we’re gonna allow this safelist of attributes (alt, title) to bypass XSS::filter. And btw, protect these rdfaAttributes too.”

We do NOT want to allow href or src to bypass filtering altogether I would think! Which is what would happen if we used the attribute safelist approach here and just added in href and src. In this issue, we just want it to allow tokens and not mangle them up. Assuming that tokens aren’t an XSS vector (which is an open question for discussion in #2944173-65: CKEditor mangles tokens in URLs, due to bug in Xss::attributes().

Could we combine these issues and their solutions? Maybe it would make sense to do so, to ensure a compatible solution that solves both problems (href, src token mangling, and RDFa Attribute handling in XSS class). We would need to increase the scope in the one that stays open. Is that worth doing? What do you think?

I'll be keeping tabs on discussions in both issues with the goal of deciding whether they should be combined into 1. Thank you!

Version: 9.4.x-dev » 9.5.x-dev

Drupal 9.4.0-alpha1 was released on May 6, 2022, which means new developments and disruptive changes should now be targeted for the 9.5.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

andrea.cividini’s picture

Tried applying the latest patch in 9.4.1 PHP 8.1 and it applies and works correctly, but I am still struggling with `class` attribute.

Using Tailwind CSS many classes have `:` char in them and even with the current patch, values are still getting stripped incorrectly.

This :
class="button flex items-center justify-center font-bold inline-flex cursor-pointer focus:outline-none transition duration-200 rounded-full group disabled:opacity-50 disabled:pointer-events-none text-s button--tertiary flex-row

becomes:

class="pointer-events-none text-s button--tertiary flex-row

but I'm not sure if I would add the `class` attribute to the $safeAttributes list for this purpose, what do you think?

andrea.cividini’s picture

@cilefen Thanks for pointing this out! I wasn't able to find this issue while searching in the issue list, works flawlessly on my instance. Thanks again!

smustgrave’s picture

With the addition of https://www.drupal.org/project/drupal/issues/3252587 I'm no longer finding this to be an issue. Tested with rel="schema:author" and confirmed nothing was stripped out.

+1 for closing out

cilefen’s picture

Status: Needs review » Closed (duplicate)
catch’s picture

Status: Closed (duplicate) » Needs review

That's not an exact duplicate, rel was already in the allow list before that issue landed, and that issue added class. But the more general problem that other valid attributes get stripped and there's no way to control this remains.

catch’s picture

This came up in the bug smash channel, and discussion in there gave me a bit of an idea.

Currently we have two conflicting approaches:

With the core status quo, we maintain an allowlist of attributes that don't get protocol filtering, and add to that list when it turns out someone needs to add a protocol-like attribute value in there as we recently did with class. The problem with this is that for each new use case, it's broken until you get a core patch committed - and I found a reference to adding 'class' at least five years ago...

At various times there's been a suggestion to switch this to an exclude list, for example see discussion between @alexpott and @chx seven years ago #2635632-3: Drupal XSS filtering cuts valid attributes, make it configurable / use a blacklist. The problem with an exclude list only, is that if something is missing from the list, and HTML could add a new URL-taking attribute at any time, then you've got a core security issue.

What if we maintain both lists?

If something explicitly doesn't get protocol filtering, we don't filter protocols - same as now.

If something explicitly does get protocol filtering, we silently filter protocols - assuming they're from bad user data and should be stripped.

If an attribute isn't specified in either list, we noisily filter protocols - i.e. we check the value of the attribute before and after filtering, and if something protocol-like has been stripped, trigger a PHP notice with a link to documentation - this could suggest opening a core issue, adding to a configurable list if we make it configurable etc.

smustgrave’s picture

StatusFileSize
new7.02 KB
new14.52 KB

This is a WIP patch just trying to get some feedback if I'm going in the right direction. I'm pulling from here, https://www.drupal.org/project/drupal/issues/2635632 and https://www.drupal.org/project/drupal/issues/2944173

Trying to address the comment #117

Still need to think about how to address

f something protocol-like has been stripped, trigger a PHP notice with a link to documentation - this could suggest opening a core issue, adding to a configurable list if we make it configurable etc

Pulled tests from all the tickets so tokens, rdf attributes, media, should be working.

The last submitted patch, 118: 2544110-118-WIP-tests-only.patch, failed testing. View results

Status: Needs review » Needs work

The last submitted patch, 118: 2544110-118-WIP.patch, failed testing. View results

catch’s picture

  1. +++ b/core/lib/Drupal/Component/Utility/Xss.php
    @@ -29,6 +29,48 @@ class Xss {
    +   * @see http://www.w3.org/TR/xhtml-rdfa/
    +   */
    +  protected static $rdfaAttributes = [
    +    'property',
    

    I think this could just be merged into safeAttributes()

  2. +++ b/core/lib/Drupal/Component/Utility/Xss.php
    @@ -267,7 +310,16 @@ protected static function attributes($attributes) {
               if (preg_match('/^"([^"]*)"(\s+|$)/', $attributes, $match)) {
    -            $value = $skip_protocol_filtering ? $match[1] : UrlHelper::filterBadProtocol($match[1]);
    +            if ($enforce_protocol_filtering) {
    +              $value = static::filterProtocol($attribute_name, $match[1]);
    +            }
    +            elseif ($skip_protocol_filtering) {
    +              $value = htmlentities(strip_tags(html_entity_decode($match[1])), ENT_NOQUOTES);
    +            }
    +            else {
    +              // todo logger
    +              $value = static::filterProtocol($attribute_name, $match[1]);
    +            }
    

    Why not using UrlHelper::filterBadProtocol() if there's protocol filtering, and nothing if there's not? I think it would at least be good to start with the safe/unsafe lists + logging for changed values in neither and then evaluate whether we need more changes after that.

smustgrave’s picture

Was trying to think how to cover tokens. [node:id] is getting sent through the protocol breaking the token.

catch’s picture

#2944173: CKEditor mangles tokens in URLs, due to bug in Xss::attributes() is already open looking at tokens, and I don't think we should try to fix that here. It might need something like supporting an alternative token format since the colon is meaningful in too many places.

smustgrave’s picture

I'll revert those changes. And push a new patch in the morning.

smustgrave’s picture

Status: Needs work » Needs review
StatusFileSize
new5.66 KB
new8.11 KB
new12.61 KB

Removed the token code.

@catch following the original patch of this ticket I believe rdf tokens need to go through a preg_match call first. Could be wrong about that.

We are still using UrlHelper::filterBadProtocol() just it goes through an internal function first to determine if its an rdf attribute for like above.

Uploaded a new tests-only patch with several examples. Removed some of the examples from the original patch as those issues are being tracked in separate tickets.

The last submitted patch, 125: 2544110-125-tests-only.patch, failed testing. View results

Status: Needs review » Needs work

The last submitted patch, 125: 2544110-125.patch, failed testing. View results

smustgrave’s picture

StatusFileSize
new10.83 KB

This patch contains no longer. Should clean up some of those errors.

smustgrave’s picture

StatusFileSize
new1.36 KB
new10.65 KB

This one may be a little better

smustgrave’s picture

Status: Needs work » Needs review
smustgrave’s picture

Trying to keep to the front of the issue queue. Don’t want to get lost in the shuffle

quietone’s picture

Status: Needs review » Needs work
Issue tags: +Needs issue summary update

Always good to see an issue tagged Security making progress. Thanks!

The success patch in #125 has more failures that than fail patch, some appear to be random failures. However, the test being change here, Drupal\Tests\Component\Utility\XssTest, fails in both patches. I think we should have a new fail/success pair of patches.

+++ b/core/tests/Drupal/Tests/Component/Utility/XssTest.php
@@ -6,6 +6,7 @@
+use Prophecy\PhpUnit\ProphecyTrait;

@@ -26,6 +27,8 @@
+  use ProphecyTrait;
+

I had to remove these lines in order to run the test locally. Running the test locally with PHPUnit 8.5.26 fails with

phpunit -c core --debug -v --colors=always --filter testAttribute core/tests/Drupal/Tests/Component/Utility/XssTest.php
PHP Fatal error:  Trait 'Prophecy\PhpUnit\ProphecyTrait' not found in /var/www/html/core/tests/Drupal/Tests/Component/Utility/XssTest.php on line 28

The issue summary has no proposed resolution or remaining tasks, tagging for that.

I have not reviewed the patch.

smustgrave’s picture

Issue summary: View changes
Status: Needs work » Needs review
Issue tags: -Needs issue summary update

@quietone updated IS.

#129 is the patch I got working trying to go off of #117

Looking at the referenced tickets.

Can we consolidate work from https://www.drupal.org/project/drupal/issues/2635632 into here?

Can we close this as a duplicate https://www.drupal.org/project/drupal/issues/2567743 ?

Version: 9.5.x-dev » 10.1.x-dev

Drupal 9.5.0-beta2 and Drupal 10.0.0-beta2 were released on September 29, 2022, which means new developments and disruptive changes should now be targeted for the 10.1.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

catch’s picture

Status: Needs review » Needs work
Issue tags: +Bug Smash Initiative
  1. +++ b/core/lib/Drupal/Component/Utility/Xss.php
    @@ -29,6 +29,48 @@ class Xss {
    +   * @see \Drupal\Component\Utility\Xss::filter()
    +   */
    +  protected static $safeAttributes = ['alt', 'title', 'class', 'media'];
    +
    +  /**
    

    #2544110-65: XSS attribute filtering is inconsistent and strips valid attributes (six years ago!) asked to adopt the longer list of safe attributes from the patch in #2567743: Add protocol filtering to the core Attribute component. I still think we should do that. Also to allow that list to be used in Attribute, can we make this a constant instead?

  2. +++ b/core/lib/Drupal/Component/Utility/Xss.php
    @@ -29,6 +29,48 @@ class Xss {
    +
    +  /**
    +   * The default list of RDFa attributes untouched by filter().
    +   *
    +   * @var array
    +   *
    +   * @see \Drupal\Component\Utility\Xss::filter()
    +   * @see http://www.w3.org/TR/xhtml-rdfa/
    +   */
    +  protected static $rdfaAttributes = [
    +    'property',
    +    'typeof',
    +    'rel',
    +    'rev',
    +    'datatype',
    +  ];
    +
    

    I still wonder if we really need the separate list of RDFa attributes on top of safe and unsafe, but I think we should try that in a follow-up, removing it is going to result in subtle behaviour changes which might or might not be OK, so should be handled separately.

In #2544110-117: XSS attribute filtering is inconsistent and strips valid attributes I suggested triggering an error if an attribute doesn't appear in any lists, I still think we should do that but also in its own follow-up issue to try to avoid scope creep here.

Needs work - just for the longer list of allowed attributes.

smustgrave’s picture

Status: Needs work » Needs review
StatusFileSize
new10.92 KB
new11.26 KB

Pulled the attribute list from patch https://www.drupal.org/files/issues/2567743-58.patch
Can we close #2567743: Add protocol filtering to the core Attribute component for this one?

Also added class to the safeAttribute (thought it was already).

And update the skip_protocol check.

Are you asking it to be made a constant in the Attributes.php file?

If we removed rdfAttributes should they be added to the safe or unsafe list? Will most like require updating the tests.

Opened #3324862: Follow up on 14614509: Trigger error when attribute is not in list for followup to trigger warning

Since there are so many filter tickets open hoping when 1 lands it may loosen or cover others. but could be wishful thinking.

smustgrave’s picture

Closed #2635632: Drupal XSS filtering cuts valid attributes, make it configurable / use a blacklist as a duplicate as I believe the work here will cover that ticket too.

catch’s picture

Can we close #2567743: Add protocol filtering to Attribute for this one?

That's specifically adding filtering support to the Attribute class, so not a duplicate, but it should hopefully be simplified by this issue.

Are you asking it to be made a constant in the Attributes.php file?

No in the Filter class, like Filter::XSS_SAFE_ATTRIBUTES, just instead of protected static. But.. we could also do that in #2567743: Add protocol filtering to the core Attribute component if and when we need it to be accessible, it doesn't need to be here, so can skip adding the API surface.

If we removed rdfAttributes should they be added to the safe or unsafe list? Will most like require updating the tests.

The safe list.. but I don't know if that's actually 100% safe or not, so should be a follow-up too.

smustgrave’s picture

If we moved them to safe then things like

typeof="foaf:bad////value

property="javascript:alert(0);"

Would be allowed.

Status: Needs review » Needs work

The last submitted patch, 136: 2544110-136.patch, failed testing. View results

smustgrave’s picture

I do like the idea of consolidating but not sure how to handle scenario like those above.

catch’s picture

Apart from the test failure, #136 is looking good to me.

smustgrave’s picture

So those failures in #136 along the same lines

Content is a safe variable

The value is 0;url=javascript:alert(\'XSS\');

So since it's safe is that fine to leave as is?

catch’s picture

Yes I think we can change the test expectation for that one, content can contain any arbitrary text and won't get parsed as a URL or anything.

smustgrave’s picture

Status: Needs work » Needs review
StatusFileSize
new4.96 KB
new12.88 KB

In that case was able to combine rdfattributes.

smustgrave’s picture

lazysoundsystem’s picture

StatusFileSize
new12.88 KB

The last patch is now a few lines off

Checking patch core/lib/Drupal/Component/Utility/Xss.php...
Checking patch core/modules/editor/tests/src/Unit/EditorXssFilter/StandardTest.php...
Hunk #1 succeeded at 403 (offset 8 lines).
Checking patch core/tests/Drupal/Tests/Component/Utility/XssTest.php...
Hunk #3 succeeded at 538 (offset 10 lines).
Applied patch core/lib/Drupal/Component/Utility/Xss.php cleanly.
Applied patch core/modules/editor/tests/src/Unit/EditorXssFilter/StandardTest.php cleanly.
Applied patch core/tests/Drupal/Tests/Component/Utility/XssTest.php cleanly.

Updated patch changes the line numbers and nothing else.

smustgrave’s picture

Just checked #146 and it applied fine to 10.1

smustgrave’s picture

StatusFileSize
new10.09 KB
new12.62 KB

Rerolled

error: patch failed: core/lib/Drupal/Component/Utility/Xss.php:267
error: core/lib/Drupal/Component/Utility/Xss.php: patch does not apply
error: patch failed: core/tests/Drupal/Tests/Component/Utility/XssTest.php:525
error: core/tests/Drupal/Tests/Component/Utility/XssTest.php: patch does not apply

larowlan’s picture

Status: Needs review » Needs work
  1. +++ b/core/lib/Drupal/Component/Utility/Xss.php
    @@ -29,6 +29,50 @@ class Xss {
    +   * The default list of safe attributes untouched by filter().
    

    nit - untouched => not sanitized ?

  2. +++ b/core/lib/Drupal/Component/Utility/Xss.php
    @@ -29,6 +29,50 @@ class Xss {
    +   * The default list of Unsafe attributes that must be touched by filter().
    

    ubernit - touched => sanitized ?

  3. +++ b/core/lib/Drupal/Component/Utility/Xss.php
    @@ -232,13 +277,13 @@ protected static function attributes($attributes) {
    +              substr($attribute_name, 0, 3) === 'ng-' ||
    

    Can we get a comment for this, I assume its something angular related

  4. +++ b/core/lib/Drupal/Component/Utility/Xss.php
    @@ -271,7 +316,15 @@ protected static function attributes($attributes) {
    +            if ($enforce_protocol_filtering) {
    +              $value = static::filterProtocol($attribute_name, $match[1]);
    +            }
    +            elseif ($skip_protocol_filtering) {
    +              $value = $match[1];
    +            }
    +            else {
    +              $value = static::filterProtocol($attribute_name, $match[1]);
    +            }
    

    I think we can simplify this to

    if ($enforce_protocol_filtering || !$skip_protocol_filtering) {
      // filter here
    }
    else {
      // skip here
    }
    
    

    applies in all three instances of this new code

  5. +++ b/core/lib/Drupal/Component/Utility/Xss.php
    @@ -340,6 +409,26 @@ protected static function needsRemoval($html_tags, $elem) {
    +  protected static function filterProtocol($name, $value) {
    

    We can typehint here, both for arguments and return type

  6. +++ b/core/lib/Drupal/Component/Utility/Xss.php
    @@ -340,6 +409,26 @@ protected static function needsRemoval($html_tags, $elem) {
    +    if (in_array($name, static::$safeAttributes)) {
    +      return preg_match('/^[a-zA-Z0-9]+\:[a-zA-Z0-9]+$/', $value) ? $value : UrlHelper::stripDangerousProtocols($value);
    

    can we get a comment here - its a safe attribute, but we're stripping something if a regex matches?

    could we document the regex a bit like we do in \Drupal\Component\Utility\Xss::filter where we have this

    // Strip any tags that are not in the list of allowed html tags.
        return preg_replace_callback('%
          (
          <(?=[^a-zA-Z!/])  # a lone <
          |                 # or
          <!--.*?-->        # a comment
          |                 # or
          <[^>]*(>|$)       # a string that starts with a <, up until the > or the end of the string
          |                 # or
          >                 # just a >
          )%x', $splitter, $string);
    

    Just as a courtesy to our future selves

  7. +++ b/core/modules/editor/tests/src/Unit/EditorXssFilter/StandardTest.php
    @@ -395,15 +395,15 @@ public function providerTestFilterXss() {
    +    $data[] = ['<META HTTP-EQUIV="refresh" CONTENT="0;url=javascript:alert(\'XSS\');">', '<META http-equiv="refresh" content="0;url=javascript:alert(\'XSS\');">'];
    

    Whilst content is not executed, for a http-equiv="refresh" it is

    I tested this in Chrome and it was blocked by the browser, but it was executed - so I think we may need to handle this differently

  8. +++ b/core/modules/editor/tests/src/Unit/EditorXssFilter/StandardTest.php
    @@ -395,15 +395,15 @@ public function providerTestFilterXss() {
    +    $data[] = ['<META HTTP-EQUIV="refresh" CONTENT="0;url=data:text/html base64,PHNjcmlwdD5hbGVydCgnWFNTJyk8L3NjcmlwdD4K">', '<META http-equiv="refresh" content="0;url=data:text/html base64,PHNjcmlwdD5hbGVydCgnWFNTJyk8L3NjcmlwdD4K">'];
    

    Same here, the browser evaluated it but blocked it in Firefox and Chrome

  9. +++ b/core/tests/Drupal/Tests/Component/Utility/XssTest.php
    @@ -543,6 +546,104 @@ public function providerTestAttributes() {
    +        '<h2 property="javascript:alert(0);">The Title</h2>',
    

    In my testing this was safe, browsers ignored it

  10. +++ b/core/tests/Drupal/Tests/Component/Utility/XssTest.php
    @@ -543,6 +546,104 @@ public function providerTestAttributes() {
    +        '<section class="actions-menu-inner" ng-style="{ \'max-height\': maxPanelHeight }" ed-pretty-scrollbar ed-scroll-axis="y" ed-scroll-theme="light"></section>',
    

    shouldn't the ed- prefixed attributes be removed?

  11. +++ b/core/tests/Drupal/Tests/Component/Utility/XssTest.php
    @@ -543,6 +546,104 @@ public function providerTestAttributes() {
    +        '<img src="http://example.com/foo.jpg" typeof="foaf:Image">',
    +        '<img src="http://example.com/foo.jpg" typeof="foaf:Image">',
    +        'Image tag with RDFa with namespaced attribute',
    +        ['img'],
    +      ],
    +      [
    +        '<img src="http://example.com/foo.jpg" typeof="foaf:bad////value">',
    +        '<img src="http://example.com/foo.jpg" typeof="foaf:bad////value">',
    +        'Image tag with RDFa with bad with namespaced attribute',
    +        ['img'],
    +      ],
    +      [
    +        '<img src="http://example.com/foo.jpg" foo="bar:baz">',
    +        '<img src="http://example.com/foo.jpg" foo="baz">',
    +        'Image tag with non-RDFa attribute',
    +        ['img'],
    +      ],
    +      [
    +        '<h2 property="title">The Title</h2>',
    +        '<h2 property="title">The Title</h2>',
    +        'H2 tag with RDFa attribute without namespace',
    +        ['h2'],
    +      ],
    +      [
    +        '<h2 property="http://purl.org/dc/terms/title">The Title</h2>',
    +        '<h2 property="http://purl.org/dc/terms/title">The Title</h2>',
    +        'H2 tag with RDFa attribute with URL',
    +        ['h2'],
    +      ],
    +      [
    +        '<h2 property="javascript:alert(0);">The Title</h2>',
    +        '<h2 property="javascript:alert(0);">The Title</h2>',
    +        'H2 tag with RDFa attribute with XSS',
    +        ['h2'],
    

    these look to be duplicated in the test cases?

smustgrave’s picture

Assigned: Unassigned » smustgrave

Will work on this tomorrow. If I don’t post anything in 3 days feel free to remove my name and work on.

smustgrave’s picture

Status: Needs work » Needs review
StatusFileSize
new0 bytes
new11.56 KB

#152

1 = Fixed
2 = Fixed
3 = Added comment
4 = Updated
5 = Fixed
6 = tried my best
7 = This was an existing test, only change I made was for javascript:alert(\'XSS\'); as content was marked a safe attribute
8 = This was an existing test, only change I made was for javascript:alert(\'XSS\'); as content was marked a safe attribute
9 = should it be removed?
10 = removed
11 = removed

smustgrave’s picture

StatusFileSize
new6.44 KB
smustgrave’s picture

StatusFileSize
new2.06 KB
new9.59 KB

Per a discussion with @larowlan removing "content" as one of the safe attributes and opened #3352540: Investigate allowing "content" as allowed tag

Updated patch

Status: Needs review » Needs work

The last submitted patch, 156: 2544110-156.patch, failed testing. View results

larowlan’s picture

Assigned: smustgrave » Unassigned
  1. +++ b/core/lib/Drupal/Component/Utility/Xss.php
    @@ -29,6 +29,49 @@ class Xss {
    +    'rel',
    +    'rev',
    +    'about',
    +    'datatype',
    +    'datatype_callback',
    +    'datetime',
    +    'mailto',
    +    'media',
    +    'sizes',
    

    We don't have test coverage for all of these attributes - should we add it here? I realise there's some overlap with #2692451: Xss::filterAdmin() incorrectly filters datetime attribute

  2. +++ b/core/lib/Drupal/Component/Utility/Xss.php
    @@ -229,16 +273,16 @@ protected static function attributes($attributes) {
    +              substr($attribute_name, 0, 3) === 'ng-' ||
    

    I think this was missed from my earlier review, can we get a comment regarding the significance of `ng-` - e.g. relates to Angular (I think?)

smustgrave’s picture

Status: Needs work » Needs review
StatusFileSize
new1.67 KB
new10.72 KB

rel = is already included
rev = tests added
datetime = test added
mailto = test added
media = test added
sizes = test added

Removed some as I couldn't find examples
about = Removed
datatype = Removed
datatype_callback = Removed

#158.2 added to comment block in earlier patch.

catch’s picture

Issue summary: View changes

Opened #3353173: Considering making the XSS safe/unsafe attribute lists extendable as an additional follow-up, took the RDF list out of the issue summary since that's no long in the patch.

heddn’s picture

Status: Needs review » Needs work
heddn’s picture

Status: Needs work » Needs review
StatusFileSize
new10.76 KB
larowlan’s picture

Status: Needs review » Needs work

Getting close, couple of questions and looks like at least one test case has the wrong label

  1. +++ b/core/tests/Drupal/Tests/Component/Utility/XssTest.php
    @@ -507,6 +510,36 @@ public function providerTestAttributes() {
    +        'Link tag with rev attribute',
    

    I don't see a rev attribute here

  2. +++ b/core/tests/Drupal/Tests/Component/Utility/XssTest.php
    @@ -507,6 +510,36 @@ public function providerTestAttributes() {
    +        'Link tag with mailto href',
    

    ah here is is, I think these titles are out of order

  3. +++ b/core/tests/Drupal/Tests/Component/Utility/XssTest.php
    @@ -507,6 +510,36 @@ public function providerTestAttributes() {
    +        '<a media="print and (resolution:300dpi)">Drupal</a>',
    +        '<a media="print and (resolution:300dpi)">Drupal</a>',
    +        'Link tag with media attribute',
    +        ['a'],
    +      ],
    +      [
    +        '<a sizes="16x16">Drupal</a>',
    +        '<a sizes="16x16">Drupal</a>',
    

    are sizes and media valid attributes for an a tag? should we use img here or picture or something that reflects supported attributes?

  4. +++ b/core/tests/Drupal/Tests/Component/Utility/XssTest.php
    @@ -507,6 +510,36 @@ public function providerTestAttributes() {
    +        '<time datetime="2017-02-14">',
    +        '<time datetime="2017-02-14">',
    +        'Time with datetime attribute',
    

    should we add a test-case with a : or did the related issue already resolve that? if so do we need this test case at all anymore?

  5. +++ b/core/tests/Drupal/Tests/Component/Utility/XssTest.php
    @@ -561,6 +594,68 @@ public function providerTestAttributes() {
    +        '<img src="http://example.com/foo.jpg" typeof="foaf:bad////value">',
    +        '<img src="http://example.com/foo.jpg" typeof="foaf:bad////value">',
    +        'Image tag with RDFa with bad with namespaced attribute',
    +        ['img'],
    +      ],
    +      [
    +        '<img src="http://example.com/foo.jpg" foo="bar:baz">',
    +        '<img src="http://example.com/foo.jpg" foo="baz">',
    +        'Image tag with non-RDFa attribute',
    

    I don't understand why one of these is stripped but the other isn't?

Version: 10.1.x-dev » 11.x-dev

Drupal core is moving towards using a “main” branch. As an interim step, a new 11.x branch has been opened, as Drupal.org infrastructure cannot currently fully support a branch named main. New developments and disruptive changes should now be targeted for the 11.x branch, which currently accepts only minor-version allowed changes. For more information, see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

donquixote’s picture

Not only attribute values are damaged, also attribute names.
Attribute aaa:bbb="X" becomes bbb="X".
Attribute aaa_bbb="X" becomes bbb="X".

E.g. try Xss::filter('<div aaa:bbb="AAA-BBB" bbb="BBB">&nbsp;</div>', Xss::getAdminTagList()).
This leads to a duplicate attribute `bbb=`.

You can argue that these are silly or invalid attribute names.
If this is the concern, the correct behavior would be to remove the attribute, _not_ to change the attribute name.

Interesting discussion: "What characters are allowed in an HTML attribute name?", https://stackoverflow.com/a/53563849/246724

Some things to consider.

  • Custom html tags can have any custom attributes.
  • Browsers typically accept custom attribute names even on basic html tags.
  • Some legacy or poorly designed 3rd party systems might rely on poorly named attributes.
    In fact this is the case for me in a current project.

The culprit is definitely the attribute splitter function.
But to fix it, we need to define the desired behavior.

donquixote’s picture

Review:

            if ($enforce_protocol_filtering || !$skip_protocol_filtering) {

These two variables always occur in this combination.
They can be combined into a single variable.

            $filter_protocols = in_array($attribute_name, static::$unSafeAttributes)
              || !(str_starts_with($attribute_name, 'data-')
                || str_starts_with($attribute_name, 'ng-')
                || in_array($attribute_name, static::$safeAttributes)
              );
[...]
            if ($filter_protocols) {

We also see that, with their default values, static::$unSafeAttributes has no effect whatsoever.
It would only have an effect if somebody adds a "data-" or "ng-" attribute to the unsafe list.

I wonder, do we support people modifying these class properties? Or could we use constants instead?

donquixote’s picture

donquixote’s picture

I wonder, do we support people modifying these class properties? Or could we use constants instead?

If we support them being modified, I would rather turn all the static methods to object methods, so that you can have different instances with different configuration.
For static methods, I would expect them to behave the same at all times.

donquixote’s picture

Also,

  protected static function filterProtocol(string $name, string $value): string {
    // If the attribute is a safe attribute check that it doesn't contain a
    // protocol. If it does call UrlHelper::stripDangerousProtocols().
    if (in_array($name, static::$safeAttributes)) {

We now have two places where we compare the attribute name to a list:
One time in ::attributes() with case 0, another time in ::filterProtocol().
This can be collapsed to just one place.
Instead of passing the name to ::filterProtocol(), we can pass a filter mode.

But maybe the name parameter in ::filterProtocol() is meant for subclasses that want custom logic to determine the filter mode?
Do we care about inheritance?

sudishth made their first commit to this issue’s fork.

s_leu made their first commit to this issue’s fork.

neclimdul’s picture

despite several requests for comment, its still not clear why we need a special case for "ng-"

If we do need to special case it, hard coding it like this in its current location has a smell and suggests to me we haven't succeeded in making this "consistent" if such a special case for a framework is required.

avpaderno’s picture

Issue summary: View changes
tommyk’s picture

While I haven't read every reply, I didn't see any results when I searched for ARIA in this issue, so adding my use case.

I got here because I'm trying to manipulate Views output to create accessible Read More links like described at https://www.w3.org/WAI/WCAG21/Techniques/aria/ARIA8 wherein I am adding an aria-label attribute with a token like <a href="/node/{{ nid }}" aria-label="Read more about {{ title }}">Read More</a> where the title in this case contains a colon. Everything in the ARIA Label up to and including the colon in the title is stripped.

quietone’s picture

Hide files

joao.ramos.costa’s picture

HI @ tommyk,

indeed. Got here by the same reason. The aria-* attributes may be 'whitelisted' as for the same reason 'data-' attributes are and perhaps a little more than 'ng-'.

onnia’s picture

Hi,
I could not find 10.4.x compatible patch for commit 3cf003db
so I made my own, I hope someone else finds this useful.

prudloff made their first commit to this issue’s fork.

prudloff changed the visibility of the branch 11.x to hidden.

prudloff’s picture

I don't think we should hardcode a special condition for Angular attributes (or other use cases), a better solution would be to provide a way to extend the list of skipped attributes.

Version: 11.x-dev » main

Drupal core is now using the main branch as the primary development branch. New developments and disruptive changes should now be targeted to the main branch.

Read more in the announcement.

ehlovader made their first commit to this issue’s fork.

graber made their first commit to this issue’s fork.