Problem/Motivation
The current filtering in \Drupal\Component\Utility\Xss::filter()
and \Drupal\Component\Utility\Xss::attributes()
has 2 issues related to how we handle data-* attributes:
XSS attack vector (confirmed on 8.0.0-beta9).
Image captions and other data-* attributes are stored HTML-encoded. This causes them to bypass our current XSS filtering logic. When viewing content in the editor, the data-attribute's content is decoded and could be evaluated, leading to an XSS vulnerability when editing content.
To reproduce, create a node as a regular user, add an image with caption "<script>alert('xss');</script>
" and save the node. Edit the node as editor, and the script will be executed when the CKEditor-interface is loaded.
Upgrading to critical because of this vulnerability.
Captions or data-attribute values are mangled.
When adding captions to images through the editor, content is stripped incorrectly. Some examples:
Actual caption in editor | Rendered caption | |
---|---|---|
(http://drupal.org) | //drupal.org) | |
Source:reuters | reuters | |
Example use of the Scope Resolution Operator (::) | :) |
(more examples and screenshots in the first 20 comments of this issue).
Proposed resolution
The initial approach of altering stripDangerousProtocols in #10 was rejected because it proved too difficult and error-prone.
In #22, chx suggested to introduce the concept of whitelisted "harmless"-attributes (such as data-caption), for which no bad protocol filtering is executed. This approach was signed off on by Damien Tournoud in #32
Additionally, encoded data-attributes should be filtered for XSS to prevent attacks through the editor.
Remaining tasks
- Fix remaining test failure
User interface changes
None. (expect removing the XSS vulnerability on the editor interface).
API changes
None.
Original issue
tested with Last packaged version: 8.0-alpha3+423-dev
if you put a http:// on the caption this strip something else ..i mean this is not wanted..so bug report
Comment | File | Size | Author |
---|---|---|---|
#111 | drupal-no_protocol_filter-2105841-111-D7.patch | 6.33 KB | Liam Morland |
#97 | interdiff-2105841-90-97.txt | 1.74 KB | Devin Carlson |
#97 | do-2105841_no_protocol_filter-97.patch | 6.33 KB | Devin Carlson |
#97 | do-2105841_no_protocol_filter-97-tests-only.patch | 2.5 KB | Devin Carlson |
#94 | filter_php_5.2.17.png | 163.65 KB | Devin Carlson |
Comments
Comment #1
eule CreditAttribution: eule commentedComment #2
Wim LeersExcellent find, and awesome bug report: all the info is here, explained very clearly. Thank you! :)
Whew, I fear this is going to be mighty difficult to fix. The cause is deep in the filter system: the
filter_html
usesXss::filter()
, which is responsible for stripping out evil protocols. If you type something (anything) before "http:", such as "You should see http://drupal.org", then ""You should see http:" is considered the protocol! Patch attached to fix this.Tested with this content, and with the "restrict images to this site" filter off:
(Note that you have to copy/paste the above in CKEditor's "Source" mode, without exiting source mode; the current CKEditor build will mangle HTML stored in data- attributes. That's fixed in newer versions of CKEditor.)
Before:
After:
Comment #3
Wim Leers.
Comment #4
eule CreditAttribution: eule commentedworks for me..but need´s deeper tests
Comment #5
Wim LeersOf course. But before I do that, I want sign-off on the approach.
Comment #6
mr.baileysSome test cases with the patch applied and using Basic HTML:
The most recent patch only works when the scheme is preceded by a space. There are other valid captions where the scheme is not preceded by a space (see example 1). Rather than checking everything between space and ':', the logic should follow RFC 3986's definition of a scheme as "Scheme names consist of a sequence of characters beginning with a letter and followed by any combination of letters, digits, plus ("+"), period ("."), or hyphen ("-")"
However, I'm not sure how we can fix examples 2 and 3 while still using Url::stripDangerousProtocols.
Needs work for example 1, although we might have to rethink the approach to accommodate examples 2 and 3 too.
Comment #7
Wim LeersThanks for the review — much appreciated!
And you're right, I didn't consider the
(http://example.com)
scenario, which of course must also work :/ Thanks for the pointers!Comment #8
Wim LeersSo I'm now testing with this HTML:
For a moment, I thought it would've been possible to first run the
filter_caption
filter and then run thefilter_html
filter, but that won't work, because it'd require the<figure>
and<figcaption>
tags to be whitelisted. We don't want that.The solution that I found is super simple and elegant: we simply leverage the
FilterInterface::prepare()
phase that filters may implement! By URL-encoding the value of thedata-caption
attribute, no protocol stripping happens anymore. Thedata-caption
attribute is intended to contain (HTML-encoded) HTML, therefore filtering should be applied. (This already is the case.) And it is then that filtering that is responsible for stripping dangerous protocols.This patch no longer touches the
Xss
orUrlHelper
classes at all anymore, so the "needs security review" tag is not even necessary anymore. However, thanks to the test coverage added by #2099741: Protect WYSIWYG Editors from XSS Without Destroying User Data (Drupal\editor\Tests\EditorXssFilter\StandardTest
), we can also be certain that we did not enable any new security holes! Hence I'm definitely able to remove the "needs security review" part.Comment #9
mr.baileysI took the patch in #8 for quick test-drive, and there seems to still be an issue. To reproduce:
Creating/editing a node and adding a caption:
Viewing the result on node view:
Editing the node again:
Comment #10
Wim LeersHow do you keep coming up with these edge cases? :D Amazing!
Now testing with:
Upon further (and deep — spent my entire Saturday on this) investigation, it turns out that the solution in #8 indeed solves it for filtering (i.e. output for the end user), but it's broken because of #2099741: Protect WYSIWYG Editors from XSS Without Destroying User Data, which introduced
EditorXssFilterInterface
to ensure safety of authors when using WYSIWYG editors. That also usesXss::filter()
… and therefore we end up having exactly the same problems as before #8, but only for the author, not for the end user!If you'd paste the sample data, save it, edit it again, you'd notice that exactly the same things are broken as before #8, but only for authors!
In other words: the solution in #8 is not a complete solution. And since we have to fix it in
Xss::filter()
anyway, we might as well get rid of #8 entirely — but keep the added test coverage, of course.By very carefully adapting
UrlHelper::stripDangerousProtocols()
, I've been able to find a solution, that:\Drupal\editor\Tests\EditorXssFilter\StandardTest
,\Drupal\Tests\Component\Utility\XssTest
,\Drupal\filter\Tests\FilterUnitTest
and\Drupal\Tests\Component\Utility\UrlHelperTest
.data-caption
attribute), they are all stripped. Note that we in fact committed\Drupal\editor\Tests\EditorXssFilter\StandardTest
with a test case that shows that this is broken in HEAD — the mitigating factor is that you'd need the<META>
tag to be allowed to use it.Source:Wikipedia
use case, because it's impossible for Drupal to know whetherSource:
is an actual protocol, now or in the future. That will be a minor WTF, but we must always err on the side of safety.Comment #11
Wim LeersComment #12
jcisio CreditAttribution: jcisio commentedSorry for stupid question, but I don't understand a few XSS protection filter. E.g. this test:
Why that string needs to be "protected"? I test (only with Chrome) and it does not do anything harmful.
Comment #13
Wim LeersThere's a link just above that line of code that explains it. Please remember that XSSes don't need to work in every browser to be dangerous, if they only work in some browsers, that's already enough.
Comment #14
jcisio CreditAttribution: jcisio commentedIMO that link say the contrary: it is used to protect against XSS, not to do an XSS attack. Because I don't trust 100% on any type of wiki, I tried to reproduce it in any browser, but no popup was trigger, so I raised the question.
Comment #15
Wim LeersWell, in any case: what you're talking about is completely off-topic for this issue. Please open a new issue if you want to remove that.
Comment #16
jcisio CreditAttribution: jcisio commentedHere you are #2240365: Remove misunderstanding XSS tests.
Comment #17
Damien Tournoud CreditAttribution: Damien Tournoud commented^
javascript: alert(123)
works pretty well. Virtually every browser will automatically url-encode spaces in embedded URLs.Formal validation is pretty useless in an XSS filter, precisely because browsers have very lenient parsers.
^
\s
is possibly local specific, I'm not sure we want to go there.^ This looks like a really slow version of
substr($protocol, 0, -strlen($actual_protocol))
^
"javascript\t:alert(123)"
works pretty well as an XSS (at least in Chrome), so this assumption is just wrong with the split on\s
.^ The new version is missing a lot of relative URLs, that do not need to start with either
/
,?
or#
.(For example:
about?test:toto
is a valid URL.)^ From the look of it, there is a off-by-one error here. You are stripping the protocol *and* the colon.
Comment #18
chx CreditAttribution: chx commentedstripDangerousProtocols comes from the original KSES port ( https://api.drupal.org/api/drupal/modules%21filter.module/function/filte... ) and as such it should not be changed at all unless we have very very good reasons and a lot of expertise. I do not think it possible to enhance the KSES code but feel free to prove me wrong.
Comment #19
Wim LeersClosed #2327341: Xss::attributes() corrupts data attributes that contain colons or JSON-encoded data as a duplicate.
Also related: #2231709: Filter system clean-up: stop applying UrlHelper::stripDangerousProtocols()'s logic to *every* attribute value.
Comment #20
chx CreditAttribution: chx commentedCan we instead think on which attributes are JS capable? I know it's dangerous too but perhaps it's less dangerous than trying this.
Comment #21
neclimdulThat could be a solution. Its clear that our attribute needs are a bit more complex then they where almost 9 years ago when the port of KSES was added. It is still doing its job exceedingly well, its just the requirements of complex attributes and more advanced browsers seems to have pushed us to require something a little more surgical.
Comment #22
chx CreditAttribution: chx commentedSo what we can do here; we can write a harmless attribute list based on https://developer.mozilla.org/en-US/docs/Web/HTML/Attributes ; data-* surely belongs there. And be quite careful with it.
Comment #23
chx CreditAttribution: chx commentedDamien, please review this -- and please noone else set to RTBC even if you like it.
Comment #24
Dave Reid@chx: See also #952964: String stripped from title and alt attribute if contains a colon which is pretty much doing the same thing, and was the solution I was heading towards as well.
Maybe we should continue this "new" direction either on #952964: String stripped from title and alt attribute if contains a colon or #2327341: Xss::attributes() corrupts data attributes that contain colons or JSON-encoded data?
Comment #25
chx CreditAttribution: chx commentedJust mark everything as a duplicate and keep this one going?
Comment #26
Wim LeersI'm fine with this approach. Only one question: do we want this list of harmless attributes to be configurable or not? #952964: String stripped from title and alt attribute if contains a colon made it configurable.
(Especially considering Web Components are upcoming, I think we must allow for additional attributes to be considered harmless. And that list could vary per site and potentially even per text format.)
Comment #27
chx CreditAttribution: chx commentedYou put the wrong attribute there and you are hosed. Do we dare? I am already very very anxious to touch this code at all but to hand it over to users? Eek :)
Yes, we should make it configurable.
Comment #28
Wim Leers#27: I'm fine with deferring configurability to a follow-up if you want :)
data-*
attributes are the immediate need/problem/use case, everything else could be dismissed as speculation :)Comment #29
Dave ReidI'm ok to making it configurable without an exposed UI in a follow-up.
Comment #30
Wim LeersI closed #2346989: [PP-1] 'Restrict images to this site' leads to incorrect HTML with language attribute as another duplicate of this.
Comment #31
Dave ReidComment #32
Damien Tournoud CreditAttribution: Damien Tournoud commented(About one month later...)
I reviewed the approach from #23 and it sounds very reasonable to me.
The whitelist approach is recommended by several sources (for example the nice write-up by Jakob Kallin and Irene Lobo Valbuena at excess-xss.com). A few years back, a "common" list of sanitization rules was posted on the whatwg wiki. It has a list of "acceptable attributes" and a list of "attributes whose value is a URI", we probably want to consider in our whitelist only the difference between those two (+ everything that starts with
data-
), and run anything else through the filter.That would be a first start. The other approach would be to just leverage an externally maintained filter? HTML purifier looks extremely relevant.
Comment #34
Wim LeersAnother duplicate was opened: #2383205: Xss::filter removes valid HTML attributes . But one that has nice tests :)
Looks like we want to go with #23. What's missing in that case, are tests.
We want these tests that I wrote (in #10) for the caption filter in particular:
But also tests for the XSS filtering changes themselves in #23. webflo's patch in #2383205 does just that.
Comment #35
webflo CreditAttribution: webflo commentedComment #36
Wim Leerss/html/HTML/
Missing
@covers
.And finally, missing the test coverage I wrote for the caption filter, see my previous comment.
This is never used. Let's delete it?
s/alt and title/data-/
Comment #38
cs_shadow CreditAttribution: cs_shadow commentedAddressed Wim's comments in #36. I've added tests from #10. Although, I've tried to fix the errors but there are still some errors in the filterCaption tests. Maybe someone can help me on how to fix it.
Comment #40
cs_shadow CreditAttribution: cs_shadow commentedFixed a few test failures.
Comment #42
mr.baileysComment #43
Wim LeersThank you very much for making this patch green again, @mr.baileys!
This unchanged test code still ensures that no XSS can make it to the site's output.
This on the other hand… it now actually verifies that an XSS does make it all the way to the WYSIWYG editor. Which means that an XSS is possible in theory. CKEditor protects against this automatically, but another editor may not.
A mitigating factor is that the editing user would actually have to click this link within the editor (which CKEditor also doesn't let you do) — he couldn't be automatically attacked. And note that this never makes it to the end user, see point 1.
So that makes this less than ideal, but still better than what I proposed in #10 — see the comments by chx and Damien Tournoud.
AFAICT it is literally impossible to not have this problem when going with the approach recommended by chx & Damien, since it uses a simple whitelist of tags. My code in #10 used to do what you could call "speculative URL parsing".
The only way I see to fix this problem is to add a
editorXssFilter()
method toFilterInterface
, to allow filters to perform additional XSS filtering for editors. That way, theFilterCaption
filter could apply XSS filtering to thedata-caption
attribute's value.These are deprecated, but changing them here is actually not necessary. It actually is distracting, because it suggests it's part of the solution for this issue.
Comment #44
mr.baileysTalked to @Wim Leers in irc, assigning to myself and will try to upload a new patch later today.
Comment #45
mr.baileysAfter talking to Wim Leers on irc, we decided it might be better to perform XSS filtering on all data-*-attributes globally rather than relying on filters implementing editorXssFilter() individually. Attached patch implements this global data-* attribute filtering.
You're right, I've changed them back.
Comment #47
mr.baileysComment #48
mr.baileysComment #49
Wim Leerss/data-*-/data-/, like everywhere else.
s/markup/HTML markup/, to clarify, because "markup" in general does not need to be filtered, only HTML markup.
Maybe also add an
@see \Drupal\filter\Plugin\Filter\FilterCaption
?Hah :)
Add a comment for these 3 lines:
Let's omit these distracting, unnecessary changes.
s/array(…)/[…]/
Please use random names instead; this creates unnecessary assumptions.
In fact, why not just omit this completely?
Comment #50
mr.baileysThanks for the review, new patch attached.
Not sure about this, since FilterCaption is just one of the potentially many filters for which we globally sanitize the data-attributes?
Comment #51
sanduhrsStraight reroll.
Comment #52
sanduhrsSmall typo, trying again. Sorry.
Comment #54
Wim LeersThis means we need to expand the test coverage in
\Drupal\Tests\editor\Unit\EditorXssFilter\StandardTest
.After that, this is RTBC.
Comment #55
mr.baileysComment #56
mr.baileysI started adding tests to
\Drupal\Tests\editor\Unit\EditorXssFilter\StandardTest
specifically for the attributing filtering.There is one failing test because
DOMAttr->value
automatically decodes attribute values, while we want the original unescaped value.Comment #57
mr.baileysForgot to attach the interdiff to previous patch.
Comment #60
mr.baileysComment #61
mr.baileysComment #62
pfrenssenGoing to review this.
Comment #63
xjmDiscussed with @pfrenssen; this is a critical (and upgrade path blocking because of the disclosed vulnerability).
Comment #64
pfrenssenAs discussed with @xjm this is critical because it is an actual XSS issue which is easily exploitable.
This issue can be quite confusing to figure out. Here's what I've learned from experimenting with the patch, and talking to @mr.baileys and @Wim Leers:
data-caption
attribute. It is possible to use HTML in captions, for example if you want to output a link. If you create a link using the button in the CKEditor interface this will be single escaped. If you want to show any HTML entities then these will need to be double escaped. Double escaped captions are inherently safe and should never be filtered.<strong>my text</strong>
and submit the form this will become bold text. If I enter<script>alert('xss');</script>
then this will be executed as well, causing the XSS vulnerability. I suspect that this is a bug in CKEditor. Even if it would be intentional to allow editors to embed scripts in their captions it doesn't make much sense to execute the scripts when they are being edited since any executed scripts no longer have their code present in the editor and will be lost on a subsequent form submit.<script>
tags will be gone), but this is of course preferable over having an actual XSS issue.javascript:alert('xss')
) in selected "harmless" attributes (data attributes + title and alt attributes). These attributes are fully text based and will never have their contents executed as javascript code. This made me wonder about the data-caption attribute. It turns out fine, since its contents are fully XSS-filtered byFilterCaption::process()
. A potential attack with a bad protocol could bedata-caption="<a href="javascript:alert(1);">test</a>"
but if this is decoded and XSS-filtered thejavascript:
part is filtered correctly since it now is part of anhref
which is not on the whitelist of "harmless" attributes.Order use statements alphabetically.
These two namespaces are unused.
I don't understand why it was changed from static:: to parent::. This has a different meaning when this class is extended. It would mean that if the filter() method is overridden in the called class then it would not be called in this case, but instead its parent would be called. I think this should be put back to static::.
Same here, it should be
static::filterXssDataAttributes()
, so that in case this class is overridden the method of the called class is used, and not specifically always the one from theStandard
class.We are not decoding in this part, only filtering and encoding. As is already said $node->value is already decoded.
Maybe explain why it is needed to filter single escaped values but not double escaped values.
Alphabetical ordering of use statements, my favourite pet coding standard :)
"which is typically used with" is not a complete sentence. There should be an "it" in there somewhere.
Anonymous functions should have a space between "function" and the opening parenthesis.
...
This is very dense and hard to read, can we separate each case with an empty line?
What's the typo? The missing space after the double colon?
Also it seems that $input and $expected_xss_filtered are identical, so $input can be used in the final assert.
Comment #65
pfrenssenComment #66
mr.baileysThanks for the review @pfrenssen, will address your feedback asap.
Comment #67
Wim LeersFor the strangeness on the CKEditor side WRT encoding/decoding, we already have #2460145: Image captions incorrectly deal with plain-text HTML tags for that. Specifically see #2460145-3: Image captions incorrectly deal with plain-text HTML tags, which shows that it's a bug in CKEditor itself.
Comment #68
mr.baileysComment #69
pfrenssenThanks a lot Ivo! Just some minor cleanup left and this is good to go.
Document the parameter and the return value.
$expected_xss_filtered is identical to $input, so we can get rid of $expected_xss_filtered and use $input in the final assert.
Comment #70
xjmComment #71
mr.baileysAddressed the comments from #69.
Comment #72
pfrenssenLooking good now, thanks a lot!
Comment #73
alexpottLet's add a comment as to why this is harmless.
Comment #74
pfrenssenComment #75
mr.baileysAdded a comment. I've also renamed the
$harmless
-variable to$skip_protocol_filtering
, since that better describes what it does.Comment #76
Dom. CreditAttribution: Dom. commentedI have read though the interdiff which looks fine. Since we claim Drupal 8 to be HTML5 ready I would expect the @see link to point out at html5 doc not html4, but I could not find a proper one.
RTBC+1
Comment #77
pfrenssenVery good, I like the variable rename too. Thanks!
Comment #79
alexpottThis issue addresses a critical bug and is allowed per https://www.drupal.org/core/beta-changes. Afaics @Damien Tournoud has done a security review and agreed that the current whitelist approach is the correct one to take. Committed 2febbff and pushed to 8.0.x. Thanks!
I reworked the comment to be shorter and less repetitive.
Comment #80
Liam MorlandWill this be backported to D7?
Comment #81
Wim LeersAmazing work on this very, very tricky issue, @mr.baileys! Thank you so much for sticking with it for one and a half years! (Your first comment on this issue was on November 6, 2013…)
Comment #83
Heine CreditAttribution: Heine at LimoenGroen for Historisch Centrum Overijssel commentedAttached patch is a partial backport of the changes to filter_xss code in the D8 patch. The part filtering and encoding data attributes values has not been backported as this may break existing sites.
Comment #84
Heine CreditAttribution: Heine at LimoenGroen for Historisch Centrum Overijssel commentedUpdated patch to fix the comment.
Comment #85
Heine CreditAttribution: Heine at LimoenGroen for Historisch Centrum Overijssel commentedFix codestyle issue
Comment #86
xjmTagging to indicate this is a backport.
Comment #88
Les LimUp in #26 through #29, there was talk about having a follow-up to make the attribute whitelist configurable. Adding that here: #2544110: XSS attribute filtering is inconsistent and strips valid attributes
This is newly relevant based on #2501697: Remove SafeMarkup::set in rdf_preprocess_comment(), where we discovered that the XSS filter mangles valid RDF attribute name/value combinations, like
rel="schema:author"
andtypeof="foaf:image"
.Comment #89
webchickRemoving the upgrade path tag.
Comment #90
Devin Carlson CreditAttribution: Devin Carlson commentedExpanded #85 to backport
filterXssDataAttributes()
as well so it could be used by D7 contrib projects, such as CKEditor. If this isn't needed then #85 should be good to go.I found #85 to be a straight backport of #75 and didn't find any issues after a review or manual testing. This issue affects the D7 version of the Entity Embed module. #2550669: Backport field formatter display capabilities
Comment #92
samuel.mortensonMarking (#90) as RTBC, looks like a pretty straight-forward backport of the current D8 filtering.
Comment #93
David_Rothstein CreditAttribution: David_Rothstein as a volunteer commented#90 looks like it won't work on PHP 5.2?
Seems like #85 is the most important part to backport, but the extra function in #90 could be useful also (although I haven't read up on the details).
In the meantime, just changing the issue title, since it's pretty scary-looking and confusing otherwise :)
Comment #94
Devin Carlson CreditAttribution: Devin Carlson commentedTo address #93, I set up an environment with PHP 5.2.17 and ran the filter tests, with #90 applied, which all passed without any warnings.
Comment #95
David_Rothstein CreditAttribution: David_Rothstein as a volunteer commentedI don't see how since namespaces weren't introduced until PHP 5.3. And if I try this at http://sandbox.onlinephpfunctions.com using PHP 5.2 I get a PHP warning due to the slash in "\DOMXPath" (
Unexpected character in input: '\'
).Also if we're going to introduce this new function then I think we need more documentation on how and when to use it. It is not used at all in Drupal 7 core, and it's not really clear from the current documentation what you would use it for.
(minor) These should be part of a sentence since they're inline code comments rather than PHPDoc, e.g.
See filter_xss_bad_protocol() and http://www.w3.org/TR/html4/index/attributes.html.
Otherwise it looked good to me.
Comment #96
David_Rothstein CreditAttribution: David_Rothstein as a volunteer commentedAlso demoting the priority here, since the part of this issue that affects Drupal 7 is not a critical bug. Would still be very nice to fix it though.
Comment #97
Devin Carlson CreditAttribution: Devin Carlson commentedAn updated version of #90 to address #93 and #95.
Comment #100
aDarkling CreditAttribution: aDarkling as a volunteer commentedTried it out & it worked great for me in D-7.41.
Ran the testing suite. 1 test failed -- "HTML scheme clearing evasion -- spaces and metacharacters before scheme." I don't think that's related to this fix, though.
I'm not sure how filter_xss_data_attributes() is really needed, though since it is currently only used for the tests.
As I understand it we're supposed to run data through the filters that are set up (Full, Filtered, Plain, etc.), so there'd be no need to call filter_xss_data_attributes() separately.
Otherwise, I'd say that its good to go.
Comment #101
dbazuin CreditAttribution: dbazuin at LimoenGroen commentedaDarkling witch patch did you use?
And if it is #90 dit you try it with PHP 5.2?
Comment #102
aDarkling CreditAttribution: aDarkling as a volunteer commentedSorry, no.
I used #97 on PHP Version 5.4.38.
Just (manually) re-applied the patch to D-7.42. Still working.
Comment #103
aDarkling CreditAttribution: aDarkling as a volunteer commentedThis patch is already being used by (at least) the 2,424 sites that are using the Entity Embed module.
Comment #104
stefan.r CreditAttribution: stefan.r commentedComment #105
stefan.r CreditAttribution: stefan.r commentedComment #106
Fabianx CreditAttribution: Fabianx as a volunteer commentedRTBC + 1, it looks good to me.
Marking for commit.
Comment #107
David_Rothstein CreditAttribution: David_Rothstein as a volunteer commentedI thought about this issue a little more and I'm not sure the
data-
part of it is safe in terms of backwards compatibility.The problem is that
data-
attributes are by definition arbitrary, so it's certainly possible for a URL to be stored in them intentionally (which is later moved into an actual href attribute by JavaScript). And I've definitely seen modules store URLs indata-
attributes before.For a simple example, if your JavaScript code does
$('.some-link').attr('href', $('.some-source').attr('data-url'))
that's safe against XSS currently as long as the source has been run through filter_xss(), but not safe with this patch. So this patch would potentially open up a security hole in existing code in that case.Seems like we might need some kind of whitelist here, where a module can declare that a particular
data-
attribute shouldn't undergo this filtering? (This could be as simple as a single binary flag that is applied sitewide, e.g. a list of whitelisteddata-
attributes which modules could set via variable_set() and which filter_xss() then checks... as long as we're willing to assume that modules only use this on attributes that they can reasonably believe they themselves "control" and that other JavaScript code out there won't be using also.)Some other minor issues I would have fixed on commit if not for the above:
"are are"
Should be "Contributed modules". And maybe this could use a specific explanation that contrib modules are expected to call this function themselves in those cases, and how to call it? (e.g.,
filter_xss_data_attributes(filter_xss($html))
... specifically that this function by itself is not a substitute for regular XSS-filtering)Comment #110
Gábor HojtsySwap media tags to the right one.
Comment #111
Liam MorlandRe-roll of the patch from #97 which is the latest D7 patch in this issue.
Comment #113
cilefen CreditAttribution: cilefen commentedComment #114
Liam MorlandComment #116
Liam MorlandComment #117
Wim LeersPlease create a Drupal 7 issue, so we can finally close this against Drupal 8. This was committed in April 2015, almost two years ago!
Comment #120
Jaypan CreditAttribution: Jaypan at Jaypan commentedIn 8.2.x, I'm still seeing this issue. I'm posting here in case someone can point me at where I'm going wrong.
I've got a field on my account, named bio. In this field, I'm manually adding the following:
However, the resulting HTML is as follows:
So
schema:
is being stripped from the 'typeof' property.Comment #121
Eric_A CreditAttribution: Eric_A commented@Jaypan,
Could you repost your
typeof
issue in #2635632: Drupal XSS filtering cuts valid attributes, make it configurable / use a blacklist?Comment #122
Jaypan CreditAttribution: Jaypan at Jaypan commentedDone. Thanks.
Comment #123
Charles Belov@Wim Leers I've created an issue #2859006 which I just closed as a duplicate of this issue. If that would be the appropriate 7.x issue, then it's available for that.
Comment #124
jhodgdonI think that this issue is probably a duplicate of #2544110: XSS attribute filtering is inconsistent and strips valid attributes (or vice versa)? I just commented there as well and added this one as related there.
Comment #125
jhodgdonSee also #2635632-20: Drupal XSS filtering cuts valid attributes, make it configurable / use a blacklist, where I have tried to summarize all the mangled attribute issues and ask whether we should solve them together or apart.
Comment #126
jhodgdonIn case we are going to mark this issue as a duplicate of #2635632: Drupal XSS filtering cuts valid attributes, make it configurable / use a blacklist, I just made a test-only patch on comment #22 there that incorporates (I think) the test cases here.
Comment #128
jhodgdonOh, I see. This was an 8.x issue and was apparently fixed. Someone finally created the 7.x version of it in #123. This one should be marked Fixed / 8.0.x. It was closed several years ago and reopened around comment #83 for 7.x, but now our procedure is to have separate 8.x and 7.x issues. Hence... fixed.
Comment #129
generalredneckAdding a link to #2859006: Image alternative text loses text preceding colon upon leaving plain-text editor or upon saving node as mentioned in #123... also made it "related"