Follow-up to #2105841: Xss filter() mangles image captions and title/alt/data attributes
Problem/Motivation
\Drupal\Component\Utility\Xss::filter() cleans potentially dangerous protocols like "javascript:" from element attributes. It does this by stripping any set of characters that ends with a colon, unless the string is "http:" or "https:".
The filter strips out valid attribute name/value combinations that provide RDF metadata, such as rel="schema:author" or property="foaf:name".
Some attributes are exempt from this treatment, including alt, title, and any data-* attribute. In #2105841: Xss filter() mangles image captions and title/alt/data attributes, the decision was made to hard-code the exempt attributes list, and possibly make the list configurable in a follow-up issue.
Proposed resolution
Create two lists: one for for safe attributes (for which we can skip protocol check) and one for unsafe attributes (for which we enforce a protocol check).
User interface changes
None.
API changes
None.
Beta phase evaluation
| Issue category | Bug because RDF attributes are being stripped |
|---|---|
| Issue priority | Major because ... Critical/Not critical because ... |
| Unfrozen changes | Unfrozen because it is a bug fix |
| Prioritized changes | The main goal of this issue is bug fix and security |
-->
| Comment | File | Size | Author |
|---|---|---|---|
| #178 | 2544110-10.4.x-xss-attribute-filtering-is-inconsistent.diff | 10.67 KB | onnia |
Issue fork drupal-2544110
Show commands
Start within a Git clone of the project using the version control instructions.
Or, if you do not have SSH keys set up on git.drupalcode.org:
Comments
Comment #1
wim leersFirst step: exempt
relandpropertytoo?Also: how come test coverage is not detecting this problem!? :O
Comment #2
les limRDF module mostly uses the
Attributeclass to prepare its attribute name/value strings, which sanitizes usinghtmlspecialchars()but does not run through Xss::filter().I've opened up a sister issue to explore whether
Attributeneeds additional filtering: #2545056: Attribute class does not sanitize for "javascript:alert('XSS')"Comment #3
subhojit777I found this bug and created an issue for this #2556051: Xss::filterAdmin() strips namespaced attributes from elements. I think we should close the other one and work on this one.
mpdonadio has done some work there. I am going to replicate it here and submit a patch.
Comment #4
subhojit777Comment #6
subhojit777Comment #7
mpdonadioNot sure why #4 failed to run. Here is a reroll from the other issue w/ the test-only patch.
I think coming up with the list of attributes may be difficult. Maybe we want to whilelist prefixes instead? We have to deal with RDF, Facebook / OpenGraph, Dublin Core, and probably others.
Short term we should hardcode the list, but long term we should consider a way for modules to declare their namespaces, which would probably mean moving this out of Component.
?
Comment #8
subhojit777@mpdonadio See the interdiff in #6, may be that is the reason.
Comment #10
mpdonadioI have an idea for this that may make it a little less fragile and more maintainable in the long run, but also keep it secure. I am going to pick away at it during the day, and will post my progress tonight.
Comment #11
scor commentedIt's awesome to see work on this bug! For RDFa, we should allow the following attributes in the white list:
property,typeof,rel,revanddatatype. Those can include tokens of the form "prefix:term" and they would be stripped if not in the whitelist, which is a problem.Not sure if we would need to go as far as establishing a whitelist of prefixes as suggested in #7, I think filtering on a whitespace separated list of tokens "[alphanum]:[alphanum]" would be sufficient if we want to inspect the content of those attributes.
Comment #12
mpdonadioI like #11, and can move my idea in that direction.
Comment #13
subhojit777For the long run I think #11 will be good. As @mpdonadio has suggested in #7 there can be many like Facebook, OpenGraph, etc.
Comment #14
mpdonadio@scor, there are more attributes listed in http://www.w3.org/TR/xhtml-rdfa/ This list you provided are the only ones we need to worry about for the semicolons?
Comment #15
joelpittetAdded beta evaluation and there are tests on this so removing the tag. Only left out if this is major or not in the B.E, I'll let @scor decide that bit I think or someone who'd like to write something for that feel free.
Comment #16
joelpittetComment #17
joelpittetComment #18
joelpittetComment #19
mpdonadioWe have a proof of concept test that demonstrates this is a real problem. I wrote a patch at lunch that has way better testing. I want to do my best, though, to make sure I catch all edge cases from the get go, so I am going to wait until tonight to post it.
Comment #20
mpdonadioThink this is good for review. The interdiff is kinda useless, but I included it anyway.
The only open question is whether we want methods to update and/or set Xss::$safeAttributes and Xss::$rdfaAttributes. Easy to add, but didn't include in this patch.
Comment #21
joelpittetThis is looking pretty good IMO.
Maybe worth putting the regex pattern as a object property? or better yet factor out this value retrieval as it's own function because it's duplicated 3 times.
If someone needs that we can let them ask for the setters/getters.
Comment #23
mpdonadioNot in love with the docblock, method name, or the comment above the code, but here is that logic pulled into its own function. Suggestions appreciated.
Comment #24
droplet commentedWhy Why ? I don't understand. Danger strings in safe area, this is safe.
Comment #25
mpdonadioThe new assertions on this issue are just to demonstrate that the patch doesn't change the behavior for the non RDFa attributes. The test only patch shows this. The single failure is from what the issue addresses. The other new assertions just double check that the patch doesn't introduce additional problems.
Comment #27
subhojit777Looks good +1 RTBC
Comment #30
droplet commentedI still confused with them.
Can you tell the difference between these 2 test cases and show an attack way (proof-of-concept). From my own thoughts, if you assumed latter one is unsafe, I can't see how the first example is safe in the case.
Comment #31
subhojit777RE #30:
According to this issue Xss::attributes() should not strip valid RDF metadata atttribute. The first test case will allow
foaf:Imageattribute, while in the second casebar:bazshould be restricted.Comment #32
subhojit777oops..
Comment #33
mpdonadio#30, just to clarify.
In the code snippet you posed the first dataset tests the functionality in this patch, which is to not strip out RDFa prefixes from element attributes.
The second snippet I added is to ensure that this patch doesn't introduce a regression. In other words, that is how HEAD currently works.
The scope of this issue is to essentially just extend the attribute whitelist to allow RDFa attributes through.
Changing the behavior would be best discussed and done in a different issue, I think.
Comment #34
joelpittetDo you need to escape the result as well? May result in a double escape?
Should be using
UrlHelper::stripDangerousProtocols()directly no?Comment #35
mpdonadioNot sure about #34. That is really just shuffling some existing code around. Read in a bigger context:
This was the change to the attributes method. Since we needed a weird regex to match the RDFa attributes, we introduced a helper function per #21 in #23.
and then the helper is
So UrlHelper::filterBadProtocol() is what is currently used in HEAD and the patch retains this functionality. Essentially, it whitelists
[alnum]:[alnum]attributes and lets them through unscathed (the request in #11). Or are you saying that we should haveI think this will essentially be a no-op since nothing in [alnum] will be escapable.
?
Comment #36
joelpittet@mpdonadio I'm pretty sure about #34 but we could add a test to prove it get's double escaped if you want.
Adding some tangentially related issues around attribute values. To consolidate efforts a bit.
#2567743: Add protocol filtering to the core Attribute component
#2531824: Attribute class to check safe strings before escaping (has tests)
Comment #37
catchComment #38
mpdonadioDiscussed on IRC w/ @joelpittet. Will replace UrlHelper::filterBadProtocol() with UrlHelper::stripDangerousProtocols and add some additional assertions.
Comment #39
catch#2567743: Add protocol filtering to the core Attribute component has an ever-increasing whitelist.
We may want to consolidate that somewhere where both classes can use it, and make it extensible at least via Settings.
Comment #40
mpdonadioXss is in \Component. If we consolidate, won't we have to move to \Core and therefore have a BC break?
Comment #41
catchWe could put all the centralization in Component, although I don't think that can depend on Settings...
Either way if it's tricky we can fix both separately and have a follow-up to consolidate.
Comment #42
mpdonadioHere's just the function swap. XssTest still passes locally. Doing a full run to see if there are any side effects in other tests.
Wasn't sure if we need new assertions in XssTest, since we are unit testing here and Twig autoescape would need an integration test. So, do we need to add an integration test for this, or is there enough coverage elsewhere for it?
Re #41, and idea I had earlier on was to expose public static setter/getters for $safeAttributes and $rdfaAttributes. Would that suffice for sharing the attribute list with Attribute, and we could eventually expose that to Settings in a followup?
Comment #43
dawehner#2567743: Add protocol filtering to the core Attribute component seems to add even more ...
Comment #44
wim leersHow does this relate to #2567743: Add protocol filtering to the core Attribute component? It seems like they should be one issue?
Comment #45
dawehnerWell, its in different code paths, but they should at least share some common code around which attributes are allowed.
Comment #46
mpdonadioComment #47
dawehnerhttps://github.com/yesodweb/haskell-xss-sanitize/blob/master/Text/HTML/S... seems to provide quite a lit of attributes as whitelist.
Comment #48
mpdonadio#47, is Yesod using this for sanitization? It amost looks like the list of all valid HTML attributes per the HTML4 Strict DTD. My Haskell is really rusty, but I will try to trace this out later. I can't quite tell is this is just removing attributes not in that list, or whether they get sanitized.
Comment #50
mpdonadioReroll b/c #2609928: Xss::attributes() mangles valid attribute names containing numbers. Easy conflict.
Comment #51
gnugetA few lines above in this patch the short syntax is used but here it's using the old syntax.
Comment #52
subhojit777The function
providerTestAttributes()is written in the old syntax itself.Comment #53
subhojit777- Coding standards checked. The only concern is the patch does not uses short array syntax, however the rest of the file
core/tests/Drupal/Tests/Component/Utility/XssTest.phpuses long array syntax.- Tests passing.
Moving this to RTBC.
Comment #54
chx commentedI think the properties should use long array syntax.
But a bigger problem I have is with
preg_match('/^[[:alnum:]]+\:[[:alnum:]]+$/', $value). Can you tell me what :alnum: is in the first place? Is it locale aware? Unicode aware? Unicode aware if the tables are compiled? http://www.pcre.org/original/doc/html/pcresyntax.html says "alphanumeric" which tells me absolutely nada. Let's not use it. Let's be unambiguous and say what we have. Note how such is not used anywhere. Literally, you can search the entire codebase, Drupal, vendors, all, it's not there.Comment #55
subhojit777Comment #56
subhojit777I was unable to find
[:alnum:]inside core, but I found references and usage of[:alnum:]character class here https://www.hscripts.com/tutorials/regular-expression/character-classes/.... I am not sure whether it's unicode aware. I too was unable to find proper documentation for[:alnum:], in that case I agree we shouldn't use this.Comment #57
chx commentedCareful there, you left out the initial ^ anchor in the new patch. Also, if you are rerolling anyways, the : doesn't need an escape, it's not a preg special char. Otherwise, if that's what the RDFa standards prescribe, we are probably good to go.
Could you please file a followup to sprinkle this class with
finalmodifiers? Thanks.Comment #58
chx commentedComment #59
mpdonadioUsage in core aside,
[:alnum:]is documented at http://php.net/manual/en/regexp.reference.character-classes.php as a valid class: "letters and digits". I'll let others decide what Unicode / locale implications there are as that is not in my expertise and this behavior is rather poorly documented anyway in the PHP docs (eg, I see the same problem with using\wfor "word characters").Comment #60
chx commentedYes, I know that's what it is but my question was what "letters" are and this is not documented anywhere as far as I can see. And there are only ten \w in core/ and you are more than welcome to nuke them all.
Comment #61
subhojit777Comment #62
subhojit777@chx If I am not wrong the final modifiers are for
$safeAttributesand$rdfaAttributes.Comment #63
chx commentedI think this is good to go; let's discuss
finalin a followup.Comment #64
catchThis looks right to me. Leaving RTBC for additional review time.
Comment #65
catchReading back, there's still a mis-match here between the allowed attributes here and #2567743: Add protocol filtering to the core Attribute component. I think we should either adopt the longer list from here, or at least document why we haven't.
Comment #67
chx commentedComment #68
pcambraHere's a plain reroll of the patch in #61 so it applies to the latest 8.2.x.
There's another potential issue with the Xss filter and tokens that I've described in this message module issue https://github.com/Gizra/message/issues/85#issuecomment-242653200. When a token pattern such as [node:token] is present in a textarea/textfield, the "[node:'" is removed because I think the Xss filter considers it a protocol. Currently trying to add a test to demonstrate this behavior.
Comment #69
pcambraComment #71
pcambraSetting the right status as test pass.
Comment #72
catchDiscussed on a triage call with the other committers and we agreed this should stay major. I really think we need to discuss the allowed list/moving to a blacklist etc. in its own issue somewhere though then apply it everywhere.
Comment #73
catchSlightly re-titling, I think we should centralize the whitelist in this issue. Also removing the parent issue since that moved to 7.x long ago.
Comment #75
ao2 commentedHi,
can the
classattribute also be added to the list of safe attributes? In particular having colons in theclassattribute value should be allowed (my use case is basically the same as #372454: Filtered HTML filter modifies class attribute).Thanks,
Antonio
Comment #76
jofitzPatch no longer applies. Re-rolled.
Comment #77
jaypanHunk #2 of #69 failed for me, though the patch solved the problem I was having. So I've rescheduled a new test.
Comment #78
ao2 commentedCan someone comment about adding also the
classattribute to the list of safe attributes?I was asking about that in #75.
Thanks,
Antonio
Comment #80
eric_a commentedComment #82
jhodgdonThis is possibly related to #2944173: CKEditor mangles tokens in URLs, due to bug in Xss::attributes(), and/or the same bug? That has to do with using tokens in attributes.
I wrote a quick test-only patch to XssTest on that issue, which adds the following to the "Normalized" section of the test:
Hm.... I just tested, and the above patch in #76 does not fix the token bug. So, I guess they are separate bugs in the the same function.
Comment #83
jhodgdonHm. So, rereading this issue and @Les Lim's comment on my related issue, it does seem like they are due to the same problem, that things before : are stripped out. However, the patch here does not fix the token mangling problem.
The problem is that tokens have a format like
[piece1:piece2:piece3], and they could appear in any attribute. I don't know if things that look like that are a problem or not for XSS, but since they're a big and important part of Drupal, and Drupal's Editor module passes the HTML through XSS filtering before putting it into an editor (such as CKEditor), it's not OK to mangle things that look like tokens, just because they happen to be in something like an img src attribute, or an a href attribute.Comment #84
jhodgdonI think that this issue and #2105841: Xss filter() mangles image captions and title/alt/data attributes are quite possibly duplicates of each other?
I'll add a comment there as well...
Comment #85
jhodgdonSee also #2635632-20: Drupal XSS filtering cuts valid attributes, make it configurable / use a blacklist, where I have tried to summarize all the mangled attribute issues and ask whether we should solve them together or apart.
Comment #86
jhodgdonIn case we are going to mark this issue as a duplicate of #2635632: Drupal XSS filtering cuts valid attributes, make it configurable / use a blacklist, I just made a test-only patch on comment #22 there that incorporates (I think) the test cases here.
Comment #87
jhodgdonDisregard #84. That issue was fixed for 8.x back in 8.0.x and is only open for 7.x at this point.
However, there's still a question about #86 -- we might combine all the open "XSS filtering is mangling attributes" issues into one.
Comment #90
tedfordgif commentedFor those looking to mark the class attribute safe, here's a patch.
Comment #92
tedfordgif commentedComment #93
tedfordgif commentedComment #94
tedfordgif commentedIf you're whitelisting the class attribute, it probably makes sense to allow an expanded set of characters in the styles combo for the editor config. The attached patch works for my use case, but is clearly not the right approach for everyone.
Comment #95
markhalliwellThe real latest patch is #76, recent patches have ignored previous work.
This is an issue we've run into over at #3131224-8: CommonMark 3rd-party Extension: Footnotes.
HTML5 allows pretty much any character for classes and IDs, including colons:
https://mathiasbynens.be/notes/html5-id-class
This isn't just about "RDF" values anymore, but being more HTML5 compliant.
I'm all for stripping bad javascript when it's warranted, but blanket filtering anything that remotely resembles a "protocol" seems like overkill now.
Comment #96
markhalliwellComment #98
fabianx commentedFor those searching for workarounds:
https://www.drupal.org/project/drupal/issues/2944173#comment-13786061
TL;DR:
- Create a filter before Filtered HTML that converts e.g. ":" in "class" to "!"
- Run Filtered HTML
- Convert back
Need to ensure to implement that securely, but that is the first approach that could be made into a contrib module.
Comment #99
fabianx commentedComment #100
gaards commentedRe-rolled patch from #76
Comment #101
sadikyalcin commentedHaving an option to whitelist protocols would be great. We use deeplinks all over the net nowadays but Drupal doesn't let us use them!
What is insecure about the following?
<a href="myapp://home">click to launch app</a>Comment #103
marc.groth commentedFWIW it is possible to whitelist any anchor link protocol using 'filter_protocols' in services.yml
We were having a similar issue to https://drupal.stackexchange.com/questions/281894/problem-with-text-form... and adding the 'data' protocol fixed it (i.e. it was no longer stripped). YMMV
Comment #104
lolcode commentedThank you #103! I was trying to implement Tailwind CSS which uses colons in variant names and they were getting stripped despite being whitelisted in my CKEditor config. The Stackexchange solution worked.
Comment #105
yogeshmpawarRemoving Needs reroll tag as it is no longer needed.
Comment #107
chalk commentedI've described my case with Tailwind CSS here: #3252587: Extend the $skip_protocol_filtering list of attributes to use Tailwind CSS classes with prefix ":". Seems it's a bit related.
Comment #108
les limComment #109
amber himes matzI dug into whether #2944173: CKEditor mangles tokens in URLs, due to bug in Xss::attributes() could be closed as a duplicate of this one. (See comment #2944173-74: CKEditor mangles tokens in URLs, due to bug in Xss::attributes().) What I found was that they are both intent on modifying
Drupal\Component\Utility\XSS's filtering in order to prevent XSS class from filtering out/mangling data from 2 different kinds of contexts:hreforsrcattributes, e.g.<a href="[node:url:absolute]">link</a>in a body field that uses CKEditor and enables "Limit allowed HTML tags and correct faulty HTML" in the text format configuration.
rel="schema:author"orproperty="foaf:name"Right now, as per the scope described in each issue summary, they are not duplicates, but if the scope was widened to consider both use cases/contexts, they could be combined into 1 issue, ensuring a compatible solution. Considering that they're both dealing with XSS class, that might be desirable.
As it stands, the patches in each issue do not solve the other's problem:
We do NOT want to allow href or src to bypass filtering altogether I would think! Which is what would happen if we used the attribute safelist approach here and just added in href and src. In this issue, we just want it to allow tokens and not mangle them up. Assuming that tokens aren’t an XSS vector (which is an open question for discussion in #2944173-65: CKEditor mangles tokens in URLs, due to bug in Xss::attributes().
Could we combine these issues and their solutions? Maybe it would make sense to do so, to ensure a compatible solution that solves both problems (href, src token mangling, and RDFa Attribute handling in XSS class). We would need to increase the scope in the one that stays open. Is that worth doing? What do you think?
I'll be keeping tabs on discussions in both issues with the goal of deciding whether they should be combined into 1. Thank you!
Comment #111
andrea.cividini commentedTried applying the latest patch in 9.4.1 PHP 8.1 and it applies and works correctly, but I am still struggling with `class` attribute.
Using Tailwind CSS many classes have `:` char in them and even with the current patch, values are still getting stripped incorrectly.
This :
class="button flex items-center justify-center font-bold inline-flex cursor-pointer focus:outline-none transition duration-200 rounded-full group disabled:opacity-50 disabled:pointer-events-none text-s button--tertiary flex-rowbecomes:
class="pointer-events-none text-s button--tertiary flex-rowbut I'm not sure if I would add the `class` attribute to the
$safeAttributeslist for this purpose, what do you think?Comment #112
cilefen commented#3252587: Extend the $skip_protocol_filtering list of attributes to use Tailwind CSS classes with prefix ":" has been committed to 9.5.x.
Comment #113
andrea.cividini commented@cilefen Thanks for pointing this out! I wasn't able to find this issue while searching in the issue list, works flawlessly on my instance. Thanks again!
Comment #114
smustgrave commentedWith the addition of https://www.drupal.org/project/drupal/issues/3252587 I'm no longer finding this to be an issue. Tested with rel="schema:author" and confirmed nothing was stripped out.
+1 for closing out
Comment #115
cilefen commentedComment #116
catchThat's not an exact duplicate, rel was already in the allow list before that issue landed, and that issue added class. But the more general problem that other valid attributes get stripped and there's no way to control this remains.
Comment #117
catchThis came up in the bug smash channel, and discussion in there gave me a bit of an idea.
Currently we have two conflicting approaches:
With the core status quo, we maintain an allowlist of attributes that don't get protocol filtering, and add to that list when it turns out someone needs to add a protocol-like attribute value in there as we recently did with
class. The problem with this is that for each new use case, it's broken until you get a core patch committed - and I found a reference to adding 'class' at least five years ago...At various times there's been a suggestion to switch this to an exclude list, for example see discussion between @alexpott and @chx seven years ago #2635632-3: Drupal XSS filtering cuts valid attributes, make it configurable / use a blacklist. The problem with an exclude list only, is that if something is missing from the list, and HTML could add a new URL-taking attribute at any time, then you've got a core security issue.
What if we maintain both lists?
If something explicitly doesn't get protocol filtering, we don't filter protocols - same as now.
If something explicitly does get protocol filtering, we silently filter protocols - assuming they're from bad user data and should be stripped.
If an attribute isn't specified in either list, we noisily filter protocols - i.e. we check the value of the attribute before and after filtering, and if something protocol-like has been stripped, trigger a PHP notice with a link to documentation - this could suggest opening a core issue, adding to a configurable list if we make it configurable etc.
Comment #118
smustgrave commentedThis is a WIP patch just trying to get some feedback if I'm going in the right direction. I'm pulling from here, https://www.drupal.org/project/drupal/issues/2635632 and https://www.drupal.org/project/drupal/issues/2944173
Trying to address the comment #117
Still need to think about how to address
Pulled tests from all the tickets so tokens, rdf attributes, media, should be working.
Comment #121
catchI think this could just be merged into safeAttributes()
Why not using UrlHelper::filterBadProtocol() if there's protocol filtering, and nothing if there's not? I think it would at least be good to start with the safe/unsafe lists + logging for changed values in neither and then evaluate whether we need more changes after that.
Comment #122
smustgrave commentedWas trying to think how to cover tokens. [node:id] is getting sent through the protocol breaking the token.
Comment #123
catch#2944173: CKEditor mangles tokens in URLs, due to bug in Xss::attributes() is already open looking at tokens, and I don't think we should try to fix that here. It might need something like supporting an alternative token format since the colon is meaningful in too many places.
Comment #124
smustgrave commentedI'll revert those changes. And push a new patch in the morning.
Comment #125
smustgrave commentedRemoved the token code.
@catch following the original patch of this ticket I believe rdf tokens need to go through a preg_match call first. Could be wrong about that.
We are still using UrlHelper::filterBadProtocol() just it goes through an internal function first to determine if its an rdf attribute for like above.
Uploaded a new tests-only patch with several examples. Removed some of the examples from the original patch as those issues are being tracked in separate tickets.
Comment #128
smustgrave commentedThis patch contains no longer. Should clean up some of those errors.
Comment #129
smustgrave commentedThis one may be a little better
Comment #130
smustgrave commentedComment #131
smustgrave commentedTrying to keep to the front of the issue queue. Don’t want to get lost in the shuffle
Comment #132
quietone commentedAlways good to see an issue tagged Security making progress. Thanks!
The success patch in #125 has more failures that than fail patch, some appear to be random failures. However, the test being change here, Drupal\Tests\Component\Utility\XssTest, fails in both patches. I think we should have a new fail/success pair of patches.
I had to remove these lines in order to run the test locally. Running the test locally with PHPUnit 8.5.26 fails with
The issue summary has no proposed resolution or remaining tasks, tagging for that.
I have not reviewed the patch.
Comment #133
smustgrave commented@quietone updated IS.
#129 is the patch I got working trying to go off of #117
Looking at the referenced tickets.
Can we consolidate work from https://www.drupal.org/project/drupal/issues/2635632 into here?
Can we close this as a duplicate https://www.drupal.org/project/drupal/issues/2567743 ?
Comment #135
catch#2544110-65: XSS attribute filtering is inconsistent and strips valid attributes (six years ago!) asked to adopt the longer list of safe attributes from the patch in #2567743: Add protocol filtering to the core Attribute component. I still think we should do that. Also to allow that list to be used in Attribute, can we make this a constant instead?
I still wonder if we really need the separate list of RDFa attributes on top of safe and unsafe, but I think we should try that in a follow-up, removing it is going to result in subtle behaviour changes which might or might not be OK, so should be handled separately.
In #2544110-117: XSS attribute filtering is inconsistent and strips valid attributes I suggested triggering an error if an attribute doesn't appear in any lists, I still think we should do that but also in its own follow-up issue to try to avoid scope creep here.
Needs work - just for the longer list of allowed attributes.
Comment #136
smustgrave commentedPulled the attribute list from patch https://www.drupal.org/files/issues/2567743-58.patch
Can we close #2567743: Add protocol filtering to the core Attribute component for this one?
Also added class to the safeAttribute (thought it was already).
And update the skip_protocol check.
Are you asking it to be made a constant in the Attributes.php file?
If we removed rdfAttributes should they be added to the safe or unsafe list? Will most like require updating the tests.
Opened #3324862: Follow up on 14614509: Trigger error when attribute is not in list for followup to trigger warning
Since there are so many filter tickets open hoping when 1 lands it may loosen or cover others. but could be wishful thinking.
Comment #138
smustgrave commentedClosed #2635632: Drupal XSS filtering cuts valid attributes, make it configurable / use a blacklist as a duplicate as I believe the work here will cover that ticket too.
Comment #139
catchThat's specifically adding filtering support to the Attribute class, so not a duplicate, but it should hopefully be simplified by this issue.
No in the Filter class, like Filter::XSS_SAFE_ATTRIBUTES, just instead of protected static. But.. we could also do that in #2567743: Add protocol filtering to the core Attribute component if and when we need it to be accessible, it doesn't need to be here, so can skip adding the API surface.
The safe list.. but I don't know if that's actually 100% safe or not, so should be a follow-up too.
Comment #140
smustgrave commentedIf we moved them to safe then things like
typeof="foaf:bad////valueproperty="javascript:alert(0);"Would be allowed.
Comment #142
smustgrave commentedI do like the idea of consolidating but not sure how to handle scenario like those above.
Comment #143
catchApart from the test failure, #136 is looking good to me.
Comment #144
smustgrave commentedSo those failures in #136 along the same lines
Content is a safe variable
The value is
0;url=javascript:alert(\'XSS\');So since it's safe is that fine to leave as is?
Comment #145
catchYes I think we can change the test expectation for that one, content can contain any arbitrary text and won't get parsed as a URL or anything.
Comment #146
smustgrave commentedIn that case was able to combine rdfattributes.
Comment #148
smustgrave commentedClosed #3227831: Xss::filter() does not handle HTML tags inside attribute values as a duplicate.
Comment #149
lazysoundsystem commentedThe last patch is now a few lines off
Updated patch changes the line numbers and nothing else.
Comment #150
smustgrave commentedJust checked #146 and it applied fine to 10.1
Comment #151
smustgrave commentedRerolled
error: patch failed: core/lib/Drupal/Component/Utility/Xss.php:267
error: core/lib/Drupal/Component/Utility/Xss.php: patch does not apply
error: patch failed: core/tests/Drupal/Tests/Component/Utility/XssTest.php:525
error: core/tests/Drupal/Tests/Component/Utility/XssTest.php: patch does not apply
Comment #152
larowlannit - untouched => not sanitized ?
ubernit - touched => sanitized ?
Can we get a comment for this, I assume its something angular related
I think we can simplify this to
applies in all three instances of this new code
We can typehint here, both for arguments and return type
can we get a comment here - its a safe attribute, but we're stripping something if a regex matches?
could we document the regex a bit like we do in \Drupal\Component\Utility\Xss::filter where we have this
Just as a courtesy to our future selves
Whilst content is not executed, for a http-equiv="refresh" it is
I tested this in Chrome and it was blocked by the browser, but it was executed - so I think we may need to handle this differently
Same here, the browser evaluated it but blocked it in Firefox and Chrome
In my testing this was safe, browsers ignored it
shouldn't the ed- prefixed attributes be removed?
these look to be duplicated in the test cases?
Comment #153
smustgrave commentedWill work on this tomorrow. If I don’t post anything in 3 days feel free to remove my name and work on.
Comment #154
smustgrave commented#152
1 = Fixed
2 = Fixed
3 = Added comment
4 = Updated
5 = Fixed
6 = tried my best
7 = This was an existing test, only change I made was for javascript:alert(\'XSS\'); as content was marked a safe attribute
8 = This was an existing test, only change I made was for javascript:alert(\'XSS\'); as content was marked a safe attribute
9 = should it be removed?
10 = removed
11 = removed
Comment #155
smustgrave commentedComment #156
smustgrave commentedPer a discussion with @larowlan removing "content" as one of the safe attributes and opened #3352540: Investigate allowing "content" as allowed tag
Updated patch
Comment #158
larowlanWe don't have test coverage for all of these attributes - should we add it here? I realise there's some overlap with #2692451: Xss::filterAdmin() incorrectly filters datetime attribute
I think this was missed from my earlier review, can we get a comment regarding the significance of `ng-` - e.g. relates to Angular (I think?)
Comment #159
smustgrave commentedrel = is already included
rev = tests added
datetime = test added
mailto = test added
media = test added
sizes = test added
Removed some as I couldn't find examples
about = Removed
datatype = Removed
datatype_callback = Removed
#158.2 added to comment block in earlier patch.
Comment #160
catchOpened #3353173: Considering making the XSS safe/unsafe attribute lists extendable as an additional follow-up, took the RDF list out of the issue summary since that's no long in the patch.
Comment #161
heddnNeeds re-roll for #2692451: Xss::filterAdmin() incorrectly filters datetime attribute.
Comment #162
heddnComment #163
larowlanGetting close, couple of questions and looks like at least one test case has the wrong label
I don't see a rev attribute here
ah here is is, I think these titles are out of order
are sizes and media valid attributes for an a tag? should we use img here or picture or something that reflects supported attributes?
should we add a test-case with a : or did the related issue already resolve that? if so do we need this test case at all anymore?
I don't understand why one of these is stripped but the other isn't?
Comment #165
donquixote commentedNot only attribute values are damaged, also attribute names.
Attribute
aaa:bbb="X"becomesbbb="X".Attribute
aaa_bbb="X"becomesbbb="X".E.g. try
Xss::filter('<div aaa:bbb="AAA-BBB" bbb="BBB"> </div>', Xss::getAdminTagList()).This leads to a duplicate attribute `bbb=`.
You can argue that these are silly or invalid attribute names.
If this is the concern, the correct behavior would be to remove the attribute, _not_ to change the attribute name.
Interesting discussion: "What characters are allowed in an HTML attribute name?", https://stackoverflow.com/a/53563849/246724
Some things to consider.
In fact this is the case for me in a current project.
The culprit is definitely the attribute splitter function.
But to fix it, we need to define the desired behavior.
Comment #166
donquixote commentedReview:
These two variables always occur in this combination.
They can be combined into a single variable.
We also see that, with their default values,
static::$unSafeAttributeshas no effect whatsoever.It would only have an effect if somebody adds a "data-" or "ng-" attribute to the unsafe list.
I wonder, do we support people modifying these class properties? Or could we use constants instead?
Comment #167
donquixote commentedComment #168
donquixote commentedIf we support them being modified, I would rather turn all the static methods to object methods, so that you can have different instances with different configuration.
For static methods, I would expect them to behave the same at all times.
Comment #169
donquixote commentedAlso,
We now have two places where we compare the attribute name to a list:
One time in ::attributes() with case 0, another time in ::filterProtocol().
This can be collapsed to just one place.
Instead of passing the name to ::filterProtocol(), we can pass a filter mode.
But maybe the name parameter in ::filterProtocol() is meant for subclasses that want custom logic to determine the filter mode?
Do we care about inheritance?
Comment #173
neclimduldespite several requests for comment, its still not clear why we need a special case for "ng-"
If we do need to special case it, hard coding it like this in its current location has a smell and suggests to me we haven't succeeded in making this "consistent" if such a special case for a framework is required.
Comment #174
avpadernoComment #175
tommyk commentedWhile I haven't read every reply, I didn't see any results when I searched for ARIA in this issue, so adding my use case.
I got here because I'm trying to manipulate Views output to create accessible Read More links like described at https://www.w3.org/WAI/WCAG21/Techniques/aria/ARIA8 wherein I am adding an aria-label attribute with a token like
<a href="/node/{{ nid }}" aria-label="Read more about {{ title }}">Read More</a>where the title in this case contains a colon. Everything in the ARIA Label up to and including the colon in the title is stripped.Comment #176
quietone commentedHide files
Comment #177
joao.ramos.costa commentedHI @ tommyk,
indeed. Got here by the same reason. The aria-* attributes may be 'whitelisted' as for the same reason 'data-' attributes are and perhaps a little more than 'ng-'.
Comment #178
onnia commentedHi,
I could not find 10.4.x compatible patch for commit 3cf003db
so I made my own, I hope someone else finds this useful.
Comment #181
prudloff commentedI don't think we should hardcode a special condition for Angular attributes (or other use cases), a better solution would be to provide a way to extend the list of skipped attributes.