Problem/Motivation
It is pretty common to want to use a token for a URL when editing text. For instance, you might want something like:
<a href="[my:token:here]">link text here</a>
However, if you try to do this in a body field that is using CKEditor, with a text format that uses the "Limit allowed HTML tags and correct faulty HTML" filter, if you enter a token like that in the field, and then later edit the content that contains that field and either look at the HTML source or save the content, the token is corrupted. This seems to be independent of which CKEditor buttons you have configured, so it isn't a problem (apparently) with a particular button plugin.
Additionally, some (but not all) other attributes in HTML get corrupted in the same way, if your text format allows these attributes. Some examples that were tested:
| Tag | Attribute | Corrupted? |
|---|---|---|
| p | class | Corrupted |
| img | src | Corrupted |
| img | alt | Not corrupted |
| a | href | Corrupted |
Tokens in the text that are outside of HTML attributes do not get corrupted.
Steps to reproduce:
a) In a content item with a field that has a text format that is configured to use CKEditor for editing, and which contains the "Limit allowed HTML tags and correct faulty HTML" filter, and allows the A tag, type some link text in the editor.
b) Highlight to select the link text.
c) Click the Link button (chains) in the editor toolbar, and enter [my:token:here] as the URL in the popup.
d) Click Save in the popup. Verify that the HTML source looks like
<a href="[my:token:here]">link text here</a>
e) Save the content item you are editing.
f) Test -- the link works fine (assuming you are running a token replace so it gets replaced by the right URL).
g) Edit the content item again.
h) When you get back to the editor, look at the HTML source. Instead of seeing what was there before, you will see something like this:
<a href="en:here]">link text here</a>
So that's the bug: if you re-edit some HTML text using CKEditor and the "Limit allowed HTML tags and correct faulty HTML" filter, and there is an A tag with a token in the URL (or tokens in various other attributes, but not all attributes), CKEditor truncates and mangles the token, leading to data loss. According to Priority Levels of Issues, this means it is a Critical bug (or at least Major?) because it leads to data loss.
Note: We are specifically seeing this in the proposed Help Topics module (see related issue #2943974: Work-around for route tokens in Help text format getting truncated after editing a help topic).
Proposed resolution
Fix Drupal so that CKEditor doesn't mangle tokens in URLs in A tags.
The problem was traced down to a bug in \Drupal\Component\Utility\Xss::attributes(). If you pass in an attribute string like href="[something:something:config_basic]" to this function, you get out something that looks like href="config_basic]".
Remaining tasks
Fix the bug in Xss::attributes().
User interface changes
CKEditor will not mangle HTML containing tokens for URLs in A tags, or other HTML tag attributes.
API changes
None.
Data model changes
None.
| Comment | File | Size | Author |
|---|---|---|---|
| #69 | 2944173-typofix-69.patch | 3.35 KB | amber himes matz |
| #63 | 2944173-63.patch | 3.35 KB | danflanagan8 |
| #63 | 2944173-63-FAIL.patch | 2.29 KB | danflanagan8 |
| #60 | xss_attrs-2944173-60.patch | 984 bytes | liquidcms |
| #55 | 2944173-xss-attributes-55.patch | 967 bytes | kerasai |
Issue fork drupal-2944173
Show commands
Start within a Git clone of the project using the version control instructions.
Or, if you do not have SSH keys set up on git.drupalcode.org:
Comments
Comment #2
jhodgdonOh, you can see our CKEditor configuration here
https://cgit.drupalcode.org/sandbox-jhodgdon-2369943/tree/config/optiona...
It seems we are using the DrupalLink and DrupalUnLink plugins, so most likely the problem is either there or in the core CKEditor itself?
Comment #3
jhodgdonActually, this bug causes data loss/corruption with no warning to the user and no work-around. Merely having the CKEditor editor enabled on a field that contains a URL token, and updating a content item that contains that field, will cause that token to be truncated.
That means this is a critical bug, according to https://www.drupal.org/node/45111
So... updating priority and updating issue summary to add template.
Comment #4
andypostLooks that's exactly ckeditor bug, probably this regexp in action https://github.com/ckeditor/ckeditor-dev/blob/major/core/htmldataprocess...
Comment #5
webchickIs there an upstream bug report?
Comment #6
jhodgdonNot yet. Have we definitely confirmed that it is a bug in ckeditor code, independent of Drupal?
Another way to look at this: Is using strings similar to Drupal tokens in URLs something that ckeditor should support and/or not mangle, or should Drupal's link plugin possibly override the mangling that ckeditor is doing? I mean, ckeditor mangles (or, to use more neutral language: edits, fixes, and otherwise modifies) various things in HTML code that it receives. Maybe it has a reason for not leaving href attributes of A links alone if they look like Drupal tokens?
Comment #7
jhodgdonI'll just also note that as I mentioned in comment #2, Drupal's CKEditor module provides Link and Unlink plugins, so we are not using the raw ckeditor project code in this case. And Drupal has various other modifications and additions to the base ckeditor project code... So without debugging, it would not necessarily be clear that this bug comes from the ckeditor project and not the Drupal CKEditor module.
That's in addition to the question of whether, even if it does come from something in the ckeditor project code, whether it is a bug for purposes of the ckeditor project, or whether it's just a bug in the context of Drupal, which might want to put strings that look like tokens into URLs.
Comment #8
webchickOh, I might've misread #4 as attributing it to a bug in the upstream regex.
Comment #9
jhodgdonHe said "probably" it was this regexp in action in #4.
I'm not sure I understand how that regexp would mangle token-like strings, anyway? Or again, whether that's a bug or a feature from core ckeditor perspective?
Comment #10
jhodgdonI tested this some more today, and have some further information.
First off, the text format / editor config I am using has the Drupal "Limit allowed HTML tags and correct faulty HTML" filter on it. So, in the settings for that filter, I enabled more attributes on various allowed tags, used the Source button to put attributes on those tags in formats that looked like tokens, and found that CKEditor (and/or whatever Drupal is doing) is mangling some attributes but not all. Here is a list of ones I tested:
- p with "class" attribute ==> mangled
- img src ==> mangled
- img alt ==> not mangled
- a href ===> mangled
So, I turned off the "Limit allowed and correct faulty HTML" filter in my text format config, so there are no filters but I'm using CKEditor. Now, my attributes are not mangled. So... probably this is not a core ckeditor project bug.
So ... Let's see, what happens if I leave that filter in but turn off CKEditor? Answer: no mangling of attributes in the textarea.
I also tried removing various buttons from my CKEditor config. Actually, I removed all of them except the Source button. My attributes were still mangled.
So. It looks like the problem is limited to happening when I am using a text format and editor config that:
- Includes the Drupal "Limit allowed HTML tags and correct faulty HTML" filter
- Uses CKEditor
- It doesn't seem to matter which buttons I have enabled.
Updating the summary... Anyway, I think the problem is in what Drupa's CKEditor module does with that "Limit allowed HTML tags and correct faulty HTML" text filter, not with CKEditor core code itself...
Comment #11
jhodgdonOK, I have narrowed down the problem: The mangling is happening in function editor_filter_xss() in the editor.module file.
Right at the end, where it does:
Just before that call, the $html value is fine, but the return value is mangled.
To make a long debugging story shorter... The problem is actually in the function
Drupal\Component\Utility\Xss::attributes()
If you pass in something like
href="[something:something:config_basic]"to this function, you get out something that looks likehref="config_basic]"So, I am moving this to component base system, because the problem is in the XSS utility class. Ooops.
Comment #12
jhodgdonComment #13
les limI think this is the same bug as #2544110: XSS attribute filtering is inconsistent and strips valid attributes - the attributes method assumes ":" characters are an attempt to use a dangerous schema such as "javascript:", and sanitizes the string by stripping everything before the colon.
Comment #14
jhodgdonHere is a patch to the XssTest that illustrates the problem. This is just adding a data set to be tested, with elements: input, expected output, message for the test assertion, list of tags that are allowed.
If I change the test to look like this:
then it passes -- in other words, the
src="[something:like:a:token]"attribute is mangled to look like src="token]". Here, I'll attach that test too so you can see that it passes. :)Now, we only need to fix the actual bug, and make the (valid) test pass.
Comment #15
jhodgdonThis is possibly the same bug as this related issue (thanks @les_lim for pointing out)... I will add a comment there.
Comment #16
jhodgdonI just tested the patch from the other issue and it doesn't fix the token problem from this issue, so I think they are related but different issues.
Comment #17
jhodgdonAnd by the way, the test should probably be expanded. That attributes() function on the Xss class has all kinds of special cases for various attributes, so we should probably test more than just the one that I put in the test.
Comment #19
les limIs it possible to expand tokens *before* the attribute sanitization takes place, I wonder?
Comment #20
jhodgdonNot in the case described in this issue. The token mangling is happening in the HTML that is in a textarea field on a content editing screen, which is being prepared for editing with CKEditor. You need to put the token into the HTML you are editing, and you don't want the token replace to happen until the content is being rendered.
Comment #21
jhodgdonThe problem is that the Xss class thinks : is a dangerous character, and the token system thinks : is a good separator.
Comment #22
amber himes matzMight this API page for function locale_string_is_safe provide any insight or help for this issue? It sounds very similar to the problem described here.
Note the comment header for the function locale_string_is_safe:
Comment #23
jhodgdonYes, that's very interesting! At the end of that comment, it references this issue:
(e) #2372127: Prevent Xss::filter from stripping a token at the beginning of an html attribute
That was closed by @catch as a duplicate of
(a) #2635632: Drupal XSS filtering cuts valid attributes, make it configurable / use a blacklist
And that issue also has several open related issues on it:
(b) #2105841: Xss filter() mangles image captions and title/alt/data attributes
(c) #2552837: XSS::filter and filter_xss can create malformed attributes when you would expect them to be stripped
(d) #2544110: XSS attribute filtering is inconsistent and strips valid attributes
[letters added to each issue for reference purposes, sorry about (e) being at the top, I didn't think I needed it until later in my comment]
So let's see, maybe our issue is a duplicate of one or more of those?
OK, (d) is the one we've been discussing that was brought up by @Les Lim. It's definitely related, but not an exact match, and its patch doesn't fix this issue. It looks to me as though (d) and (b) are quite probably duplicates of each other.
(c) looks like a completely different issue, but related because it's also about attributes.
(a) doesn't look like a duplicate of this issue either, except that (e) definitely was, and @catch marked (e) as a duplicate of (a).
So... I'll add a comment to (b) and (d) about them probably being duplicates of each other, and a comment to (a) about whether it's going to fix this issue or not. And add some more related issues here.
Comment #24
jhodgdonSee also #2635632-20: Drupal XSS filtering cuts valid attributes, make it configurable / use a blacklist, where I have tried to summarize all the mangled attribute issues and ask whether we should solve them together or apart.
Comment #25
catchBased on the last time I looked at this, my preference would be to try to come up with an extensible (or at least configurable) solution in #2635632: Drupal XSS filtering cuts valid attributes, make it configurable / use a blacklist and mark all the other issues as duplicate - with maybe a follow-up to improve the default list that core uses.
Comment #26
jhodgdonIf that issue summary gets updated to say it's also about preserving tokens in any attribute, and that issue's priority is appropriate, that is fine with me.
Currently, that issue has no summary, so it is difficult to tell what the approach is or what "configurable/blacklist" would mean. Also, it is only Major, and I think this issue is Critical (see issue summary -- leads to data loss if you are using tokens in many HTML tag attributes, and CKEditor).
But sure, if that other issue will resolve this problem (we could take the test patch here and add it there, probably with some more examples in other attributes?), then I'm all for combining.
Comment #27
jhodgdonI took the one test here, expanded it a bit, and made a test-only patch on #2635632-21: Drupal XSS filtering cuts valid attributes, make it configurable / use a blacklist (which I then expanded further in the next comment to incorporate tests from two other related issues).
If that test-only patch is adopted, and the issue summary there is updated, I'm definitely comfortable closing this as a duplicate... I think the approach of an attribute blacklist/whitelist, even if configurable, is not going to be the right approach though. Tokens can be anywhere, and hopefully they aren't an XSS vector. If they are, we should rethink how they are formatted rather than trying to enumerate all the places where it is safe to use them and where it isn't.
Comment #28
catchIf we're going to look at changing token formatting, that should stay its own issue I think. I'd like to have one issue for the Xss::attributes() bug itself though.
Comment #29
jhodgdonWell, the question is whether the current token formatting using : is an XSS attack vector. If we're somehow going to fix Xss::attributes() so that it will let tokens through, no matter where they are, and it will still protect from XSS attacks, then we don't need to add an alternate way to format tokens. Right?
Comment #30
jhodgdonComment #32
ayalon commentedIs there a solution for this problem? There is so much text an links but I have exactly this bug and I'm looking for a patch.
Comment #33
jhodgdonThere is not a viable solution yet.
Comment #35
bramdriesenSo any movement on those sub issues? =) bumped into the same issue today, possibly related to this one: #2903336: Allow tokens for url of the Link field
Comment #36
jhodgdonNothing has changed since the last update, as far as I know.
Comment #38
tom.moffett commentedI ran into this same problem last night, and I did some debugging to come across the same problem you did. I guess my Google skills aren't as strong as I thought. I tried to see if there was an existing issue before going into the weeds of Drupal core debugging. What brought me here is that I thought I had a workaround, but it isn't satisfactory.
Same thing is happening to me when trying to use a token for the
classattribute value on anatag when I use a text editor type that has "Limit allowed HTML tags and correct faulty HTML" filter enabled (for example,<a class="[somegroup:sometoken]" ...></a>). I am using the https://www.drupal.org/project/token_filter module. I was able to get it to at least render in the site by putting the token replacement first in the text editor's "Enabled filters", but when I load a page for editing, the token gets mangled when the content is loaded in CKEditor. Doing a view-source in the browser on the edit page shows that thedata-editor-value-originalattribute on the editortextareais correct with the proper token, but the actual content of thetextareaisn't. So, it does, indeed, look like this is getting mangled before getting into CKEditor on the client-side JS. I debugged the rendering on the site part, but I haven't debugged this part. My educated guess is that the XSS utility is still mangling the content that ends up getting rendered in the CKEditor when trying to edit a page (and after re-reading the complete issue here, that definitely was mentioned and is part of this issue).Comment #39
tom.moffett commentedFor my specific case, I was able to come up with a complete workaround now. It involved creating a custom filter for the text editor, using the solution to this issue, https://www.drupal.org/node/2943974, as my inspiration. Basically, use
|to separate the token instead of:since that doesn't get stripped by the XSS utility. The custom filter comes in so that you can transform the token to the correct format it needs to be in to be recognized for replacement.I have my custom filter and then the token_filter module's filter as the first two items in the "Filter processing order". I haven't tested it, but this might still work if your processing order is different, but obviously, the custom filter would at least need to come some time before the token filter. However, if you don't want to process tokens until after the "Limit allowed HTML..." filter, then both the custom filter and the token filter should be after that one.

Sorry, but I don't have time right now to share a clean version of the full feature, but hoping that the idea could point people in the right direction. To help you along, here is the
processfunction from my custom filter:I also used the filter class from token_filter module as a guide to set up my custom filter since I was new to writing custom filters prior to this, but it was pretty easy. For reference, https://git.drupalcode.org/project/token_filter/blob/8.x-1.x/src/Plugin/.... You only really need the "process" function for your simple custom filter.
It is kind of annoying to have to do this and have this structure for tokens all over your content wherever you use them (could make maintenance annoying later on), but this issue hasn't been fixed in 2 years and I really wanted to be able to use tokens right now.
Using this approach allows the content to render properly on the site, but it also keeps the CKEditor content correct when you go to edit a page. In my previous comment, I explained how I was able to get the first part to work, but this custom filter completes the workaround and allows happy editing and usage of tokens!
Comment #40
firewaller commented+1
Comment #41
dmitry.korhovIn addition to #39 a little more completed workaround example:
Place FilterTokenAttributes.php into my_module/src/Plugin/Filter folder with content:
Comment #43
fabianx commentedOhhh - that is a very creative solution! :)
You can even use the same token format and just add two filters: One before and one after the filtered HTML filter:
- Convert tokens to use ! instead of :
- Filter HTML (like normal)
- Convert tokens to use : instead of !
That is an effective core work-around and as long as the token regexp is safe, also a safe approach.
This can then be put into a contrib module: token_xss
This can _also_ help with the colons in class problem.
Comment #44
tidoWe managed to correct this problem by testing directly the presence of brackets at the beginning and at the end of the chain to be tested in the Xss::attributes() function in order to bypass the call to UrlHelper::filterBadProtocol().
Tokens are now correctly rendered and are no longer mangled when editing the node.
Comment #45
tidoHere is the fixed patch (tested with Drupal 8.9.x).
Comment #46
rolfmeijer commented@tido The patch applies on 8.9.7 and basically it works. But there are issues when a token in a href is extended. For instance if you have a custom user orders-page at /user/[uid]/orders than href="[current-user:url:path]/orders" won’t work. It will strip of anything until the last colon, i.e. href="path]/orders". I’m not sure if this is a proper use case and has to be considered.
Comment #48
aerzas commentedRerolling patch against Drupal 9.2.x.
Comment #49
aerzas commentedAvoid notice on empty string.
Comment #50
markusd1984 commented@tom.moffett #39 and @dmitry.korhov #41, would that be possible via the custom filter module?
It has a field for the Pattern & Replacement text https://www.drupal.org/node/210553
Couldn't figure out if i just need to put into Pattern
and inside Replacement text "|" or would I need to utilise PHP Code perhaps to adopt/paste in yours?
Thanks for any help you can give, great solution I'd like to try in drupal 7.
Comment #51
scottcollierPatch against Drupal 8.9.x.
I think we only care that it starts with a token since we want to allow it to be a valid protocol.
So, all of the following should work:
Comment #53
jhodgdonThis should also be considered:
<a href="/foo/bar/[my:token:here]">link text here</a>Comment #54
firewaller commentedRe-rolled #51 against Drupal 9.2.x
Comment #55
kerasai commented#51 requires the token to be the only thing present in the first "part" (split by
/) of the URL. This doesn't work for my case, a URL such as<a href="[site:url]user/[commerce_subscription:uid:entity:uid]/orders">billing</a>.Here is an adjustment to allow any URL that begins with a token to validate, patched against
8.9.xand I expect it to apply to9.xbranches.Note, looking at the regex and the logic, this should allow the case noted in #53.
Comment #56
daggerhart commented#55 works for me. The token is maintained through multiple edits and loading of the content in the editor.
Regarding #53, My url has multiple tokens in it, and they all work as expected.
According to the issue description, fixing this replacement is the only task. I believe this is RTBC, but since I haven't seen anyone change the status of this ticket yet I'm only moving to Needs Review.
Comment #58
danflanagan8Definitely can't RTBC this one without adding test coverage. Looks like #14 has a good start to the tests. There are some other cases that have been brought up in the comments since then though.
Comment #59
liquidcms commentedNone of these patches apply to 9.1.11.
my url part of a link in editor is simply a single url token.. and on save it is replaced with "url]".
i was sure on an older project which was using 8.8.8, this worked correctly.
Comment #60
liquidcms commentedhmm, i think #55 would work but i think the patch has a typo, the last line is this:
$thisval = $skip_protocol_filtering ? $match[1] : UrlHelper::filterBadProtocol($match[1]);but pretty sure $thisval is wrong, and should just be $value. Once i change it to that, the token is not stripped and things work as expected.
corrected patch attached.
Comment #61
kerasai commentedLooks like #60 is the difference between 8.9.x and 9.x branches.
https://git.drupalcode.org/project/drupal/-/blob/8.9.x/core/lib/Drupal/C...
https://git.drupalcode.org/project/drupal/-/blob/9.3.x/core/lib/Drupal/C...
Comment #62
liquidcms commentedAh, I see, they changed the variable name.. which is good, because why would anyone ever name a variable $thisval? lol.
Comment #63
danflanagan8Here's a fail test in the spirit of #14, but with many more cases gathered from other comments in the issue.
The (partial) fix is identical to the fix from #55 except that it can handle tokens in
srcin addition tohref. I call it a partial fix because one of the test cases still fails.NOTE: The case that is still failing is
<img src="foo[my:token:here]/[another:token]">, which is getting changed to<img src="here]/[another:token]">. The output of the test is stripped of various elements. If you run the test locally you can see the true output.Comment #65
danflanagan8I'm starting to wonder if there's any way to allow tokens in src or href securely.
Before that though, note there's a typo in my patch in #63.
This should be href, not html.
Back to security. Certainly the current approach (#55 or #63 with typo fixed) is concerning. If I enable the contrib
token_filtermodule and add it to the standard Basic HTML format with the optionReplace empty valuesselected, I can very easily do some XSS.I just enter this into the CKEditor:
<a href="[fake:token]javascript:alert('gotcha!');">XSS Link</a>That comes out the other side as
<a href="javascript:alert('gotcha!');">XSS Link</a>And it could never be as simple as not allowing the string
javascript:because I could do this:<a href="[fake:token]java[fake:token]script:alert('gotcha!');">XSS Link</a>Any thoughts anyone?
Comment #67
stephencamilo commentedComment #68
danflanagan8Restoring previous status
Comment #69
amber himes matzHere's a new patch that just fixes the typo in the patch from #63.
We still need discussion on the questions raised in #65.
Comment #72
larowlanMarked #2851825: UrlHelper::filterBadProtocol() destructively truncates tokens in ckeditor link href as a duplicate
Comment #73
catchThis in turn is probably a duplicate of #2544110: XSS attribute filtering is inconsistent and strips valid attributes but I think we need to do a proper consolidation if so because there are competing approaches between the two issues.
Comment #74
amber himes matzI dug into this issue and #2544110: XSS attribute filtering is inconsistent and strips valid attributes with the intention of verifying that they were duplicates. What I found was that they are both intent on modifying
Drupal\Component\Utility\XSS's filtering in order to prevent XSS class from filtering out/mangling data from 2 different kinds of contexts:hreforsrcattributes, e.g.<a href="[node:url:absolute]">link</a>in a body field that uses CKEditor and enables "Limit allowed HTML tags and correct faulty HTML" in the text format configuration.
rel="schema:author"orproperty="foaf:name"Right now, as per the scope described in each issue summary, they are not duplicates, but if the scope was widened to consider both use cases/contexts, they could be combined into 1 issue, ensuring a compatible solution. Considering that they're both dealing with XSS class, that might be desirable.
As it stands, the patches in each issue do not solve the other's problem:
We do NOT want to allow href or src to bypass filtering altogether I would think! Which is what would happen if we used the attribute safelist approach in #2544110: XSS attribute filtering is inconsistent and strips valid attributes and just added in href and src. In this issue, we just want it to allow tokens and not mangle them up. Assuming that tokens aren’t an XSS vector (which is an open question for discussion in #65.
Could we combine these issues and their solutions? Maybe it would make sense to do so, to ensure a compatible solution that solves both problems (href, src token mangling, and RDFa Attribute handling in XSS class). We would need to increase the scope in the one that stays open. Is that worth doing? What do you think?
Final note, if this issue remained open, it would need an issue summary update, because the problem is still there (token mangling) but different in Drupal 9.4.x (the token is mangled on save, not after a re-edit only). This was mentioned in #59. But I am waiting to see what others think about combining the issues before doing so.
Comment #76
amber himes matzLooks like #2544110: XSS attribute filtering is inconsistent and strips valid attributes has been closed as a duplicate of a fixed issue. I've tested this issue again in 9.5.x and it's still a problem, although there are some minor differences in the results described in the issue summary.
I'm doing a bit more testing and plan to update the issue summary with some updated details and also some steps to test using Token Filter so that you can actually test with working tokens.
Comment #79
jibranRE #65: It is a very well-made point so thank you for that.
I feel like it shouldn't be
\Drupal\Component\Utility\Xss::attributes()responsibility. If it can happen after token replacement usingtoken_filterthen I think it should be dealt with intoken_filter. Once the token is replaced usingtoken_filterthensrcandhrefshould be passed to\Drupal\Component\Utility\UrlHelper::filterBadProtocol().Comment #80
mandclu commentedI ran into this issue today, and had to implement a messy workaround. It would great if this could get more attention.
Comment #81
richardcapricorn commentedHi all,
I used the 2 patch files provided by @danflanagan8. and turned it into a merge request.
I added the href property alongside the src and html property.
Any feedback is welcome
Best regards,
Richard Hoogstad
Comment #84
drewcking commentedThe patch of MR10819 works for me, thank you @richard_hoogstad.
Comment #86
acbramley commentedTriaged as part of BSI.
Rebased the MR which was green but well behind HEAD.
Still NW based on an IS update requested in #76 as well as questioning the direction of the solution in #65