8.x Issue: #2105841: Xss filter() mangles image captions and title/alt/data attributes
If alternative text I add to an image has a colon, then the colon and any text preceding it are lost on exiting source code mode or upon saving.
This occurs whether Advanced content filter is set to Enabled or Disabled.
This occurs whether or not a content type uses Workbench Moderation.
Steps:
- Create or edit a node.
- If the page has an image, click on it and click the Image icon in the CKEditor UI. If the page doesn't have an image, click the Image icon in the CKEditor UI and insert one.
- In the Alternative Text field, type
beginning text : colon , comma . period " quote ; semicolon ' single quote
- Click OK
- If editor has the power to Switch to plain text editor
- Click Switch to plain text editor
- Locate the image code
- Actual and expected results: image tag alt attribute is as follows
alt="beginning text : colon , comma . period " quote ; semicolon ' single quote"
- Optional additional steps:
- Switch back to WYSIWYG
- Switch back to plain-text editor
- Find the img element
- Actual result: image tag alt attribute is as follows
alt=" colon , comma . period " quote ; semicolon ' single quote"
Expected result: image tag alt attribute is as follows
alt="beginning text : colon , comma . period " quote ; semicolon ' single quote"
- Save
- View page source
- Actual result: image tag alt attribute is as follows
alt=" colon , comma . period " quote ; semicolon ' single quote"
Expected result: image tag alt attribute is as follows
alt="beginning text : colon , comma . period " quote ; semicolon ' single quote"
I also confirmed that this issue happens in a simplytest.me sandbox with Drupal core 7.54 and CKEditor - WYSIWYG HTML editor 7.x-1.17 installed.
I attempted to test this on the CKEditor nightly build demo page to see whether I needed to report this issue to CKEditor's issue queue or to the Drupal CKEditor module's issue queue, but the CKEditor demo page does not allow saving. If I insert the image, add the alt text, click the Source button to see the source, then click Source again to return to the WYSIWYG, then again to return to source, I still see the expected
alt="beginning text : colon , comma . period " quote ; semicolon ' single quote"
CKEditor issue queue search on "colon" revealed no related issues.
Comment | File | Size | Author |
---|---|---|---|
#28 | filter_removes_attribute_values-205986-28.patch | 3.1 KB | gbirch |
#26 | 2859006-26.patch | 6.97 KB | joegraduate |
#25 | 2859006-25.patch | 7.02 KB | joegraduate |
#25 | interdiff_2859006_19-25.txt | 1.25 KB | joegraduate |
#24 | core-attribute_filtering-2859006-24.patch | 2.6 KB | jenlampton |
Issue fork drupal-2859006
Show commands
Start within a Git clone of the project using the version control instructions.
Or, if you do not have SSH keys set up on git.drupalcode.org:
Comments
Comment #2
Charles BelovComment #3
Charles BelovExpanded possible causes and removed current build of pure CKEditor as a possible culprit.
Comment #4
Charles BelovAdded results of simplytest.me test.
Comment #5
Charles BelovMinor text rearrangement/edit to avoid ambiguity
Comment #6
Charles BelovMore text adjustements
Comment #7
Charles BelovSubsequent colons may lose text or not.
Block 1: "1/3 of SF residents enjoy biking. Just 15% do not.” Block 2: “51% residents can bike, but won’t in San Francisco.” Block 3: “7 in 10 people cite safety concerns as a major impact on their decision to bike."
came out as
"1/3 of SF residents enjoy biking. Just 15% do not.” Block 2: “51% residents can bike, but won’t in San Francisco.” Block 3: “7 in 10 people cite safety concerns as a major impact on their decision to bike."
(first colon and everything before it is lost)
while:
before colon: "A Glance at the SFMTA’s Bike Program for 2017 - 2021." :after colon
gave
after colon
(last colon and everything before it is lost)
Comment #8
Charles BelovComment #9
szt CreditAttribution: szt commentedSame problem.
I'm trying to put a token to the alt field: "[node:title]"
After saving the node remains just the end of it: "title]"
Comment #10
gifad CreditAttribution: gifad commentedI don't think that this is CKEditor related; see the related core issue.
Patch at #97 solved the problem for me…
Edit: Note that this patch is an explicit requirement for use of the Enitity Embed module !
Comment #11
Charles BelovClosing as duplicate of #2105841.
Comment #12
Charles BelovChanging project to Drupal Core
Comment #13
Charles BelovCorrecting version to 7.x
Comment #14
geek-merlinSee patch in #2105841-111: Xss filter() mangles image captions and title/alt/data attributes
Comment #15
joegraduateThe attached patch is a re-roll of the last D7 patch from #2105841: Xss filter() mangles image captions and title/alt/data attributes that applies cleanly against the latest 7.x branch.
Comment #16
ron_s CreditAttribution: ron_s commentedHad been using the old patch in #2105841 and this new one in #15 to meet the requirement for the Entity Embed module. Works as expected.
Comment #17
mcdruidThe D8 parent is rather long.
For the purposes of reviewing this backport, here are a couple of highlights from it:
* #64 has a summary of the problem and the approach to fixing it by @pfrenssen.
* #20 to #32 suggest, discuss and review the idea of whitelisting certain attributes as safe / not requiring filtering.
* #107 @David_Rothstein reconsiders whether the D7 backport patch is safe to commit as is to D7 due to concerns about arbitrary whitelisting of
data-
attributes.Comment #18
Fabianx CreditAttribution: Fabianx as a volunteer and at Tag1 Consulting commentedThis NEEDS WORK as the patch is incomplete:
The new function is never called outside of tests.
Also the tests do test the wrong thing - filter_xss SHOULD be identical to the with attribute filtering.
As applying e.g. #15 to a site is a potential serious security issue, I am marking this issue as "DO NOT USE".
We can re-visit once we have a good patch.
(I was searching for something that won't filter out `:` from `class` to use Tailwind and landed here).
D8 patch is here for reference:
https://git.drupalcode.org/project/drupal/commit/2febbff
Thanks,
Fabian
Comment #19
ron_s CreditAttribution: ron_s commented@Fabianx, thank you for highlighting this. It's clear the patch in #15 is incomplete.
I have reviewed the D8 patch you referenced and modified the code to include
filter_xss_data_attributes
as part offilter_xss
.I also modified the tests to simply call
filter_xss
, since the data attributes function is now part of whatfilter_xss
does.Please review and provide feedback, and I'll update as necessary. Thanks.
Comment #20
joegraduateThanks @Fabianx and @ron_s! Attached is an interdiff for patch #15 & #19.
Comment #22
jenlamptonWe ran into a similar issue for Backdrop: https://github.com/backdrop/backdrop-issues/issues/1201.
The problem was that
filter_xss()
has a portion for stripping unknown protocols (e.g. http, ftp, news, mailto, etc). What was happening was that a caption likedata-caption="Image: The Caption"
was being parsed to strip any unknown protocols. So "Image:" was considered a protocol, which wasn't in the white list, and so was stripped out.To fix this issue, we skipped the checking of protocols on all data-attributes (as well as alt and title, added later).
Since we don't know what type information could be stored in data attributes, we decided we shouldn't assume it was safe to strip protocols, or anything else for that matter. We decided that it should be left up to whatever code is using the attribute to decide if and how the attribute value is sanitized.
In our case, the
data-caption
attribute is something that is supported in core. We runfilter_xss()
on the attribute value when the image tag with the data-caption attribute is replaced with figure and figcaption tags (and when the data-caption attribute is removed).We chose not to run
filter_xss()
on any unknown data attribute value, since we were unsure of what might happen if the data attribute contained only JSON (for example).Comment #23
jenlamptonI'm attaching a patch here that's a D7 equivalent of our fix, for comparison. (We added the same treatment for alt and title later on)
Comment #24
jenlamptonHere's a patch that includes the changes for alt, title, and data-attributes, as well as tests for those changes. This patch should be safe to use.
Changing issue status to NR, but I want to highlight that this patch is quite different from the patch in #19 and is not an iteration on that approach.
The approach in #19 more closely matches the filtering in Drupal 8, but I believe running all data attributes through
filter_xss_admin()
could have serious consequences. Since data attributes often contain JSON (for example:data-x1="{ foo: 'something', bar: 'something else' }"
) I think this approach will be less likely to cause problems for existing sites.Comment #25
joegraduateThe attached patch is an iteration of the patch in #19 that should fix the failing test.
I'm not necessarily trying to make a statement about whether the approach used in patch #19 should be used instead of the approach used in #24 but noticed #19's tests were failing for a relatively trivial reason (due to self-closing img tag being generated by the filter_dom_* functions used in the new
filter_xss_data_attributes()
function).Comment #26
joegraduate#25 no longer applies cleanly to 7.x-dev after the changes from SA-CORE-2021-002. The attached patch is a re-roll of #25 that should apply cleanly to the latest 7.x-dev
Comment #27
ron_s CreditAttribution: ron_s commented@joegraduate, thank you for the re-roll.
Comment #28
gbirch CreditAttribution: gbirch at Tag1 Consulting commentedThis is a more limited version of the agreed approach. It does not address data- attributes, but does fix the class attribute, which is particularly useful for tailwind.
Comment #29
drummReported at #3396480: A colon in alt text breaks the image description for Drupal.org
Comment #32
apadernoI created a MR from the last patch because it failed to apply. I also fixed a comment, which was repeating are twice, and used
array()
for an array literal instead of the short syntax.The patch from that MR applied to the 7.x branch, even though I applied patches from other issues.