8.x Issue: #2105841: Xss filter() mangles image captions and title/alt/data attributes

If alternative text I add to an image has a colon, then the colon and any text preceding it are lost on exiting source code mode or upon saving.

This occurs whether Advanced content filter is set to Enabled or Disabled.
This occurs whether or not a content type uses Workbench Moderation.

Steps:

  1. Create or edit a node.
  2. If the page has an image, click on it and click the Image icon in the CKEditor UI. If the page doesn't have an image, click the Image icon in the CKEditor UI and insert one.
  3. In the Alternative Text field, type
    beginning text : colon , comma . period " quote ; semicolon ' single quote
  4. Click OK
  5. If editor has the power to Switch to plain text editor
    1. Click Switch to plain text editor
    2. Locate the image code
    3. Actual and expected results: image tag alt attribute is as follows
      alt="beginning text : colon , comma . period " quote ; semicolon ' single quote"
    4. Optional additional steps:
      1. Switch back to WYSIWYG
      2. Switch back to plain-text editor
      3. Find the img element
      4. Actual result: image tag alt attribute is as follows
        alt=" colon , comma . period " quote ; semicolon ' single quote"
        Expected result: image tag alt attribute is as follows
        alt="beginning text : colon , comma . period " quote ; semicolon ' single quote"
  6. Save
  7. View page source
  8. Actual result: image tag alt attribute is as follows
    alt=" colon , comma . period " quote ; semicolon ' single quote"
    Expected result: image tag alt attribute is as follows
    alt="beginning text : colon , comma . period " quote ; semicolon ' single quote"

I also confirmed that this issue happens in a simplytest.me sandbox with Drupal core 7.54 and CKEditor - WYSIWYG HTML editor 7.x-1.17 installed.

I attempted to test this on the CKEditor nightly build demo page to see whether I needed to report this issue to CKEditor's issue queue or to the Drupal CKEditor module's issue queue, but the CKEditor demo page does not allow saving. If I insert the image, add the alt text, click the Source button to see the source, then click Source again to return to the WYSIWYG, then again to return to source, I still see the expected
alt="beginning text : colon , comma . period " quote ; semicolon ' single quote"

CKEditor issue queue search on "colon" revealed no related issues.

Issue fork drupal-2859006

Command icon Show commands

Start within a Git clone of the project using the version control instructions.

Or, if you do not have SSH keys set up on git.drupalcode.org:

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Charles Belov created an issue. See original summary.

Charles Belov’s picture

Title: Image alternative text loses text preceding colon » Image alternative text loses text preceding colon upon saving node
Charles Belov’s picture

Title: Image alternative text loses text preceding colon upon saving node » Image alternative text loses text preceding colon upon leaving plain-text editor or upon saving node
Issue summary: View changes

Expanded possible causes and removed current build of pure CKEditor as a possible culprit.

Charles Belov’s picture

Issue summary: View changes

Added results of simplytest.me test.

Charles Belov’s picture

Issue summary: View changes

Minor text rearrangement/edit to avoid ambiguity

Charles Belov’s picture

Issue summary: View changes

More text adjustements

Charles Belov’s picture

Subsequent colons may lose text or not.

Block 1: "1/3 of SF residents enjoy biking. Just 15% do not.” Block 2: “51% residents can bike, but won’t in San Francisco.” Block 3: “7 in 10 people cite safety concerns as a major impact on their decision to bike."

came out as

"1/3 of SF residents enjoy biking. Just 15% do not.” Block 2: “51% residents can bike, but won’t in San Francisco.” Block 3: “7 in 10 people cite safety concerns as a major impact on their decision to bike."

(first colon and everything before it is lost)

while:

before colon: "A Glance at the SFMTA’s Bike Program for 2017 - 2021." :after colon

gave

after colon

(last colon and everything before it is lost)

Charles Belov’s picture

szt’s picture

Version: 7.x-1.17 » 7.x-1.x-dev

Same problem.
I'm trying to put a token to the alt field: "[node:title]"
After saving the node remains just the end of it: "title]"

gifad’s picture

I don't think that this is CKEditor related; see the related core issue.

Patch at #97 solved the problem for me…

Edit: Note that this patch is an explicit requirement for use of the Enitity Embed module !

Charles Belov’s picture

Status: Active » Closed (duplicate)

Closing as duplicate of #2105841.

Charles Belov’s picture

Project: CKEditor 4 - WYSIWYG HTML editor » Drupal core
Version: 7.x-1.x-dev » 9.x-dev
Component: Code » ajax system

Changing project to Drupal Core

Charles Belov’s picture

Version: 9.x-dev » 7.x-dev

Correcting version to 7.x

geek-merlin’s picture

Issue summary: View changes
Status: Closed (duplicate) » Needs review
joegraduate’s picture

The attached patch is a re-roll of the last D7 patch from #2105841: Xss filter() mangles image captions and title/alt/data attributes that applies cleanly against the latest 7.x branch.

ron_s’s picture

Status: Needs review » Reviewed & tested by the community

Had been using the old patch in #2105841 and this new one in #15 to meet the requirement for the Entity Embed module. Works as expected.

Entity Embed Requirements

7.x-3.x

Drupal 7.37 or later patched with #2105841-97: Xss filter() mangles image captions and title/alt/data attributes.

  • Editor CKEditor - A submodule of the Editor module.
  • Entity Reference - 7.x-1.2 or later.
mcdruid’s picture

The D8 parent is rather long.

For the purposes of reviewing this backport, here are a couple of highlights from it:

* #64 has a summary of the problem and the approach to fixing it by @pfrenssen.
* #20 to #32 suggest, discuss and review the idea of whitelisting certain attributes as safe / not requiring filtering.
* #107 @David_Rothstein reconsiders whether the D7 backport patch is safe to commit as is to D7 due to concerns about arbitrary whitelisting of data- attributes.

Fabianx’s picture

Title: Image alternative text loses text preceding colon upon leaving plain-text editor or upon saving node » DO NOT USE - Image alternative text loses text preceding colon upon leaving plain-text editor or upon saving node
Status: Reviewed & tested by the community » Needs work

This NEEDS WORK as the patch is incomplete:

The new function is never called outside of tests.

Also the tests do test the wrong thing - filter_xss SHOULD be identical to the with attribute filtering.

As applying e.g. #15 to a site is a potential serious security issue, I am marking this issue as "DO NOT USE".

We can re-visit once we have a good patch.

(I was searching for something that won't filter out `:` from `class` to use Tailwind and landed here).

D8 patch is here for reference:

https://git.drupalcode.org/project/drupal/commit/2febbff

Thanks,

Fabian

ron_s’s picture

Status: Needs work » Needs review
FileSize
7.42 KB

@Fabianx, thank you for highlighting this. It's clear the patch in #15 is incomplete.

I have reviewed the D8 patch you referenced and modified the code to include filter_xss_data_attributes as part of filter_xss.

I also modified the tests to simply call filter_xss, since the data attributes function is now part of what filter_xss does.

Please review and provide feedback, and I'll update as necessary. Thanks.

joegraduate’s picture

FileSize
3.05 KB

Thanks @Fabianx and @ron_s! Attached is an interdiff for patch #15 & #19.

Status: Needs review » Needs work

The last submitted patch, 19: drupal-image_alternative_text-2859006-19.patch, failed testing. View results

jenlampton’s picture

We ran into a similar issue for Backdrop: https://github.com/backdrop/backdrop-issues/issues/1201.

The problem was that filter_xss() has a portion for stripping unknown protocols (e.g. http, ftp, news, mailto, etc). What was happening was that a caption like data-caption="Image: The Caption" was being parsed to strip any unknown protocols. So "Image:" was considered a protocol, which wasn't in the white list, and so was stripped out.

To fix this issue, we skipped the checking of protocols on all data-attributes (as well as alt and title, added later).

Since we don't know what type information could be stored in data attributes, we decided we shouldn't assume it was safe to strip protocols, or anything else for that matter. We decided that it should be left up to whatever code is using the attribute to decide if and how the attribute value is sanitized.

In our case, the data-caption attribute is something that is supported in core. We run filter_xss() on the attribute value when the image tag with the data-caption attribute is replaced with figure and figcaption tags (and when the data-caption attribute is removed).

We chose not to run filter_xss() on any unknown data attribute value, since we were unsure of what might happen if the data attribute contained only JSON (for example).

jenlampton’s picture

I'm attaching a patch here that's a D7 equivalent of our fix, for comparison. (We added the same treatment for alt and title later on)

jenlampton’s picture

Title: DO NOT USE - Image alternative text loses text preceding colon upon leaving plain-text editor or upon saving node » Image alternative text loses text preceding colon upon leaving plain-text editor or upon saving node
Component: ajax system » filter.module
Status: Needs work » Needs review
FileSize
2.6 KB

Here's a patch that includes the changes for alt, title, and data-attributes, as well as tests for those changes. This patch should be safe to use.

Changing issue status to NR, but I want to highlight that this patch is quite different from the patch in #19 and is not an iteration on that approach.

The approach in #19 more closely matches the filtering in Drupal 8, but I believe running all data attributes through filter_xss_admin() could have serious consequences. Since data attributes often contain JSON (for example: data-x1="{ foo: 'something', bar: 'something else' }") I think this approach will be less likely to cause problems for existing sites.

joegraduate’s picture

The attached patch is an iteration of the patch in #19 that should fix the failing test.

I'm not necessarily trying to make a statement about whether the approach used in patch #19 should be used instead of the approach used in #24 but noticed #19's tests were failing for a relatively trivial reason (due to self-closing img tag being generated by the filter_dom_* functions used in the new filter_xss_data_attributes() function).

joegraduate’s picture

#25 no longer applies cleanly to 7.x-dev after the changes from SA-CORE-2021-002. The attached patch is a re-roll of #25 that should apply cleanly to the latest 7.x-dev

ron_s’s picture

@joegraduate, thank you for the re-roll.

gbirch’s picture

This is a more limited version of the agreed approach. It does not address data- attributes, but does fix the class attribute, which is particularly useful for tailwind.

drumm’s picture

apaderno made their first commit to this issue’s fork.

apaderno’s picture

I created a MR from the last patch because it failed to apply. I also fixed a comment, which was repeating are twice, and used array() for an array literal instead of the short syntax.

The patch from that MR applied to the 7.x branch, even though I applied patches from other issues.