Follow-up to #2371861: Strings including tokens in href or src attributes cannot be translated due to safeness check incompatibilities

Disclaimer: I'm not sure in which component I have to put this issue. Feel free to move it.

Problem/Motivation

The locale_string_is_safe function is used by the Locale module to ensure that translations are safe before saving them in the database. It does so by comparing an Xss::filter-ed version to the original version so nothing serious can happen. If that function fails, the string become totally untranslatable unless you want to manually put your translation into the database.

In some cases (see the parent issue), when a href attribute (or an attribute supposed to include an URL) starts with a token, this one is wrongly and partially stripped out so the comparison fails and so the string becomes untranslatable.

There is a simple example :

\Drupal\Component\Utility\Xss::filter('Return to the <a href="[site:url]">frontpage</a>');
// 'Return to the <a href="url]">frontpage</a>'

This issue affects two strings in core (Locale module's tour configuration) but it could affect a lot of contrib modules (and it's probably already the case). The parent issue is about to bypass this particular problem in the translation context but it would be safer and stronger to directly fix Xss::filter.

Proposed resolution

Fix Xss::filter to avoid it stripping these tokens.

Remaining tasks

Contributor tasks needed
Task Novice task? Contributor instructions Complete?
Create a patch Instructions
Add automated tests Instructions

User interface changes

None

API changes

None

Comments

ericmulder1980’s picture

Component: markup » base system
Assigned: Unassigned » ericmulder1980
Status: Postponed » Active

I will work on this issue today during the Amsterdam Core sprint.

ericmulder1980’s picture

Small update on how to reproduce this issue.

1. Install Drupal core
2. Enable Interface Translation, Tour and Language modules
3. Add an extra language to your site (admin/config/regional/language)
4. Go to Interface translation interface (admin/config/regional/translate)
5. Search for the sring "This page allows you to".
6. Copy the matched source string into the translation text field without altering it.
7. Hit the "save translation" button

You will now receive an error message stating "The submitted string contains disallowed HTML".

ericmulder1980’s picture

Issue seems to be in method Xss::split(). String is valid when entering this method and messed up when returning the value.

I will try to remain active in solving this issue but feel free to "hijack" this issue.

ericmulder1980’s picture

Assigned: ericmulder1980 » Unassigned
chx’s picture

I would subclass Xss and change the case 2 in attributes to accept a well formatted token. Changing Xss worries me a bit but we need more security team on this issue.

penyaskito’s picture

Issue tags: +Needs security review

I think this is the right tag to get more exposure to the secteam.

dawehner’s picture

One thing i'm wondering is: Why would you ever use Xss::filter on something which is not meant as HTML yet?

Version: 8.1.x-dev » 8.2.x-dev

Drupal 8.1.0-beta1 was released on March 2, 2016, which means new developments and disruptive changes should now be targeted against the 8.2.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

stefan.r’s picture

I just ran into this issue while translating a string with a token in it in the translation interface.

Regarding #7, just an example... locale_string_is_safe() is run during validation when submitting a translated string, which does decode_entities($string) == decode_entities(filter_xss($string)), which seems correct to me.

So the issue is that a string like <a href="[site:url]">Home</a> erroneously gets filtered into <a href="url]">Home</a>

stefan.r’s picture

Issue tags: +Needs backport to D7

Version: 8.2.x-dev » 8.3.x-dev

Drupal 8.2.0-beta1 was released on August 3, 2016, which means new developments and disruptive changes should now be targeted against the 8.3.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Eric_A’s picture

Priority: Normal » Major
Issue summary: View changes

I changed the PHP code markup in the issue summary so it now actually shows the issue. Also, AFAIK there's no workaround for this, major bug to me.
Question: is this a regression introduced in D7 by #2371861: Strings including tokens in href or src attributes cannot be translated due to safeness check incompatibilities? Guess not... Then how long does this terrible bug exists?

Eric_A’s picture

I needed a fix now for _wysiwyg_filter_xss_attributes() for the same bug so I posted a patch in wysiwyg_filter.

The code should be the same the same though, apart from it being a D7 patch. Not sure if and when I'm going to find the time to port to core D8.

Here is the wysiwyg_filter patch:

diff --git a/wysiwyg_filter.pages.inc b/wysiwyg_filter.pages.inc
index ce2a02e..5c11beb 100644
--- a/wysiwyg_filter.pages.inc
+++ b/wysiwyg_filter.pages.inc
@@ -435,7 +435,24 @@ function _wysiwyg_filter_xss_attributes($attr, $element = '') {
       else {
         // All attribute values are checked for bad protocols. This is the same
         // exact method used by Drupal's filter_xss().
-        $attrinfo['value'] = filter_xss_bad_protocol($attrinfo['value']);
+        // Don't check for bad protocols if the attribute value starts with a well formatted token.
+        // @see https://www.drupal.org/node/2372127.
+        $starts_with_token = FALSE;
+        $attr_tokens = token_scan($attrinfo['value']);
+        foreach ($attr_tokens as $type) {
+          foreach ($type as $part => $token) {
+            if (strpos($attrinfo['value'], $token) === 0) {
+              $starts_with_token = TRUE;
+              break;
+            }
+          }
+          if ($starts_with_token) {
+            break;
+          }
+        }
+        if (!$starts_with_token) {
+          $attrinfo['value'] = filter_xss_bad_protocol($attrinfo['value']);
+        }
 
         // If this is <a href> element, then check domain name for rel="nofollow" policies in effect.
         if ($element == 'a' && $attrname == 'href' && $nofollow_policy != 'disabled' && !$add_nofollow) {
darrenwh’s picture

Hi,
Looking at the function filter in Xss.php I've isolated the line that does this change to line 94:

      <[^>]*(>|$)       # a string that starts with a <, up until the > or the end of the string

Adding a colon to the expression like so :

      <[^:>]*(>|$)       # a string that starts with a <, up until the > or the end of the string

returns a string like <a href="[site:url]"&gt;frontpage</a>

So I think that inside the first part of the regex we need to add more a specific pattern for a tag to allow it to be excluded from filtering?

cilefen’s picture

Title: Prevent Xss::filter to strip a token at the beginning of an html attribute » Prevent Xss::filter from stripping a token at the beginning of an html attribute

Version: 8.3.x-dev » 8.4.x-dev

Drupal 8.3.0-alpha1 will be released the week of January 30, 2017, which means new developments and disruptive changes should now be targeted against the 8.4.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

catch’s picture

cilefen’s picture

catch’s picture

markusd1984’s picture

Big thanks @Eric_A for #13, saved my day! Surprised there aren't any modules yet to implement a filter to work around this.

Only found this useful posts Allow tokens for url of the Link field and custom filter CKEditor mangles tokens in URLs, due to bug in Xss::attributes() that I couldn't test yet and thinks if not for drupal 7?!?