Problem/Motivation

The patch in #2273923: Remove html => TRUE option from l() and link generator. adds the following code:
$log_text = SafeMarkup::set(Unicode::truncate(Xss::filter($message, array()), 56, TRUE, TRUE));

As pointed out in comment #2, the assumption that this is safe because truncating a safe string results in a safe string is not true as truncating a string may lead to a malformed HTML entity.

Proposed resolution

Several approaches for a resolution have been proposed. To summarize these:

  • In #3 and #5, a parameter was added to Xss::filter() to indicate whether SafeMarkup should be called before returning the truncated string.

    We're not taking this approach because, as stated in #6, adding a parameter to Xss::filter() is out of scope of this issue.

  • The next approach (#9, #14, #20, #23, #24, #32) was to add a method to the SafeMarkup class instead of adding a parameter to Xss::filter().

    We're not taking this approach because, as stated in #33, it mixes sanitization with SafeMarkup.

  • Then a completely new approach (#35, #36, #38, #42, #43, #51, #54, #56, #61) was proposed which would add a method to the Xss class.

    We're not taking this approach because, from #62, truncation has nothing to do with Xss filtering and we shouldn't go over the strings twice.

  • A minimal patch was provided in #69 which removes does not preserve any HTML tags in the truncated output.

    Comments #71 and #72 indicate that this approach is appropriate for this issue as it simply removes SafeMarkup::set().

Files: 
CommentFileSizeAuthor
#88 2399261-88.patch4.91 KBstefan.r
PASSED: [[SimpleTest]]: [PHP 5.5 MySQL] 101,345 pass(es). View
#88 interdiff-85-88.txt1.16 KBstefan.r
#85 2399261-85.patch4.82 KBstefan.r
PASSED: [[SimpleTest]]: [PHP 5.5 MySQL] 100,859 pass(es). View
#85 interdiff-83-85.txt735 bytesstefan.r
#83 2399261-83.patch4.8 KBstefan.r
FAILED: [[SimpleTest]]: [PHP 5.5 MySQL] 100,849 pass(es), 1 fail(s), and 0 exception(s). View
#83 interdiff-69-83.txt2.76 KBstefan.r
#69 2399261-69.patch1.94 KBpwolanin
PASSED: [[SimpleTest]]: [PHP 5.5 MySQL] 100,689 pass(es). View
#61 2399261_61.patch22.16 KBchx
PASSED: [[SimpleTest]]: [PHP 5.5 MySQL] 99,020 pass(es). View
#61 interdiff.txt629 byteschx
#32 interdiff.txt3 KBeffulgentsia
#32 2399261.32.patch4.49 KBeffulgentsia
PASSED: [[SimpleTest]]: [PHP 5.4 MySQL] 96,900 pass(es). View
#24 2399261.24.patch3.82 KBeffulgentsia
PASSED: [[SimpleTest]]: [PHP 5.4 MySQL] 96,988 pass(es). View
#23 2399261-23.patch7.22 KBpwolanin
PASSED: [[SimpleTest]]: [PHP 5.4 MySQL] 96,935 pass(es). View
#20 2399261.20.patch3.75 KBeffulgentsia
PASSED: [[SimpleTest]]: [PHP 5.4 MySQL] 96,918 pass(es). View
#14 2399261.13.patch965 byteseffulgentsia
FAILED: [[SimpleTest]]: [PHP 5.4 MySQL] 96,900 pass(es), 2 fail(s), and 0 exception(s). View
#9 2399261.9.patch4.43 KBalexpott
PASSED: [[SimpleTest]]: [PHP 5.4 MySQL] 96,927 pass(es). View
#9 5-9-interdiff.txt4.69 KBalexpott
#5 interdiff-2399261-5.txt628 bytescrowdcg
#5 2399261-5.patch3.02 KBcrowdcg
PASSED: [[SimpleTest]]: [PHP 5.4 MySQL] 96,868 pass(es). View
#3 2399261-3.patch2.99 KBcrowdcg
PASSED: [[SimpleTest]]: [PHP 5.4 MySQL] 96,883 pass(es). View

Comments

Cottser’s picture

Priority: Normal » Major
Issue tags: +Twig, +D8 Accelerate
Parent issue: » #2280965: [meta] Remove every SafeMarkup::set() call
pwolanin’s picture

It is certainly NOT safe to assume a truncated safe string is safe:

[2] self> \Drupal\Component\Utility\Unicode::truncate('<em>lalalaa</em>', 12);
"<em>lalalaa<"
crowdcg’s picture

Status: Active » Needs review
FileSize
2.99 KB
PASSED: [[SimpleTest]]: [PHP 5.4 MySQL] 96,883 pass(es). View

Truncating a string may lead to a malformed HTML entity, so we then need to filter it again to make sure all HTML entities are well formed. This removes SafeMarkup::set for the original string, which may be arbitrarily long and only sets as safe the truncated string.

@xjm and @pwolanin discussed this API change in IRC earlier today, which adds an optional, backwards compatible final param $set_safe_markup to Xss::filter().

mpdonadio’s picture

+++ b/core/lib/Drupal/Component/Utility/Xss.php
@@ -43,6 +43,8 @@ class Xss {
+   * @param bool $set_safe_markup
+   *   If TRUE call SafeMarkup::set() on result before returning it.
    *

Should this mention that it is optional and defaults to TRUE?, Maybe

+   * @param bool $set_safe_markup
+   *   (optional) If TRUE, call SafeMarkup::set() on result before returning it.  Defaults to TRUE.

Not sure where 80 cols is there.

crowdcg’s picture

Title: Add SafeMarkup::set() to Unicode::truncate() if the original string is safe? » Remove SafeMarkup::set and Recheck and Mark Safe the Output of Unicode::truncate() in DbLog
FileSize
3.02 KB
PASSED: [[SimpleTest]]: [PHP 5.4 MySQL] 96,868 pass(es). View
628 bytes

Thanks @mpdonadio, I've changed the text as follows.

+   * @param bool $set_safe_markup
+   *   (optional) Defaults to TRUE and calls SafeMarkup::set() on result before
+   *   returning it.

Edit: Also updated the issue title.

effulgentsia’s picture

+++ b/core/modules/dblog/src/Controller/DbLogController.php
@@ -185,9 +185,7 @@ public function overview() {
-        $log_text = SafeMarkup::set(Unicode::truncate(Xss::filter($message, array()), 56, TRUE, TRUE));
+        $log_text = Xss::filter(Unicode::truncate(Xss::filter($message, array(), FALSE), 56, TRUE, TRUE), array());

+1 to changing the SafeMarkup::set() to Xss:filter(), but the rest of this patch seems out of scope to me. If we want to add a new parameter to Xss::filter(), can we do that in a separate issue to discuss the motivation for that? Or is there a compelling need to pass in FALSE in the above line?

effulgentsia’s picture

Also, why do we need to Xss::filter() the original message to begin with? Why not truncate the unfiltered message, then filter just the once? That would also keep the long original out of the SafeMarkup registry, but without needing a new parameter in Xss::filter().

alexpott’s picture

Flags are a bit of an anti-pattern http://martinfowler.com/bliki/FlagArgument.html. Why can't we just add a new method for this?

alexpott’s picture

FileSize
4.69 KB
4.43 KB
PASSED: [[SimpleTest]]: [PHP 5.4 MySQL] 96,927 pass(es). View

Here's what I'm suggesting. It has the advantage of one less conditional and being extremely explicit when you are and when you are not making something as safe when using the method.

pwolanin’s picture

That's fine for me, as long as we have some way to filter without adding to the safe strings list.

pwolanin’s picture

Status: Needs review » Reviewed & tested by the community

@effulgentsia the inner call serves to strip all the HTML tags. we want to truncate after stripping the tags so we actually have some meaningful text.

As above, the outer call makes sure we have well-formed html entities after truncating and marks the string safe.

effulgentsia’s picture

the inner call serves to strip all the HTML tags

Oh, I missed that we were passing an empty tag array in. In that case, why not use strip_tags() instead?

effulgentsia’s picture

Status: Reviewed & tested by the community » Needs review

And, if we're stripping all tags anyway, can't we then just pass the truncated (not marked safe) message to l() and let it autoescape it? See patch.

effulgentsia’s picture

FileSize
965 bytes
FAILED: [[SimpleTest]]: [PHP 5.4 MySQL] 96,900 pass(es), 2 fail(s), and 0 exception(s). View

See patch.

Helps to upload one, I guess :)

pwolanin’s picture

@effulgentsia no, that's not right, since it would double escape HTML entities.

pwolanin’s picture

Also, strip_tags() should never be used. it's a broken, unsafe function.

Please commit #9.

pwolanin’s picture

Status: Needs review » Needs work

From chat w/ effulgentsia , if the tests don't fail on his, that means we are not testing for the presence of a valid HTML entity in the message, so we should add that also.

The last submitted patch, 14: 2399261.13.patch, failed testing.

pwolanin’s picture

Test fails do look html entity escaping-related:

	user	06/14/2015 - 17:38	Deleted user: vTedtpRq &lt;vTedtpRq@example.com&gt;.	xGSKIajE
effulgentsia’s picture

Status: Needs work » Needs review
FileSize
3.75 KB
PASSED: [[SimpleTest]]: [PHP 5.4 MySQL] 96,918 pass(es). View

Yay for test coverage!

How about this then?

effulgentsia’s picture

+++ b/core/lib/Drupal/Component/Utility/SafeMarkup.php
@@ -329,4 +329,43 @@ public static function replace($search, $replace, $subject) {
+    // Strip tags prior to truncation, so that the output has useful text.
+    $string = strip_tags($string);

If at some point we want to preserve HTML tags in the output, see also #221257: text_summary() should output valid HTML and Unicode text, and not count markup characters as part of the text length.

pwolanin’s picture

Status: Needs review » Needs work

I still think we do not want to use strip_tags(). We also need a method (or flag) to be able to Xss filter without setting the string into the safe list.

pwolanin’s picture

Issue summary: View changes
Status: Needs work » Needs review
FileSize
7.22 KB
PASSED: [[SimpleTest]]: [PHP 5.4 MySQL] 96,935 pass(es). View

Here's the combo patch

effulgentsia’s picture

FileSize
3.82 KB
PASSED: [[SimpleTest]]: [PHP 5.4 MySQL] 96,988 pass(es). View

I still think we do not want to use strip_tags().

Why not? What's the difference between Xss::filter() with an empty tag list and strip_tags()?

We also need a method (or flag) to be able to Xss filter without setting the string into the safe list.

Why add that to this issue unless there's a use case in this issue for it? That goes back then to the prior question.

Furthermore, can we remove the final Xss:filter() by decoding entities before truncation? In fact, wouldn't that improve the truncation, since it would see the entity as a single character? See patch.

pwolanin’s picture

Status: Needs review » Needs work

strip_tags() is known to be broken in a variety of ways and should not be used: e.g. see http://stackoverflow.com/questions/1972442/when-strip-tags-burns-a-haystack

the Xss::filter() code does work to validate the HTML and entities.

effulgentsia’s picture

We use strip_tags() in >300 places throughout core. That stack overflow page only mentions that it strips non-tag text (which is maybe undesirable, but not a security problem) if you give it malformed HTML as input. But in this patch's usage, we're giving it input we made when logging the message, so we have control over it being well formed.

While Xss::filter() works, it creates confusion for why it's being called, because we're not trying to filter for XSS here, we're trying to truncate a string that contains HTML tags and entities in a way that strips the tags and preserves the entities. Using the Xss class for that is overkill and relies on its side effects rather than its purpose.

pwolanin’s picture

Status: Needs work » Reviewed & tested by the community

Ok, given that we make it plain at the end, maybe the downsides of using strip_tags() are outweighed by the simplicity.

pwolanin’s picture

Status: Reviewed & tested by the community » Needs review

Actually - think this will still be a bit odd - we will be truncating to the requested number of *visible* characters, but the string could be longer than the requested length because we escape some HTML entities again.

pwolanin’s picture

In brief - assume the string to be filtered has &amp; in it.

We would decode it to & and truncate the string to the desired length, but the final checkPlain() would encode it as &amp; again so now the string is 4 characters longer than requested (though will display in the browser as the requested length).

Maybe this is ok if we care about display vs. actual character count?

Cottser’s picture

I think the behaviour is fine, the docs could perhaps be clearer, the interpretation of "text characters" is a bit up in the air to me:

+++ b/core/lib/Drupal/Component/Utility/SafeMarkup.php
@@ -329,4 +329,45 @@ public static function replace($search, $replace, $subject) {
+   * Truncates an HTML string safely to a number of text characters.
...
+   *   The string to truncate. HTML tags are stripped prior to truncation, but
+   *   HTML entities that fit in the output length are preserved.
mpdonadio’s picture

Since we sanitize for output, and shouldn't be using these eg before shoving strings into the database, perhaps "Truncates an HTML string safely to a number of rendered text characters." ?

effulgentsia’s picture

Issue tags: +Needs tests
FileSize
4.49 KB
PASSED: [[SimpleTest]]: [PHP 5.4 MySQL] 96,900 pass(es). View
3 KB

Clarified docs. Anyone willing to write some unit tests for SafeMarkup::truncate()?

xjm’s picture

Status: Needs review » Needs work

So TBH I don't think we should be adding more methods with sanitization and HTML manipulation logic to SafeMarkup; its primary purpose should just be as a value object of safe strings. I think this is perhaps more the domain of the Xss object or maybe the Html object. I checked with @alexpott and he felt similarly.

#2506195: Remove SafeMarkup::set() from Xss::filter() already decouples Xss::filter() along those lines, and also provides a way to avoid the double string list entry we have in HEAD.

pwolanin’s picture

@xjm - so should we just inline the various calls here instead of creating a helper method?

It feels like in general people won't necessarily get a safe truncation?

chx’s picture

Status: Needs work » Needs review
FileSize
6.45 KB
FAILED: [[SimpleTest]]: [PHP 5.5 MySQL] 98,277 pass(es), 1 fail(s), and 0 exception(s). View

> It feels like in general people won't necessarily get a safe truncation?

Sure they will if we use the patch attached :) There's no interdiff as this is a brand new approach.

chx’s picture

FileSize
6.13 KB
FAILED: [[SimpleTest]]: [PHP 5.5 MySQL] 98,262 pass(es), 15 fail(s), and 0 exception(s). View

Simplified a bit.

The last submitted patch, 35: 2399261_35.patch, failed testing.

chx’s picture

FileSize
6.11 KB
FAILED: [[SimpleTest]]: [PHP 5.5 MySQL] 98,249 pass(es), 14 fail(s), and 0 exception(s). View

Oh doh, that's just a typo in comments.

The last submitted patch, 36: 2399261_36.patch, failed testing.

Status: Needs review » Needs work

The last submitted patch, 38: 2399261_38.patch, failed testing.

joelpittet’s picture

I really like the direction and idea behind #38 +1 to that! Just need to figure out what's up with the tests.

chx’s picture

Status: Needs work » Needs review
FileSize
6.11 KB
PASSED: [[SimpleTest]]: [PHP 5.5 MySQL] 98,225 pass(es). View
766 bytes

In order to move this along I am stripping tags from the link as HEAD does. It's not that this code doesn't work -- it's assertLink that throws in the towel when the link text contains HTML. I am working on that http://stackoverflow.com/questions/31419933/xpath-find-link-containing-h... here.

chx’s picture

FileSize
5.22 KB
11.57 KB
PASSED: [[SimpleTest]]: [PHP 5.5 MySQL] 98,227 pass(es). View

Disregard #42, interdiff against #38 this hopefully passes. The only changes are in tests because we now allow <em> in our dblog links.

chx’s picture

Issue summary: View changes

Status: Needs review » Needs work

The last submitted patch, 43: 2399261_43.patch, failed testing.

Status: Needs work » Needs review

chx queued 43: 2399261_43.patch for re-testing.

joelpittet’s picture

Status: Needs review » Reviewed & tested by the community

That makes those tests better! It leaves truncate in Xss domain and wraps it in SafeMarkup and less capture meaningless groups captured.

Big +1 thanks @chx

joelpittet’s picture

Oh and of course the main reason: it makes the combo Unicode::truncate(Xss::filter less prone to error and potential security hole vector.

alexpott’s picture

Status: Reviewed & tested by the community » Needs work
  1. I'd expected some test coverage in Drupal\Tests\Component\Utility\XssTest.
  2. +++ b/core/lib/Drupal/Component/Utility/SafeMarkup.php
    @@ -217,6 +217,36 @@ public static function xssFilter($string, $html_tags = NULL) {
    +   * Safely truncates an HTML string.
    +   *
    +   * The resulting string rendered in a browser will be free of XSS and shorter
    +   * than the specified length.
    +   *
    +   * @param $string
    +   * @param $length
    +   * @param bool|FALSE $wordsafe
    +   * @param bool|FALSE $add_ellipsis
    +   * @param int $min_wordsafe_length
    +   * @param null $html_tags
    +   * @return string
    +   *
    +   * @see \Drupal\Component\Utility\Xss::truncate()
    +   * @see \Drupal\Component\Utility\Xss::filter()
    +   * @see \Drupal\Component\Utility\Xss::getAdminTagList()
    +   * @see \Drupal\Component\Utility\Unicode::truncate()
    +   * @see \Drupal\Component\Utility\SafeMarkup::isSafe()
    

    Does not conform to our docs standards.

  3. +++ b/core/lib/Drupal/Component/Utility/Xss.php
    @@ -135,6 +137,55 @@ public static function filterAdmin($string) {
    +   *   See \Drupal\Component\Unicode::truncate() for this parameter.
    ...
    +   *   See \Drupal\Component\Unicode::truncate() for this parameter.
    ...
    +   *   See \Drupal\Component\Unicode::truncate() for this parameter.
    ...
    +   *   See \Drupal\Component\Xss::filter() for this parameter.
    

    I think it is more useful to duplicate the documentation here.

  4. +++ b/core/lib/Drupal/Component/Utility/Xss.php
    @@ -135,6 +137,55 @@ public static function filterAdmin($string) {
    +    // Close the tags.
    +    return Html::normalize($final);
    

    Might this not result in a string that exceeds the length?

joelpittet’s picture

@alexpott re #49.4. Yes if you count tags in the string length, which I would expect we wouldn't be, but I guess either way we need #49.1 to show that our expectations.

Edit: disregard this, @chx went through it with me on IRC.

chx’s picture

FileSize
17.42 KB
PASSED: [[SimpleTest]]: [PHP 5.5 MySQL] 98,266 pass(es). View
10.87 KB
chx’s picture

Status: Needs work » Needs review
chx’s picture

Hrm my comment got lost: so this still needs tests, the above version is mostly doxygen fix and a few more minor things, like making the db log test assertion even better.

Re .4 the new function does not count the length of HTML tags (open or close) when calculating the length of string. Indeed that's why it's in Xss -- it reuses much of filter but where filter replaces the tags with a safe version and keeps the text intact, this one adds text until it's the desired length then stops.

chx’s picture

Issue tags: -Needs tests
FileSize
23.13 KB
PASSED: [[SimpleTest]]: [PHP 5.5 MySQL] 98,484 pass(es). View
8.92 KB

Even more doxygen and a lot of tests. The change the existing filter tests is absolutely minimal and doesn't affect the thing tested (attributes) at all.

alexpott’s picture

+++ b/core/tests/Drupal/Tests/Component/Utility/XssTest.php
@@ -505,14 +506,14 @@ public function testAttribute($value, $expected, $message, $allowed_tags = NULL)
-        '<img src="http://example.com/foo.jpg" title="Example: title" alt="Example: alt">',
-        '<img src="http://example.com/foo.jpg" title="Example: title" alt="Example: alt">',
+        '<img src="http://example.com/foo.jpg" title="Example: title" alt="Example: alt" />',
+        '<img src="http://example.com/foo.jpg" title="Example: title" alt="Example: alt" />',
...
-        '<img src="http://example.com/foo.jpg" data-caption="Drupal 8: The best release ever.">',
-        '<img src="http://example.com/foo.jpg" data-caption="Drupal 8: The best release ever.">',
+        '<img src="http://example.com/foo.jpg" data-caption="Drupal 8: The best release ever." />',
+        '<img src="http://example.com/foo.jpg" data-caption="Drupal 8: The best release ever." />',

@@ -606,4 +607,66 @@ protected function assertNotNormalized($haystack, $needle, $message = '', $group
+  /**
+   * @covers ::truncate
+   *
+   * @dataProvider providerTestAttributes
+   */
+  public function testTruncateAttribute($value, $expected, $message, $allowed_tags = NULL) {
+    $value = $this->convertFilterToTruncate($value, $expected, $allowed_tags);
+    $this->assertEquals($expected, $value, $message);
+  }

I'm not entirely comfortable with making this change as the logic for determining when an attribute and tag is ending is being tested here.

chx’s picture

FileSize
1.65 KB
22.16 KB
PASSED: [[SimpleTest]]: [PHP 5.5 MySQL] 98,506 pass(es). View

Sure, I changed the test itself to expect a slash.

alexpott’s picture

Works for me.

Status: Needs review » Needs work

The last submitted patch, 56: 2399261_56.patch, failed testing.

Status: Needs work » Needs review

chx queued 56: 2399261_56.patch for re-testing.

joelpittet’s picture

Status: Needs review » Reviewed & tested by the community

Nice! There is a little indent that could be fixed on commit but I'm for RTBC here.

+++ b/core/lib/Drupal/Component/Utility/Xss.php
@@ -14,6 +14,17 @@
+      |                 # or
+      <[^>]*(?:>|$)       # a string that starts with a <, up until the > or the end of the string
+      |                 # or

Bring back 2 spaces on commit please:)

chx’s picture

FileSize
629 bytes
22.16 KB
PASSED: [[SimpleTest]]: [PHP 5.5 MySQL] 99,020 pass(es). View

We do not want the committers to overwork themselves so here's two spaces removed.

alexpott’s picture

Status: Reviewed & tested by the community » Needs work

I been sleeping on this patch for a while and I think we're entangling two things here - truncating html and filtering. Truncation really has nothing to do with Xss filtering. But in the usage converted in the patch doing truncation and filtering would result in having to loop twice over the html. Anyhow Xss::truncate is mis-named it should be Xss::filterTruncate and similarly for SafeMarkup::xssTruncate - it should be SafeMarkup::xssFilterTruncate... these method names hint at the problem we have. I'm really unsure of what to do about this. Going to ping @xjm and @effulgentsia to see if they have any ideas.

+++ b/core/lib/Drupal/Component/Utility/SafeMarkup.php
@@ -207,13 +207,65 @@ public static function checkAdminXss($string) {
+   * @param bool $wordsafe
...
+   * @param bool $add_ellipsis
...
+   * @param bool $min_wordsafe_length

We're missing test coverage of these arguments.

xjm’s picture

I don't think we should be adding this API at all, especially not now after the work we've done to minimize the API surface.

Is there a reason not to just truncate first and filter after? This patch has gotten a lot more extensive than simply fixing one line in one unimportant controller.

pwolanin’s picture

@xjm - I agree. This code looks nice, but it's really not essential to solve the problem in a reasonable way.

Can we go back to the patch I marked RTBC at #27?

xjm’s picture

mdrummond’s picture

YesCT asked me to take a look at this to see if I could help by bringing the patch from #24 based on pwolanin's comment in #64. I reviewed this issue and #2506195: Remove SafeMarkup::set() from Xss::filter(). As I understand things, we can no longer rely upon XSS:filter() as that no longer mark their results as safe.

The patch in #24 adds SafeMarkup::truncate(), but in #33, xjm suggests that SafeMarkup is not the right place to take care of that, so I'm not sure about bringing back #24 because of those concerns.

Any suggestions, xjm and/or alexpott, on how to proceed next?

chx’s picture

If you want to truncate HTML then you need to split between tags and text and if you do that then you already did one half of the xss filter and since you want to display then you need to filter anyways so why not continue when you already half way there? That's the reasoning behind this patch. It works well and it's tested, I think we should use it.

pwolanin’s picture

Looking for similar uses - I only find one, which could be dangerous in that is decodes entities (though possibly the comment subject is escaped further)

\Drupal\comment\CommentForm

        $comment->setSubject(Unicode::truncate(trim(Html::decodeEntities(strip_tags($comment_text))), 29, TRUE, TRUE));

It's hard to say whether other uses of Unicode::truncate() should be removing HTML, but if we don't find any, perhaps we should just fix this case with an ugly one-liner, as opposed to adding an API method.

pwolanin’s picture

Status: Needs work » Needs review
FileSize
1.94 KB
PASSED: [[SimpleTest]]: [PHP 5.5 MySQL] 100,689 pass(es). View

This is a new, minimal patch.

joelpittet’s picture

Assigned: Unassigned » effulgentsia

@pwolanin the approach @chx took is along the lines of your point: in #25

strip_tags() is known to be broken in a variety of ways and should not be used: e.g. see http://stackoverflow.com/questions/1972442/when-strip-tags-burns-a-haystack

the Xss::filter() code does work to validate the HTML and entities.

And also I'd like to get @effulgentsia's position on #61 as I think it resolves all worries brought up earlier.

pwolanin’s picture

@joelpittet - effulgentsia was the one who originally suggested using strip_tags(). I think if we want to consider an API addition we should make a follow-up rather than dragging this issue out, since we didn't consider the breadth of use cases.

This issue is supposed to be about just removing SafeMarkup::set() and it's gotten way beyond that scope.

effulgentsia’s picture

Assigned: effulgentsia » Unassigned

I think #69 makes sense for this issue, because even in HEAD, we're not preserving any HTML tags in the truncated output. If at some point we have a need for a truncation function that preserves some HTML tags, then I think #61 can be posted to that issue to solve that use case.

genjohnson’s picture

Issue summary: View changes

I've read through the issue and summarized the proposed approaches and why they haven't been accepted.

genjohnson’s picture

I am reviewing and manually testing the patch from #69.

mdrummond’s picture

genjohnson and I tried logging some messages in html_preprocess_html in Bartik to double-check the output.

We tried:

  $message = "&lt;script&gt;alert('oops');&lt;/script&gt; Let us <strong>see</strong> what happens when we have a long log message with potentially problematic content.";
  \Drupal::logger('bartik')->error($message);

The output in the source on the log page was:

                      <td><a href="/core2015/admin/reports/dblog/event/33" title="&amp;lt;script&amp;gt;alert(&#039;oops&#039;);&amp;lt;/script&amp;gt; Let us see what happens when we have a long log message with potentially problematic content.">&lt;script&gt;alert(&#039;oops&#039;);&lt;/script&gt; Let us see what happens…</a></td>

So it looks like this is doing what it should be doing. Actual HTML tags are being stripped out, and even if HTML entities when decoded could create a potentially problematic HTML tag, the l() function will prevent that HTML tag from showing up in source.

This looks like it works.

genjohnson’s picture

Status: Needs review » Reviewed & tested by the community

I read through the patch. The code makes sense, is in scope of the issue and seems to follow coding standards.

joelpittet’s picture

@pwolanin @effulgentsia thanks for your replies, I'll open up a feature request for 8.1.x with @chx's patch.

The use-case would be search results body truncating with keeping the highlighting (strong or span) tags to highlight results. Maybe we are doing this another way, I'll try to confirm that still works for myself.

joelpittet’s picture

Yeah, FYI not problem on the search results, probably using a different approach.

YesCT’s picture

@joelpittet thanks for opening that separate issue.
It was #2544890: ::truncate() to allow certain tags to remain.

alexpott’s picture

Status: Reviewed & tested by the community » Needs work
Issue tags: +Needs tests

Can we add a test to prove that all markup is stripped.

xjm’s picture

+1 for the current solution, and also for tests. ;)

stefan.r’s picture

I'll see about adding a test

stefan.r’s picture

Status: Needs work » Needs review
Issue tags: -Needs tests
FileSize
2.76 KB
4.8 KB
FAILED: [[SimpleTest]]: [PHP 5.5 MySQL] 100,849 pass(es), 1 fail(s), and 0 exception(s). View

Status: Needs review » Needs work

The last submitted patch, 83: 2399261-83.patch, failed testing.

stefan.r’s picture

Status: Needs work » Needs review
FileSize
735 bytes
4.82 KB
PASSED: [[SimpleTest]]: [PHP 5.5 MySQL] 100,859 pass(es). View
mdrummond’s picture

Status: Needs review » Needs work

The test message should probably be longer to ensure that it is a message that would be truncated.

stefan.r’s picture

Well it's currently supposed to be safe regardless of the length, are we testing anything specific with the truncation? According to #75 it would be OK to truncate anywhere for purposes of the test, so I'll just rewrite the test to do a lorem ipsum sentence instead of hello world then.

Just so everyone is clear, the current string:

&lt;script&gt;alert('foo');&lt;/script&gt;<strong>hello</strong> world"

...is being converted into:

<script>alert('foo');</script>hello world

...before being truncated (although the truncation does not apply to this string as it is too short), and is sanitized on output to:

&lt;script&gt;alert(&#039;foo&#039;);&lt;/script&gt;hello world

stefan.r’s picture

Status: Needs work » Needs review
FileSize
1.16 KB
4.91 KB
PASSED: [[SimpleTest]]: [PHP 5.5 MySQL] 101,345 pass(es). View
mdrummond’s picture

Status: Needs review » Reviewed & tested by the community

Revised test looks good!

alexpott’s picture

Status: Reviewed & tested by the community » Fixed

Committed 6cd78a1 and pushed to 8.0.x. Thanks!

alexpott’s picture

  • alexpott committed 6cd78a1 on 8.0.x
    Issue #2399261 by chx, effulgentsia, stefan.r, crowdcg, pwolanin,...

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.