Remove SafeMarkup::set in NodeSearch::prepareResults()

» #2280965: [meta] Remove every SafeMarkup::set() call

he/him/his

English

Portland, Maine

commented 6 June 2015 at 20:32

Parent issue:

Log in or register to post comments

Comment #2

he/him/his

English

Portland, Maine

commented 6 June 2015 at 20:34

Status:	Active	» Needs review
Issue tags:		+Twig, +D8 Accelerate, +SafeMarkup

String concatenation pattern: split out into two variables and use SafeMarkup::format to 'concatenate' them.

Log in or register to post comments

Comment #3

he/him/his

English

Portland, Maine

commented 6 June 2015 at 20:34

Status	File	Size
new	remove_safemarkup_set-2501757-1.patch	1.04 KB

Always something...

Log in or register to post comments

Comment #4

he/him/his

English

Portland, Maine

commented 6 June 2015 at 20:45

Issue summary:

Log in or register to post comments

Comment #5

he/him/his

English

Portland, Maine

commented 7 June 2015 at 00:28

Issue summary:

Log in or register to post comments

Comment #6

he/him

English

Vancouver

commented 7 June 2015 at 01:02

Status:

Needs review

» Reviewed & tested by the community

Status	File	Size
new	Search_for_ipsum_-_before.png	170.92 KB
new	Search_for_ipsum_-_after.png	181.08 KB

Very straight forward. Thank you @cwells for the steps to reproduce.

Before:

After:

Log in or register to post comments

Comment #7

xjm

she/her

English

commented 7 June 2015 at 15:15

Status:

Reviewed & tested by the community

» Needs review

Thanks for documenting the before-and-after testing!

+++ b/core/modules/node/src/Plugin/Search/NodeSearch.php
@@ -328,10 +328,9 @@ protected function prepareResults(StatementInterface $found) {
+      $built = $this->renderer->render($build);
+      $comment_count = $this->moduleHandler->invoke('comment', 'node_update_index', array($node, $item->langcode));
+      $rendered = SafeMarkup::format('@built @comment_count', ['@built' => $built, '@comment_count' => $comment_count]);

I'm slightly hesitant about this pattern. Both comment_node_update_index() and the $this->renderer->render() here are doing early renders. Is it possible to resolve this by removing those early renders instead?

I realize the scope of that would be a lot bigger than the scope of this issue, and at least by using the @ placeholders we are ensuring it's already either in the SafeMarkup list or sanitized. Setting to NR for more feedback; I'll also ping @effulgentsia.

Another issue where this pattern was proposed: #2296929: Remove system_requirements() SafeMarkup::set() use That one was slightly different because the strings are being assembled right there in the same code path, but I still have the concern. In both cases we're using SafeMarkup::format() for something that's kinda leaking beyond just putting variables in a text string safely.

Log in or register to post comments

Comment #8

star-szr

he/him

English

commented 7 June 2015 at 15:20

In this case, lower down we have this, so not sure the early render can be avoided:

'snippet' => search_excerpt($keys, $rendered, $item->langcode),

Log in or register to post comments

Comment #9

he/him

English

Vancouver

commented 7 June 2015 at 15:22

Assigned:

» effulgentsia

Log in or register to post comments

Comment #10

effulgentsia commented 7 June 2015 at 15:54

Assigned:

effulgentsia

» Unassigned

we're using SafeMarkup::format() for something that's kinda leaking beyond just putting variables in a text string safely

The fact that we have a space in there makes me think it's a legit use of SafeMarkup::format(). In fact, should we have a followup for whether to use t(), since could an alternate language want something different than a space or want to reverse the order?

Note that HEAD's tablesort_header() function has SafeMarkup::format('@cell_content@image', ..., which is a straight concatenation with no space, which we might want to change to some other concatenation pattern if we feel it's an abuse of that method. I didn't find any other similar examples in HEAD.

Log in or register to post comments

Comment #11

he/him

English

Vancouver

commented 7 June 2015 at 16:07

@effulgentsia maybe this proposal has some promise #2501975: Determine how to update code that currently joins strings in SafeMarkup::set()?

Log in or register to post comments

Comment #12

he/him

English

Vancouver

commented 7 June 2015 at 21:52

The fact that we have a space in there makes me think it's a legit use of SafeMarkup::format().

So back to RTBC;) ?

Log in or register to post comments

Comment #13

effulgentsia commented 7 June 2015 at 23:37

Status:

Needs review

» Reviewed & tested by the community

Yep.

Log in or register to post comments

Comment #14

8 June 2015 at 00:14

Status:

Reviewed & tested by the community

» Needs work

The last submitted patch, 3: remove_safemarkup_set-2501757-1.patch, failed testing.

Log in or register to post comments

Comment #15

star-szr

he/him

English

commented 8 June 2015 at 00:36

Status:

Needs work

» Reviewed & tested by the community

Wow that's a slow test.

Drupal\locale\Tests\LocaleUpdateTest had one fail:

Updates for Contrib module one

LocaleUpdateTest.php 144

Drupal\locale\Tests\LocaleUpdateTest->testUpdateImportSourceRemote()

Log in or register to post comments

Comment #16

he/him/his

English

Portland, Maine

commented 8 June 2015 at 01:47

I was wondering if it was just a slow test or if it was this https://qa.drupal.org/node/228 - pifr auto-retesting anything that's RTBC. With that said, I re-ran this test locally and it passed with this same patch.

Going to kick the testbot anyway, for assurance.

Log in or register to post comments

Comment #17

8 June 2015 at 01:47

cwells queued 3: remove_safemarkup_set-2501757-1.patch for re-testing.

Log in or register to post comments

Comment #18

xjm

she/her

English

commented 12 June 2015 at 03:20

Assigned:

Unassigned

» xjm

Thanks @effulgentsia and @joelpittet for looking into that more.

I'm not sure I agree with this statement:

The fact that we have a space in there makes me think it's a legit use of SafeMarkup::format().

In general, my thought is that if we need to do SafeMarkup::format(@thing_1 @thing_2), something is being done wrong somewhere else before that. I spoke to @alexpott about it briefly this morning and he had a similar thought.

Then I thought about this point from @effulgentsia:

In fact, should we have a followup for whether to use t(), since could an alternate language want something different than a space or want to reverse the order?

On first read, that raises a red flag that there should absolutely be a format_plural() call if it's making a comment count (which the screenshots seem to support). So I tried to dig into it more to see if @comment_count was properly formatting plurals further down. And so I read comment_node_update_index(). At which point my jaw kind of dropped because it appeared that we're rendering the entire comment thread apparently to just get a count of how many comments there are? Granted, this is a very rare operation and part of something that's already batched or on cron and expensive (updating search indexes). But it still surprised me. I also didn't understand how this ended up being a comment "count" in the first place.

@Cottser pointed out this:

In this case, lower down we have this, so not sure the early render can be avoided:
'snippet' => search_excerpt($keys, $rendered, $item->langcode),

That's a good point. (See in NodeSearch::prepareResult(s) for the whole deal.) But if the comment count really is just a count, why would we need to run search_excerpt() on it? Why couldn't the comment count just be appended to it later?

And also, if it is just the "2 comments" bit in the screenshots as a unit, then the formatting wrapping it is not part of a translation -- it's part of the search snippet output. Which, ideally, should be themeable output. And therefore in an actual Twig template.

This seems like a huge stack of stuff, and it's possible we might decide cleaning up part or all of it is too disruptive in or otherwise not appropriate for beta. If that turns out to be the case and if I do understand what's going on correctly, a postponed followup issue to put it in a template with @todo in the code might be an option. However, I'd also prefer not to simply move technical debt around, so I'd like to investigate and understand this whole code flow better before I make a recommendation.

Leaving at RTBC (since my review isn't actionable at this point and I don't want to push back a second time if I'm on the wrong track here), but assigning to myself to make it clear to other committers that I'd like to take a look at this more. I might also end up reaching out to the comment and search maintainers about the issue once I understand better what's going on here.

Thanks for your patience. :)

Log in or register to post comments

Comment #19

xjm

she/her

English

commented 12 June 2015 at 03:25

Forgot to mention, @Cottser, thanks for documenting the test failure on the issue prior to the retest.

Log in or register to post comments

Comment #20

17 June 2015 at 08:29

Status:

Reviewed & tested by the community

» Needs work

The last submitted patch, 3: remove_safemarkup_set-2501757-1.patch, failed testing.

Log in or register to post comments

Comment #21

star-szr

he/him

English

commented 17 June 2015 at 08:48

Status:

Needs work

» Reviewed & tested by the community

Testbot terminated. Back to RTBC to try and get this back in the queue.

Log in or register to post comments

Comment #22

20 June 2015 at 22:26

alexpott queued 3: remove_safemarkup_set-2501757-1.patch for re-testing.

Log in or register to post comments

Comment #23

25 June 2015 at 03:38

Status:

Reviewed & tested by the community

» Needs work

The last submitted patch, 3: remove_safemarkup_set-2501757-1.patch, failed testing.

Log in or register to post comments

Comment #24

adamwhite commented 28 June 2015 at 15:02

Working on this at the Drupal North sprint

Log in or register to post comments

Comment #25

adamwhite commented 28 June 2015 at 16:09

Status:

Needs work

» Needs review

Status	File	Size
new	remove_safemarkup_set-2501757-25.patch	1.37 KB

Rerolled the patch from #3.

The conflict was that $this->renderer->render($build) had been changed to $this->renderer->renderPlain($build)

I worked on this with bohemier please make sure he's attributed.

Log in or register to post comments

Comment #26

star-szr

he/him

English

commented 28 June 2015 at 16:21

Issue summary:

Adding a note at the top of the issue summary to credit @bohemier.

Log in or register to post comments

Comment #27

Anonymous (not verified) commented 29 June 2015 at 14:55

Issue tags:

+DrupalNorth2015

Updating the issue tag to include a hashless DrupalNorth2015 on behalf of the Drupal North sprinting group.

Log in or register to post comments

Comment #28

lauriii

he/him

Finnish

Finland

commented 4 July 2015 at 19:08

Issue tags:

+Needs tests

There's no existing tests for this

Log in or register to post comments

Comment #29

cilefen commented 23 July 2015 at 17:14

Status	File	Size
new	interdiff-2501757-25-29.txt	1.09 KB
new	remove_safemarkup_set-2501757-29.patch	1.23 KB

It seems like the comment count is figured out elsewhere. It works this way.

Log in or register to post comments

Comment #30

cmanalansan commented 23 July 2015 at 18:02

Status:

Needs review

» Needs work

Working with cilefen on this at DrupalGovCon.

The patch in comment #25 is actually probably good.

I think the confusion is over the naming of $comment_count.

Patch coming.

Log in or register to post comments

Comment #31

cilefen commented 23 July 2015 at 18:07

Re #18 - it is not the comment count, it is the rendered comments to be snippet-ed in the search output. The count is evidently found some other way.

Log in or register to post comments

Comment #32

cmanalansan commented 23 July 2015 at 18:15

Status:

Needs work

» Needs review

Status	File	Size
new	remove_safemarkup_set-2501757-32.patch	1.04 KB

$comment_count renamed to $comments

Log in or register to post comments

Comment #33

cilefen commented 23 July 2015 at 19:02

Status	File	Size
new	interdiff-2501757-32-33.txt	1.44 KB
new	remove_safemarkup_set-2501757-33.patch	2.6 KB

Added an XSS test with a script in the comment subject.

Log in or register to post comments

Comment #34

he/him

English

Vancouver

commented 23 July 2015 at 19:36

Status:	Needs review	» Reviewed & tested by the community
Issue tags:	-Needs tests

I'll send this back to RTBC, thanks for the fixes and see where @xjm is at with the latest iteration and use of SafeMarkup::format()

Log in or register to post comments

Comment #35

cilefen commented 23 July 2015 at 19:45

Isn't there an assertEscaped() we could use instead of assertRaw?

Log in or register to post comments

Comment #36

cilefen commented 23 July 2015 at 19:54

No, we can't because it gets wrapped in a <strong> in the search results.

Log in or register to post comments

Comment #37

alexpott

he/they

English

🇪🇺🌍

commented 29 July 2015 at 09:11

Status:

Reviewed & tested by the community

» Needs review

Status	File	Size
new	2501757.37.patch	2.5 KB
new	33-37-interdiff.txt	1014 bytes

Reading search_excerpt I'm not sure that we need to worry about whether the $text input is safe. It removes all html. I think it's docs need to be improved and probably by this patch. Also I'm not sure why it's doing any SafeMarkup::checkPlain() inside it since it has stripped all the tags - but that is a separate issue.

  // Prepare text by stripping HTML tags and decoding HTML entities.
  $text = strip_tags(str_replace(array('<', '>'), array(' <', '> '), $text));
  $text = Html::decodeEntities($text);

Log in or register to post comments

Comment #38

yesct commented 31 July 2015 at 18:59

Assigned:	xjm	» Unassigned
Status:	Needs review	» Needs work

+++ b/core/modules/search/src/Tests/SearchCommentTest.php
@@ -152,6 +158,14 @@ function testSearchResultsComment() {
+    // Verify the evil comment subject is escaped in search results.
+    $this->drupalPostForm('search/node', $edit, t('Search'));
+    $this->assertRaw('alert(&#039;<strong>hello</strong>&#039;);');

this does not actually assert that the (script) html tag was stripped out.
(I did confirm the the escaped quotes make an alert not work even if it is in a script tag.

and I looked at SearchExcerptTest and did not see any tests which verify that html tags are stripped out.

"Needs work" to
i) improve the docs of search_excerpt() as @alexpott suggests in #37 ... and
ii) to improve the test, maybe by adding a test to SearchExcerptTest asserting html tags are stripped and/or to SearchCommentTest asserting there is no script tag.
iii) update the comment count comment. maybe from // Fetch comment count for snippet. maybe to // Fetch comments for snippet.

The rest of the patch looks ok.

Do we still want the issue @xjm asks for in #18?
a) To have the search snippet (or the "X comments") output be themeable output. And in an actual Twig template.

unassigning from xjm because it was set that way back in june when this was also remaining at rtbc but xjm had an action item for it. I think we are not waiting on that anymore.

Log in or register to post comments

Comment #39

jvandyk commented 2 August 2015 at 16:55

Examining this at DrupalCorn sprint.

Log in or register to post comments

Comment #40

jvandyk commented 2 August 2015 at 18:49

Status:

Needs work

» Needs review

Status	File	Size
new	2501757.40.patch	3.13 KB
new	interdiff.2501757.37.40.txt	1.25 KB

Setting to needs review for testbot.

Patch to address i) and iii) in #38. Will attempt ii) after caffeine.

Log in or register to post comments

Comment #41

jvandyk commented 2 August 2015 at 19:28

Status	File	Size
new	interdiff.2501757.40.41.txt	1.02 KB
new	2501757.41.patch	4.16 KB

Regarding ii) in #38, there is actually a test in testSearchExcerpt() that implicitly tests for stripping of HTML tags.

I updated the assertion text to reflect this. I don't think we need another test since HTML tag stripping is already being tested.

    $text = 'The <strong>quick</strong> <a href="#">brown</a> fox &amp; jumps <h2>over</h2> the lazy dog';
    // Note: The search_excerpt() function adds some extra spaces -- not
    // important for HTML formatting. Remove these for comparison.
    $expected = 'The quick brown fox &amp; jumps over the lazy dog';
    $result = preg_replace('| +|', ' ', search_excerpt('nothing', $text));
    $this->assertEqual(preg_replace('| +|', ' ', $result), $expected, 'Entire string, stripped of HTML tags, is returned when keyword is not found in short string');

Log in or register to post comments

Comment #42

AnnGreazel commented 2 August 2015 at 21:20

Issue summary:

Updating steps to test.

Log in or register to post comments

Comment #43

jvandyk commented 2 August 2015 at 21:22

Going through manual testing steps in summary, both HEAD and the patch in #41 result in the following markup from the search:

<div class="search-result__snippet-info">
      <p class="search-result__snippet">…               Textbefore &lt;script&gt;alert(&#039;XSS Body&#039;);&lt;/script&gt; <strong>textafter</strong>    
        
  
     
  
  
        

    
      
 …            Textbefore &lt;script&gt;alert(&#039;XSS Comment&#039;);&lt;/script&gt; <strong>textafter</strong>    
        
  

                  Delete        …</p>

Log in or register to post comments

Comment #44

yesct commented 3 August 2015 at 00:01

Issue summary:	View changes
Status:	Needs review	» Reviewed & tested by the community

thanks for those changes and updating the issue summary and manual testing.

all remaining tasks from the issue summary are done.

I looked through the patch again. and the thing I mentioned before, about asserting the script tag is not there, seems not strong enough to be stubborn about this.

Log in or register to post comments

Comment #45

xjm

she/her

English

commented 4 August 2015 at 17:06

Status:

Reviewed & tested by the community

» Needs work

Nice; this patch is looking much better than before. Phew! Thanks for the added test coverage and the manual testing.

+++ b/core/modules/search/src/Tests/SearchCommentTest.php
@@ -152,6 +158,14 @@ function testSearchResultsComment() {
+    $this->assertRaw('alert(&#039;<strong>hello</strong>&#039;);');

We should also probably pair this with asserting that <script> isn't there. Edit: So @YesCT, yes, you are right about that. :)

+++ b/core/modules/search/src/Tests/SearchExcerptTest.php
@@ -39,7 +39,7 @@ function testSearchExcerpt() {
-    $this->assertEqual(preg_replace('| +|', ' ', $result), $expected, 'Entire string is returned when keyword is not found in short string');
+    $this->assertEqual(preg_replace('| +|', ' ', $result), $expected, 'Entire string, stripped of HTML tags, is returned when keyword is not found in short string');

~~This changed assertion text doesn't seem to actually describe what the assertion is asserting. So I'm concerned about that.~~ Never mind. I understand now how this is what the assertion is asserting -- it's just a couple lines above. I'll look again and try to suggest a clarification.

Log in or register to post comments

Comment #46

he/him

English

Vancouver

commented 4 August 2015 at 17:28

Status:

Needs work

» Needs review

Status	File	Size
new	interdiff.txt	771 bytes
new	remove_safemarkup_set-2501757-46.patch	4.19 KB

This should cover the script tag check (just in case there was whitespace or something, it ensures the tag doesn't exist at all in the raw output.

Log in or register to post comments

Comment #47

kgoel commented 12 August 2015 at 17:12

Assigned:

Unassigned

» kgoel

I am going to review this.

Log in or register to post comments

Comment #48

kgoel commented 12 August 2015 at 19:35

Issue summary:

Log in or register to post comments

Comment #49

kgoel commented 13 August 2015 at 18:58

Status	File	Size
new	2501757-49.patch	4.81 KB
new	interdiff.txt	2.26 KB

Log in or register to post comments

Comment #50

kgoel commented 13 August 2015 at 19:08

Issue summary:

Log in or register to post comments

Comment #51

kgoel commented 13 August 2015 at 19:23

Issue summary: