Problem/Motivation

In the BigPipe D8.1 core issue, @effulgentsia said this at #2469431-258: BigPipe for auth users: first send+render the cheap parts of the page, then the expensive parts.6:

+++ b/core/modules/big_pipe/src/Render/BigPipe.php
@@ -0,0 +1,431 @@
+    list($pre_body, $post_body) = explode('</body>', $content, 2);

This can catch an earlier occurrence of </body> than the one we want. For example, such a string within a CDATA section or a JS string value within inline JS. I didn't check if other places where we do similar string parsing is similarly brittle.

The complete set of places with such brittle parsing:

  1. In \Drupal\big_pipe\Render\BigPipe::sendContent():
    list($pre_body, $post_body) = explode('</body>', $content, 2);
    

  2. In \Drupal\big_pipe\Render\BigPipe::getPlaceholderOrder():
    $fragments = explode('<div data-big-pipe-placeholder-id="', $html);
    …
    $t = explode('"></div>', $fragment, 2);
    
  3. In \Drupal\big_pipe\Render\BigPipe::sendPreBody:
    list($pre_scripts_bottom, $scripts_bottom, $post_scripts_bottom) = explode('<drupal-big-pipe-scripts-bottom-marker>', $pre_body, 3);
    
  4. in \Drupal\big_pipe\Render\BigPipe::sendNoJsPlaceholders():
        // Split the HTML on every no-JS placeholder string.
        $prepare_for_preg_split = function ($placeholder_string) {
          return '(' . preg_quote($placeholder_string, '/') . ')';
        };
        $preg_placeholder_strings = array_map($prepare_for_preg_split, array_keys($no_js_placeholders));
        $fragments = preg_split('/' . implode('|', $preg_placeholder_strings) . '/', $html, NULL, PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE);
    

Proposed resolution

Per #16, only fix the most probable (yet still quite improbable) edge case: a second closing </body> tag. The others would have to be intentionally constructed HTML, and if somebody can insert that, they can just as well insert malicious HTML.

Remaining tasks

None.

User interface changes

None.

API changes

None.

Data model changes

None.

Comments

Wim Leers created an issue. See original summary.

Wim Leers’s picture

Project: BigPipe » Drupal core
Version: 8.x-1.x-dev » 8.1.x-dev
Component: Code » big_pipe.module
Wim Leers’s picture

Issue summary: View changes
Wim Leers’s picture

Issue summary: View changes

Other occurrences of brittle parsing:

  1. In \Drupal\big_pipe\Render\BigPipe::getPlaceholderOrder():
    $fragments = explode('<div data-big-pipe-placeholder-id="', $html);
    …
    $t = explode('"></div>', $fragment, 2);
    
  2. In \Drupal\big_pipe\Render\BigPipe::sendPreBody:
    list($pre_scripts_bottom, $scripts_bottom, $post_scripts_bottom) = explode('<drupal-big-pipe-scripts-bottom-marker>', $pre_body, 3);
    

IS updated.

Wim Leers’s picture

Issue summary: View changes

Oh and one more, in \Drupal\big_pipe\Render\BigPipe::sendNoJsPlaceholders():

    // Split the HTML on every no-JS placeholder string.
    $prepare_for_preg_split = function ($placeholder_string) {
      return '(' . preg_quote($placeholder_string, '/') . ')';
    };
    $preg_placeholder_strings = array_map($prepare_for_preg_split, array_keys($no_js_placeholders));
    $fragments = preg_split('/' . implode('|', $preg_placeholder_strings) . '/', $html, NULL, PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE);
Wim Leers’s picture

So, working towards a proposed resolution…

  • We could theoretically use \DOMDocument-based parsing. I.e. DOM parsing, not string parsing. That'd work for cases 1–3, but not case 4, because no-JS placeholders may be embedded inside attribute values or even text nodes. That could theoretically be done using DOM parsing, but would be extremely slow.
  • A much bigger problem is that \DOMDocument doesn't support HTML5. It only supports XHTML. #2429363: Add HTML5-lib to Drupal 8 core for the filter system and for the testing system added a library to support HTML5 parsing. But that means the parsing would be done in PHP, not in C, and hence be even slower.

Given all that, I'm not sure how we can solve this.

Wim Leers’s picture

Priority: Normal » Minor
Issue summary: View changes

Since November 19, when BigPipe 8.x-1.0-beta1 for Drupal 8.0.x was tagged (https://www.drupal.org/node/2619070), not a single person has run into this problem. That's 3.5 months.

Therefore I think it's fair to say this is minor.

Even more so because CDATA is really XML, not HTML5, even though HTML5 still allows it.

Version: 8.1.x-dev » 8.2.x-dev

Drupal 8.1.0-beta1 was released on March 2, 2016, which means new developments and disruptive changes should now be targeted against the 8.2.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Wim Leers’s picture

Version: 8.2.x-dev » 8.1.x-dev
mondrake’s picture

Issue summary: View changes
FileSize
21.66 KB
40.23 KB

I'm not sure this is the case, but using BigPipe with the Devel module and Kint I get some JS rendered as text when inspecting an entity through the Devel tab, and cannot expand the Kint debug

without BigPipe it all works fine

Tim Banks’s picture

I am getting the same behaviour as mondrake. A fix is pretty important for install profiles that already use bigpipe as you cant just turn it off... Any chance a fix might get looked at again?

Wim Leers’s picture

#11: also using Kint?

Wim Leers’s picture

Reproduced the problems with Kint.

Wow, Kint seems very poorly integrated:

  1. it dumps its output immediately, even if there is no page output yet. This seems to somehow break the chunked encoding that BigPipe uses. So if you do kint($page) in quickedit_page_attachments() for example, it will completely break Drupal while BigPipe is enabled. You literally won't get any Drupal HTML, just Kint's <script> tag plus The website encountered an unexpected error. Please try again later.<br />.
  2. it uses document.write() (gasp!) to load

This makes sense from the POV that Kint wants to be self-contained and robust, regardless of where it is used. But I think we need to more carefully integrate the module into Drupal to prevent these kinds of problems.

podarok’s picture

#10 confirmed on my end with kint + devel

Wim Leers’s picture

Triaged by @Fabianx and I at DrupalCon New Orleans.

We agree this needs to be fixed, but we both think it is in fact minor: it can be fixed at any time, will not require BC breaks. The Kint solution is tangentially related, but not exactly, so we moved that to a separate issue: #2723709: Investigate Kint + BigPipe compatibility.

Wim Leers’s picture

Assigned: Unassigned » Wim Leers

Discussed with @Fabianx.

14:27:11 <Fabianx-screen> For https://www.drupal.org/node/2678662 I think we need to see how robust text parsing would work.
14:27:19 <Fabianx-screen> Obviously explode is very uhm rudimentary.
14:27:34 <Fabianx-screen> I just lack the necessary reg ex skills to implemented that.
14:27:54 <Fabianx-screen> Some of the sec folks might help as they do these kinda complicated matches and replacements all the time.
14:28:09 <Fabianx-screen> e.g.
14:28:35 <WimLeers> regex would also be brittle
14:28:35 <Fabianx-screen> for the placeholder maybe we should add <!-- --> sections behind and before the placeholder div.
14:28:37 <WimLeers> and slow
14:28:55 <Fabianx-screen> That would solve most problems as it being in accidental things is quite unlikely.
14:29:07 <WimLeers> right, but </body> is the most likely thing anyway
14:29:11 <Fabianx-screen> yeah
14:29:25 <WimLeers> so that's the one we *really* need to solve
14:29:29 <WimLeers> although I'm not convinced we do
14:29:29 <Fabianx-screen> And that we could still solve with strrev(), explode, strrev().
14:29:32 <WimLeers> CDATA is XML
14:29:34 <WimLeers> not HTML5
14:29:35 <Fabianx-screen> quite easily.
14:29:41 <WimLeers> which Drupal 8 site is going to use CDATA?
14:30:49 <WimLeers> and yeah, +1 for strrev, explode, strrev
14:30:59 <WimLeers> I think that's a "good enough" solution
14:31:04 <WimLeers> let's just do that and call it a day?
14:31:23 <WimLeers> it's all theoretical, and then we've addressed the most probable theoretical proble
14:31:27 <WimLeers> that's good enough, I say
14:31:37 <WimLeers> not our fault that PHP's DOM parser is absolute shit
14:55:02 <Fabianx-screen> WimLeers agree 100$
14:55:04 <Fabianx-screen> 100%
14:55:06 <Fabianx-screen> :)
14:55:09 <WimLeers> 100$ :D

So, this issue is going to solve the most-probable (yet still quite improbable) edge case in a simple, sane way. If we encounter these other edge cases in the real world, we'll deal with them then. Hopefully PHP will by then have a working HTML5 parser.

Assigning to me because I'll be working on this in the next few days.

Wim Leers’s picture

Issue summary: View changes
Status: Active » Needs review
FileSize
4.74 KB

To start, here's regression test coverage showing that a second </body> tag is a problem. This test should fail.

Wim Leers’s picture

FileSize
5.86 KB
1.15 KB

And here's the fix.

The last submitted patch, 17: 2678662-17.patch, failed testing.

Fabianx’s picture

Status: Needs review » Reviewed & tested by the community

RTBC - Looks great to me!

Status: Reviewed & tested by the community » Needs work

The last submitted patch, 18: 2678662-18.patch, failed testing.

Fabianx’s picture

Status: Needs work » Reviewed & tested by the community
Issue tags: +Random test failure

Back to RTBC, random test failure

alexpott’s picture

+++ b/core/modules/big_pipe/src/Render/BigPipe.php
@@ -126,7 +126,11 @@ public function sendContent($content, array $attachments) {
+    list($pre_body, $post_body) = array_reverse(array_map('strrev', explode(strrev('</body>'), strrev($content), 2)));

I wonder if this could be better written as:

$parts = explode('</body>', $content);
$post_body = array_pop($parts);
$pre_body = implode('', $parts);
Wim Leers’s picture

That'd work too. It's probably easier to understand.

Fabian, do you agree?

alexpott’s picture

Status: Reviewed & tested by the community » Needs work

The performance looks quite similar... https://3v4l.org/4Wnhp/perf#tabs vs https://3v4l.org/CfDGB/perf#tabs

I think #23 is much simpler to understand than all the reversing and more maintainable - so setting to needs work for that.

Wim Leers’s picture

+1

Wim Leers’s picture

Status: Needs work » Reviewed & tested by the community
FileSize
1.06 KB
5.86 KB

Done.

Status: Reviewed & tested by the community » Needs work

The last submitted patch, 27: 2678662-27.patch, failed testing.

Fabianx’s picture

Status: Needs work » Reviewed & tested by the community

Back to RTBC, still looks like a random test fail to me.

alexpott’s picture

Status: Reviewed & tested by the community » Fixed

Committed and pushed f95fc93c84b93e1cafe9892b8942d1846b773e0f to 8.2.x and 518e32b to 8.1.x. Thanks!

  • alexpott committed f95fc93 on 8.2.x
    Issue #2678662 by Wim Leers, mondrake, Fabianx, alexpott: Ensure BigPipe...

  • alexpott committed 518e32b on 8.1.x
    Issue #2678662 by Wim Leers, mondrake, Fabianx, alexpott: Ensure BigPipe...
mondrake’s picture

Checked post commit, and no longer see #10 happening. Thanks!

Wim Leers’s picture

Great!

  • alexpott committed f95fc93 on 8.3.x
    Issue #2678662 by Wim Leers, mondrake, Fabianx, alexpott: Ensure BigPipe...

  • alexpott committed f95fc93 on 8.3.x
    Issue #2678662 by Wim Leers, mondrake, Fabianx, alexpott: Ensure BigPipe...

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.