Add SafeMarkup::remove() to free memory from marked strings when they're printed [#2488538]

Problem/Motivation

Steps to reproduce:

1. Install standard profile
2. Visit admin/user/permissions (profiling with xhprof or your profiler of choice)
3. Note that memory usage is similar to the following:

Total Incl. MemUse (bytes):	51,542,168 bytes
Total Incl. PeakMemUse (bytes):	55,488,944 bytes

4. Add a role and profile the permissions page again

memory usage goes up as follows:

Total Incl. MemUse (bytes):	52,504,248 bytes
Total Incl. PeakMemUse (bytes):	57,392,560 bytes

At three custom roles, memory usage had gone up as follows:

Total Incl. MemUse (bytes):	54,607,112 bytes
Total Incl. PeakMemUse (bytes):	61,356,968 bytes

Possible culprits:
SafeMarkup::set() - with a lot of checkboxes the array grows to mb.
#2488610: Use ModuleHander::getName() instead of rebuilding module data on permissions page
Html::cleanCssIdentifier() / Html::getId() / HtmlgetUniqueId() - combination of the strtr() and the static cache in HtmlgetUniqueId()

I think the culprit is SafeMarkup::set() in twig_render_template() - every checkbox is a template render. See attached xhprof screenshots.

Marking this critical for now because #2289201: [Meta] Make drupal install and run within reasonable php memory limits so we can reset the memory requirements to lower levels is setting the standard profile memory limit to 64mb, and it looks like five custom roles and/or additional permissions could easily send this over the top.

Proposed resolution

Evaluate SafeMarkup::set() and possible other causes of high memory usage, try to optimize.

Remaining tasks

User interface changes

None

API changes

TBD, probably additions.

Comment	File	Size	Author
#34	strtr.patch	2.18 KB	catch
#34
#32	Screen Shot 2015-05-18 at 5.30.51 PM.png	131.22 KB	catch
#30	2488538-30.patch	15.84 KB	catch
#30
#30	test_only.patch	1.35 KB	catch
#30
#22	interdiff.txt	2.28 KB	catch
#22	2488538-21.patch	15.3 KB	catch
#22
#18	2488538-17.patch	14.56 KB	catch
#18
#18	interdiff.txt	1.5 KB	catch
#16	interdiff.txt	8.6 KB	catch
#16	2488538-16.patch	14.11 KB	catch
#16
#11	2488538_safe_markup.patch	3.25 KB	catch
#11
#8	2488538_safe_markup.patch	2.94 KB	catch
#8
#7	2488538_safe_markup.patch	3.03 KB	catch
#7
#4	Screen Shot 2015-05-14 at 12.04.27 PM.png	121.07 KB	catch
#1	2488538_safe_markup.patch	1.5 KB	catch
#1
#1	Screen Shot 2015-05-14 at 11.41.32 AM.png	47.72 KB	catch
	Screen Shot 2015-05-14 at 11.08.04 AM.png	116.22 KB	catch
	Screen Shot 2015-05-14 at 11.07.41 AM.png	89.05 KB	catch

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Comment #1

catch

he/him

English

CreditAttribution: catch commented 14 May 2015 at 10:42

Status:

Active

» Needs review

File	Size
Screen Shot 2015-05-14 at 11.41.32 AM.png	47.72 KB
2488538_safe_markup.patch	1.5 KB

Here's a patch. Saves 500kb for me and couldn't spot any double-escaping issues. See what the bot says.

Comment #2

catch

he/him

English

CreditAttribution: catch commented 14 May 2015 at 10:55

Comment #3

Berdir

German

Switzerland

CreditAttribution: Berdir at MD Systems GmbH commented 14 May 2015 at 10:59

The 2mb are I guess with the default set of permissions, it will get bigger when you have more permissions.

Comment #4

catch

he/him

English

CreditAttribution: catch commented 14 May 2015 at 11:06

Issue summary:

View changes

File	Size
Screen Shot 2015-05-14 at 12.04.27 PM.png	121.07 KB

Yep.

Also looks like strtr() isn't helping, especially in Html::cleanCssIdentifier() and Html::getId().

Comment #5

14 May 2015 at 11:12

Status:

Needs review

» Needs work

The last submitted patch, 1: 2488538_safe_markup.patch, failed testing.

Comment #6

catch

he/him

English

CreditAttribution: catch commented 14 May 2015 at 11:20

Other culprits:

Html::getUniqueId() - although the static there looks like 125kb or so, most of it is from getId() and strtr().

system_rebuild_module_data() (between 4-5M) - although apart from additional modules that's a fixed cost, not from the checkboxes.

Comment #7

catch

he/him

English

CreditAttribution: catch commented 14 May 2015 at 12:08

Status:	Needs work	» Needs review
Issue tags:		+D8 Accelerate

File	Size
2488538_safe_markup.patch	3.03 KB

Adding a counter so that duplicate strings don't get removed the first time they're printed.

Also handle setMultiple(), placeholder() etc. that were directly manipulating the variable.

Comment #8

catch

he/him

English

CreditAttribution: catch commented 14 May 2015 at 12:15

File	Size
2488538_safe_markup.patch	2.94 KB

And without debug...

Comment #9

14 May 2015 at 12:22

The last submitted patch, 7: 2488538_safe_markup.patch, failed testing.

Comment #10

14 May 2015 at 12:43

Status:

Needs review

» Needs work

The last submitted patch, 8: 2488538_safe_markup.patch, failed testing.

Comment #11

catch

he/him

English

CreditAttribution: catch commented 14 May 2015 at 13:08

Status:

Needs work

» Needs review

File	Size
2488538_safe_markup.patch	3.25 KB

Broke batch due to strictness in setMultiple only accepting boolean true - this is no longer possible with the counter. Changed to is_numeric() for now.

Comment #12

14 May 2015 at 13:38

Status:

Needs review

» Needs work

The last submitted patch, 11: 2488538_safe_markup.patch, failed testing.

Comment #13

dawehner

German

CreditAttribution: dawehner as a volunteer commented 14 May 2015 at 18:01

+++ b/core/lib/Drupal/Core/Template/TwigExtension.php
@@ -354,6 +354,7 @@ public function escapeFilter(\Twig_Environment $env, $arg, $strategy = 'html', $
       if ($autoescape && SafeMarkup::isSafe($return, $strategy)) {
+        SafeMarkup::remove($return, $strategy);
         return $return;

How do we deal with having the same string rendered twice of the page, should we take that into account?

Comment #14

tim.plunkett

he/him

English

Philadelphia

CreditAttribution: tim.plunkett at Acquia commented 14 May 2015 at 18:29

+++ b/core/lib/Drupal/Component/Utility/SafeMarkup.php
@@ -75,7 +75,28 @@ class SafeMarkup {
+      if (static::$safeStrings[$string][$strategy] == 1) {
+        unset(static::$safeStrings[$string][$strategy]);
+      }
+      else {
+        static::$safeStrings[$string][$strategy]--;
+      }

This is intended to address that, right?

Comment #15

catch

he/him

English

CreditAttribution: catch commented 15 May 2015 at 11:19

Yep, patch takes that into account.

Remaining test failures are where that's falling down a bit, looking at those now.

Comment #16

catch

he/him

English

CreditAttribution: catch commented 15 May 2015 at 15:31

Status:

Needs work

» Needs review

File	Size
2488538-16.patch	14.11 KB

interdiff.txt	8.6 KB

New patch.

So this whole approach creates a restriction where if you generate a string and assign it to a variable, then re-use that variable multiple times (either concatenated or in a render array), then it can end up being double escaped. This happens in very few places in core though and most of those are within tests, the only one I'm concerned about here is the page title when we re-use $page['#title'] - will see what the test coverage says.

FilterAdminTest failures look like a real bug in core: We're supposed check that things aren't double-escaped, but in the page output and the assertion, they actually aren't escaped at all - see screenshot - and the test is testing for the wrong markup (but passes due to this). I wasn't able to reproduce that non-escaping with the standard profile though - need to try again with minimal.

Unless these changes have caused regressions in other tests, this should be green.

Comment #17

catch

he/him

English

CreditAttribution: catch commented 15 May 2015 at 16:01

Confirmed the filter/tips bug in HEAD.

To reproduce:

Configure Basic HTML to show the basic HTML formatting tips.
Visit filter/tips
Note that the first time the tips are displayed, the tips are escaped correctly.
Then note that the second time they're displayed, the tips aren't escaped properly - you see literal link and ampsersand in the examples.

Patch fixes that as a side-effect and the test update just makes the assertion correct.

Comment #18

catch

he/him

English

CreditAttribution: catch commented 15 May 2015 at 16:26

File	Size
interdiff.txt	1.5 KB
2488538-17.patch	14.56 KB

With docs for the two new methods.

Comment #19

dawehner

German

CreditAttribution: dawehner as a volunteer commented 15 May 2015 at 16:59

Just some review

+++ b/core/lib/Drupal/Component/Utility/SafeMarkup.php
@@ -75,7 +75,39 @@ class SafeMarkup {
+    if (!isset(static::$safeStrings[$string][$strategy])) {
+      static::$safeStrings[$string][$strategy] = 1;
+    }
+    else {
+      static::$safeStrings[$string][$strategy]++;
+    }
+    return $string;
+  }
+

It would be cool to have a quick comment about the strategy we apply here in order to save memory.

+++ b/core/lib/Drupal/Core/Render/MainContent/HtmlRenderer.php
@@ -263,7 +264,7 @@ protected function prepare(array $main_content, Request $request, RouteMatchInte
-    $title = isset($main_content['#title']) ? $main_content['#title'] : $this->titleResolver->getTitle($request, $route_match->getRouteObject());
+    $title = isset($main_content['#title']) ? SafeMarkup::set($main_content['#title']) : $this->titleResolver->getTitle($request, $route_match->getRouteObject());

We should document why its safe to set the #title as safe.

Comment #20

catch

he/him

English

CreditAttribution: catch commented 15 May 2015 at 16:59

Issue summary:

View changes

This patch won't be enough by itself, added the other problems found profiling to the issue summary.

Comment #21

Wim Leers

Ghent 🇧🇪🇪🇺

CreditAttribution: Wim Leers at Acquia commented 15 May 2015 at 17:26

+++ b/core/lib/Drupal/Component/Utility/SafeMarkup.php
@@ -75,7 +75,39 @@ class SafeMarkup {
   public static function set($string, $strategy = 'html') {
     $string = (string) $string;
-    static::$safeStrings[$string][$strategy] = TRUE;
+    if (!isset(static::$safeStrings[$string][$strategy])) {
+      static::$safeStrings[$string][$strategy] = 1;
+    }
+    else {
+      static::$safeStrings[$string][$strategy]++;
+    }
+    return $string;
+  }

Like @dawehner said: this really could use a comment.

Comment #22

catch

he/him

English

CreditAttribution: catch commented 15 May 2015 at 17:27

File	Size
2488538-21.patch	15.3 KB

interdiff.txt	2.28 KB

Added explanations per #19.

Comment #23

chx CreditAttribution: chx commented 15 May 2015 at 17:32

- $link_as_code = '<code>' . $link . ''<;/code>';
- $ampersand_as_code = ''<code>' . $ampersand . ''</code>';
+ $link_as_code = ''<code>' . SafeMarkup::checkPlain($link) . ''</code>';

*HUH* was it unsecure before? Why do we checkplain now if we didn't before?

Comment #24

Wim Leers

Ghent 🇧🇪🇪🇺

CreditAttribution: Wim Leers at Acquia commented 15 May 2015 at 17:32

Title:

Each role adds 2mb memory use to the permissions page with standard profile

» Each role adds 2 MB memory use to the permissions page with standard profile

Looks great!

+++ b/core/lib/Drupal/Core/Template/TwigExtension.php
@@ -354,6 +354,7 @@ public function escapeFilter(\Twig_Environment $env, $arg, $strategy = 'html', $
+        SafeMarkup::remove($return, $strategy);

Is this truly the only place that should (and could) call ::remove()?

Comment #25

catch

he/him

English

CreditAttribution: catch commented 15 May 2015 at 17:56

*HUH* was it unsecure before? Why do we checkplain now if we didn't before?

@chx: see #17 to reproduce the core bug.

This change is in a test, if you look at the semantics of the test, it's obvious that it's testing the wrong thing in HEAD. We look for unencoded HTML in-between code tags. If I was to write <code><foo> then as a raw HTML string it needs to be <code>&ltfoo&gt. (edit - and trying to nest code tags is even better...)

In HEAD when you show the HTML tag hints once, it's fine, but the second time (i.e. when two formats are configured to show them) it doesn't get escaped.

#2346209: /filter/tips improperly escaped added this - that was fixing an earlier regression where it was double escaped.

Comment #26

catch

he/him

English

CreditAttribution: catch commented 15 May 2015 at 17:57

@Wim there may be other places we could call remove() - didn't want to look for any more until resolving the test failures though.

Comment #27

catch

he/him

English

CreditAttribution: catch commented 15 May 2015 at 20:00

OK so more precise explanation of where the HEAD bug is:

 \Drupal\filter\Plugin\Filter\FilterHtml

// Paraphrased.
$tips[$tag][1] = '<a href="' . $base_url . '">' . SafeMarkup::checkPlain(\Drupal::config('system.site')->get('name')) . '</a>';

     array('data' => SafeMarkup::format('@var', array('@var' => $tips[$tag][1])), 'class' => array('type')),
          array('data' => SafeMarkup::format($tips[$tag][1]), 'class' => array('get'))

When you have two different formats configured to show the HTML formatting tips, the SafeMarkup calls run twice.

1, When $tips[$tag][1] is passed as @var, it's escaped and marked as safe.

2. When its passed to SafeMarkup::format() as the first argument, it's also marked as safe (unescaped).

3. When once again it's passed as @var, both the escaped and unescaped versions have both been marked as safe, so SafeMarkup doesn't bother to escape something it can see has already been escaped.

The problem is in this case that we actually want the 'double-escaping' here, because we're literally escaping the same string twice.

The patch here fixes that, because after the strings are rendered, they're taken out of the safeStrings array, so they can then be escaped again if needed.

This is a proper edge case where identical escaped and unescaped HTML is rendered more than once during the same request, but IMO it also balances out the other edge cases like the theme_links() test changes).

Comment #28

Fabianx CreditAttribution: Fabianx as a volunteer commented 16 May 2015 at 06:51

The idea is great, but I have huge concerns about the implementation of using a counter.

The problem appears when the string is copied to somewhere else and then checked. That usually does never happen and is an edge-case, but it can happen and then code uses SafeMarkup::check() and the string is marked safe just once.

However we came up with a better approach that directly ties in with this:

#2450993-24: Rendered Cache Metadata created during the main controller request gets lost

The state of the SafeMarkup is directly targeted at what we are rendering, that means if we instead of a global, use our 'rendering context', we have that informations available in a stack like state.

which means that we can remove all strings / reduce the counter after we have rendered the corresponding thing.

While it might look similar it is not, because if we e.g. render a block and it creates some 'safe strings', then we can discard all of that once we are finished with rendering that block (except for the block markup itself obviously).

However when we use a stack instead of a global static, we actually know which safe strings appeared where and we also know exactly when to remove them again.

I might be okay to just let this patch in for removing the criticalness and document clearly that copying a string you need to check if its safe and if it is, increment a usage counter, but it still feels a little brittle to me.

Comment #29

catch

he/him

English

CreditAttribution: catch commented 16 May 2015 at 09:22

Status:

Needs review

» Needs work

The state of the SafeMarkup is directly targeted at what we are rendering, that means if we instead of a global, use our 'rendering context', we have that informations available in a stack like state.

which means that we can remove all strings / reduce the counter after we have rendered the corresponding thing.

So there are three cases here where the behaviour changed in a noticeable way:

- the theme_links() test where the string is prepared then separately rendered multiple times (see test changes where I moved this to a method). I don't think the stack would help there - those would all be fresh rendering stacks no?

- the simpletest results page where images are put in the $image_map variable then re-used - here the stack might well help.

- the filter tips bug that this inadvertently found and fixed - we'd want to make sure there's not a regression.

Also if we completely empty the static when leaving a render scope, that might let us empty the static more completely than the manual call to SafeMarkup::remove() added here, which can't count all cases. So it's worth looking at definitely, but I'm not completely convinced it'll fix all the fragility issues.

I need to do another round of profiling here - as mentioned above this patch isn't in itself nearly enough to fix the overall memory issue on user/permissions it was just the most obvious one to tackle. Also CNW-ing myself since we can improve the filter tips test coverage - pretty sure the modified test would actually pass on HEAD by itself, but we need one that fails with the current bug.

Comment #30

catch

he/him

English

CreditAttribution: catch commented 18 May 2015 at 09:53

Status:

Needs work

» Needs review

File	Size
test_only.patch	1.35 KB

2488538-30.patch	15.84 KB

Here's that test coverage - uploading test-only.patch to show that HEAD fails with it, that's also the interdiff for the latest patch.

Profiling next. I'm not expecting to be this anywhere near sufficient saving to make admin/people/permissions scale, so we may have to open a new meta and make this a child issue.

Comment #31

18 May 2015 at 10:22

The last submitted patch, 30: test_only.patch, failed testing.

Comment #32

catch

he/him

English

CreditAttribution: catch commented 18 May 2015 at 16:43

File	Size
Screen Shot 2015-05-18 at 5.30.51 PM.png	131.22 KB

Uploading an xhprof screenshot.

This shows a diff between admin/people/permissions with just the standard profile, vs. after adding a role.

Can see from this that around 500kb is added by SafeMarkup::set().

1.5mb is added by strtr() - nearly all of this from Html::cleanCssIdentifier() and Html::getId()

Unfortunately the patch only reduces the SafeMarkup::set() overhead by about 250kb at the moment. About 200kb is reduction of the baseline memory usage of the overall request, with about 50kb reduction per-role. So still more to do here.

Comment #33

Fabianx CreditAttribution: Fabianx as a volunteer and at Tag1 Consulting commented 18 May 2015 at 17:39

Might seem totally counter-productive (and might not be possible due to Form API), but we should try if early rendering brings any benefit or where those 2 MB per role come from really.

Comment #34

catch

he/him

English

CreditAttribution: catch commented 18 May 2015 at 20:59

Issue tags:

+Performance

File	Size
strtr.patch	2.18 KB

Factored out strtr() from cleanCssIdentifier() and getId() since this was the single highest contributor to memory usage.

Replaced with preg_replace() - which appears to have lower memory usage.

However, at least measuring with xhprof, I end up with the same overall memory usage for the request. Less from strtr(), preg_replace() doesn't add much back, xhprof measures extra bits for things like element_children() which doesn't make sense.

This brings up the question of whether strtr() is actually using that much memory, vs. whether preg_replace() really isn't using more memory. This can probably only be answered by not using xhprof and doing a raw comparison of memory_get_peak_usage() or similar.

Patch passes unit tests at least so posting, but not sure whether this is really doing anything or not.

Comment #35

catch

he/him

English

CreditAttribution: catch commented 19 May 2015 at 12:36

I lost a comment here.

Did some testing with no xhprof - just memory_get_peak_usage() in index.php. I can reproduce the ~250kb improvement with #30 consistently. However I can also reproduce no improvement with the strtr()/preg_replace() change - which is good there's no memory leak in PHP, but no saving to be had there.

Also xhprof was causing about 10mb of overhead in total - getting numbers closer to 50mb peak memory usage - but still a 2mb increase in memory usage for each additional role. So a few content types and/or more roles and/or more modules installed and this is going to get over quickly.

Comment #36

catch

he/him

English

CreditAttribution: catch commented 21 May 2015 at 13:28

#2488610: Use ModuleHander::getName() instead of rebuilding module data on permissions page was committed. This doesn't help the increase with additional roles, but it gives us an (at least 3mb) lower baseline to start from. That makes me feel like most of what' remaining here is major - in that it'll take more than a couple of roles or several modules before the page goes over 64mb. And the SafeMarkup change can go into a minor version IMO.

However before downgrading, I want to profile the form submission - we disable certain caches on POST and that might end up closer to the 64mb limit.

Comment #37

alexpott

he/they

English

🇪🇺🌍

CreditAttribution: alexpott at Chapter Three commented 21 May 2015 at 18:44

If your site has a 64Mb limit - which is our current minimum - the permissions page will break on PHP5.6 with 15 roles.

Comment #38

aspilicious CreditAttribution: aspilicious commented 21 May 2015 at 18:50

If you have 15 roles you shouldn't be on shared hosting ;)

Comment #39

larowlan

🇦🇺🏝.au GMT+10

CreditAttribution: larowlan as a volunteer commented 24 May 2015 at 00:26

+++ b/core/lib/Drupal/Component/Utility/Html.php
@@ -215,17 +216,18 @@ public static function getUniqueId($id) {
+    // Use three separate preg_replace() calls since PHP has a memory leak when
+    // passing array arguments to preg_replace().

Comment #35 indicates this is no longer the case? Does that mean we need a re-think of this change? At very least need to remove that comment?

Comment #40

catch

he/him

English

CreditAttribution: catch commented 26 May 2015 at 10:15

Title:	Each role adds 2 MB memory use to the permissions page with standard profile	» Add SafeMarkup::remove() to free memory from marked strings when they're printed
Priority:	Critical	» Major

@larowlan as far as I can tell the patch from #35 only affects xhprof's measurement of memory usage, not actual memory usage (when measured with memory_get_peak_usage() in index.php), so should probably be abandoned.

#34 definitely is saving some memory in real terms, just not as much as I'd like.

Now that #2488610: Use ModuleHander::getName() instead of rebuilding module data on permissions page is in, there is more 'head room' on user permissions - the memory usage still goes up pretty quick when adding roles/permissions, but it's starting around 44M for me. So I'm going to re-title this to be about SafeMarkup::remove() and downgrade to major.

Comment #41

Fabianx CreditAttribution: Fabianx as a volunteer and at Tag1 Consulting commented 26 May 2015 at 10:21

Status:

Needs review

» Needs work

And I am gonna mark it needs work for now, because the ::remove still makes me very uncomfortable without a clear defined 'scope' or 'canvas' or 'stack' where those things are used on. Simple reference counting is dangerous as we cannot really keep track of duplicated strings.

Comment #42

xjm

she/her

English

CreditAttribution: xjm at Acquia commented 7 June 2015 at 16:09

Issue tags:

+SafeMarkup

I'm also concerned about the new scope of the issue, because with the current way strings are stored, I believe this would result in the string being double-escaped unintentionally if it was used a second time on the page. An easy case of this would be an HTML filtered field that's printed both in the main content area and the sidebar of the page.

Edit: I totally missed that catch already addressed that in the recent patch. ^ Edit: But also @Fabianx indicates that it may still be an issue...

Comment #43

xjm

she/her

English

CreditAttribution: xjm at Acquia commented 7 June 2015 at 16:14

Issue tags:

+Needs tests

Could we look into that concern by adding some test cases?

Comment #44

catch

he/him

English

CreditAttribution: catch commented 7 June 2015 at 16:31

#30 doesn't address Fabianx's comment. That patch uses a global counter, and it causes double escaping in exactly the case Fabianx talked about, which is the simpletest UI where it creates a safe string then uses it in multiple table rows. For that use-case I had to explicitly mark the string as safe each time it was re-used.

However, I'm not convinced that the render scope change would help with the filter tips UI, where we want to show some HTML both escaped and unescaped twice - in that case HEAD is broken and the patch fixes it. If that case happened within the same overall render scope, then HEAD would still be broken.

So we have two cases in core (outside of actual tests) where the patch changes behaviour noticeably, one of them causes a bug, one of them fixes one.

The big difference with the render scope is we could completely empty the static cache when we leave scope, which would likely save a lot more memory - but I don't think it will help with the filter tips issue, since afaik that happens within the same overall scope.

Vancouver

CreditAttribution: joelpittet as a volunteer commented 10 June 2015 at 00:18

@effulgentsia mentioned on a call that we could possibly use Twig's \Twig_Markup class to help with the memory usage as the return'd object from Renderer::doRender(). Going to open up a new issue to give this a stab. #2503447: Use Twig_Markup to help with the SafeMarkup::$safeStrings memory load from Renderer::doRender()

FYI I was looking for a place to use it to help out with this memory issue.