Problem/Motivation

Steps to reproduce:

1. Install standard profile
2. Visit admin/user/permissions (profiling with xhprof or your profiler of choice)
3. Note that memory usage is similar to the following:

Total Incl. MemUse (bytes):	51,542,168 bytes
Total Incl. PeakMemUse (bytes):	55,488,944 bytes

4. Add a role and profile the permissions page again

memory usage goes up as follows:

Total Incl. MemUse (bytes):	52,504,248 bytes
Total Incl. PeakMemUse (bytes):	57,392,560 bytes

At three custom roles, memory usage had gone up as follows:

Total Incl. MemUse (bytes):	54,607,112 bytes
Total Incl. PeakMemUse (bytes):	61,356,968 bytes

Possible culprits:
SafeMarkup::set() - with a lot of checkboxes the array grows to mb.
#2488610: Use ModuleHander::getName() instead of rebuilding module data on permissions page
Html::cleanCssIdentifier() / Html::getId() / HtmlgetUniqueId() - combination of the strtr() and the static cache in HtmlgetUniqueId()

I think the culprit is SafeMarkup::set() in twig_render_template() - every checkbox is a template render. See attached xhprof screenshots.

Marking this critical for now because #2289201: [Meta] Make drupal install and run within reasonable php memory limits so we can reset the memory requirements to lower levels is setting the standard profile memory limit to 64mb, and it looks like five custom roles and/or additional permissions could easily send this over the top.

Proposed resolution

Evaluate SafeMarkup::set() and possible other causes of high memory usage, try to optimize.

Remaining tasks

User interface changes

None

API changes

TBD, probably additions.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

catch’s picture

Status: Active » Needs review
FileSize
47.72 KB
1.5 KB

Here's a patch. Saves 500kb for me and couldn't spot any double-escaping issues. See what the bot says.

catch’s picture

Berdir’s picture

The 2mb are I guess with the default set of permissions, it will get bigger when you have more permissions.

catch’s picture

Issue summary: View changes
FileSize
121.07 KB

Yep.

Also looks like strtr() isn't helping, especially in Html::cleanCssIdentifier() and Html::getId().

Status: Needs review » Needs work

The last submitted patch, 1: 2488538_safe_markup.patch, failed testing.

catch’s picture

Other culprits:

Html::getUniqueId() - although the static there looks like 125kb or so, most of it is from getId() and strtr().

system_rebuild_module_data() (between 4-5M) - although apart from additional modules that's a fixed cost, not from the checkboxes.

catch’s picture

Status: Needs work » Needs review
Issue tags: +D8 Accelerate
FileSize
3.03 KB

Adding a counter so that duplicate strings don't get removed the first time they're printed.

Also handle setMultiple(), placeholder() etc. that were directly manipulating the variable.

catch’s picture

FileSize
2.94 KB

And without debug...

The last submitted patch, 7: 2488538_safe_markup.patch, failed testing.

Status: Needs review » Needs work

The last submitted patch, 8: 2488538_safe_markup.patch, failed testing.

catch’s picture

Status: Needs work » Needs review
FileSize
3.25 KB

Broke batch due to strictness in setMultiple only accepting boolean true - this is no longer possible with the counter. Changed to is_numeric() for now.

Status: Needs review » Needs work

The last submitted patch, 11: 2488538_safe_markup.patch, failed testing.

dawehner’s picture

+++ b/core/lib/Drupal/Core/Template/TwigExtension.php
@@ -354,6 +354,7 @@ public function escapeFilter(\Twig_Environment $env, $arg, $strategy = 'html', $
       if ($autoescape && SafeMarkup::isSafe($return, $strategy)) {
+        SafeMarkup::remove($return, $strategy);
         return $return;

How do we deal with having the same string rendered twice of the page, should we take that into account?

tim.plunkett’s picture

+++ b/core/lib/Drupal/Component/Utility/SafeMarkup.php
@@ -75,7 +75,28 @@ class SafeMarkup {
+      if (static::$safeStrings[$string][$strategy] == 1) {
+        unset(static::$safeStrings[$string][$strategy]);
+      }
+      else {
+        static::$safeStrings[$string][$strategy]--;
+      }

This is intended to address that, right?

catch’s picture

Yep, patch takes that into account.

Remaining test failures are where that's falling down a bit, looking at those now.

catch’s picture

Status: Needs work » Needs review
FileSize
14.11 KB
8.6 KB

New patch.

So this whole approach creates a restriction where if you generate a string and assign it to a variable, then re-use that variable multiple times (either concatenated or in a render array), then it can end up being double escaped. This happens in very few places in core though and most of those are within tests, the only one I'm concerned about here is the page title when we re-use $page['#title'] - will see what the test coverage says.

FilterAdminTest failures look like a real bug in core: We're supposed check that things aren't double-escaped, but in the page output and the assertion, they actually aren't escaped at all - see screenshot - and the test is testing for the wrong markup (but passes due to this). I wasn't able to reproduce that non-escaping with the standard profile though - need to try again with minimal.

Unless these changes have caused regressions in other tests, this should be green.

catch’s picture

Confirmed the filter/tips bug in HEAD.

To reproduce:

Configure Basic HTML to show the basic HTML formatting tips.
Visit filter/tips
Note that the first time the tips are displayed, the tips are escaped correctly.
Then note that the second time they're displayed, the tips aren't escaped properly - you see literal link and ampsersand in the examples.

Patch fixes that as a side-effect and the test update just makes the assertion correct.

catch’s picture

FileSize
1.5 KB
14.56 KB

With docs for the two new methods.

dawehner’s picture

Just some review

  1. +++ b/core/lib/Drupal/Component/Utility/SafeMarkup.php
    @@ -75,7 +75,39 @@ class SafeMarkup {
    +    if (!isset(static::$safeStrings[$string][$strategy])) {
    +      static::$safeStrings[$string][$strategy] = 1;
    +    }
    +    else {
    +      static::$safeStrings[$string][$strategy]++;
    +    }
    +    return $string;
    +  }
    +
    

    It would be cool to have a quick comment about the strategy we apply here in order to save memory.

  2. +++ b/core/lib/Drupal/Core/Render/MainContent/HtmlRenderer.php
    @@ -263,7 +264,7 @@ protected function prepare(array $main_content, Request $request, RouteMatchInte
    -    $title = isset($main_content['#title']) ? $main_content['#title'] : $this->titleResolver->getTitle($request, $route_match->getRouteObject());
    +    $title = isset($main_content['#title']) ? SafeMarkup::set($main_content['#title']) : $this->titleResolver->getTitle($request, $route_match->getRouteObject());
    

    We should document why its safe to set the #title as safe.

catch’s picture

Issue summary: View changes

This patch won't be enough by itself, added the other problems found profiling to the issue summary.

Wim Leers’s picture

+++ b/core/lib/Drupal/Component/Utility/SafeMarkup.php
@@ -75,7 +75,39 @@ class SafeMarkup {
   public static function set($string, $strategy = 'html') {
     $string = (string) $string;
-    static::$safeStrings[$string][$strategy] = TRUE;
+    if (!isset(static::$safeStrings[$string][$strategy])) {
+      static::$safeStrings[$string][$strategy] = 1;
+    }
+    else {
+      static::$safeStrings[$string][$strategy]++;
+    }
+    return $string;
+  }

Like @dawehner said: this really could use a comment.

catch’s picture

FileSize
15.3 KB
2.28 KB

Added explanations per #19.

chx’s picture

- $link_as_code = '<code>' . $link . ''<;/code>';
- $ampersand_as_code = ''<code>' . $ampersand . ''</code>';
+ $link_as_code = ''<code>' . SafeMarkup::checkPlain($link) . ''</code>';

*HUH* was it unsecure before? Why do we checkplain now if we didn't before?

Wim Leers’s picture

Title: Each role adds 2mb memory use to the permissions page with standard profile » Each role adds 2 MB memory use to the permissions page with standard profile

Looks great!

+++ b/core/lib/Drupal/Core/Template/TwigExtension.php
@@ -354,6 +354,7 @@ public function escapeFilter(\Twig_Environment $env, $arg, $strategy = 'html', $
+        SafeMarkup::remove($return, $strategy);

Is this truly the only place that should (and could) call ::remove()?

catch’s picture

*HUH* was it unsecure before? Why do we checkplain now if we didn't before?

@chx: see #17 to reproduce the core bug.

This change is in a test, if you look at the semantics of the test, it's obvious that it's testing the wrong thing in HEAD. We look for unencoded HTML in-between code tags. If I was to write <code><foo> then as a raw HTML string it needs to be <code>&ltfoo&gt. (edit - and trying to nest code tags is even better...)

In HEAD when you show the HTML tag hints once, it's fine, but the second time (i.e. when two formats are configured to show them) it doesn't get escaped.

#2346209: /filter/tips improperly escaped added this - that was fixing an earlier regression where it was double escaped.

catch’s picture

@Wim there may be other places we could call remove() - didn't want to look for any more until resolving the test failures though.

catch’s picture

OK so more precise explanation of where the HEAD bug is:

 \Drupal\filter\Plugin\Filter\FilterHtml
// Paraphrased.
$tips[$tag][1] = '<a href="' . $base_url . '">' . SafeMarkup::checkPlain(\Drupal::config('system.site')->get('name')) . '</a>';

     array('data' => SafeMarkup::format('@var', array('@var' => $tips[$tag][1])), 'class' => array('type')),
          array('data' => SafeMarkup::format($tips[$tag][1]), 'class' => array('get'))

When you have two different formats configured to show the HTML formatting tips, the SafeMarkup calls run twice.

1, When $tips[$tag][1] is passed as @var, it's escaped and marked as safe.

2. When its passed to SafeMarkup::format() as the first argument, it's also marked as safe (unescaped).

3. When once again it's passed as @var, both the escaped and unescaped versions have both been marked as safe, so SafeMarkup doesn't bother to escape something it can see has already been escaped.

The problem is in this case that we actually want the 'double-escaping' here, because we're literally escaping the same string twice.

The patch here fixes that, because after the strings are rendered, they're taken out of the safeStrings array, so they can then be escaped again if needed.

This is a proper edge case where identical escaped and unescaped HTML is rendered more than once during the same request, but IMO it also balances out the other edge cases like the theme_links() test changes).

Fabianx’s picture

The idea is great, but I have huge concerns about the implementation of using a counter.

The problem appears when the string is copied to somewhere else and then checked. That usually does never happen and is an edge-case, but it can happen and then code uses SafeMarkup::check() and the string is marked safe just once.

However we came up with a better approach that directly ties in with this:

#2450993-24: Rendered Cache Metadata created during the main controller request gets lost

The state of the SafeMarkup is directly targeted at what we are rendering, that means if we instead of a global, use our 'rendering context', we have that informations available in a stack like state.

which means that we can remove all strings / reduce the counter after we have rendered the corresponding thing.

While it might look similar it is not, because if we e.g. render a block and it creates some 'safe strings', then we can discard all of that once we are finished with rendering that block (except for the block markup itself obviously).

However when we use a stack instead of a global static, we actually know which safe strings appeared where and we also know exactly when to remove them again.

--

I might be okay to just let this patch in for removing the criticalness and document clearly that copying a string you need to check if its safe and if it is, increment a usage counter, but it still feels a little brittle to me.

catch’s picture

Status: Needs review » Needs work

The state of the SafeMarkup is directly targeted at what we are rendering, that means if we instead of a global, use our 'rendering context', we have that informations available in a stack like state.

which means that we can remove all strings / reduce the counter after we have rendered the corresponding thing.

So there are three cases here where the behaviour changed in a noticeable way:

- the theme_links() test where the string is prepared then separately rendered multiple times (see test changes where I moved this to a method). I don't think the stack would help there - those would all be fresh rendering stacks no?

- the simpletest results page where images are put in the $image_map variable then re-used - here the stack might well help.

- the filter tips bug that this inadvertently found and fixed - we'd want to make sure there's not a regression.

Also if we completely empty the static when leaving a render scope, that might let us empty the static more completely than the manual call to SafeMarkup::remove() added here, which can't count all cases. So it's worth looking at definitely, but I'm not completely convinced it'll fix all the fragility issues.

I need to do another round of profiling here - as mentioned above this patch isn't in itself nearly enough to fix the overall memory issue on user/permissions it was just the most obvious one to tackle. Also CNW-ing myself since we can improve the filter tips test coverage - pretty sure the modified test would actually pass on HEAD by itself, but we need one that fails with the current bug.

catch’s picture

Status: Needs work » Needs review
FileSize
1.35 KB
15.84 KB

Here's that test coverage - uploading test-only.patch to show that HEAD fails with it, that's also the interdiff for the latest patch.

Profiling next. I'm not expecting to be this anywhere near sufficient saving to make admin/people/permissions scale, so we may have to open a new meta and make this a child issue.

The last submitted patch, 30: test_only.patch, failed testing.

catch’s picture

Uploading an xhprof screenshot.

This shows a diff between admin/people/permissions with just the standard profile, vs. after adding a role.

Can see from this that around 500kb is added by SafeMarkup::set().

1.5mb is added by strtr() - nearly all of this from Html::cleanCssIdentifier() and Html::getId()

Unfortunately the patch only reduces the SafeMarkup::set() overhead by about 250kb at the moment. About 200kb is reduction of the baseline memory usage of the overall request, with about 50kb reduction per-role. So still more to do here.

Fabianx’s picture

Might seem totally counter-productive (and might not be possible due to Form API), but we should try if early rendering brings any benefit or where those 2 MB per role come from really.

catch’s picture

Issue tags: +Performance
FileSize
2.18 KB

Factored out strtr() from cleanCssIdentifier() and getId() since this was the single highest contributor to memory usage.

Replaced with preg_replace() - which appears to have lower memory usage.

However, at least measuring with xhprof, I end up with the same overall memory usage for the request. Less from strtr(), preg_replace() doesn't add much back, xhprof measures extra bits for things like element_children() which doesn't make sense.

This brings up the question of whether strtr() is actually using that much memory, vs. whether preg_replace() really isn't using more memory. This can probably only be answered by not using xhprof and doing a raw comparison of memory_get_peak_usage() or similar.

Patch passes unit tests at least so posting, but not sure whether this is really doing anything or not.

catch’s picture

I lost a comment here.

Did some testing with no xhprof - just memory_get_peak_usage() in index.php. I can reproduce the ~250kb improvement with #30 consistently. However I can also reproduce no improvement with the strtr()/preg_replace() change - which is good there's no memory leak in PHP, but no saving to be had there.

Also xhprof was causing about 10mb of overhead in total - getting numbers closer to 50mb peak memory usage - but still a 2mb increase in memory usage for each additional role. So a few content types and/or more roles and/or more modules installed and this is going to get over quickly.

catch’s picture

#2488610: Use ModuleHander::getName() instead of rebuilding module data on permissions page was committed. This doesn't help the increase with additional roles, but it gives us an (at least 3mb) lower baseline to start from. That makes me feel like most of what' remaining here is major - in that it'll take more than a couple of roles or several modules before the page goes over 64mb. And the SafeMarkup change can go into a minor version IMO.

However before downgrading, I want to profile the form submission - we disable certain caches on POST and that might end up closer to the 64mb limit.

alexpott’s picture

If your site has a 64Mb limit - which is our current minimum - the permissions page will break on PHP5.6 with 15 roles.

aspilicious’s picture

If you have 15 roles you shouldn't be on shared hosting ;)

larowlan’s picture

+++ b/core/lib/Drupal/Component/Utility/Html.php
@@ -215,17 +216,18 @@ public static function getUniqueId($id) {
+    // Use three separate preg_replace() calls since PHP has a memory leak when
+    // passing array arguments to preg_replace().

Comment #35 indicates this is no longer the case? Does that mean we need a re-think of this change? At very least need to remove that comment?

catch’s picture

Title: Each role adds 2 MB memory use to the permissions page with standard profile » Add SafeMarkup::remove() to free memory from marked strings when they're printed
Priority: Critical » Major

@larowlan as far as I can tell the patch from #35 only affects xhprof's measurement of memory usage, not actual memory usage (when measured with memory_get_peak_usage() in index.php), so should probably be abandoned.

#34 definitely is saving some memory in real terms, just not as much as I'd like.

Now that #2488610: Use ModuleHander::getName() instead of rebuilding module data on permissions page is in, there is more 'head room' on user permissions - the memory usage still goes up pretty quick when adding roles/permissions, but it's starting around 44M for me. So I'm going to re-title this to be about SafeMarkup::remove() and downgrade to major.

Fabianx’s picture

Status: Needs review » Needs work

And I am gonna mark it needs work for now, because the ::remove still makes me very uncomfortable without a clear defined 'scope' or 'canvas' or 'stack' where those things are used on. Simple reference counting is dangerous as we cannot really keep track of duplicated strings.

xjm’s picture

Issue tags: +SafeMarkup

I'm also concerned about the new scope of the issue, because with the current way strings are stored, I believe this would result in the string being double-escaped unintentionally if it was used a second time on the page. An easy case of this would be an HTML filtered field that's printed both in the main content area and the sidebar of the page.

Edit: I totally missed that catch already addressed that in the recent patch. ^ Edit: But also @Fabianx indicates that it may still be an issue...

xjm’s picture

Issue tags: +Needs tests

Could we look into that concern by adding some test cases?

catch’s picture

#30 doesn't address Fabianx's comment. That patch uses a global counter, and it causes double escaping in exactly the case Fabianx talked about, which is the simpletest UI where it creates a safe string then uses it in multiple table rows. For that use-case I had to explicitly mark the string as safe each time it was re-used.

However, I'm not convinced that the render scope change would help with the filter tips UI, where we want to show some HTML both escaped and unescaped twice - in that case HEAD is broken and the patch fixes it. If that case happened within the same overall render scope, then HEAD would still be broken.

So we have two cases in core (outside of actual tests) where the patch changes behaviour noticeably, one of them causes a bug, one of them fixes one.

The big difference with the render scope is we could completely empty the static cache when we leave scope, which would likely save a lot more memory - but I don't think it will help with the filter tips issue, since afaik that happens within the same overall scope.

pwolanin’s picture

This sounds like a problematic approach. Why not sha256 or sha512 any string longer than the output of the respective function when storing it in the safe markup array? That would bound the memory consumed for a fixed number of safe strings.

pwolanin’s picture

Here's an issue to try the hashing - needs someone to profile: #2503445: Modify SafeMarkup to use hashed strings from marked strings to minimize memory usage

joelpittet’s picture

@effulgentsia mentioned on a call that we could possibly use Twig's \Twig_Markup class to help with the memory usage as the return'd object from Renderer::doRender(). Going to open up a new issue to give this a stab. #2503447: Use Twig_Markup to help with the SafeMarkup::$safeStrings memory load from Renderer::doRender()

FYI I was looking for a place to use it to help out with this memory issue.

xjm’s picture

xjm’s picture

alexpott’s picture

Status: Needs work » Closed (won't fix)

We're now mostly creating SafeStrings instead of SafeMarkup::set() on high volume stuff like the render system and low level filtering. See #2506581: Remove SafeMarkup::set() from Renderer::doRender and #2506195: Remove SafeMarkup::set() from Xss::filter() amongst others.

catch’s picture

Yes it's great that we can won't fix this!