Profiled a cached page running memcache and noticed that wildcards were being fetched even though only variable_init() and page cache were calling cache_get(). Attached patch avoids fetching wildcards that are longer than the cid passed to cache_get().

Comments

catch’s picture

While thinking about this I thought there was the possibility that later cache_get() to the same table would skip fetching wildcards that were needed, but reviewing the code again we're actually fine with this patch since it runs the same logic for each $cid and always fetches new wildcards if they're not already cached.

I'd considered comparing the length of the cid to the max() of the wildcard lengths array before going into the foreach(), but that array shouldn't get too big so doubt this would be measurable, and having the multiget fetch the smaller array should be worthwhile in itself, so leaving this as is for now.

catch’s picture

StatusFileSize
new2.13 KB

This caused two test failures when cache lifetime wasn't in use due to > instead of <.

Fixing it took some time - the problem is that global wildcard flushes using '*' use the same wildcard system as everything else - so there will always be a wildcard shorter than the cid being checked if there's ever been a drush cc all.

I added some special casing for this, which allows the tests to pass, and keeps the vast majority of the performance improvement, but it's not pretty and could likely be optimized a bit - i.e. it feels like it should be possible to use the variable_set()/variable_get() logic only for '*' wildcards (i.e. the same logic as is used when cache_lifetime is on) - however trying to do this caused test failures too.

catch’s picture

StatusFileSize
new8.43 KB

Alright here's a cleaner version, skips the wildcard logic altogether for empty cids, had to make several changes to the test to get them to pass since we use REQUEST_TIME so much.

Note that existing installs will have an entry for cid - '' in memcache_wildcard_flushes - we may want to consider an update to clean that out but it'll expire after 28 days.

catch’s picture

Marked #1091672: How does the getMulti() work? as duplicate since this should improve the stats.

catch’s picture

Assigned: Unassigned » catch
basic’s picture

StatusFileSize
new6.4 KB

fix some cruft with latest patch

catch’s picture

Another idea here, should probably go in after this patch and with separate testing.

A wildcard clear has to have at least one character in it. So instead of a single wildcard_lengths array, we could instead key that by the first letter of the wildcard clear (since we can easily get that from the $cid too). Then a clear for 'node:' and a clear for 'variable' would have different set of wildcard counts - they'd be keyed by 'n' and 'v' respectively. That could further reduce the number of wildcards we need to fetch, without bloating the size of the variable much.

catch’s picture

StatusFileSize
new10.34 KB

Patch didn't apply. The test changes that basic removed look like they're needed (although I'm not happy that we need all that crap in the tests to make them pass), so I added that back, but kept the removal of the redundant $cache->created < $cache_flush.

catch’s picture

StatusFileSize
new9.65 KB

Should fix re-roll errors now.

jeremy’s picture

Status: Needs review » Needs work
StatusFileSize
new752 bytes
new763 bytes
new752 bytes
new751 bytes

I ran the patch through our JMeter test suite, results attached.

  • 6.x-dev-baseline is w/o a minimum cache lifetime
  • 6.x-dev_1103478 is w/o a minimum cache lifetime
  • 6.x-dev-baseline-min is w/ a 10 minute minimum cache lifetime
  • 6.x-dev_1103478-min is w/ a 10 minute minimum cache lifetime

6.x-dev-baseline is the latest 6.x-1.x-dev development version. 6.x-dev_1103478 adds in your patch.

My biggest concern is the drop in performance with the minimum cache lifetime enabled. We go from serving 20.7 pages per second pre-patch, to 17.65 pages per second w/ the patch. I ran the tests three times on each, and then repeated all the tests again three times apiece to rule out this regression being fluctuations in the Rackspace cloud.

catch’s picture

Status: Needs work » Needs review
StatusFileSize
new9.74 KB

Reproduced those results with the same test suite.

Took a while to track down, but I had completely broken minimum cache lifetime with this patch. Attached patch fixes that and produces the following result:

STAT total_connections 11861
STAT cmd_get 120072
STAT cmd_set 56825
STAT get_hits 62401
STAT get_misses 57671
STAT incr_hits 0
STAT bytes_read 66913520
STAT bytes_written 166436824
STAT evictions 2815
STAT total_items 56825

Hit rate: 51.9600%
Miss rate: 48.0300%

HTTP return codes:  20x(11848) 30x(1313) 40x(2)  50x(0)
 200: 11848
 302: 1313

Pages per second: 20.79

This is still no faster in the load tests. I'm going to do a simple read-only load test locally now to see how that looks.

Also we could definitely use tests for minimum cache lifetime, will open an issue for those.

catch’s picture

So with this version of the patch, read only, there is no performance change either way if there is a cache lifetime - this is expected since very little changes (when it's not broken).

When there's no cache lifetime, there is an improvement on cached page requests from 167.96 [#/sec] to 175.31 [#/sec] - this is because we reduce memcache get calls from 3 to 2 in that situation (only get variable + page cache, instead of variables + page cache + wildcards).

With the jmeter tests, there is no performance improvement, this is very likely because we're replacing a memcache set with a variable_set(), and that could cancel out any gain we make on reads, but it's not entirely possible to tell this from the memcache stats.

What you can see in the memcache stats with no cache_lifetime is that incr hits drop from 584 to 0. This is the effect of the new logic - however it means that all requests (including cached pages) are going to have at least 584 misses on the variable cache, possibly more than this since 6.x has no locking. So... these misses may cancel out any gains made elsewhere.

Here's local ab results for just anonymous cached pages, you can see the improvement there. Profiling shows the wildcard logic doesn't kick in with the patch applied:

No patch:


Server Software:        Apache/2.2.14
Server Hostname:        d6.6
Server Port:            80

Document Path:          /
Document Length:        5457 bytes

Concurrency Level:      1
Time taken for tests:   5.954 seconds
Complete requests:      1000
Failed requests:        0
Write errors:           0
Total transferred:      5941000 bytes
HTML transferred:       5457000 bytes
Requests per second:    167.96 [#/sec] (mean)
Time per request:       5.954 [ms] (mean)
Time per request:       5.954 [ms] (mean, across all concurrent requests)
Transfer rate:          974.45 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:     4    6   1.1      6      23
Waiting:        4    6   1.1      6      23
Total:          4    6   1.1      6      23

Patch:


Server Software:        Apache/2.2.14
Server Hostname:        d6.6
Server Port:            80

Document Path:          /
Document Length:        5457 bytes

Concurrency Level:      1
Time taken for tests:   5.666 seconds
Complete requests:      1000
Failed requests:        0
Write errors:           0
Total transferred:      5941059 bytes
HTML transferred:       5457000 bytes
Requests per second:    176.48 [#/sec] (mean)
Time per request:       5.666 [ms] (mean)
Time per request:       5.666 [ms] (mean, across all concurrent requests)
Transfer rate:          1023.92 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:     4    6   2.0      6      62
Waiting:        4    5   2.0      5      62
Total:          4    6   2.0      6      62

And here's the jmeter results, with minimum_cache_lifetime and the latest patch - which show more or less no change at all (which is what we want).

No patch:

STAT total_connections 11900
STAT cmd_get 119918
STAT cmd_set 57119
STAT get_hits 61954
STAT get_misses 57964
STAT incr_hits 0
STAT bytes_read 67107870
STAT bytes_written 166243376
STAT evictions 6431
STAT total_items 57119

Hit rate: 51.6600%
Miss rate: 48.3300%

HTTP return codes:  20x(11887) 30x(1312) 40x(2)  50x(0)
 200: 11887
 302: 1312

Pages per second: 20.75

Patch:

STAT total_connections 11930
STAT cmd_get 120452
STAT cmd_set 57128
STAT get_hits 62476
STAT get_misses 57976
STAT incr_hits 0
STAT bytes_read 67003559
STAT bytes_written 166544455
STAT evictions 5916
STAT total_items 57128

Hit rate: 51.8600%
Miss rate: 48.1300%

HTTP return codes:  20x(11917) 30x(1316) 40x(2)  50x(0)
 200: 11917
 302: 1316

Pages per second: 21.80

There are open issues to address variable_set() performance, like #987768: [PP-1] Optimize variable caching and linked issues. So this isn't necessarily bad, but I want to get just a bit more data here before we commit this. I'm also wondering if we should consider 'fixing' variable set in memcache.inc (like a memcache_variable_set() that uses more or less the same logic as the 7.x/8.x patch - cache_get() - replace - cache_set() again) until this is resolved in core.

Jeremy, could we look at splitting up the jmeter tests a bit?

It would be good to compare:

1. anonymous users only, no writes.
2. authenticated users only, no writes.
3. authenticated users, mainly writes.
4. combination same as now.

Then it'd be easier to confirm where the trade-offs are.

catch’s picture

Another feature request for the load tests:

For this particular case, there's only a performance improvement on each bin if there's previously been a wildcard clear on the bin. I think that may not be the case with the sites being installed here (although I'm not sure either way).

If we did a drush cc all just before running the actual jmeter test, that'd ensure at least one wildcard clear against each bin is registered, and it should be easier to see the improvement here.

I tried to think of a place with wildcard prefix clears in core that would be good to add for the authenticated threads, but nothing is coming right now. But again if there's no such clear on a bin, we won't see any improvement here since we already avoid fetching wildcards if there's been no wildcard clears at all.

catch’s picture

StatusFileSize
new11.28 KB

Actually we don't necessarily need to alter the load test plan, just need to confirm that a faster version of variable_set() fixes the issue.

Added memcache_variable_set() - this grabs a fresh copy of the cache, replaces the variable, then sets it again, instead of clearing the variables cache - means that only the request setting the variable knows anything about it (just about). Patch passes tests locally.

I'll do a new jmeter run against this version of the patch. We might not want to commit like this, but it should show what's going on better.

catch’s picture

Before and after with minimum cache lifetime, still no measurable improvement from jmeter:

Before:

STAT total_connections 11897
STAT cmd_get 119649
STAT cmd_set 57018
STAT get_hits 61788
STAT get_misses 57861
STAT incr_hits 0
STAT bytes_read 66937469
STAT bytes_written 165947435
STAT evictions 6464
STAT total_items 57018

Hit rate: 51.6400%
Miss rate: 48.3500%

HTTP return codes:  20x(11884) 30x(1309) 40x(2)  50x(0)
 200: 11884
 302: 1309

Pages per second: 20.84

With patch:

STAT total_connections 11958
STAT cmd_get 120723
STAT cmd_set 57606
STAT get_hits 63009
STAT get_misses 57714
STAT incr_hits 0
STAT bytes_read 67531484
STAT bytes_written 167806897
STAT evictions 6716
STAT total_items 57606

Hit rate: 52.1900%
Miss rate: 47.8000%

HTTP return codes:  20x(11945) 30x(1314) 40x(2)  50x(0)
 200: 11945
 302: 1314

Pages per second: 20.98
catch’s picture

StatusFileSize
new12.65 KB

Added tests for wildcard flushes, those tests actually fail with current -dev (if you revert memcache.inc but leave the test changes in, then the last assertion fails). So I'd like to get this in sooner rather than later.

This patch now incorporates #1128114: Expand on cache_lifetime tests and #971814: Undefined index cache flush (the latter is necessary to avoid an exception when running the tests).

catch’s picture

Version: 6.x-1.x-dev » 7.x-1.x-dev
Status: Needs review » Patch (to be ported)

I went ahead and committed this with http://drupalcode.org/project/memcache.git/commit/2603dd7

We need to port this to 7.x.

berdir’s picture

Assigned: catch » berdir

Subscribe, will try to work on this one

catch’s picture

Priority: Normal » Critical

Bumping, I'd forgotten this actually fixed a bug alongside the optimization.

das-peter’s picture

sub

berdir’s picture

Status: Patch (to be ported) » Needs review
StatusFileSize
new13.2 KB

I proudly present the first working version, all tests (except IMHO unrelated exception in the session tests) are passing again!

I'm pretty sure this isn't all correct yet, I just tried until the tests passed.

Notes:

- It took me a while to figure out that a large part of #1134242: Review cache_lifetime behaviour. changes code added by this patch. Since I started with that other issue, it confused the hell out of me :) I have now merged both patches together so that I was actually able to verify that the tests work.

- I added a debug_backtrace() check to detect cache_clear_all() calls.

- One problem I had with the tests, and a difference to D6 is that the variables are only loaded when the cache object is initalized. Which meant that most of the variable_set() calls in the tests had no effect at all. I am not sure what is the best way to solve this. For now, I moved the variable_get() calls in memcache.inc in a separate, public function, that I can call from the tests to update the values. Other possible solutions, that I haven't tested:
1. Reload all variables that *might* have changed on every set/get/.. call.
2. Always use a custom cache bin for the tests to avoid pre-initialized cache objects. That should save the call I've added to setUp(), but others are still required. Which is probably ok.

I actually prefer the second solution, but I wanted to get some feedback on the patch as a whole and this issue specifically first.

- I *think* I ported all hunks from the previous patch, but it is possible that I missed something, I had to re-apply pretty much everything manually :)

catch’s picture

I'm happy to commit this with the debug_backtrace() hack, thanks for adding the @todo.

On the variables the custom memcache bin sounds like a good plan but not sure that has to happen here.

The logic looks right to me but I should probably run this locally (and through a profiler) to make the wildcard skipping is working as expected.

Thanks for tackling this!

catch’s picture

I'm getting this when running tests:

An AJAX HTTP error occurred. HTTP Result Code: 200 Debugging information follows. Path: /batch?id=6&op=do StatusText: OK ResponseText: Fatal error: Call to undefined method DrupalDatabaseCache::reloadVariables() in memcache/tests/memcache.test on line 99

Not sure why I'm getting this yet, need to debug where the default is getting set to that in the first place.

berdir’s picture

This means you are running the tests against the DrupalDatabaseCache.

I had the same issue, and the only way to get it working was to manually override the default value in _cache_get_object().

I could add a check there, but it's kinda pointless to run the tests against the database backend anyway.

catch’s picture

StatusFileSize
new15.63 KB

This is due to DrupalWebTestCase resetting $conf.

I added some extra workarounds to the test - we always use a cache_memcache bin, and I added a check for the MEMCACHE_CONTENT_CLEAR constant in memcache.inc even though currently only the test uses this (this means the actual cache_clear_all()/backtrace logic isn't tested, but hopefully that is temporary until core's fixed anyway).

Tests now pass for me without needing to hack core to get them to run (except the session test).

I also profiled and the wildcard logic is doing exactly what it's supposed to (only looked quickly but seems right to me).
Uploading updated patch. It's getting late here but I'll likely commit this tomorrow to get things moving. We could do with MemcacheWebTestCase extends DrupalWebTestCase and some other stuff to tidy the tests up but don't want to hold this up on any of that.

berdir’s picture

+++ b/tests/memcache.testundefined

@@ -234,10 +235,11 @@ class MemCacheClearCase extends MemcacheTestCase {
   }
 
   function setUp() {
-    $this->default_bin = 'cache_page';
-    $this->default_value = $this->randomName(10);
 
     parent::setUp();
+    $this->default_bin = 'cache_memcache';
+    variable_set('cache_class_cache_memcache', 'MemcacheDrupal');
+    $this->default_value = $this->randomName(10);
   }

Hm, it shouldn't be necessary to set the cache class in every setUp(), they all extend MemcacheTestCase and call parent::setUp(), where this is set up. The overriding values need to be removed, though.

Powered by Dreditor.

pillarsdotnet’s picture

Status: Needs review » Needs work

Until memcache can pass its own tests, I don't see the point in trying to port patches from 6.x to 7.x, as there is no (automated) way to tell whether the ported patch breaks anything.

berdir’s picture

That's the point, this issue *fixes* the tests.

They all work except an unrelated notice about an undefined constant in the session tests. And I am not sure if they even have been ported yet anyway.

pillarsdotnet’s picture

Okay; then I was wasting my time. Wish someone would have told me sooner.
Closed (duplicate): #1246796-16: Memcache fails its own tests.

catch’s picture

Status: Needs work » Fixed

@pillarsdotnet, in #1246796: Memcache fails its own tests. I tried to make it clear this couldn't be fixed without the core issue with cache_clear_all() being dealt with, however I had not considered working around that with debug_backtrace(), which is ugly as hell but allows this to proceed without waiting on API changes.

I cleaned up the mess I'd added in setUp(), ran tests again locally to make sure things were still running OK, and went ahead and committed/pushed this to 7.x-1.x.

This will remove the critical bug here, and should unblock getting some other issues moving as well. If there's more cleanup to do we can attack that separately.

Thanks for working on this!

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.