When doing a lot of wildcard cache clears like cache_clear_all("node:render:$nid", 'cache_performance_hacks', TRUE); a lock is created for the entire cache bin.
Two issues with this:
1. usleep(1000) isn't long to wait, so there could be a lock stampede - patch converts this to use the locking framework, completely untested.
2. locking clears on the whole bin, which in our cases has entries for 3 million nodes, means it's quite likely that clears are going to run into a lock.
Jeremy going to look at a hash_wrapper lookup for prefixes so we don't have to lock and search a single array for the entire bin, which would fix this a lot more solidly.
Comments
Comment #1
jeremy commentedHere's an attempt at cleaning this up. The hashing function needs more love to better spread out unique cid's, and overall should be optimized to use multiget rather than making so many wildcard queries.
Feedback and testing very welcome.
Comment #2
catchDoes the $cid = '' hunk now need to happen above empty($cid)?
Will try to review the patch properly tomorrow. The rest is thoughts not really directly related to the patch, but I think it's feasible to keep gets down at least in the performance_hacks case.
Haven't looked at code for this, but I think we can optimize this at least for the common case where wildcard clears end with an int.
In cache_performance_hacks() we have cids like 'node:render:1234:foo:bar:baz'
These are always cleared as cache_clear_all('node:render:1234', 'cache_performance_hacks');
On clearing, it would be easy to determine if the end of the wildcard is an int, then segment the wildcard directory numerically - say in chunks of 1000. These will be different sizes in reality of course depending on how many nodes are actually cached with nid 1-1000.
On ->get(), we need to account for the fact that that we could be doing cache_get('node:render:1234:foo:bar:bar') or cache_get('node:render:1234:z:x); when you mentioned this I more or less gave up..
However, from the clears, we could determine that they always match the pattern node:render:$int I think.
So... as well as numeric segmentation, we could also store the wildcard patterns which have been used to clear bins - meaning we only need to confirm that node:render:$int if that's the only way things ever get cleared. Not sure how best to figure out that node:render:1234 and node:render:654321 are the same pattern - it might mean requiring this pattern (colon delimiters and an int at the end, split into an array, confirm the text strings are the same and the last array key is an int) then a fallback to more generic hashing otherwise.
So this would require a wildcard patterns variable somewhere, could probably store a single array with patterns for all bins - there'll never be many patterns - so that should be small and cheap to fetch. Then we'd be able to fetch one set of actual wildcards per bin on ->get().
Also starting to think that even if we fix this, it might still have issues with the specific case of performance hacks and a very high number of nodes + comments being posted, so may need to refactor that to not use wildcard clears at all still.
Comment #3
jeremy commentedThe "hash" function we currently use is really just grabbing a few chunks of characters from the front of the cid to split up the wildcard directories in a given bin. Reading your feedback, I suspect what we really want is to use a few characters from the front of the cid and a few characters from the end of the cid both -- that should help to better spread out the wildcard directories minimizing both lock contention and memory usage.
As for optimizing the number of gets, I'm still leaning toward doing a single multiget of all possible wildcards, rather than looping through all possible wildcards as we currently do. That will reduce the necessary roundtrips to the memcache server(s) which is the only performance concern I currently have.
Comment #4
jeremy commentedActually, we have to take the characters from the left. We could skip every other, every third, every fourth, whatever -- so long as it's consistent. But we can't take from the left and right side of the cid, as that would make it impossible to find all matching wildcards.
Comment #5
jeremy commentedFor example, here's a patch to my patch that improves the hashing algorithm to better split up the $cid.
Comment #6
jeremy commentedI fixed another bug affecting long cid names, and made the hash length configurable through a variable if desired. In my testing, however, the default size has worked best.
The negative of this patch is that it requires considerably more memcache gets to validate that we've not had a wildcard flush -- the longer the cid, the more gets we have to make.
I started a patch to add multi-get support to minimize the number of added gets, but that will only help if using the memcached pecl extension. At this point, I'd appreciate general reviews of the patch, and am open to better ideas on how to accomplish this.
Note that I renamed 'prefixes' to 'wildcards', as there was already another type of 'prefix' in the memcache module and this was highly confusing when reading the code.
The attached "test.php" script can be run from drush like "drush scr test" and will do a basic regression test.
Comment #7
jeremy commentedThis patch improves over previous versions in several significant ways:
This patch also adds proper multiget support to the standard memcache pecl plugin, in addition to the multiget support that already exists for the memcached pecl plugin.
Comment #8
jeremy commentedPatch committed, slightly modified from above. Also tested with minimum cache lifetime.
Comment #9
jeremy commentedNeeds to be ported to the 6.x branch.
Comment #10
jeremy commentedIn DRUPAL-7-x-1.x-dev, fixed typo, and made it possible to completely disable cid hashing:
http://drupal.org/cvs?commit=411938
Comment #11
catchgot this running drush after merge:
I think this is just because there wasn't a full cache flush. Can add an isset there which will instantly invalidate all items, or maybe always return true if !isset()?
Comment #12
catchComment #13
jeremy commentedIt seems to me that this will force-flush all previously cached content following an upgrade to this version of the memcache module -- is that the goal? If not, perhaps we're looking for something like:
More specifically, $this->wildcard_flushes($cid) will return 0 if there have been no relevant wildcard flushes.
Comment #14
catchI could go either way on this - the variable will be zero both for items which have been wildcard flushed, and those that haven't, so we have no way to know which is which. It might be better to do your patch, then let sites do a cc all if they don't want to take the risk.
Comment #15
jeremy commentedApplied, thanks for reporting the issue.
Marking as to-be-ported again.
Comment #16
jeremy commentedPatch back-ported and applied to 6.x-1.x-dev:
http://drupal.org/cvs?commit=412680