Note that I don't mark problems "critical" lightly. We found this problem when a deployment caused server load to spike to 1000% capacity. We logged into the db server and observed a "menu_rebuild" locking everything up in the semaphore table. Here's a newrelic visualization of what happened:

The spike in the middle begins when this module was deployed and abruptly ends after we commented out the call to views_invalidate_cache in this function (which runs on every cache_get, cache_set call).
<?php
/**
* Retrieve or generate the cache id to schema mapping.
*
* We store the schema mapping in the main cache table/bin, this means that it
* can get invalidated quite quickly, but this will also probably coincide with
* the views cache being flushed, so we're are just wasting a few CPU cycles in
* reality.
*
* @param boolean $reset
* Reset the static and DB cache for the schema mapping.
*
* @return array
* An array where each key is a plugin key ID and each value is the
* corresponding database column.
*/
function views_content_cache_id_schema($reset = FALSE) {
static $map;
if (!isset($map) || $reset) {
$cache = cache_get('views_content_cache_id_schema');
if (!$reset && !empty($cache->data)) {
$map = $cache->data;
}
else {
$cache_keys = array_keys(views_content_cache_get_plugin());
$i = 1;
foreach ($cache_keys as $key_id) {
// Schema is limited to 8.
if ($i > 8) {
break;
}
$map[$key_id] = "c{$i}";
$i++;
}
// If the newly generated map and the prior map do not match invalidate
// all cache update records.
if (empty($cache->data) || ($map != $cache->data)) {
db_truncate('views_content_cache')->execute();
// This is probably too aggressive. @TODO: See if we can surgically
// invalidate only views that use VCC.
views_invalidate_cache();
}
cache_set('views_content_cache_id_schema', $map, 'cache');
}
}
return $map;
}
?>
Generally, my immediate suggestion is to not call views_invalidate_cache. Here's what that function does:
function views_invalidate_cache() {
// Clear the views cache.
cache_clear_all('*', 'cache_views', TRUE);
// Clear the page and block cache.
cache_clear_all();
// Set the menu as needed to be rebuilt.
variable_set('menu_rebuild_needed', TRUE);
// Allow modules to respond to the Views cache being cleared.
module_invoke_all('views_invalidate_cache');
Is there a reason we need to rebuild menus, clear variable/block/page caches in order to retrieve a proper $cid? That's not a rhetorical question... I'm an idiot and am afraid I'm missing something.
If we just called the following instead, the website would have survived easily:
cache_clear_all('*', 'cache_views', TRUE);
module_invoke_all('views_invalidate_cache');
That aside, the root cause of the issue was this check (in views_content_cache_id_schema) failing unexpectedly under certain circumstances:
if (empty($cache->data) || ($map != $cache->data)) {
// ...nuke all the caches!
We have three theories for why this check failed so catastrophically. All theories are largely unverified at this point.
1. Storing caches in multiple memcache instances under high concurrency. (this one seems most plausible as we never were able to reproduce the behavior locally, or see it in XhProf).
2. Domain access (this site has hundreds of domains)
3. i18n (this site has hundreds of languages)
I was planning to work on putting together a fix on Monday. Any thoughts on not clearing block/page/variable cache when the " if (empty($cache->data) || ($map != $cache->data)) {" check fails? I think the root cause is much less critical than its devastating effect.
| Comment | File | Size | Author |
|---|---|---|---|
| #8 | 2555925-views_content_cache-schemacache-8.patch | 895 bytes | Jorrit |
| #2 | 2555925-2-cache_get_requests_trigger_menu_rebuild.patch | 2.35 KB | grndlvl |
| Screen Shot 2015-08-22 at 5.48.36 PM.png | 70.67 KB | Nick Lewis |
Comments
Comment #2
grndlvl commentedSo we looked it over a bit and could not figure out why this is being stored in the cache system. Maybe it's there to detect a cache clear? Either way we feel that the work being done here is small enough to degrade the caching down to a static variable cache. But for extra flexibility we went ahead and used the advanced drupal_static() fast pattern.
Thoughts?
Comment #3
grndlvl commentedComment #5
grndlvl commentedHmm, I ran the tests locally and this one test does indeed fail. However, the output of the function that was changed is returning the same data as expected with out the patch applied. It would appear that the changes of the additional cache clears is desired for this comment check? Though I am unclear what the check is accomplishing and why it expects such a side effect.
Comment #6
Jorrit commentedI think the test fails because previously the views_content_cache table was cleared by
views_content_cache_id_schema(). The test retrieves a row that was added by another test.What the test accomplishes is that the rows in views_content_cache are cleared when the map changes. This no longer occurs when the patch is applied.
Comment #7
Jorrit commentedPerhaps the function
views_content_cache_id_schema()could useviews_cache_get()/views_cache_set(). In this way, the cached map gets cleared less often andviews_invalidate_cache()is invoked fewer times.Comment #8
Jorrit commentedPlease see the attached patch.