Problem/Motivation

SchemaCache is prone to persistent cache corruption.

protected function resolveCacheMiss($offset) {
    $complete_schema = drupal_get_complete_schema();
    $value = isset($complete_schema[$offset]) ? $complete_schema[$offset] : NULL;
    $this->storage[$offset] = $value;
    $this->persist($offset);
    return $value;
  }

If drupal_get_complete_schema() returns NULL (which is a valid case thats handled in that function) then the bad values are persisted.

This can be reproduced by calling e.g. entity_get_info('node') before DRUPAL_BOOTSTRAP_FULL and e.g. a corrupted module implements cache can lead to that. (see also #496170: module_implements() cache can be polluted by module_invoke_all() being called (in)directly prior to full bootstrap completion)

Proposed resolution

Lets harden this for the most obvious corruption:

diff --git a/includes/bootstrap.inc b/includes/bootstrap.inc
index 0b81dc0..e843997 100644
--- a/includes/bootstrap.inc
+++ b/includes/bootstrap.inc
@@ -2972,17 +2972,34 @@ class SchemaCache extends DrupalCacheArray {
   }
 
   /**
+   * Whether the cache should be persisted.
+   */
+  protected $persistable;
+
+  /**
    * Overrides DrupalCacheArray::resolveCacheMiss().
    */
   protected function resolveCacheMiss($offset) {
     $complete_schema = drupal_get_complete_schema();
     $value = isset($complete_schema[$offset]) ? $complete_schema[$offset] :  NULL;
-    $this->storage[$offset] = $value;
-    $this->persist($offset);
+    if ($this->persistable() && !empty($complete_schema)) {
+      $this->storage[$offset] = $value;
+      $this->persist($offset);
+    }
     return $value;
   }
 }
 
+  /**
+   * Whether the cache should be persisted.
+   */
+  function persistable() {
+    // Try until we have reached full bootstrap level.
+    if (empty($this->persistable)) {
+      $this->persistable = drupal_get_bootstrap_phase() == DRUPAL_BOOTSTRAP_FULL;
+    }
+    return $this->persistable;
+  }
+
 /**
  * Gets the whole database schema.
  *

Note: Do not apply this patch to a site lower than Drupal 7.33, because there drupal_get_bootstrap_phase() was just fixed.

Remaining tasks

Contributor tasks needed
Task Novice task? Contributor instructions Complete?
Create a patch novice Instructions (initial patch done)
Update the patch to incorporate feedback from reviews (include an interdiff) Instructions
Review patch to ensure that it fixes the issue, stays within scope, is properly documented, and follows coding standards Instructions

User interface changes

API changes

Comments

yesct’s picture

Issue summary: View changes
Issue tags: -needs patch

active means it does not have a patch.

the dreditor insert tasks template is nice for identifying what the remaining tasks are with links to instructions though.

https://drupal.org/core-mentoring/novice-tasks helps people decide what is a novice task and what is not.

I think the novice task is to make a patch that does the changes in the summary. :) Super.

balagan’s picture

Assigned: Unassigned » balagan
balagan’s picture

StatusFileSize
new1.32 KB

Trying to make a patch for this.

balagan’s picture

Assigned: balagan » Unassigned
Status: Active » Needs review

Status: Needs review » Needs work

The last submitted patch, 3: SchemaCache-corruption-2450533.patch, failed testing.

Status: Needs work » Needs review

Status: Needs review » Needs work

The last submitted patch, 3: SchemaCache-corruption-2450533.patch, failed testing.

balagan’s picture

StatusFileSize
new1.27 KB

Ah, function was outside of class.

balagan’s picture

Status: Needs work » Needs review
yesct’s picture

Issue summary: View changes
Issue tags: -Novice
Mixologic’s picture

Issue tags: +affects drupal.org

Your favorite drupal website is suffering from this core bug lately.

Would love to see some action on this to keep pagerduty quiet.

dawehner’s picture

This patch, given its similarity with #496170: module_implements() cache can be polluted by module_invoke_all() being called (in)directly prior to full bootstrap completion certainly solves some of the problems you have. One thing I'm wondering though is the following:
drupal_get_complete_schema() contains a static cache as well, which is never reset. Given what I understand about this problem, the static in there could be a problem which is just inherited to SchemaCache. Given that maybe drupal_get_complete_schema() should be invalided much like the module implements cache.

drumm’s picture

StatusFileSize
new1.36 KB

This version of #8 cleans ups the logic a bit to be more clear. The behavior should not be changed.

drumm’s picture

StatusFileSize
new1.36 KB

And this is a clarification of how I originally read the patch. $persistable is set once, and not modified later.

Both patches seek to replace the empty() with something more explicit, since empty(FALSE) is truthy.

drumm’s picture

I kicked off a 7.x core test with PHP 5.4 to see if the 5.4 errors were regressions: https://www.drupal.org/pift-ci-job/224331

Apparently this fixes 54 tests, so it seems this is an improvement. We'll test out #13 on Drupal.org to see if it helps out.

drumm’s picture

#2681197: Intermittent 500 errors hasn't happened again on Drupal.org since deploying this patch, it seems to be working well.

fabianx’s picture

Title: SchemaCache is prone to persistent cache corruption » SchemaCache is prone to persistent cache corruption / drupal_get_complete_schema() is not robust against recursion

I have much more information on this bug:

- It has nothing to do with a corrupted module_implements cache as it happens with most current drupal core
- It has to do with a recursive calling of entity_get_info()
- Checking DRUPAL_BOOTSTRAP_FULL is just a side effect of hook_init() being before bootstrap completion.

The easiest way to reproduce this:

- Add module, enable module, remove module from filesystem
- Add rules module
- Ensure that entity_get_info() is called during hook_init()

rules_watchdog gets called during the rebuild of entity_get_info() because a module is missing from the filesystem, which calls entity_get_info() again, which then calls drupal_complete_schema() again, which then returns the incomplete schema.

AND THAT EXPLAINS IT:


static $schema;

if (!empty($schema) || $rebuild) {
  foreach (module_implements('schema') as $module) {
    $schema[$key] = module_invoke($module, $schema);
  }
}

return $schema;

drupal_get_complete_schema() is not robust against recursion.

fabianx’s picture

entity_get_info() has the exact same problem, which checks for empty() and populates potentially incomplete or unaltered data ...

oadaeh’s picture

I've been tasked with testing this, and so I'm trying to understand what is going on, but Fabian's last two comments are throwing me off, as drumm seems to have RTBCd the issue (even though he hasn't), but Fabian seems to be saying CNW.

So, Fabian, for my clarification, what are you implying by your comments, as I'm not sure?

Are you simply confirming that the patch addresses the issue with your updated information, or are you saying that the patch needs to be modified in some way, or are you saying something else entirely?

Thanks for the helping me out here.

drumm’s picture

Fabianx’s comment seems reasonable to me. drupal_get_complete_schema() should detect when it is called from within itself, and handle that situation. (Or avoid recursion completely.)

I’m not sure about the implementation. Checking for recursion could be done with a static variable, or better checking of the existing static variables. I don’t know what the best way to handle it is.

fabianx’s picture

#19: drumms patch is better than no patch, but I am saying "It only masks the real issue in most cases and works because hook_init() is called."

I am however (without debugging) not sure in what issue d.org runs - obviously.

That said: However it might be a good idea to add a static $count variable to drupal_get_complete_schema and put a backtrace to some file or error_log on staging / dev like:


function ... () {
  static $count = 0;

  if ($count > 0) {
    $e= new \Exception();
    error_log('Recursion detected:' . $e->getTraceAsString(), 3, '/tmp/mytmp-log.txt');
  }

  $count++;

  // function code

  $count--;

  return $info;

It is _very_ important to not use e.g. watchdog to debug this, but rather 'plain metal' tools, because watchdog in my case was what was causing the issue (via rules_watchdog).

That way you can debug when / if this happens. Obviously you could also add monitoring to the condition drumm's patch solves and report that _somewhere_ (tm) to see if that is caused by recursive calling, too. (I bet it is).

Oh and to increase chances of this happening use for dev / staging debugging the 'devnull' cache backend, so that all caches are empty always. That increased the chances in my case to reproduce (and would fit that some cache expires every 24 hours).

TL;DR:

It can be that d.orgs issue is different from what I experienced both on the original client site I made the patch for and for the last site I investigated the problem on. Debugging can tell and the above explains some ways how to debug.

David_Rothstein’s picture

Hm, so the question is what should these functions (drupal_get_complete_schema()/entity_get_info()) do in the case of recursion?

If they just powered through and tried to do a second rebuild, that would probably work for the particular issue mentioned above (since watchdog() itself has anti-recursion protection of its own) but it's a little scary to do that in case there is some other potential infinite recursion loop out there that doesn't involve watchdog()...

damienmckenna’s picture

FYI I've seen some of the Commerce modules run into infinite recursion problems via entity_get_info(), though I can't find issues for them off-hand.

drumm’s picture

What tipped us off to this on Drupal.org was og_theme, which we use to make https://www.drupal.org/8 use a different theme.

og_theme implements og_theme_custom_theme(), which is called relatively early in bootstrapping. That gets into og_context() which loads nodes and uses entity_metadata_wrapper()

We haven't run into the SchemaCache-related opcache corruption since deploying this patch. We have run into other things that smell like opcache corruption. We could have had two problems colluding.

damienmckenna’s picture

drumm’s picture

As far as I know, og_theme_custom_theme() has been consistently invoked for us.

fabianx’s picture

stephencamilo’s picture

Status: Needs review » Closed (won't fix)
hestenet’s picture

Status: Closed (won't fix) » Needs review

Reset issue status.

avpaderno’s picture

The premises of this issue are that drupal_get_complete_schema() can return NULL, but for what I can see, that never happens. It can eventually return an empty array, since It even casts to an array the value returned from hook_schema().

// Cast the result of hook_schema() to an array, as a NULL return value
// would cause array_merge() to set the $schema variable to NULL as well.
// That would break modules which use $schema further down the line.
$current = (array) module_invoke($module, 'schema');

drupal_get_complete_schema() should not return an empty array, with the System module that is essential for Drupal, and which defines the schema for at least 26 database tables.
The code should not cache a not complete schema, which means waiting to cache the value returned from drupal_get_complete_schema() until the bootstrap phase is DRUPAL_BOOTSTRAP_FULL.

Status: Needs review » Closed (outdated)

Automatically closed because Drupal 7 security and bugfix support has ended as of 5 January 2025. If the issue verifiably applies to later versions, please reopen with details and update the version.