Problem/Motivation
SchemaCache is prone to persistent cache corruption.
protected function resolveCacheMiss($offset) {
$complete_schema = drupal_get_complete_schema();
$value = isset($complete_schema[$offset]) ? $complete_schema[$offset] : NULL;
$this->storage[$offset] = $value;
$this->persist($offset);
return $value;
}
If drupal_get_complete_schema() returns NULL (which is a valid case thats handled in that function) then the bad values are persisted.
This can be reproduced by calling e.g. entity_get_info('node') before DRUPAL_BOOTSTRAP_FULL and e.g. a corrupted module implements cache can lead to that. (see also #496170: module_implements() cache can be polluted by module_invoke_all() being called (in)directly prior to full bootstrap completion)
Proposed resolution
Lets harden this for the most obvious corruption:
diff --git a/includes/bootstrap.inc b/includes/bootstrap.inc
index 0b81dc0..e843997 100644
--- a/includes/bootstrap.inc
+++ b/includes/bootstrap.inc
@@ -2972,17 +2972,34 @@ class SchemaCache extends DrupalCacheArray {
}
/**
+ * Whether the cache should be persisted.
+ */
+ protected $persistable;
+
+ /**
* Overrides DrupalCacheArray::resolveCacheMiss().
*/
protected function resolveCacheMiss($offset) {
$complete_schema = drupal_get_complete_schema();
$value = isset($complete_schema[$offset]) ? $complete_schema[$offset] : NULL;
- $this->storage[$offset] = $value;
- $this->persist($offset);
+ if ($this->persistable() && !empty($complete_schema)) {
+ $this->storage[$offset] = $value;
+ $this->persist($offset);
+ }
return $value;
}
}
+ /**
+ * Whether the cache should be persisted.
+ */
+ function persistable() {
+ // Try until we have reached full bootstrap level.
+ if (empty($this->persistable)) {
+ $this->persistable = drupal_get_bootstrap_phase() == DRUPAL_BOOTSTRAP_FULL;
+ }
+ return $this->persistable;
+ }
+
/**
* Gets the whole database schema.
*
Note: Do not apply this patch to a site lower than Drupal 7.33, because there drupal_get_bootstrap_phase() was just fixed.
Remaining tasks
Contributor tasks needed
| Task |
Novice task? |
Contributor instructions |
Complete? |
| Create a patch |
novice |
Instructions |
(initial patch done) |
| Update the patch to incorporate feedback from reviews (include an interdiff) |
|
Instructions |
|
| Review patch to ensure that it fixes the issue, stays within scope, is properly documented, and follows coding standards |
|
Instructions |
|
User interface changes
API changes
Comments
Comment #1
yesct commentedactive means it does not have a patch.
the dreditor insert tasks template is nice for identifying what the remaining tasks are with links to instructions though.
https://drupal.org/core-mentoring/novice-tasks helps people decide what is a novice task and what is not.
I think the novice task is to make a patch that does the changes in the summary. :) Super.
Comment #2
balagan commentedComment #3
balagan commentedTrying to make a patch for this.
Comment #4
balagan commentedComment #8
balagan commentedAh, function was outside of class.
Comment #9
balagan commentedComment #10
yesct commentedComment #11
MixologicYour favorite drupal website is suffering from this core bug lately.
Would love to see some action on this to keep pagerduty quiet.
Comment #12
dawehnerThis patch, given its similarity with #496170: module_implements() cache can be polluted by module_invoke_all() being called (in)directly prior to full bootstrap completion certainly solves some of the problems you have. One thing I'm wondering though is the following:
drupal_get_complete_schema()contains a static cache as well, which is never reset. Given what I understand about this problem, the static in there could be a problem which is just inherited to SchemaCache. Given that maybedrupal_get_complete_schema()should be invalided much like the module implements cache.Comment #13
drummThis version of #8 cleans ups the logic a bit to be more clear. The behavior should not be changed.
Comment #14
drummAnd this is a clarification of how I originally read the patch.
$persistableis set once, and not modified later.Both patches seek to replace the
empty()with something more explicit, sinceempty(FALSE)is truthy.Comment #15
drummI kicked off a 7.x core test with PHP 5.4 to see if the 5.4 errors were regressions: https://www.drupal.org/pift-ci-job/224331
Apparently this fixes 54 tests, so it seems this is an improvement. We'll test out #13 on Drupal.org to see if it helps out.
Comment #16
drumm#2681197: Intermittent 500 errors hasn't happened again on Drupal.org since deploying this patch, it seems to be working well.
Comment #17
fabianx commentedI have much more information on this bug:
- It has nothing to do with a corrupted module_implements cache as it happens with most current drupal core
- It has to do with a recursive calling of entity_get_info()
- Checking DRUPAL_BOOTSTRAP_FULL is just a side effect of hook_init() being before bootstrap completion.
The easiest way to reproduce this:
- Add module, enable module, remove module from filesystem
- Add rules module
- Ensure that entity_get_info() is called during hook_init()
rules_watchdog gets called during the rebuild of entity_get_info() because a module is missing from the filesystem, which calls entity_get_info() again, which then calls drupal_complete_schema() again, which then returns the incomplete schema.
AND THAT EXPLAINS IT:
drupal_get_complete_schema() is not robust against recursion.
Comment #18
fabianx commentedentity_get_info() has the exact same problem, which checks for empty() and populates potentially incomplete or unaltered data ...
Comment #19
oadaeh commentedI've been tasked with testing this, and so I'm trying to understand what is going on, but Fabian's last two comments are throwing me off, as drumm seems to have RTBCd the issue (even though he hasn't), but Fabian seems to be saying CNW.
So, Fabian, for my clarification, what are you implying by your comments, as I'm not sure?
Are you simply confirming that the patch addresses the issue with your updated information, or are you saying that the patch needs to be modified in some way, or are you saying something else entirely?
Thanks for the helping me out here.
Comment #20
drummFabianx’s comment seems reasonable to me.
drupal_get_complete_schema()should detect when it is called from within itself, and handle that situation. (Or avoid recursion completely.)I’m not sure about the implementation. Checking for recursion could be done with a static variable, or better checking of the existing static variables. I don’t know what the best way to handle it is.
Comment #21
fabianx commented#19: drumms patch is better than no patch, but I am saying "It only masks the real issue in most cases and works because hook_init() is called."
I am however (without debugging) not sure in what issue d.org runs - obviously.
That said: However it might be a good idea to add a static $count variable to drupal_get_complete_schema and put a backtrace to some file or error_log on staging / dev like:
It is _very_ important to not use e.g. watchdog to debug this, but rather 'plain metal' tools, because watchdog in my case was what was causing the issue (via rules_watchdog).
That way you can debug when / if this happens. Obviously you could also add monitoring to the condition drumm's patch solves and report that _somewhere_ (tm) to see if that is caused by recursive calling, too. (I bet it is).
Oh and to increase chances of this happening use for dev / staging debugging the 'devnull' cache backend, so that all caches are empty always. That increased the chances in my case to reproduce (and would fit that some cache expires every 24 hours).
TL;DR:
It can be that d.orgs issue is different from what I experienced both on the original client site I made the patch for and for the last site I investigated the problem on. Debugging can tell and the above explains some ways how to debug.
Comment #22
David_Rothstein commentedHm, so the question is what should these functions (drupal_get_complete_schema()/entity_get_info()) do in the case of recursion?
If they just powered through and tried to do a second rebuild, that would probably work for the particular issue mentioned above (since watchdog() itself has anti-recursion protection of its own) but it's a little scary to do that in case there is some other potential infinite recursion loop out there that doesn't involve watchdog()...
Comment #23
damienmckennaFYI I've seen some of the Commerce modules run into infinite recursion problems via entity_get_info(), though I can't find issues for them off-hand.
Comment #24
drummWhat tipped us off to this on Drupal.org was og_theme, which we use to make https://www.drupal.org/8 use a different theme.
og_theme implements
og_theme_custom_theme(), which is called relatively early in bootstrapping. That gets intoog_context()which loads nodes and usesentity_metadata_wrapper()We haven't run into the SchemaCache-related opcache corruption since deploying this patch. We have run into other things that smell like opcache corruption. We could have had two problems colluding.
Comment #25
damienmckennaSeeing is hook_custom_theme was mentioned, might #2086335: Bootstrap issue with theme initialization causes hook_custom_theme() never to be invoked be related?
Comment #26
drummAs far as I know,
og_theme_custom_theme()has been consistently invoked for us.Comment #27
fabianx commentedComment #28
stephencamilo commentedComment #29
hestenetReset issue status.
Comment #30
avpadernoThe premises of this issue are that
drupal_get_complete_schema()can returnNULL, but for what I can see, that never happens. It can eventually return an empty array, since It even casts to an array the value returned fromhook_schema().drupal_get_complete_schema()should not return an empty array, with the System module that is essential for Drupal, and which defines the schema for at least 26 database tables.The code should not cache a not complete schema, which means waiting to cache the value returned from
drupal_get_complete_schema()until the bootstrap phase isDRUPAL_BOOTSTRAP_FULL.