Problem/Motivation

As pointed out in #2864718: Warn developers about the risks of writing code running in a non-fully bootstrapped environment, using APIs that involve loading elements from cache can be problematic when bootstrap level is lower than full bootstrap. Tracking all dependencies and keep them declared in the callback definition may not always be easy to do or even practical. However raising the bootstrap level to full basically cancels all the performance gains this module brings.

Proposed resolution

Implement a custom cache layer on top of the actually configured cache layer performing a full bootstrap only on cache misses.

Remaining tasks

  • Validate the solution
  • Write a patch
  • Reviews

User interface changes

None

API changes

  • Add a new cache property in callback info that is enabled by default.

Data model changes

None

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

plach created an issue. See original summary.

plach’s picture

Issue summary: View changes
Status: Active » Needs review
FileSize
5.93 KB
markhalliwell’s picture

Status: Needs review » Needs work
+++ b/includes/callback.inc
@@ -181,6 +181,12 @@ function js_callback_bootstrap(array &$info) {
+    // At this point in the execution flow it is safe to perform a full
+    // bootstrap in case of cache misses.
+    if (!empty($GLOBALS['conf']['js_cache'])) {
+      JsCache::setFullBootstrap(TRUE);
+    }

I'd rather this be detected per callback (info definition) rather than globally.

Not all callbacks will need this.

markhalliwell’s picture

  1. +++ b/includes/cache.inc
    @@ -0,0 +1,160 @@
    +  public function get($cid) {
    +    $cache = $this->backend->get($cid);
    +    if (!$cache) {
    +      $this->doFullBootstrap();
    +    }
    +    return $cache;
    +  }
    

    Shouldn't this retry getting the cache after it's fully bootstrapped?

  2. +++ b/includes/cache.inc
    @@ -0,0 +1,160 @@
    +  public function getMultiple(&$cids) {
    +    $cache = $this->backend->getMultiple($cids);
    +    if ($cids) {
    +      $this->doFullBootstrap();
    +    }
    +    return $cache;
    +  }
    

    Shouldn't this only bootstrap if !array_filter($cids)?

    And also retry getting the cached data.

plach’s picture

Status: Needs work » Needs review

I'd rather this be detected per callback (info definition) rather than globally. Not all callbacks will need this.

Yeah, sorry, we talked about this when we discussed the read-only cache approach, forgot about that. One clarification: would you get rid of the global toggle and make this opt-in per callback or would we retain the global opt-in switch and make this opt-out per callback? The former approach implies probably a better DX, but would force JS to always initialize the custom cache layer.

Shouldn't this retry getting the cache after it's fully bootstrapped?

I'm not sure about that: in the case of a node load, for instance, a full bootstrap would not change anything in the entity cache status, another get would just mean another miss. I can imagine scenarios where a full bootstrap could populate the cache entries being fetched, but I don't think that's the most common case.

Shouldn't this only bootstrap if !array_filter($cids)?

According to the PHP doc, DrupalCacheInterface::getMultiple(&$cids) is expected to remove from $cids all ids that were fetched, so if all items were fetched from cache $cids will actually be empty at that point.

plach’s picture

Status: Needs review » Needs work
markhalliwell’s picture

Issue summary: View changes
Issue tags: +Needs change record

One clarification: would you get rid of the global toggle and make this opt-in per callback or would we retain the global opt-in switch and make this opt-out per callback?

Get rid of the global $conf toggle. There's absolutely no need for it either way and just complicates things IMO.

The former approach implies probably a better DX, but would force JS to always initialize the custom cache layer.

Not necessarily.

Originally was imagining that this property could be configurable so a callback could in specify whatever class they wanted.

However the more I look at the code, JsCache is really just a wrapper around whatever backend cache implementation that's currently set in $conf['cache_default_class'].

This means that a developer can override this class at that level.

Thus, it kind of begs the question: does there need to be any opt-in/opt-out/callback info property in the first place?

This is really just a way to intercept missed cache hits and then just do a full bootstrap.

Which means that it's only ever fully bootstrapped if absolutely necessary (based on callback code).

I'm now thinking that this should just be a cache property in the callback that is "enabled" by default.

Then we just update the API docs to mention that if their callback execute code that potentially interacts with the Cache API (and misses), a full bootstrap is necessary and will be done on the callback's behalf to prevent data loss.

The callback definition info is retrieved before the callback is ever executed which is where a callback would modify the DB and run into this problem.

This custom cache layer should then be implemented in js_callback_bootstrap() based on whether cache is set in the callback info.

This will allow callbacks to opt-out if they truly don't need/want this level of "interference". I doubt it really matters either way, but I'm thinking more for existing callback implementations/BC sake than anything.

Thus, this issue will need a change record as well.

plach’s picture

Status: Needs work » Needs review
FileSize
3.28 KB
6.94 KB

Originally was imagining that this property could be configurable so a callback could in specify whatever class they wanted.

I originally tried to implement this, but it's not possible as cache handlers are initialized on demand and cannot be reset, by design, to avoid inconsistencies inside the same request. Unfortunately this may happen way earlier than the point where we retrieve the callback info, so the JS cache needs to be initialized as one of the first steps.

This is really just a way to intercept missed cache hits and then just do a full bootstrap.

Yep, I found that this works better than skipping cache writes, as I originally proposed.

If this is ok I will work on a change record draft.

plach’s picture

FileSize
6.94 KB
448 bytes
+++ b/includes/cache.inc
@@ -0,0 +1,160 @@
+  // Finally ensure our custom wrappers now which actual cache backend they are

typo

markhalliwell’s picture

Status: Needs review » Needs work

Overall this is looking fantastic! Thank you for putting all this work into this issue :D

I think all that's really left is just some clean-up, renames and a small bit of reorganization to help with readability (sorry for the re-review, but seem to be looking at it with some fresh eyes today):

  1. +++ b/includes/cache.inc
    @@ -0,0 +1,160 @@
    +function js_cache_initialize() {
    

    I think this should be renamed to _js_initialize_cache() as it's really just an internal function.

    Also, let's just move this to the main module file. I really don't like mixing procedural code with classes.

    It can then use module_load_include() inside it as well.

  2. +++ b/includes/cache.inc
    @@ -0,0 +1,160 @@
    +class JsCache implements DrupalCacheInterface {
    

    I'm thinking this should probably be named this something a little more explicit so it defines what it actually does. As it stands now, it kind of appears that it's in charge of handling cache requests, not intercepting and then proxying them.

    Something like JsProxyCacheBootstrap maybe?

  3. +++ b/includes/callback.inc
    @@ -181,6 +181,12 @@ function js_callback_bootstrap(array &$info) {
    +    // At this point in the execution flow it is safe to perform a full
    +    // bootstrap in case of cache misses.
    +    if ($info['cache']) {
    +      JsCache::setFullBootstrap(TRUE);
    +    }
    

    This comment is a bit misleading. It's not actually performing a full bootstrap, just flagging that it should if a cache hit misses.

  4. +++ b/js.api.php
    @@ -53,9 +53,12 @@
    + *     not refreshed meanwhile. By default JS will work around the corrupt cache
    + *     issue automatically by performing a full bootstrap whenever a cache miss
    + *     is detected, however the general issue with APIs performing storage
    + *     writes remains. A possible solution is raising the bootstrap level to
    + *     full, although this defeats the purpose of using this module. An
    + *     alternative solution is monitoring the code paths triggered by the
    

    The usage of "corrupt cache" makes it seem like there's something wrong with JS Callback.

    This is also starting to feel like a huge run on sentence. I think this entire block needs some paragraph breaks and a bit of rewording. Thoughts?

    Note: by default JS Callback intercepts requests via core's Cache API. When a cache miss is detected, it will automatically perform a full bootstrap in an attempt to ensure all necessary modules involved with cache are loaded.
    
    This does not, however, solve the issue where a complex callback stores via the Cache API,
     which is commonly done during entity saves.
    
    A temporary solution, in cases like this, is to raise the bootstrap level to full. However, this defeats the entire purpose of using this module.
    
    A more permanent solution is to monitor the execution path of a callback via the "xhprof" integration and ensure all required dependencies are added to the callback info.
    
  5. +++ b/js.api.php
    @@ -111,6 +114,8 @@
    + *   - cache: (optional) Flag indicating whether a full bootstrap should be
    + *     performed when detecting a cache miss. Defaults to TRUE.
    

    Move up in the docblock (alphabetized). Maybe also add a See "bootstrap" property for more info. bit and vice versa.

  6. +++ b/js.module
    @@ -488,8 +488,12 @@ function js_execute_request() {
    +  $memcache = !empty($conf['cache_default_class']) && $conf['cache_default_class'] === 'MemCacheDrupal';
    ...
       // Memcache requires an additional bootstrap phase to access variables.
    -  if (!empty($conf['cache_default_class']) && $conf['cache_default_class'] === 'MemCacheDrupal') {
    +  if ($memcache) {
         js_bootstrap(DRUPAL_BOOTSTRAP_VARIABLES);
       }
    

    Since this is now always intercepting the backend cache, I think it makes sense to move this into _js_initialize_cache().

  7. +++ b/js.module
    @@ -800,6 +804,7 @@ function js_get_callback($module = NULL, $callback = NULL, $reset = FALSE) {
    +            'cache' => TRUE,
    

    Move up in the array (alphabetized).

markhalliwell’s picture

Also:

--- /dev/null
+++ b/includes/cache.inc

In my mind, procedural stuff goes in ./includes/*.inc files.

I know we're not actually using autoloading, but I'd feel a little more sane if this followed PSR-4 standards and was actually named after the class ./src/JsProxyCacheBootstrap.php.

plach’s picture

Status: Needs work » Needs review

@markcarver:

Thanks for the review, all the suggestions make sense to me! Just one clarification:

I know we're not actually using autoloading, but I'd feel a little more sane if this followed PSR-4 standards and was actually named after the class ./src/JsProxyCacheBootstrap.php.

Ok, but we are still explicitly including it, right? Autoloading it may imply a DB bootstrap.

plach’s picture

Status: Needs review » Needs work

aw

markhalliwell’s picture

Ok, but we are still explicitly including it, right? Autoloading it may imply a DB bootstrap.

Correct, it should manually include the class in _js_initialize_cache() using module_load_include().

I would just like to keep classes separate from the rest of the code (e.g. in a separate src folder and named after the class name).

plach’s picture

Status: Needs work » Needs review
FileSize
8.08 KB
9.35 KB

This should implement all the suggestions above. Instead of JsProxyCacheBootstrap I went for the simpler JsProxyCache, because core cache handler class names end with the Cache suffix. I also thought that what we do with bootstrap is an internal implementation detail that should not "leak" into the class name: if in the future we were to add other custom logic in there, the name would suddenly be misleading or at least inconsistent.

I also slightly changed the suggested doc blocks as the issue with storage writes is not only cache-related so I tried to make that as clear as I could.

plach’s picture

FileSize
143 bytes
8.08 KB

Sorry, the previous patch contained a wrongly named file.

plach’s picture

FileSize
9.89 KB
1.9 KB

While reviewing my own code I realized that with the previous patches hook_init() implementations could be invoked twice, if a full bootstrap happened after js_callback_bootstrap() had run. I didn't notice that while testing this locally, because I have init hooks disabled by default. The attached patch seems to address this issue nicely. I performed various tests with cold caches, caches primed by the callback itself and warm caches and they all worked correctly.

markhalliwell’s picture

Status: Needs review » Needs work
  1. +++ b/js.module
    @@ -88,11 +88,28 @@ function js_preprocess_html() {
    +  // During a full bootstrap, hook_custom_theme() is invoked just before
    +  // hook_init(). Here we can make sure that hook_init() implementations
    +  // provided by callback dependencies are not invoked twice, in case a full
    +  // bootstrap is performed after invoking them in js_callback_bootstrap().
    +  if (!empty($_js['module']) && !empty($_js['callback'])) {
    +    $info = js_get_callback($_js['module'], $_js['callback']);
    +    if (!$info['skip init'] && $info['dependencies']) {
    +      $implementations = &drupal_static('module_implements');
    +      // After a global cache clear, hook implementation info is not available
    +      // yet, so we explicitly rebuild it.
    +      if (!isset($implementations['init'])) {
    +        module_implements('init');
    +      }
    +      $implementations['init'] = array_diff_key($implementations['init'], array_flip($info['dependencies']));
    +    }
    +  }
    

    This feels, at best, like a premature optimization and, at worst, a hack. Obviously something isn't right if init hooks are being invoked twice, but I think it could be solved a different way perhaps? In any case, I think this should be a separate issue entirely.

    Also, FYI, JS Callback already does a full bootstrap if there is no callback info cached: http://cgit.drupalcode.org/js/tree/js.module?h=7.x-2.x#n770

  2. +++ b/js.module
    @@ -605,6 +620,41 @@ function js_execute_request() {
    +  // We do not rely on autoloading as it may trigger a DB bootstrap.
    

    JS Callback already has a minimum of a DRUPAL_DATABASE bootstrap level since it's required to utilize all the module_* functions: http://cgit.drupalcode.org/js/tree/js.php?h=7.x-2.x#n24

    This is more of a "we already know exactly where it is and don't want to involve more over head" kind of thing.

    That being said, I suppose module_load_include and autoloading are essentially the same thing in this case. Since that's the case, would it make sense to just register the file[] in the module's .info? My only concern with autoloading is that it may not work with any custom autoloading implementations.

  3. +++ b/js.module
    @@ -636,6 +686,12 @@ function js_deliver($result, array $info = NULL) {
    +  // Since most callbacks act at a bootstrap level lower than full, that is
    +  // without loading all modules, we need to make sure hook implementation cache
    +  // is never written while processing a JS request.
    +  $implementations = &drupal_static('module_implements');
    +  unset($implementations['#write_cache']);
    

    I'm a little unsure why this is even added.

    module_implements_write_cache() is only ever invoked from drupal_page_footer() which is never invoked anywhere in JS Callback.

    Is it a preventative measure? In cases where a callback may manually invoke it?

    Since this was introduced along with the hook_custom_theme code above, I think maybe this too should be tabled to a separate issue. Clearly how cached data is handled in regards to module_implements() needs a little closer look.

  4. +++ b/src/JsProxyCache.php
    @@ -0,0 +1,128 @@
    +      throw new LogicException('The JS cache handler was not properly configured');
    

    I meant to ask this before, but why is this throwing an error here? Or rather, what are the exact steps in which this error would be thrown?

    I'm not a huge fan of throwing errors in callbacks, despite recent error handling improvements that were committed.

    From what I can tell, it would seem that this would only ever happen if a user unset the default bin key from $conf in their settings.php file.

    In which case, that indicates that this is really a developer error rather than a JS Callback issue.

    The message provided doesn't really explain this very well or how to "solve" it.

plach’s picture

Status: Needs work » Needs review
FileSize
1.05 KB
10.04 KB

Discussed this with Mark: we're fine with keeping the hook_init() fix (#18.1 and #18.3) as part of this patch, even if it could be addressed separately, since it is a soft dependency for this issue.

This addresses #18.2 and #18.4 as we discussed: I debugged the autoloading process and it actually initializes caches if the class was not loaded by another handler, so one more reason to avoid it.

plach’s picture

I just created a draft change record at https://www.drupal.org/node/2871046

plach’s picture

Issue tags: -Needs change record

  • markcarver committed 628a1cf on 7.x-2.x authored by plach
    Issue #2866759 by plach, markcarver: Add a custom cache layer to avoid...
markhalliwell’s picture

Assigned: plach » Unassigned
Status: Needs review » Fixed

Awesome! Thanks @plach :D

I went ahead and published the change record. I'll publish a release here shortly.

markhalliwell’s picture

plach’s picture

Awesome, thanks!

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.