Problem/Motivation
As pointed out in #2864718: Warn developers about the risks of writing code running in a non-fully bootstrapped environment, using APIs that involve loading elements from cache can be problematic when bootstrap level is lower than full bootstrap. Tracking all dependencies and keep them declared in the callback definition may not always be easy to do or even practical. However raising the bootstrap level to full basically cancels all the performance gains this module brings.
Proposed resolution
Implement a custom cache layer on top of the actually configured cache layer performing a full bootstrap only on cache misses.
Remaining tasks
- Validate the solution
Write a patch- Reviews
User interface changes
None
API changes
- Add a new
cache
property in callback info that is enabled by default.
Data model changes
None
Comment | File | Size | Author |
---|---|---|---|
#19 | js-cache-2866759-19.patch | 10.04 KB | plach |
Comments
Comment #2
plachComment #3
markhalliwellI'd rather this be detected per callback (info definition) rather than globally.
Not all callbacks will need this.
Comment #4
markhalliwellShouldn't this retry getting the cache after it's fully bootstrapped?
Shouldn't this only bootstrap if
!array_filter($cids)
?And also retry getting the cached data.
Comment #5
plachYeah, sorry, we talked about this when we discussed the read-only cache approach, forgot about that. One clarification: would you get rid of the global toggle and make this opt-in per callback or would we retain the global opt-in switch and make this opt-out per callback? The former approach implies probably a better DX, but would force JS to always initialize the custom cache layer.
I'm not sure about that: in the case of a node load, for instance, a full bootstrap would not change anything in the entity cache status, another get would just mean another miss. I can imagine scenarios where a full bootstrap could populate the cache entries being fetched, but I don't think that's the most common case.
According to the PHP doc,
DrupalCacheInterface::getMultiple(&$cids)
is expected to remove from$cids
all ids that were fetched, so if all items were fetched from cache$cids
will actually be empty at that point.Comment #6
plachComment #7
markhalliwellGet rid of the global
$conf
toggle. There's absolutely no need for it either way and just complicates things IMO.Not necessarily.
Originally was imagining that this property could be configurable so a callback could in specify whatever class they wanted.
However the more I look at the code,
JsCache
is really just a wrapper around whatever backend cache implementation that's currently set in$conf['cache_default_class']
.This means that a developer can override this class at that level.
Thus, it kind of begs the question: does there need to be any opt-in/opt-out/callback info property in the first place?
This is really just a way to intercept missed cache hits and then just do a full bootstrap.
Which means that it's only ever fully bootstrapped if absolutely necessary (based on callback code).
I'm now thinking that this should just be a
cache
property in the callback that is "enabled" by default.Then we just update the API docs to mention that if their callback execute code that potentially interacts with the Cache API (and misses), a full bootstrap is necessary and will be done on the callback's behalf to prevent data loss.
The callback definition info is retrieved before the callback is ever executed which is where a callback would modify the DB and run into this problem.
This custom cache layer should then be implemented in js_callback_bootstrap() based on whether
cache
is set in the callback info.This will allow callbacks to opt-out if they truly don't need/want this level of "interference". I doubt it really matters either way, but I'm thinking more for existing callback implementations/BC sake than anything.
Thus, this issue will need a change record as well.
Comment #8
plachI originally tried to implement this, but it's not possible as cache handlers are initialized on demand and cannot be reset, by design, to avoid inconsistencies inside the same request. Unfortunately this may happen way earlier than the point where we retrieve the callback info, so the JS cache needs to be initialized as one of the first steps.
Yep, I found that this works better than skipping cache writes, as I originally proposed.
If this is ok I will work on a change record draft.
Comment #9
plachtypo
Comment #10
markhalliwellOverall this is looking fantastic! Thank you for putting all this work into this issue :D
I think all that's really left is just some clean-up, renames and a small bit of reorganization to help with readability (sorry for the re-review, but seem to be looking at it with some fresh eyes today):
I think this should be renamed to
_js_initialize_cache()
as it's really just an internal function.Also, let's just move this to the main module file. I really don't like mixing procedural code with classes.
It can then use
module_load_include()
inside it as well.I'm thinking this should probably be named this something a little more explicit so it defines what it actually does. As it stands now, it kind of appears that it's in charge of handling cache requests, not intercepting and then proxying them.
Something like
JsProxyCacheBootstrap
maybe?This comment is a bit misleading. It's not actually performing a full bootstrap, just flagging that it should if a cache hit misses.
The usage of "corrupt cache" makes it seem like there's something wrong with JS Callback.
This is also starting to feel like a huge run on sentence. I think this entire block needs some paragraph breaks and a bit of rewording. Thoughts?
Move up in the docblock (alphabetized). Maybe also add a
See "bootstrap" property for more info.
bit and vice versa.Since this is now always intercepting the backend cache, I think it makes sense to move this into
_js_initialize_cache()
.Move up in the array (alphabetized).
Comment #11
markhalliwellAlso:
In my mind, procedural stuff goes in
./includes/*.inc
files.I know we're not actually using autoloading, but I'd feel a little more sane if this followed PSR-4 standards and was actually named after the class
./src/JsProxyCacheBootstrap.php
.Comment #12
plach@markcarver:
Thanks for the review, all the suggestions make sense to me! Just one clarification:
Ok, but we are still explicitly including it, right? Autoloading it may imply a DB bootstrap.
Comment #13
plachaw
Comment #14
markhalliwellCorrect, it should manually include the class in
_js_initialize_cache()
usingmodule_load_include()
.I would just like to keep classes separate from the rest of the code (e.g. in a separate
src
folder and named after the class name).Comment #15
plachThis should implement all the suggestions above. Instead of
JsProxyCacheBootstrap
I went for the simplerJsProxyCache
, because core cache handler class names end with theCache
suffix. I also thought that what we do with bootstrap is an internal implementation detail that should not "leak" into the class name: if in the future we were to add other custom logic in there, the name would suddenly be misleading or at least inconsistent.I also slightly changed the suggested doc blocks as the issue with storage writes is not only cache-related so I tried to make that as clear as I could.
Comment #16
plachSorry, the previous patch contained a wrongly named file.
Comment #17
plachWhile reviewing my own code I realized that with the previous patches
hook_init()
implementations could be invoked twice, if a full bootstrap happened afterjs_callback_bootstrap()
had run. I didn't notice that while testing this locally, because I have init hooks disabled by default. The attached patch seems to address this issue nicely. I performed various tests with cold caches, caches primed by the callback itself and warm caches and they all worked correctly.Comment #18
markhalliwellThis feels, at best, like a premature optimization and, at worst, a hack. Obviously something isn't right if init hooks are being invoked twice, but I think it could be solved a different way perhaps? In any case, I think this should be a separate issue entirely.
Also, FYI, JS Callback already does a full bootstrap if there is no callback info cached: http://cgit.drupalcode.org/js/tree/js.module?h=7.x-2.x#n770
JS Callback already has a minimum of a
DRUPAL_DATABASE
bootstrap level since it's required to utilize all themodule_*
functions: http://cgit.drupalcode.org/js/tree/js.php?h=7.x-2.x#n24This is more of a "we already know exactly where it is and don't want to involve more over head" kind of thing.
That being said, I suppose
module_load_include
and autoloading are essentially the same thing in this case. Since that's the case, would it make sense to just register thefile[]
in the module's.info
? My only concern with autoloading is that it may not work with any custom autoloading implementations.I'm a little unsure why this is even added.
module_implements_write_cache()
is only ever invoked fromdrupal_page_footer()
which is never invoked anywhere in JS Callback.Is it a preventative measure? In cases where a callback may manually invoke it?
Since this was introduced along with the hook_custom_theme code above, I think maybe this too should be tabled to a separate issue. Clearly how cached data is handled in regards to
module_implements()
needs a little closer look.I meant to ask this before, but why is this throwing an error here? Or rather, what are the exact steps in which this error would be thrown?
I'm not a huge fan of throwing errors in callbacks, despite recent error handling improvements that were committed.
From what I can tell, it would seem that this would only ever happen if a user unset the default bin key from
$conf
in their settings.php file.In which case, that indicates that this is really a developer error rather than a JS Callback issue.
The message provided doesn't really explain this very well or how to "solve" it.
Comment #19
plachDiscussed this with Mark: we're fine with keeping the
hook_init()
fix (#18.1 and #18.3) as part of this patch, even if it could be addressed separately, since it is a soft dependency for this issue.This addresses #18.2 and #18.4 as we discussed: I debugged the autoloading process and it actually initializes caches if the class was not loaded by another handler, so one more reason to avoid it.
Comment #20
plachI just created a draft change record at https://www.drupal.org/node/2871046
Comment #21
plachComment #23
markhalliwellAwesome! Thanks @plach :D
I went ahead and published the change record. I'll publish a release here shortly.
Comment #24
markhalliwellhttps://www.drupal.org/project/js/releases/7.x-2.4
Comment #25
plachAwesome, thanks!