Drupal 7 added the new cache_bootstrap bin, and system_list() that tries to use one cache object to store multiple items that are needed together. However this isn't consistently used, and there are still a very large number of individual cache_get() calls during D7 page requests (albeit replacing individual database queries in D7), often for things that are always needed together.
#1011614: Theme registry can grow too large for MySQL max_allowed_packet and memcache default slab size adds a new CacheArrayObject, which allows a large global cache to lazy load parts of itself when requested - when the potential number of items in the cache is often a lot higher than the number of items that are actually requested during normal operation (schema definitions for tables that never have drupal_get_schema() called on them for example).
The trade-off with that approach, at least when it's fully applied is this:
- current D7/8 - rebuild one very large cache item as soon as it's requested. Load that into memory on every/most requests. This has the advantages of one cache write, but disadvantages of high cost to build for the request that hits a cold start and high memory usage on every page.
- CacheArrayObject - build the cache object over time as keys are requested (hopefully without an initial full load although not all current patches can skip that step). This requires additional cache_set() (although a maximum of one extra per page that gets a cache miss, not one for each key), but the memory usage should be dramatically reduced. It should also allow the cost of cache misses to be spread out over multiple requests (i.e. we may be able to get away with not loading the full theme registry in one go ever eventually unless specifically requested, although that's a way off).
So IMO this pattern makes a lot of sense - it is a compromise between loading a single huge object or array into memory - which leads to very high overall memory requirements (and a lot of data being moved around), compared to always lazy loading individual pieces, which leads to lots of little pieces of data being moved around - separate i/o requests to the database or cache backend.
If that pattern can be applied to individual systems, it still leaves us with the issue of having a lot of different, hopefully relatively decoupled systems that need to load stuff on every request to Drupal - this can be a mixture of configuration and other stuff (i.e. the theme registry is not config, nor is a lot of what gets put in the variables system).
So, if the individual items can be kept as small as possible, I think it is worth looking at introducing a shared cache object for these. This would function something like the CacheArrayObject patch (not necessarily that API although it might work).
Let's call that SharedCache.
__construct() loads the shared cache item from the cache backend.
When a key is requested, if it's cached, it's pulled straight from memory - no separate round trip (what system_list() does now but not a hacky one-off).
If the key is missing, it is either pulled from an individual cache item, or triggers a cache miss. This will generate the content, and it'll be set to write through back to the shared cache.
This means that the shared cache object never really gets cleared, each system maintains their own cache as they do now, but the API piggybacks them onto one item internally.
Potentially this could automatically key based on request - so GET/POST and html vs. other content types. That'd mean you could use the shared item transparently, but internally it would maintain separate items based on request context, which would avoid it getting polluted with too much irrelevant stuff.
I don't have anything written yet, I'd probably want to get some of the memory patches sorted out before even looking at this, but writing it down while I think of it.
Comments
Comment #1
Anonymous (not verified) commentedsubscribe.
i'm still digesting the cache object stuff, not 100% sold yet, but not opposed either.
Comment #2
catch#1092192: Using hook_entity_info_alter to add the bundles prevents some modules from altering entities is an example of where this might help. Modules are using hook_entity_info_alter() to add things that they want to query about entities, but don't want to pull on every request from a separate cache entry - even if they aren't strictly entity info (i.e. the RDF mappings are used by RDF module internally, and there is a separate API for changing them).
One alternative to a big cache entry would be to do the following:
1. Have the CacheArrayObject maintain a cache entry - but just a list of cids+bins.
2. In __construct(), grab the cids from cache (will be a small enough entry), then cache_get_multiple() those cache entries and populate the CacheArrayObject.
This would mean one cache_get() added to the page (the list of cids), but no single huge cache entry to maintain.
SQL, Memcache and APC all support multiple get) (the Memcache pecl extension internally still individually gets each item, Memcached does a proper get multi, two years left until D8 comes out so that might improve).
Comment #3
sunI've read this, but I did not fully understand the suggested logic and intention of the "SharedCache" on a high-level.
Comment #4
catchIt's about reducing i/o from having to grab multiple different cache objects that are requested together 99% of the time. More or less applying the logic of the system path cache (for reducing the number of requests for path aliases) in D7 to cache items themselves. Each cached item would maintain itself if we used the cache_get_multiple() method (same as the system path cache is agnostic to what the content of the actual path aliases are).
The best example of a practical use case would be the menu cache. 14 of the 22 cache_get() calls on a default install of D8 are for cache items related to menus (links and tree params). These are per page, usually generated once all at the same time.
Most of those could probably be consolidated into one cache_get() and one cache_get_multiple(). Trying to implement this as a one off in menu.inc would be very difficult and extremely ugly, having a class that encapsulates the logic (of 'learning' the cache items, static caching and retrieving) will keep the ugly in one place. Having the class means it's available across subsystems too (since the system path cache is per page, that could use the same shared per-path cache).
Another example would be the 'global registries/config', so again looking at a default install, there is system_list(), variable_initialize(), _theme_load_registry(), module_implements(), field info, entity info. This would need to be keyed on whether the page was cacheable and request type, but that is another 5-10 items that could turn into a single cache_get() plus a cache_get_multiple().
If both turned out to be viable candidates for this, it's going to remove maybe 10-15 database queries from a default install, more with contrib - to add to the list of backends that would benefit, MongoDB and Redis both have support for multiple get too. The only one I can think of that definitely wouldn't is files.
Have a reasonable idea of how this would end up looking in the low level implementation, the module-facing API is still a bit mushy yet. It'd be nice to be able to expose this as an API to cache backends rather than cache system users, but that'd likely mean centralizing some of the current cid generation logic inside cache_get() and cache_set() (so we know whether caches are per page, per-user etc).
Comment #5
fago>It's about reducing i/o from having to grab multiple different cache objects that are requested together 99% of the time. More or less applying the logic of the system path cache (for reducing the number of requests for path aliases) in D7 to cache items themselves.
But wouldn't that lead to lots of duplicated stored/cached information? Or do you just suggest storing the needed cids per page?
Comment #6
catchYeah just the cids per page.
Comment #7
Crell commentedSubscribing. This sounds promising if we can work out the details.
Comment #8
jhedstromCaching has changed very much since this was last active. While I don't think a shared cache was added as described here, I wonder if this is still needed/relevant?
Comment #9
berdirYes, I think we can close this as a duplicate of #2473205: Create a cache backend that pre loads multiple items in one getMultiple() call. Either that, chained fast or cache collector are providing different ways of doing something like this.