Updated: Comment #N
Problem/Motivation
CacheArray has been deprecated. The only left-over use is SchemaCache, which was added in 7.x because the list of tables could get very long, especially due to configurable fields.
Proposed resolution
As field storage tables are no longer exposed in hook_schema() and a lot of tables have been dropped due to conversion to config entities, caches tables are created on-demand as well and are no longer in there. The expected amount of tables defined in hook_schema() will be much much smaller in 8.x. Maybe even the content entity storage tables will be removed. So the schema cache is small enough so that we won't have to worry about it anymore.
Therefore, drop SchemaCache without replacement, together with CacheArray.
Remaining tasks
User interface changes
API changes
CacheArray is removed, has been deprecated for a long time already.
Original report by @cosmicdreams
As a follow up to #1858616: Extract generic CacheCollector implementation and interface from CacheArray we should remove the deprecated CacheArray class and convert the final pieces of Drupal to use CacheCollector instead of CacheArray. Only two uses remain and two places within code comments.
Then we can put the parent issue to bed by writing a change notice.
Comment | File | Size | Author |
---|---|---|---|
#24 | remove-cache-array-2185015-24-interdiff.txt | 822 bytes | Berdir |
#24 | remove-cache-array-2185015-24.patch | 12.52 KB | Berdir |
#19 | remove-cache-array-2185015-19-interdiff.txt | 9.6 KB | Berdir |
#19 | remove-cache-array-2185015-19.patch | 11.86 KB | Berdir |
#3 | 2185015-3-schema-cache.patch | 3.75 KB | ianthomas_uk |
Comments
Comment #2
BerdirDuplicate of #1957092: [META] Remove CacheArray, issue to remove ModuleInfo is RTBC, schema is likely going to be tricky.
This is not a drop-in replacement, it requires rewriting the implementation of the client classes to stop treating them as arrays but as an object with methods.
Not sure why I forgot to write the change notice, but it's not blocked on this, adding CacheCollector and removing CacheArray can and probably should be two separat change notices.
Comment #3
ianthomas_ukThe tricky part of the is to convert the SchemaCache class to use CacheCollector, but unless I've missed something really obvious the SchemaCache class is totally unnecessary anyway.
The only usage of SchemaCache is in drupal_get_schema, and it is only used there if there is no schema at all, in which case a new instance is created which retrieves the schema from drupal_get_complete_schema(). But that function's already cached, so we're just copying from one cache to another.
The current code is also a bit weird, because $schema in drupal_get_schema is sometimes an array and sometimes an instance of SchemaCache.
Let's see if the testbot shows something I've not thought of.
Comment #4
ianthomas_ukComment #5
BerdirWe don't want to go back to an array, we want to use a CacheCollection instead.
And there used to be cases where we looped over the full list of schemas, might only exist in contrib and not core, not sure.
Comment #6
ianthomas_ukThis doesn't 'go back' to an array, it's just being consistent about the existing usage of an array.
At the moment a SchemaCache instance can only be returned by calling, in order:
drupal_get_schema('table_name'); // populate $schema with SchemaCache
drupal_get_schema(); // retrieve SchemaCache.
If you set $rebuild = true, $schema will switch back to an array.
If you call drupal_get_schema() first, before asking for a specific table, $schema will initialise to an array.
If you call drupal_get_complete_schema(), you will get an array.
As it happens, a call to drupal_get_schema() without passing a table will usually return an object, but that's only because most calls to drupal_get_schema() pass a table.
Comment #7
BerdirNo, if there is no rebuild enforced and it is called with a table name, it initializes a CacheSchema, which will maintain an internal cache that only consists of the actually used tables.
It only calls the full schema if all tables are requested, which means that the SchemaCache() wouldn't help and would only get filled up for no reason.
Yes, it does some tricky around object/array, but that's because this was introduced in 7.x, to avoid breaking backwards compatibility. I'm sure this is working :)
Comment #8
ianthomas_ukAh, OK. I can see now how that works (apart from where the SchemaCache is cleared when the schema is rebuilt) and how it will use less memory to only load the schema for the tables that are needed.
That code is not at all obvious though, some comments to explain what you've just done would have been a big help. I'll take another look at this tomorrow.
Comment #9
BerdirAbsolutely, this was hacked into 7.x with the main goal to not break existing sites, not have clean and understandable code :)
As mentioned, the idea for the full rebuild was that using SchemaCache would be backwards, as it would lead to the whole schema being stored in there.
An 8.x approach would look like this I think:
interface SchemaStoreInterface (or Registry?) extends CacheCollectorInterface
public function getAll();
}
Then have an implementation of that as a service. Inline drupal_get_schema() and drupal_get_complete_schema() (this might just become getAll() ?). Drop the cache clear/all arguments for drupal_get_schema(), switch those to getAll()/clear(). Maybe drop the function or @deprecate it. Not sure.
Will require an API change approval.
Comment #10
ianthomas_ukJust to be clear, are you proposing using a single class that:
- contains the code from drupal_get_complete_schema()
- by default will load only the requested schemas into memory
- when asked to, will load all schema (including rebuild+cache if required) and keep it in memory for the rest of the page request
Because it would replace drupal_get_complete_schema, we would stop caching the array of all schema and just rely on the per-schema cache in the new class.
Comment #11
BerdirNot sure about dropping the cache for the all schema case, because the way the limited cache is built is up is falling back to the full cache until that's no longer necessary. Other than that, yes, single class.
Comment #12
ianthomas_ukAt the moment a rebuild calls all the module hooks and then stored the resulting big array under a single cache key. The limited cache then retrieves this big array and stores parts of it into it's own little caches (unless I've misunderstood and CacheArray is just a shorter big list).
I'm proposing that a rebuild would call all the module hooks and store the resulting big array under separate cache keys, so we could retrieve just the ones we want. To generate the big array we'd just return the contents of all caches.
Comment #13
BerdirNo, we want a single cache entry for the "commonly used tables", that's what CacheArray/CacheCollector does. Not a single cache per table.
And we also want to keep the full thing cached, but not polluting the smaller list, so getAll() that directly returns everything from the global cache. There might still be use cases to need that. And it's needed while the cache collector is building up the optimized cache.
Comment #14
sunHm. Perhaps it would make sense to wrap this up from the back?
IIRC, the actual root cause for introducing
SchemaCache
(or rather its D7/D6 equivalent) was to "avoid having to load .install files, because they can be big." — But in D8, (full) database schema information should not actually be commonly needed anymore.→ Wouldn't it make more sense if we'd drop that cache altogether and just simply load the .install file of the requested module + invoke its
hook_install()
at all times?The primary offenders are:
drupal_write_record()
(calls)drupal_get_schema($table)
(calls)drupal_schema_fields_sql()
(calls)For (1), I created #1774104: Replace all trivial drupal_write_record() calls with db_merge() a very long time ago already, because
drupal_write_record()
is definitely replaced by all the (base) field work in D8. AFAIK, @tstoeckler and @andypost are currently working to fill in some last remaining gaps in that space, with the goal of completely replacinghook_install()
for all entity types (since the information can be derived from base field definitions).The vast majority of calls to (2) are tests.
The calls to (3) are almost exclusively originating from Entity/Field code, so quite potentially those might go away with aforementioned base field definition-to-schema conversion effort.
In turn, we'd be in a position to drop all the caching from these schema functions and simply turn them into a new Schema class/service.
Comment #15
BerdirThe main reason for SchemaCache (which was introduced in 7.x and the original CacheArray implementation) was that the full schema cache can get huge. Mostly, that's because field.module exposes field tables there, or used to, so every field resulted in two tables. Hundreds of tables in there are very common in 7.x.
That's why the optimized schema cache was introduced in 7.something that only stores what is needed on normal pages and also separates between GET and POST.
Re 1), I can see quite some issues with all the schema generation stuff, we'll see how well that goes :)
But yes, given that field schema is no longer exposed, we might no longer need this, I'm not sure. At least a global cache should be kept IMHO, though. I'm a bit worried that we'll notice that we still need something like it once we removed it, but it's going to be much less of a problem with field_schema() gone in 8.x.
Comment #16
sunComment #17
jibran3: 2185015-3-schema-cache.patch queued for re-testing.
Comment #18
BerdirThey don't. The comments can be removed. And the drupal_static_reset() call probably too.
Let's also remove the CacheArray class then. I'm still not 100% sure that we should do this, but given that field tables are no longer in the schema cache, I guess the problem is much much smaller.
Comment #19
BerdirOk here we go, Removing it completely. Left the drupal_static_reset() as we still need to call it at some point, it just doesn't matter anymore when.
Comment #20
ianthomas_ukComment #21
damiankloip CreditAttribution: damiankloip commentedHa :)
I think this looks good. Let's do it.
Comment #22
BerdirIn regards to change notice, we already have https://drupal.org/node/2186501. I don't think many people outside of core used this, so I'll just update that one to mention that CacheArray has now been removed completely as soon as this is committed...
Comment #23
BerdirOh, sorry, I did not want to set it back to needs review.
Comment #24
BerdirRe-roll, CacheArray.php got changed :(
Also noticed in #2006434: [meta] Speed up web tests that we can remove the schema cache tag, no need for that anymore.
Comment #25
ianthomas_ukIt wasn't immediately obvious to me why we could just remove the schema cache tag. Were we only ever checking for it immediately before replacing the cache anyway, making it redundant?
Are we sure this won't cause a significant regression in speed or memory usage? We should probably at least measure how much memory drupal_get_complete_schema() uses (which we can probably triple for real-world usage).
Comment #26
BerdirThe tag was there to remove all schema caches at once. We only have on left, so we don't need the tag anymore.
We still cache just like before, it's actually faster, because we no longer have to remove the cache tag with separate queries. Cache::PERMANENT could also be removed, as it is the default, forgot that.
Comment #27
sunThanks, even better!
Let's finally get rid of this :-)
No need to nitpick this further — we need to convert this code into services/classes anyway. The functionality is actually somewhat related to the extension system revamp I'm working on, but thus far, I did not include this in my plans. Nevertheless, I'd think that something like
Drupal\Core\Extension\SchemaManager
exposed asschema_manager
service would be appropriate. If someone wants to work on that, I'd be happy to help with architectural planning.Comment #28
alexpottCommitted 7c26feb and pushed to 8.x. Thanks!
Comment #29
BerdirUpdated [#2186501]