Updated: Comment #N

Problem/Motivation

CacheArray has been deprecated. The only left-over use is SchemaCache, which was added in 7.x because the list of tables could get very long, especially due to configurable fields.

Proposed resolution

As field storage tables are no longer exposed in hook_schema() and a lot of tables have been dropped due to conversion to config entities, caches tables are created on-demand as well and are no longer in there. The expected amount of tables defined in hook_schema() will be much much smaller in 8.x. Maybe even the content entity storage tables will be removed. So the schema cache is small enough so that we won't have to worry about it anymore.

Therefore, drop SchemaCache without replacement, together with CacheArray.

Remaining tasks

User interface changes

API changes

CacheArray is removed, has been deprecated for a long time already.

Original report by @cosmicdreams

As a follow up to #1858616: Extract generic CacheCollector implementation and interface from CacheArray we should remove the deprecated CacheArray class and convert the final pieces of Drupal to use CacheCollector instead of CacheArray. Only two uses remain and two places within code comments.

Then we can put the parent issue to bed by writing a change notice.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Status: Needs review » Needs work

The last submitted patch, 1858616_34.patch, failed testing.

Berdir’s picture

Duplicate of #1957092: [META] Remove CacheArray, issue to remove ModuleInfo is RTBC, schema is likely going to be tricky.

This is not a drop-in replacement, it requires rewriting the implementation of the client classes to stop treating them as arrays but as an object with methods.

Not sure why I forgot to write the change notice, but it's not blocked on this, adding CacheCollector and removing CacheArray can and probably should be two separat change notices.

ianthomas_uk’s picture

Status: Needs work » Needs review
FileSize
3.75 KB

The tricky part of the is to convert the SchemaCache class to use CacheCollector, but unless I've missed something really obvious the SchemaCache class is totally unnecessary anyway.

The only usage of SchemaCache is in drupal_get_schema, and it is only used there if there is no schema at all, in which case a new instance is created which retrieves the schema from drupal_get_complete_schema(). But that function's already cached, so we're just copying from one cache to another.

The current code is also a bit weird, because $schema in drupal_get_schema is sometimes an array and sometimes an instance of SchemaCache.

Let's see if the testbot shows something I've not thought of.

ianthomas_uk’s picture

Title: Follow up: Remove deprecated CacheArray class » Remove final uses of deprecated CacheArray class
Berdir’s picture

We don't want to go back to an array, we want to use a CacheCollection instead.

And there used to be cases where we looped over the full list of schemas, might only exist in contrib and not core, not sure.

ianthomas_uk’s picture

This doesn't 'go back' to an array, it's just being consistent about the existing usage of an array.

At the moment a SchemaCache instance can only be returned by calling, in order:
drupal_get_schema('table_name'); // populate $schema with SchemaCache
drupal_get_schema(); // retrieve SchemaCache.

If you set $rebuild = true, $schema will switch back to an array.
If you call drupal_get_schema() first, before asking for a specific table, $schema will initialise to an array.
If you call drupal_get_complete_schema(), you will get an array.

As it happens, a call to drupal_get_schema() without passing a table will usually return an object, but that's only because most calls to drupal_get_schema() pass a table.

Berdir’s picture

No, if there is no rebuild enforced and it is called with a table name, it initializes a CacheSchema, which will maintain an internal cache that only consists of the actually used tables.

It only calls the full schema if all tables are requested, which means that the SchemaCache() wouldn't help and would only get filled up for no reason.

Yes, it does some tricky around object/array, but that's because this was introduced in 7.x, to avoid breaking backwards compatibility. I'm sure this is working :)

ianthomas_uk’s picture

Assigned: cosmicdreams » ianthomas_uk
Status: Needs review » Needs work

Ah, OK. I can see now how that works (apart from where the SchemaCache is cleared when the schema is rebuilt) and how it will use less memory to only load the schema for the tables that are needed.

That code is not at all obvious though, some comments to explain what you've just done would have been a big help. I'll take another look at this tomorrow.

Berdir’s picture

Absolutely, this was hacked into 7.x with the main goal to not break existing sites, not have clean and understandable code :)

As mentioned, the idea for the full rebuild was that using SchemaCache would be backwards, as it would lead to the whole schema being stored in there.

An 8.x approach would look like this I think:

interface SchemaStoreInterface (or Registry?) extends CacheCollectorInterface

public function getAll();

}

Then have an implementation of that as a service. Inline drupal_get_schema() and drupal_get_complete_schema() (this might just become getAll() ?). Drop the cache clear/all arguments for drupal_get_schema(), switch those to getAll()/clear(). Maybe drop the function or @deprecate it. Not sure.

Will require an API change approval.

ianthomas_uk’s picture

Just to be clear, are you proposing using a single class that:
- contains the code from drupal_get_complete_schema()
- by default will load only the requested schemas into memory
- when asked to, will load all schema (including rebuild+cache if required) and keep it in memory for the rest of the page request

Because it would replace drupal_get_complete_schema, we would stop caching the array of all schema and just rely on the per-schema cache in the new class.

Berdir’s picture

Not sure about dropping the cache for the all schema case, because the way the limited cache is built is up is falling back to the full cache until that's no longer necessary. Other than that, yes, single class.

ianthomas_uk’s picture

At the moment a rebuild calls all the module hooks and then stored the resulting big array under a single cache key. The limited cache then retrieves this big array and stores parts of it into it's own little caches (unless I've misunderstood and CacheArray is just a shorter big list).

I'm proposing that a rebuild would call all the module hooks and store the resulting big array under separate cache keys, so we could retrieve just the ones we want. To generate the big array we'd just return the contents of all caches.

Berdir’s picture

No, we want a single cache entry for the "commonly used tables", that's what CacheArray/CacheCollector does. Not a single cache per table.

And we also want to keep the full thing cached, but not polluting the smaller list, so getAll() that directly returns everything from the global cache. There might still be use cases to need that. And it's needed while the cache collector is building up the optimized cache.

sun’s picture

Hm. Perhaps it would make sense to wrap this up from the back?

IIRC, the actual root cause for introducing SchemaCache (or rather its D7/D6 equivalent) was to "avoid having to load .install files, because they can be big." — But in D8, (full) database schema information should not actually be commonly needed anymore.

→ Wouldn't it make more sense if we'd drop that cache altogether and just simply load the .install file of the requested module + invoke its hook_install() at all times?

The primary offenders are:

  1. drupal_write_record() (calls)
  2. drupal_get_schema($table) (calls)
  3. drupal_schema_fields_sql() (calls)

For (1), I created #1774104: Replace all trivial drupal_write_record() calls with db_merge() a very long time ago already, because drupal_write_record() is definitely replaced by all the (base) field work in D8. AFAIK, @tstoeckler and @andypost are currently working to fill in some last remaining gaps in that space, with the goal of completely replacing hook_install() for all entity types (since the information can be derived from base field definitions).

The vast majority of calls to (2) are tests.

The calls to (3) are almost exclusively originating from Entity/Field code, so quite potentially those might go away with aforementioned base field definition-to-schema conversion effort.

In turn, we'd be in a position to drop all the caching from these schema functions and simply turn them into a new Schema class/service.

Berdir’s picture

The main reason for SchemaCache (which was introduced in 7.x and the original CacheArray implementation) was that the full schema cache can get huge. Mostly, that's because field.module exposes field tables there, or used to, so every field resulted in two tables. Hundreds of tables in there are very common in 7.x.

That's why the optimized schema cache was introduced in 7.something that only stores what is needed on normal pages and also separates between GET and POST.

Re 1), I can see quite some issues with all the schema generation stuff, we'll see how well that goes :)

But yes, given that field schema is no longer exposed, we might no longer need this, I'm not sure. At least a global cache should be kept IMHO, though. I'm a bit worried that we'll notice that we still need something like it once we removed it, but it's going to be much less of a problem with field_schema() gone in 8.x.

jibran’s picture

Status: Needs work » Needs review

3: 2185015-3-schema-cache.patch queued for re-testing.

Berdir’s picture

Status: Needs review » Needs work
+++ b/core/modules/simpletest/lib/Drupal/simpletest/TestBase.php
@@ -1025,8 +1025,8 @@ private function prepareEnvironment() {
     // __destruct() method still have access to it.
     // All static variables need to be reset before the database prefix is
-    // changed, since \Drupal\Core\Utility\CacheArray implementations attempt to
-    // write back to persistent caches when they are destructed.
+    // changed, since \Drupal\Core\Utility\CacheCollector implementations
+    // attempt to write back to persistent caches when they are destructed.
     // @todo: Remove once they have been converted to services.
     drupal_static_reset();

@@ -1062,8 +1062,8 @@ private function prepareEnvironment() {
 
     // Change the database prefix.
     // All static variables need to be reset before the database prefix is
-    // changed, since \Drupal\Core\Utility\CacheArray implementations attempt to
-    // write back to persistent caches when they are destructed.
+    // changed, since \Drupal\Core\Utility\CacheCollector implementations
+    // attempt to write back to persistent caches when they are destructed.
     $this->changeDatabasePrefix();

They don't. The comments can be removed. And the drupal_static_reset() call probably too.

Let's also remove the CacheArray class then. I'm still not 100% sure that we should do this, but given that field tables are no longer in the schema cache, I guess the problem is much much smaller.

Berdir’s picture

Title: Remove final uses of deprecated CacheArray class » Remove SchemaCache and CacheArray
Assigned: ianthomas_uk » Unassigned
Issue summary: View changes
Status: Needs work » Needs review
FileSize
11.86 KB
9.6 KB

Ok here we go, Removing it completely. Left the drupal_static_reset() as we still need to call it at some point, it just doesn't matter anymore when.

ianthomas_uk’s picture

Issue tags: +@deprecated
damiankloip’s picture

Status: Needs review » Reviewed & tested by the community
+++ b/core/includes/schema.inc
@@ -33,12 +32,9 @@
+  if ($rebuild || !isset($schema)) {

Ha :)

I think this looks good. Let's do it.

Berdir’s picture

Status: Reviewed & tested by the community » Needs review

In regards to change notice, we already have https://drupal.org/node/2186501. I don't think many people outside of core used this, so I'll just update that one to mention that CacheArray has now been removed completely as soon as this is committed...

Berdir’s picture

Status: Needs review » Reviewed & tested by the community

Oh, sorry, I did not want to set it back to needs review.

Berdir’s picture

Status: Reviewed & tested by the community » Needs review
FileSize
12.52 KB
822 bytes

Re-roll, CacheArray.php got changed :(

Also noticed in #2006434: [meta] Speed up web tests that we can remove the schema cache tag, no need for that anymore.

ianthomas_uk’s picture

It wasn't immediately obvious to me why we could just remove the schema cache tag. Were we only ever checking for it immediately before replacing the cache anyway, making it redundant?

Are we sure this won't cause a significant regression in speed or memory usage? We should probably at least measure how much memory drupal_get_complete_schema() uses (which we can probably triple for real-world usage).

Berdir’s picture

The tag was there to remove all schema caches at once. We only have on left, so we don't need the tag anymore.

We still cache just like before, it's actually faster, because we no longer have to remove the cache tag with separate queries. Cache::PERMANENT could also be removed, as it is the default, forgot that.

sun’s picture

Status: Needs review » Reviewed & tested by the community

Thanks, even better!

Let's finally get rid of this :-)

No need to nitpick this further — we need to convert this code into services/classes anyway. The functionality is actually somewhat related to the extension system revamp I'm working on, but thus far, I did not include this in my plans. Nevertheless, I'd think that something like Drupal\Core\Extension\SchemaManager exposed as schema_manager service would be appropriate. If someone wants to work on that, I'd be happy to help with architectural planning.

alexpott’s picture

Status: Reviewed & tested by the community » Fixed

Committed 7c26feb and pushed to 8.x. Thanks!

Berdir’s picture

Updated [#2186501]

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.