Problem

Unclear

  • Are we not caching the config schema information?
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

sun’s picture

Hm...

use Drupal\Core\Config\InstallStorage;

/**
 * Defines the file storage controller for metadata files.
 */
class SchemaStorage extends InstallStorage {

We're using the InstallStorage at regular runtime...? :-/

alexpott’s picture

Status: Active » Needs review
FileSize
977 bytes

Nope there's no caching of config schema - so we can try this. Which appears to help BUT also the sheer number of objects created to parse a views schema seems to be what is really taking the time as this only shaves a couple of seconds off enabling the views module for me.

Let's see what breaks.

alexpott’s picture

The problem is our schema are dynamic - that is to say that the schema of view is made up of lots of other bits of schema and which exact schemas these are depend on values with the view entity. This means that in order to create the schema for the view we have to parse every value within the view and create a schema object for each one. This turns out to not be cheap.

Anonymous’s picture

does this mean we should cache schema bits per config entity type?

can we live with a coupling the cache lifecycle of a collection of schema entity objects to the CRUD lifecycle of config entities?

alexpott’s picture

FileSize
977 bytes

Testbot was borked when I submitted the last patch

alexpott’s picture

Okay found a large part of the slow down is due unnecessary re getting of the same schema due to nested keys and recursion in Config::getSchemaForKey()

HEAD

Pre using schema in save

Patch

I'm not 100% happy with the patch as it stands since we are caching every level of schema and this is unecessary but its a start

EDIT: the xhprof runs are comparisons of config_install_default_config() whilst enabling Views module through drush.

Status: Needs review » Needs work

The last submitted patch, 6: 2162013.6.patch, failed testing.

alexpott’s picture

Status: Needs work » Needs review
FileSize
6.82 KB

Ok I'm getting happier - more code moving from the Config class to the schema system which is a good thing. The most important line in the patch is

diff --git a/core/lib/Drupal/Core/Config/Schema/Sequence.php b/core/lib/Drupal/Core/Config/Schema/Sequence.php
index 44dbe1d..6333349 100644
--- a/core/lib/Drupal/Core/Config/Schema/Sequence.php
+++ b/core/lib/Drupal/Core/Config/Schema/Sequence.php
@@ -60,7 +60,7 @@ public function onChange($delta) {
    *   Typed configuration element.
    */
   public function get($key) {
-    $elements = $this->parse();
+    $elements = $this->getElements();
     return $elements[$key];
   }

Since this is the line that ensures the array schema elements populate $this->elements and therefore don't need to recreate the elements every time.

Profiling enabling of views: the inclusive wall time when enabling views is 10,098,144 microsec (HEAD is 21,285,766)

No interdiff with #6 since the approach is completely different.

YesCT’s picture

I was trying to do profiling, but could not figure out how to get xhprof results for a: drush -y en views. Opened issue: #2163693: xhprof run results not saved and no id given after xhprof enabled for drush in devel

alexpott’s picture

FileSize
382 bytes
6.66 KB

@beejeebus pointed out that I'd left an used property in the patch from the previous solution.

catch’s picture

Do we really need an extra cache bin for this? I'd think the existing config cache bin would be fine as long as no chance of collisions.

Logic in the patch looks good but yeah it'd be good to see profiling.

Berdir’s picture

Issue tags: +Test suite performance
FileSize
7.55 KB
917 bytes

I noticed the same bug in Sequence::get() (the result is now consistent with ArrayElement::offsetGet() when re-rolling #2110467: Add first(), get($index) and possibly other methods to ListInterface. It could also simply call to parent::offsetGet(), that's what my patch used to to when I added the get() method.

Doing some profiling with xhprof and running a web test:

Before/After:
Number of Function Calls 3,809,248 3,493,216 -316,032 -8.3%
Incl. Wall Time (microsec) 19,470,149 16,325,501 -3,144,648 -16.2%
Incl. MemUse (bytes) 17,509,800 16,797,416 -712,384 -4.1%
Incl. PeakMemUse (bytes) 20,086,152 20,028,912 -57,240 -0.3%

While that's pretty nice and all, I did notice that it also results in 600 additional queries, caused by using CachedSchema for the config schema stuff and not actually being much faster in there (in case of a test) as it mostly just passes through to the file storage + cache set calls and if it doesn't, then resulting in tons of separate cache requests. A simple improvement is using readMultiple() there, although it would be nice to have CacheBackendInterface::setMultiple() here and for cached config in general: #1167248: Add CacheBackendInterface::setMultiple(). See interdiff.

While this doesn't result in a large difference, it saves ~270 queries and the setMultiple() would to the same for the insert queries, this should however improve quite a bit when the cache is warm. #2155635: Allow plugin managers to opt in to cache clear during module install might also be related.

What I'm wondering is why we even mess with the config values at all in config_install_default_config(), aren't we just copying stuff from a module folder to the active storage? If there is no reason, could we inject a null implementation there, so that the validation is skipped?

For the typed data/content entity validation API, we explicitly do not validate automatically on save but made it a separate API, #2085071: Auto validate fields on $entity->save(), but allow opt-out is related. But this is also just casting stuff around and not actually validating it :)

Also running some tests to compare the affect on test run time for the bot with php ./core/scripts/run-tests.sh --concurrency 8 --url http://d8/ Node:

Test run duration:
HEAD: 7 min 44 sec
With the patch: 6 min 56 sec

Not that much of a difference. Noticed that NodeTranslationUITest took by far the most time, so not sure how meaningful those numbers are, but to compare that, I did also run that as a standalone test:

HEAD: 4 min 37 sec
With the patch: 4 min 9 sec

So I mostly just compared that test in the run above ;)

Also planning on doing a comparison with before the commit of the issue that caused this.

Berdir’s picture

For comparison, i did a quick check with an empty typed config manager implementation in that function and it's another 1.2s faster. Note that this doesn't affect saved config entities, saving those was now actually 1s (40%) slower (as it were those calls that triggered the parsing), so that's another 1s that we could save if we'd skip that casting stuff there as well.

More numbers, I did run the test with xhprof enabled, running them without gives me the following:

HEAD: 15s
last patch: 13s
empty typed config manager: 12s

andypost’s picture

Great improvement #12! Is it possible to measure the effect of growing cache entries? small chunks means more read, bigger means more deserializaton, is not it?

I see no reason in skiping casting when saving, core implementation of saving should be the same for install and runtime.

pwolanin’s picture

If we are simply copying in the default value, I don't see why it would make sense to validate it?

Berdir’s picture

That is what I understand yes, we're taking the values from file A and write them to file B, we're not changing anything, so we don't need to cast/validate anything? If I'm missing something, tell me :)

@andypost: Yeah not sure, might be better to maintain a single cache directly in TypedConfigManager, that's a single get and write on miss, and we always read everything. The only problem could be to clear the cache at the right time, but I don't see where we do trhat right now anyway. And the mentioned #2155635: Allow plugin managers to opt in to cache clear during module install would help.

alexpott’s picture

#11 I don't think we can reuse the config cache bin as there is no way we can guarantee that there is no chance of collisions.

Berdir’s picture

Here's a version that implements the caching directly in the typed config manager.

Timed drush si -y a bit:

HEAD 46.7s
Previous patch: 32.5s
This patch: 29.802s

Results vary only marginally, did three installs with this patch and the other results were 29.7 and 29.9.

The last submitted patch, 18: 2162013.18.patch, failed testing.

Berdir’s picture

FileSize
9.77 KB
701 bytes

This should fix the installer.

The last submitted patch, 20: 2162013.20.patch, failed testing.

pwolanin’s picture

@Berdir - this looks like you are using a cache rather than just copying them across?

Berdir’s picture

FileSize
11.93 KB
2.16 KB

@pwolanin: I'm not changing anything related to to that, the caching has nothing to do with how that works. We want caching anyway for normal config saves, I'm just wondering why default config installation goes through the same API.

The last submitted patch, 23: 2162013.23.patch, failed testing.

pwolanin’s picture

Looks like a very minor fix needed.

alexpott’s picture

FileSize
1.61 KB
12.95 KB

The approach of having a single cache entry for all definitions looks great. Minor update to use a constant instead of a property for the cache ID.

alexpott’s picture

The only issue I can think about with having a single cache entry is the potential size of the value. Memcache has a hard limit of 1mb and the current schemas in core add up to 116895 utf-8 characters so this could be in the region of 450k already (utf-8 characters use up to 3 bytes)

sun’s picture

  1. Can we use the cache.config bin here? When clearing that cache bin, I expect all cached config related data to be refreshed.

    I do not see a problem with name clashes; but if we want to be 100% safe, we just need to use/append characters that are not allowed for configuration file/object names; e.g., a colon (:).

  2. While the single cache item looks simple, I'm afraid it will turn into a memory problem.

    Considering that the config schema is only needed when saving a particular config object - and - considering that installing/uninstalling another module will have to add or remove definitions, while retaining existing, it would appear more sensible and performant to me if there was a cache item for each config schema; e.g.,

    schema:$provider:$name
    
  3. Cache invalidation is the most mind-boggling part from my perspective. — With this schema cache in place; as a developer, I'll have to manually flush the config (schema) cache in order to work against latest definitions.

    While that may be acceptable and generally more common in D8, there's also the non-developer/end-user case of cache invalidation:

    If an existing/installed module happens to change/fix one of its config schema definitions (in a newer version), how and when exactly will the stale config schema cache be invalidated? → Off-hand, I assume the answer is "You MUST run update.php after touching any code." — whereas the update.php user interface only talks about database updates and does not really tell you that you always need to invoke it (in order to clear all caches).

  4. #0: Usage of Config\InstallStorage at regular runtime is still a major issue.

    The InstallStorage was explicitly designed to operate in the initial installer environment, in which the base system is not operable yet — it therefore performs plenty of filesystem scans (as opposed to relying on readily available application state information) and it also ignores the installation status of discovered extensions.

    The InstallStorage was never intended to be used outside of the installer. In fact, as also clarified in #2155701: Installer starts with fatal error/exception "table 'semaphore' not found" if settings.php contains $databases already, the installer's minimal service container including InstallStorage is only used in the very first 2-3 installer screens until the base system has been set up and verified. As soon as System module and User module are installed, the installer uses a regular DrupalKernel with regular runtime services. Not even the installer uses Config\InstallStorage from that point anymore, because application state and extension information is readily available.

    I'm fairly sure that the (ab)use of InstallStorage by the SchemaStorage is causing a major slowdown in performance; at minimum in the installer (and consequently tests). But I'm not sure whether that should be fixed here or in a separate issue?

Berdir’s picture

2. That's not how the API works. It does a listAll() + loadMultiple() of everything. Creating a single cache entry of that is actually less overhead because we only need a single cache object with a large $data array and a single unseralize() call on that and not 200 cache objects, each of those with a separate small $data array and 200 unseralize() calls. More specific caching would first require an API to load specific definitions.

3. Well, yes. In case of module enable/disable, #2155635: Allow plugin managers to opt in to cache clear during module install would take care of it. As for changes, that's the same as with anything else, if you change your entity type definition or routes or table schema, then you will need to make sure it's updated/the necessary caches are cleared. I don't see how we could avoid this. We need to cache this.

4. Haven't touched that part but after a quick look, this does seem to explain why the complete schema config is parsed, not just enabled modules, which probably also explains why there were never any cache clear problems. So yes, limiting that to actually enabled modules should help a lot with test performance, where usually only a small subset of modules is enabled.

alexpott’s picture

I would prefer to get this in as is and address the fact that all schema are cached regardless of whether its provided by an enabled module in another issue. We also use the InstallStorage in ExtensionInstallStorage which is currently used to config_install_default_config because it offers a handy way to get config from all the enabled modules. ExtensionInstallStorage overrides a function to limit only to enabled extensions.

catch’s picture

I really think we should use the config cache bin, and just add a prefix to the cache keys if that's really necessary to avoid collisions.

The memcache project gzips items that are going to be too big to fit in the 1mb limit, so the practical limit is actually quite a bit more than 1mb. On the other hand it's not good to rely on that - max_allowed_packet also defaults to 1mb.

If the InstallStorage is a different issue, it'd be good to open that as another critical follow-up.

I'm also wondering a bit how much of a performance issue the lack of caching is if using the PECL YAML parser - it's not impossible that it'd be quicker than hitting the db.

Berdir’s picture

https://drupal.org/requirements/database documents that max_allowed_packet should be at least 16MB.

I think the cache is still a lot faster than pecl, we're not talking about a single file but non-cached (due to the InstallStorage usage) directory content lookup for every existing (not just enabled, again, thanks to InstallStorage) module and then parsing all those files (~70 right now) one by one. Compared to a single cache get + unseralize.

I'll try to provide numbers for my thinking tonight :)

sun’s picture

2. … the API … does a listAll() + loadMultiple() of everything. … More specific caching would first require an API to load specific definitions.

I agree, yes, that's what I meant — it looks like there is a design flaw of blatantly scanning, reading, parsing, and processing all available definitions in the new config schema system - whereas the actual schema information is used for saving a single particular config object only; i.e., we're loading hundreds of definitions, just to use one.

Spin-off: #2166699: Use separate config schema cache items

If the InstallStorage is a different issue, it'd be good to open that as another critical follow-up.

Spin-off: #2166703: Config SchemaStorage uses InstallStorage


The only remaining change here from my perspective would be to change the cache bin to cache.config.

sun’s picture

Changed the cache bin to cache.config + minor fixes.

Berdir’s picture

+++ b/core/lib/Drupal/Core/Config/Schema/SchemaIncompleteException.php
@@ -2,7 +2,7 @@
 /**
  * @file
- * Definition of Drupal\Core\Config\Schema\SchemaIncompleteException.
+ * Definition of \Drupal\Core\Config\Schema\SchemaIncompleteException.
  */

Both are wrong, actually :) Should be Contains \Drupal\...

alexpott’s picture

FileSize
534 bytes
13.03 KB

Lol

And using the CID of 'config_typed_definitons' can never clash with a key from the config cache of regular configuration since config names always have to have a period.

sun’s picture

Status: Needs review » Reviewed & tested by the community

Awesome - let's move forward here :)

The two follow-up issues above will most likely result in further major speedups, potentially affecting regular runtime performance (whereas the changes here only appear to have an effect on rare operations; i.e., installer, module installation, and config saving).

Berdir’s picture

Compared/combined this with #1920902: Add a Drupal Yaml wrapper so we can default to PECL Yaml component if it is available but wasn't able to detect a big difference with and without this issue.

webchick’s picture

Status: Reviewed & tested by the community » Fixed

Awesome! Great work, all.

Committed and pushed to 8.x. Let's hope this gets our testbot run time back under 2 hours again. Look forward to further improvements. :)

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.