Problem/Motivation

Blocking #2289201: [Meta] Make drupal install and run within reasonable php memory limits so we can reset the memory requirements to lower levels

YAML parsing is still one of the slowest components of the current cold cache performance.

This especially effects cache clears (e.g. container rebuilds or module installation), the peak memory usage and cache clears.

e.g.

Looking at profile of standard install we call FileStorage::read 1772 times but we only have 1244 yml files in all of core, so this would indicate that some files are being read and parsed twice.

Proposed resolution

- Files can be cached perfectly based on their mtime using a FileCache class
- The best backend to cache files in is APCu, but PHPStorage works too (for e.g. shared hosts not having APCu).

Benefits:

- Installation is around 10%-20% faster with FileCache.
- Testbot consistently goes down from 15.4 minutes to 14.2 seconds, so we shave off a minute for core testing
- After module installation the container rebuild and YAML parsing time for all plugin data is saved (5-7% performance improvement, 38 k less calls)
- Saves 18% performance, 152k less calls for container rebuild
- Allows pre-compiling YAML and other data

Remaining tasks

- Review
- Commit

User interface changes

- None

API changes

- Add FileCache component / class that is a generic low-level class to cache files, has static caching by default
- Add FileCacheBackendInterface based on the APCu interface to allow some pluggability and especially not hard-code us to APCu, which has drawbacks, too.
- Add FileCacheFactory class to add some pluggability similar to PhpStorageFactory

CommentFileSizeAuthor
#210 2395143.210.patch31.23 KBalexpott
#210 a-interdiff-of-sorts.patch576 bytesalexpott
#205 Screen Shot 2015-04-22 at 13.05.03.png336.56 KBalexpott
#199 2395143-198-followup-followup.patch1.89 KBznerol
#190 2395143-190-followup.patch568 bytesamateescu
#185 2395143-185-interdiff.txt15.13 KBberdir
#185 2395143-185.patch31.15 KBberdir
#180 2395143-180-interdiff.txt5.46 KBberdir
#180 2395143-180.patch40.34 KBberdir
#179 2395143-179-interdiff.txt5.26 KBberdir
#179 2395143-179.patch39.68 KBberdir
#177 2395143-175-interdiff.txt7.74 KBberdir
#177 2395143-175.patch37.29 KBberdir
#171 interdiff.txt310 bytesamateescu
#171 2395143-171.patch36.69 KBamateescu
#168 interdiff.txt9.8 KBamateescu
#168 2395143-168.patch39.86 KBamateescu
#165 interdiff.txt1.34 KBamateescu
#165 2395143-164.patch46.78 KBamateescu
#160 interdiff.txt2.74 KBamateescu
#160 2395143-160.patch46.83 KBamateescu
#158 interdiff.txt5.61 KBamateescu
#158 2395143-158.patch46.79 KBamateescu
#153 interdiff.txt3.29 KBamateescu
#153 2395143-153.patch46.64 KBamateescu
#150 interdiff.txt1.42 KBamateescu
#150 2395143-149.patch46.56 KBamateescu
#146 interdiff.txt709 bytesamateescu
#146 2395143-146.patch46.58 KBamateescu
#144 interdiff.txt418 bytesamateescu
#144 2395143-144.patch46.58 KBamateescu
#142 interdiff.txt2.57 KBamateescu
#142 2395143-142.patch46.61 KBamateescu
#139 interdiff.txt1.69 KBamateescu
#139 2395143-139.patch45.96 KBamateescu
#137 extra_changes.interdiff.txt1.69 KBamateescu
#137 interdiff.txt13.65 KBamateescu
#137 2395143-137.patch45.94 KBamateescu
#126 yaml_parsing_is_very-2395143-126--interdiff.txt23.45 KBfabianx
#126 yaml_parsing_is_very-2395143-126.patch45.63 KBfabianx
#116 yaml_parsing_is_very-2395143-116.patch45.02 KBfabianx
#114 yaml_parsing_is_very-2395143-114--interdiff.txt3.21 KBfabianx
#114 yaml_parsing_is_very-2395143-114.patch45.01 KBfabianx
#111 yaml_parsing_is_very-2395143-111--interdiff.txt1.81 KBfabianx
#111 yaml_parsing_is_very-2395143-111.patch45.19 KBfabianx
#108 yaml_parsing_is_very-2395143-107--interdiff.txt20.66 KBfabianx
#108 yaml_parsing_is_very-2395143-107.patch45.17 KBfabianx
#105 yaml_parsing_is_very-2395143-105--interdiff.txt21.26 KBfabianx
#105 yaml_parsing_is_very-2395143-105.patch45.76 KBfabianx
#95 remove-todo-statement-2395143-95.patch27.79 KBdashaforbes
#95 interdiff.txt787 bytesdashaforbes
#93 remove-todo-statement-2395143-93.patch27.77 KBdashaforbes
#93 interdiff.txt998 bytesdashaforbes
#90 yaml_parsing_is_very-2395143-90--interdiff.txt12.61 KBfabianx
#90 yaml_parsing_is_very-2395143-90.patch27.93 KBfabianx
#81 yaml_parsing_is_very-2395143-80.patch27.39 KBfabianx
#78 yaml_parsing_is_very-2395143-78.patch27.39 KBfabianx
#76 yaml_parsing_is_very-2395143-76.patch27.29 KBfabianx
#74 yaml_parsing_is_very-2395143-74.patch27.26 KBfabianx
#69 interdiff.txt19.22 KBfabianx
#69 yaml_parsing_is_very-2395143-69.patch21.65 KBfabianx
#57 2395143-57-interdiff.txt881 bytesberdir
#57 2395143-57.patch16.91 KBberdir
#53 2395143-53-interdiff.txt12.19 KBberdir
#53 2395143-53.patch16.04 KBberdir
#38 2395143-38-interdiff.txt770 bytesberdir
#38 2395143-38.patch12.6 KBberdir
#32 2395143-32-interdiff.txt1.64 KBberdir
#32 2395143-32.patch11.85 KBberdir
#30 2395143-30-interdiff.txt8.64 KBberdir
#30 2395143-30.patch11.42 KBberdir
#28 2395143-28.patch14.54 KBAnonymous (not verified)
#25 2395143-25-interdiff.txt6.23 KBAnonymous (not verified)
#25 2395143-25.patch14.53 KBAnonymous (not verified)
#21 2395143-21.patch12.13 KBAnonymous (not verified)
#14 apcu-file-cache-2395143-14-interdiff.txt8.11 KBberdir
#14 apcu-file-cache-2395143-14.patch12.13 KBberdir
#5 apcu-file-cache-2395143-1.patch5.43 KBberdir
cache-read-files.1.patch1.97 KBlarowlan

Comments

larowlan’s picture

Status: Active » Needs review
dawehner’s picture

I'm not sure whether this is the best idea, given that the continues container rebuild might reset the file storage object.
I remember that @alexpott had some ideas about actual cached YML files, maybe there is an isse out there for it?

berdir’s picture

Not sure how much this will help.. Maybe when installing with drush as it is a single request, but not when installing through the web UI.

What I was thinking about is a cache that uses apcu with a file modification date so that we can keep returning the parsed result.

Also note that we have other .yml files that are loaded differently.

Installing my install profile, I'm seeing 14k calls to Yaml::decode, 7.5k from FileStorage and 6.9k from YamlDiscovery (83 router builds).

Also, 93% of FileStorage parsing is config schema :( For that, I had that idea about instead of rebuilding the config schema every time we install a new module, that we can update it with just the files from one module.

Yaml::decode() took 100s, which is 30% of the whole install.

berdir’s picture

StatusFileSize
new5.43 KB

Quick (and ugly and undocumented) proof of concept, not yet tested in my install profile, but with the standard profile:

HEAD

Congratulations, you installed Drupal! [49.19 sec, 91.4 MB] (with uprofiler enabled)
Congratulations, you installed Drupal! [24.07 sec, 75.91 MB]  (without)

Patch:

Congratulations, you installed Drupal! [31.95 sec, 89.6 MB]  (with uprofiler enabled)
Congratulations, you installed Drupal! [19.33 sec, 74.05 MB] (without)

So a 20% improvement when neither xdebug nor uprofiler are enabled, without using more memory.

(uprofiler documentation page says that speedstep is messing with their timers, need to have a closer look at that, see http://superuser.com/questions/454101/is-there-a-way-to-disable-intel-sp...)

Those numbers again nicely show the overhead (edit: of uprofiler/xhprof) of many function calls. HEAD does 40% more function calls according to uprofiler report.

Not yet tested with my custom install profile, I actually expect the difference there to be bigger, as the number of file reads grow with both more modules (more rebuilds), which also result in more files. Yaml::decode() was about the same percentage, so maybe. There is a lot of other stuff going on there as well (like default content).

Note: because apc cache is per process, every drush install command still has to parse all the files once, but that's just 4% instead of 38% (when uprofiler is enabled).

berdir’s picture

We can probably also use ApcuFileCache for other things, like annotions, AnnotatedClassDiscovery::getDefinitions() currently takes 2.5s/7% (with the patch applied).

chx’s picture

Well, the DependencyInjection YamlFileLoader was already doing this so this makes perfect sense.

anavarre’s picture

Issue tags: +Performance, +D8 cacheability
Anonymous’s picture

nice! love this idea.

catch’s picture

Like this a lot.

The last submitted patch, cache-read-files.1.patch, failed testing.

Status: Needs review » Needs work

The last submitted patch, 5: apcu-file-cache-2395143-1.patch, failed testing.

berdir’s picture

Uh. That smells like php file stats caching... :(

berdir’s picture

Status: Needs work » Needs review
StatusFileSize
new12.13 KB
new8.11 KB

No, it was just me being stupid :)

Here's an improved version.

- Tests should be fixed (I forgot to update the static when a file got written written)
- added function exists checks for set/delete
- I changed it so that the static cache is always used.
- I'm also using this for the annotation parsing, extension discovery and the yaml file loader (replaced the static cache, which allows us to remote the reset()).

Status: Needs review » Needs work

The last submitted patch, 14: apcu-file-cache-2395143-14.patch, failed testing.

moshe weitzman’s picture

     foreach ($this->findFiles() as $provider => $file) {
-      $all[$provider] = Yaml::decode(file_get_contents($file));
+      if ($data = ApcuFileCache::get($file)) {
+        $all[$provider] = $data;
+      }
+      else {
+        $all[$provider] = Yaml::decode(file_get_contents($file));
+        ApcuFileCache::set($file, $all[$provider]);
+      }
     }

In this example, it looks like we are saving a YAML:decode for each subsequent file read which is good but ApcuFileCache::get() has to internally deserialize so I wonder how much we are saving. We have seen apc_fetch() come up on flame graphs. I see benchmarks in #5. Would be good to redo those now that the patch is more mature.

berdir’s picture

I didn't do benchmarks but drush si was another 0.3 seconds faster, but that obviously didn't hit apc, just the static cache.

Yes, apc_fetch has some overhead, but yaml parsing has a lot of that. Fairly convinced that this is faster. We could also try to build getMultiple() and setMultiple() versions, but this could be quite a bit bigger.

berdir’s picture

Priority: Normal » Major

Those fails are related to the caching in the annotation parser, weird stuff is happening I think when we cache objects because they are suddenly by reference and changes survive rebuilds. Tried to serialize them to get rid of the references problem but no luck yet.

@beejeebus also has ideas for a getMultiple() implementation that uses a single apc_fetch(), he said he will post a proof of concept here.

I think the performance gains here make this major...

larowlan’s picture

We'll need a hook requirements here too right?

berdir’s picture

For what? apcu is optional, we're not going to require it. Unless you want to make it a recommendation?

For drush, it won't even make a difference, because it just lives as long as the drush process lives. And the current implementation uses the static cache even if apcu is not available.

Anonymous’s picture

StatusFileSize
new12.13 KB

here's a patch that just fixes the !function_exists('apc_fetch') bit, leaving the rest of the patch as is to see what the bot thinks.

multi-get code is still in process, i fell down a rabbit hole looking ExtensionDiscovery and had to drink my way out.

Anonymous’s picture

Status: Needs work » Needs review

Status: Needs review » Needs work

The last submitted patch, 21: 2395143-21.patch, failed testing.

yched’s picture

This looks promising, but naming this "FileCache" is a bit confusing. This is doesn't cache the actual file content, but arbitrary PHP data "related to" (e.g. extracted from) a file.

Some consuming code uses this to cache the structures parsed from annotations in PHP class files,
Some consuming code uses this to cache the structures parsed from YAML files...
But the data being stored, and how it derives from the file, is entirely up to the consuming code. Thus, it seems there should be at least a notion of separate bins or namespaces, since there could be cases where different consumers might want to store different data about the same file ?

Anonymous’s picture

Status: Needs work » Needs review
StatusFileSize
new14.53 KB
new6.23 KB

here's a first go at multiget.

it's ugly, and there are some bits to make tests pass i don't like.

re. #24 - yes, i guess that could be true.

larowlan’s picture

@berdir yeah I didn't have php-apc on my 5.4 VM so it caused a fatal, but I think @beejeebus has fixed (although I have it installed now)

Status: Needs review » Needs work

The last submitted patch, 25: 2395143-25.patch, failed testing.

Anonymous’s picture

Status: Needs work » Needs review
StatusFileSize
new14.54 KB

berdir pointed out a silly typo:

diff --git a/core/lib/Drupal/Component/Cache/ApcuFileCache.php b/core/lib/Drupal/Component/Cache/ApcuFileCache.php
index 9489966..18bd40c 100644
--- a/core/lib/Drupal/Component/Cache/ApcuFileCache.php
+++ b/core/lib/Drupal/Component/Cache/ApcuFileCache.php
@@ -67,7 +67,7 @@ public static function getMultiple(array $filepaths) {
       foreach ($cache_results as $cache) {
         if ($cache['mtime'] == filemtime($cache['filepath'])) {
           $file_data[$cache['filepath']] = $cache['data'];
-          static::$cached[$filepath] = $cache;
+          static::$cached[$cache['filepath']] = $cache;
         }
       }
     }

Status: Needs review » Needs work

The last submitted patch, 28: 2395143-28.patch, failed testing.

berdir’s picture

Status: Needs work » Needs review
StatusFileSize
new11.42 KB
new8.64 KB

Interdiff is against my last patch, included the getMultiple() but changed a few things.

I removed the caching from the annotation parser, somehow that doesn't work yet, so trying to get this passing again without that. But fixed the yaml loader parsing, forgot the set() call. Down to 18s now with this, without the annotation parsing.

Apart from that, I also have the idea mentioned in the @todo. that we can cache built definition objects instead of just the parsed yaml in the YamlFileLoader, might be a bit ugly, though ;)

Status: Needs review » Needs work

The last submitted patch, 30: 2395143-30.patch, failed testing.

berdir’s picture

Status: Needs work » Needs review
StatusFileSize
new11.85 KB
new1.64 KB

Fixed the test fails.

the PermissionHandlerTest is a bit annoying, that is using the vfs:// stream wrapper, but realpath() doesn't support stream wrappers (don't ask), so the cache id was always just the prefix and we had fun overlaps.

Status: Needs review » Needs work

The last submitted patch, 32: 2395143-32.patch, failed testing.

berdir’s picture

Hm, those tests worked for me, could be that I was running them in the UI.

Test run time was 15m30, HEAD seems to be at 18m right now, so that seems to save 2m30.. not bad.

babruix queued 32: 2395143-32.patch for re-testing.

babruix’s picture

Confirming, those test passed on my local, too.

The last submitted patch, 32: 2395143-32.patch, failed testing.

berdir’s picture

Status: Needs work » Needs review
StatusFileSize
new12.6 KB
new770 bytes

So the problem is that filemtime is only in seconds. So if a file is written multiple times *and* read *and* that happens in a different context (cli vs web requests) so that set() doesn't work.

The test works from the UI, because there the test also runs in apache and can affect the same apc cache.

Adding a sleep(1) in there at the right time. Given that this is the only place ( I think there was a second that didn't fail this time. We also need to make sure to run this a few times, so that we don't have random fails), I think that's OK. We're calling sleep() in other tests as well. And it's not something that will happen in non-test scenarios I think.

catch’s picture

If mtime is tricky we could look at a hash of file_get_contents() maybe.

chx’s picture

Crazy people would extend the public stream wrapper so that the url_stat of it would return a counter of stream_write calls and use this file system for the test. But we aren't crazy, are we.

berdir’s picture

Well, realname() doesn't support stream wrappers, so no, I don't think we can do something like that :)

catch’s picture

Title: Consider caching read yaml files in Drupal\Core\Config\FileStorage::read » YAML parsing is very slow, cache it with APC in Drupal\Core\Config\FileStorage::read
Priority: Major » Critical

Bumping this to critical since it blocks any meaningful profiling of cold cache performance and it definitely won't be the last thing to look at.

dawehner’s picture

+++ b/core/lib/Drupal/Component/Cache/ApcuFileCache.php
@@ -0,0 +1,131 @@
+  public static function delete($filepath) {

Do we really have to take care of the delete example? For things like a simpletest process you actually don't want to remove the entries between runs. Should we maybe just reset the static cache?

What about other usecases: library.yml and the container YamlFileLoader.php,

berdir’s picture

Yes, we do need delete(), for config, mostly.

We don't delete entries at the end of test runs, we only do it when the file we are working with is actually deleted.

What about other usecases: library.yml and the container YamlFileLoader.php,

Not sure what you are trying to say here :)

dawehner’s picture

We don't delete entries at the end of test runs, we only do it when the file we are working with is actually deleted.

Ah gotcha.

Not sure what you are trying to say here :)

Ups. I was trying to express/ask whether we should cache the .library.yml and .services.ymlfiles as well.
Maybe especially for the container rebuild we could safe a bit, but it turned out, I was just too stupid to fine it. So what about
the library case, see \Drupal\Core\Asset\LibraryDiscoveryParser

berdir’s picture

Yeah, those two examples are already token care of. Sure, can add it to LibraryDiscoveryParser as well. One place that does not yet use it unfortunately is the annotation parser. I tried to do it in an earlier patch and had *very* nasty issues with object references. There are tests that set flags to dynamically alter entity type objects, and then those alters survived rebuilds.

larowlan’s picture

If #1920902: Add a Drupal Yaml wrapper so we can default to PECL Yaml component if it is available was in would we still need this? Still waiting on a testbot.

berdir’s picture

@larowlan: The yaml extension is a much faster, but there's no comparison with an apc_fetch() :) Also, this is not limited to Yaml, we could also use it for the annotation parsing for example, if we can make it work.

We can cache arbitrary data based on files. It might be possible to cache service definition objects for example, instead of just the parsed yaml structure.

catch’s picture

Yes some sites may have PECL but not APC, but some sites may have APC and not PECL. Also even with both, as Berdir points out APC will be faster for warm caches and PECL will be faster for cold. So both issues seem complementary to me.

yched’s picture

We can cache arbitrary data based on files. It might be possible to cache service definition objects for example, instead of just the parsed yaml structure

Then bumping on my comment in #24 :-)

The data being stored, and how it derives from the file, is entirely up to the consuming code. Thus, it seems there should be at least a notion of separate bins or namespaces, since there could be cases where different consumers might want to store different data about the same file ?

berdir’s picture

Maybe, I'm not sure if it will ever be a thing to parse the same file in two different ways? Do you have any specific use cases in mind?

Anyway, I'm not opposed to that, how do you imagine that to work?

$cache = new ApcuFileCache('routing');
$cache->get($filename)?

?

yched’s picture

Well, it feels weird to have an API that implicitly assumes that for any file there's only one "kind of info" that anyone might ever need to extract and cache.

Yep, I guess just adding a string parameter in the construct would do the job ?
Could be named $bin (as in the "regular" Cache API) - but maybe we precisely want to avoid that since it's an entirely different API ?
or $context (as in t()) ?

berdir’s picture

StatusFileSize
new16.04 KB
new12.19 KB

Ok, converted to that pattern.

Status: Needs review » Needs work

The last submitted patch, 53: 2395143-53.patch, failed testing.

znerol’s picture

+++ b/core/modules/config/src/Tests/ConfigImportUITest.php
@@ -55,6 +55,10 @@ function testImport() {
+    // Accessing the configuration import UI might have cached the files in the
+    // APC user cache. Wait a second so we can be sure that they get a different
+    // file modification time.
+    sleep(1);

Maybe just touch() the file with a modification date in the past instead of sleeping here.

berdir’s picture

I can try that. I suspect the failing test is another one of those that fails sometimes, based on the exact time scenario. We have a lot of bored bots right now,I will post this patch a few times in a test issue and see if I can make it fail in other tests as well. Don't want to introduce a dozen random fails :)

Also, I will open a follow-up to look into caching parsed annotations, as that seems like a tougher problem than it looked like at first. I imagine that could produce another pretty neat speed up.

berdir’s picture

Status: Needs work » Needs review
StatusFileSize
new16.91 KB
new881 bytes

Ok, fixed that fail.

I thought about touch(), but that would require me to find out the name of the file.

Also posting 10 patches to #2082049: Ignore me: Berdir vs. Simpletest, to see if there are random fails. Did the same with the previous patch, there it was just the same one as here, plus some random fails in UserAdminTest, but that was a HEAD random fail.

Berdir queued 57: 2395143-57.patch for re-testing.

Anonymous’s picture

please, no commit credit, thanks.

alexpott’s picture

Title: YAML parsing is very slow, cache it with APC in Drupal\Core\Config\FileStorage::read » YAML parsing is very slow, cache it with APCu in Drupal\Core\Config\FileStorage::read
catch’s picture

Issue tags: +Triaged D8 critical

Discussed this on the maintainer call and agreed it should stay critical since it more or less blocks further work on #2289201: [Meta] Make drupal install and run within reasonable php memory limits so we can reset the memory requirements to lower levels.

berdir’s picture

I think the patch is complete, the issue summary needs to be updated.

There are also at least two follow-ups to create: 1. Use this for the annotation parsed, proved to be harder than other cases, due to object references I think. 2: Maybe we can optimize services.yml parsing further by caching built definition objects instead of just the yaml, similar to other cases. Would need some refactoring in that class, though.

I have been using this patch since I posted the first version in my project, and as far as I can see, it resulted in zero problems and makes a *huge* difference for the install time, exponentially increasing with the number of modules.

With my install profile that installs ~110 modules, the install time difference is ~285s vs ~200s, that's a ~30% improvement. That improvement is not free, there is a 6% increase in memory usage (168M -> 179M) and 3.7% for the peak memory usage (195M -> 202M, again no idea what the difference is exactly in this case). IMHO, that is well worth it.

It would also be great if someone else could try to verify my numbers, and also with the web installer, possibly with blackfire.io?

fabianx’s picture

I spoke with catch:

This are my current concerns:

1. It feels wrong to hardcode this to ACPU without any interface, PHPStorage would work as well, I think it is better to abstract the cache storage out itself and have this use a kind of proxy pattern.

+++ b/core/modules/simpletest/src/WebTestBase.php
@@ -1046,8 +1047,9 @@ protected function setContainerParameter($name, $value) {
-    // Clear the YML file cache.
-    YamlFileLoader::reset();
+    // Ensure that the cache is deleted for the yaml file loader.
+    $file_cache = new ApcuFileCache('container_yaml_loader');
+    $file_cache->delete($filename);

Especially this feels problematic, because it hardcodes us to one implementation.

2. I would propose to use something like $settings['yaml_filestorage_cache_class'] that can be used to configure this as settings are available very early before the container is booted / build.

3. yched's point seems valid to me, though I don't know yet how to integrate the prefix properly, will investigate.

I would like to take the remainder of this over - if I may. Berdir, could you please confirm that this would be okay for me to do or if you continue to work on this. Thanks.

berdir’s picture

I'm OK with making this pluggable, although I'm not sure how much a phpstorage backed solution would help, as we still doing a lot of IO, i guess that would only perform in combination with an opcache.

As mentioned before, this is very low-level, as you said, the only way to make it pluggable would be through $settings. Maybe we could just change the hardcoded new $class with a SomeClass::factory(), that would make it easier to add pluggability in a follow-up issue? I'd prefer that this doesn't get delayed with implementing and testing different implementations.

I'm not worried about the WebTestBase change, we have various hardcoded assumptions there.

webchick’s picture

Issue tags: +blocker

Sorry, just doing some house-cleaning. Per #61, this is a blocker to other critical issues getting resolved so adding the tag.

fabianx’s picture

Assigned: Unassigned » fabianx

That is fair.

I will do that:

- Create a SomeClass::factory() then its pseudo-pluggable

The WebTestBase would also use the SomeClass::factory().

I plan on working on this tomorrow and Sunday to get this issue to MVP to get in.
(This is part of my accelerate grant.)

Thanks,

Fabian

fabianx’s picture

Apologize for the delay, I got distracted by the also-very-important cache contexts discussions in the 'warm' cache performance realm.

Here is the new plan that I am in the process of finishing to implement:

- Create FileStorageCacheFactory with ::create()
- FileStorageCacheFactory::create() checks settings (optional), but if not set, auto-detects apc_fetch and uses APCCacheBackend, else falls back to PHP storage CacheBackend
- Have FileStorageCache take a generic 'CacheBackendInterface'
- Move all hardcoded apc_fetch / apc_store to $this->cache->get(), $this->cache->set(), $this->cache->delete()

This solves nicely all concerns and ensures non-APC users are profiting, too. It also makes it more generic.

berdir’s picture

This needs to be in Drupal\Component, the cache is Drupal\Core.

fabianx’s picture

Status: Needs review » Needs work
StatusFileSize
new21.65 KB
new19.22 KB

Uploading the new patch:

- Introduced generic FileCache class doing static caching, added protected cacheFetch, cacheStore, cacheDelete functions
- Introduced FileCacheFactory that is informed by the Kernel, which class to use. That solves the Core vs. Component dilemma nicely.
- Renamed all new ApcuFileCache(..) -> FileCacheFactory::get(...)
- Added PhpStorageFileCache and had ApcuFileCache extend FileCache instead.

Ensured that only if APCu is enabled, the ApcuFileCache is used. For normal mixed APC, user fragmentation is a big problem and will lead to worse performance.

APCu added the apcu_enabled() function for this case and we should be using it throughout core for auto detection instead of checking apc_fetch.

I think whats in DrupalKernel right now belongs in Settings, but besides that I am pretty happy with the implementation as we don't leave users not having PHP 5.5 / APCu / opcache out.

berdir’s picture

Will review, as said above, I'm not convinced that adding the phpstorage backed implementation here is a good idea. It will need profiling. We're talking about *a lot* of files, are you sure that phpstorage is really worth for caching the .info files, and so on? I think cache writes to phpstorage are expensive, likely more expensive than parsing yml files.

And if you don't have apc, then loading from phpstorage is also not that fast.

I'm happy to have an implementation that shows that it works, but I'm still -1 to adding that here and not in a follow-up, where we can profile that it is actually worth doing.

alexpott’s picture

fabianx’s picture

Indeed, I currently ponder on the static cache as it makes memory way worse and that affects the installer (that I am trying to fix) big time.

berdir’s picture

I know, but without that, it won't help much on e.g. drush installation, unless you enable apc on cli (which would then just be a glorified static cache as well, and it would use memory too :))

why not let the testbot run this, btw?

fabianx’s picture

Title: YAML parsing is very slow, cache it with APCu in Drupal\Core\Config\FileStorage::read » YAML parsing is very slow, cache it with APCu / in Drupal\Core\Config\FileStorage::read
Status: Needs work » Needs review
StatusFileSize
new27.26 KB

I played around with all use cases the FileCache will ever have to be used for and used a configuration array similar to PhpStorage now.

I shortly played with multiple cache backends, but now decided for a ChainedFileCacheBackend instead.

This means you can do e.g.:

$configuration = array(
  'class' => 'Drupal\Component\PhpStorage\FileStorage',
  'directory' => DRUPAL_ROOT . '/cache/php',
);

$settings['php_storage']['file_cache_ro'] = $configuration;

$settings['file_cache']['default'] = array(
  'class' => '\Drupal\Component\Cache\FileCache',
  'cache_backend_class' => '\Drupal\Component\Cache\ApcuFileCacheBackend',
);

$settings['file_cache']['extension_discovery'] = array(
  'class' => '\Drupal\Component\Cache\FileCache',
  'cache_backend_class' => array(
    '\Drupal\Component\Cache\ChainedFileCacheBackend' => array(
      '\Drupal\Component\Cache\ApcuFileCacheBackend',
      '\Drupal\Core\Cache\PhpStorageFileCacheBackend' => array(
        'name' => 'file_cache_ro',
      ),
    ),
  ),
);

where the file_cache_ro would in this case be a pre-generated cache item.

I played also around with a FileCacheCollector class, but that is a follow-up as the locking and end-of-request writing is more tricky than one thinks (and probably not even needed).

But I found it important to allow easily in the future to do:

$this->fileCache = FileCacheFactory::get('extension_discovery', [ 'class' => '\Drupal\Component\Cache\FileCacheCollector' ]);

which makes the module info be just one cache fetch.

Neither PhpStorageFileCacheBackend nor ChainedFileCacheBackend need to be discussed here, but they have been included to show that indeed all use-cases are covered.

This can now also nicely be used to generate caches and pre-seed the system cache/ for cold caches.

Pre-seeded cache/ saves 7-16 s (of 75 s) during installation for me.

Status: Needs review » Needs work

The last submitted patch, 74: yaml_parsing_is_very-2395143-74.patch, failed testing.

fabianx’s picture

Status: Needs work » Needs review
StatusFileSize
new27.29 KB

Silly missing $configuration = [];

Status: Needs review » Needs work

The last submitted patch, 76: yaml_parsing_is_very-2395143-76.patch, failed testing.

fabianx’s picture

StatusFileSize
new27.39 KB
fabianx’s picture

Status: Needs work » Needs review

Status: Needs review » Needs work

The last submitted patch, 78: yaml_parsing_is_very-2395143-78.patch, failed testing.

fabianx’s picture

Status: Needs work » Needs review
StatusFileSize
new27.39 KB
fabianx’s picture

14.2 minutes for testbot, memory requirements have been good in my tests and with the last patch the InstallerKernel _could_ for non-testbot cases exchange FileCache with NullFilleCache, which could still cache in APC, but not in static memory.

Task list remaining here is:

- Figure out how to detect APC on testbot, but APCu on normal machines (user apc cache leads to very bad fragmentation in APC (not APCu) itself due to the double nature of the cache). For now can leave the general used version of function_exists('apc_fetch'). [Followup: @todo-insert-link]

- Review the patch / general architecture and then strip out all non-APCu for now, create follow-up for PhpStorage, which I think is still worth it from my benchmarks.
[Followup: @todo-insert-link]

- Create follow-up for static cache building [Follow-up: @todo-insert-link]

- Create follow-up for caching annotations in FileCache, too [Follow-up: @todo-insert-link]

- Create follow-up for possibly caching loaded classes in a FileCacheCollector - that gets simple to do now.

This issue now needs reviews and once we are happy with the big picture I will get the other parts stripped and moved to the follow-ups (to be created).

dawehner’s picture

Just some drive by review ...

  1. +++ b/core/lib/Drupal/Component/Cache/ChainedFileCacheBackend.php
    @@ -0,0 +1,83 @@
    +   ¶
    +      // Write back the fresh data to the seen cache backends.
    +      foreach ($seen_backends as $seen_cache) {
    +        foreach ($cached as $cid => $data) {
    +          $seen_cache->store($cid, $data);
    +        }
    +      }
    

    Did we considered to write them later, using terminate for example? This would also make the fetch() code easier to grok

  2. +++ b/core/lib/Drupal/Component/Cache/FileCache.php
    @@ -0,0 +1,178 @@
    +    $cids = [];
    ...
    +        $cids[] = $cid;
    

    Nitpick: What about name them $remaining_cids?

  3. +++ b/core/lib/Drupal/Component/Cache/FileCache.php
    @@ -0,0 +1,178 @@
    +      $realpath = realpath($filepath);
    +      // If the file exists but realpath returns nothing, it is using a stream
    +      // wrapper, those are not supported.
    +      if (empty($realpath)) {
    +        continue;
    +      }
    

    Mh, this will be potential problematic in conjunction with #2304461: KernelTestBaseTNG™. In other words, aren't the filepaths coming in really limited in its possible values? (trying to be the devil advocate here)

  4. +++ b/core/lib/Drupal/Component/Cache/FileCacheBackendInterface.php
    @@ -0,0 +1,43 @@
    +
    +  /**
    +   * Stores data into a cache backend.
    +   *
    +   * @param string $cid
    +   *   The cache ID to store data to.
    +   * @param array $data
    +   *   The data to store.
    +   */
    +  public function store($cid, $data);
    +
    +  /**
    +   * Delete data from a cache backend.
    +   *
    +   * @param string $cid
    +   *   The cache ID to delete.
    ...
    +  public function delete($cid);
    

    Should we support a fluent interface? I can't think of many usecases where you want use multiple in a row.

fabianx’s picture

1. Possible, but ChainedFileCacheBackend is only there as an example for now and writing at end of request has its own problems, but on a cache miss we would have written anyway (without the chained backend would have been a cache miss), so this should be fine for a simple implementation.

2. Will rename, sounds good.

3. Follow-up, I think we can resolve stream-wrappers to files, but currently files coming in are resolved paths.

4. This is the APCu interface, which works well enough for all our use cases - its also kinda what ircmaxwell proposed in his PSR-6 'cache interfaces need to be simple' rant.

larowlan’s picture

Status: Needs review » Needs work
  1. +++ b/core/lib/Drupal/Component/Cache/ApcuFileCacheBackend.php
    @@ -0,0 +1,35 @@
    +   * @{inheritdoc}
    ...
    +   * @{inheritdoc}
    ...
    +   * @{inheritdoc}
    
    +++ b/core/lib/Drupal/Component/Cache/ChainedFileCacheBackend.php
    @@ -0,0 +1,83 @@
    +   * @{inheritdoc}
    ...
    +   * @{inheritdoc}
    ...
    +   * @{inheritdoc}
    

    @ should be inside the {}

  2. +++ b/core/lib/Drupal/Component/Cache/ApcuFileCacheBackend.php
    @@ -0,0 +1,35 @@
    +  }
    

    nitpick: needs blank line before the final }

  3. +++ b/core/lib/Drupal/Component/Cache/ChainedFileCacheBackend.php
    @@ -0,0 +1,83 @@
    + * Allows to cache data based on file modification dates for various cache backends.
    

    >80

  4. +++ b/core/lib/Drupal/Component/Cache/ChainedFileCacheBackend.php
    @@ -0,0 +1,83 @@
    +    * Constructs a FileCache object.
    

    wrong class name

  5. +++ b/core/lib/Drupal/Component/Cache/ChainedFileCacheBackend.php
    @@ -0,0 +1,83 @@
    +    * @param (optional) array $configuration
    
    +++ b/core/lib/Drupal/Component/Cache/FileCache.php
    @@ -0,0 +1,178 @@
    +   * @param (optional) NULL|string|array $cache_backend_class
    

    optional goes on second line
    https://www.drupal.org/node/1354#param

  6. +++ b/core/lib/Drupal/Component/Cache/ChainedFileCacheBackend.php
    @@ -0,0 +1,83 @@
    +    // earlier in the chain is less current than a cache later.
    

    I think 'later cache' is better than 'cache later'

  7. +++ b/core/lib/Drupal/Component/Cache/ChainedFileCacheBackend.php
    @@ -0,0 +1,83 @@
    +   ¶
    

    whitespace

  8. +++ b/core/lib/Drupal/Component/Cache/FileCache.php
    @@ -0,0 +1,178 @@
    +   * @var FileCacheBackendInterface
    

    needs namespace

  9. +++ b/core/lib/Drupal/Component/Cache/FileCache.php
    @@ -0,0 +1,178 @@
    +   *   be a array('class' => $configuration) data structure or a simple string.
    

    an array

  10. +++ b/core/lib/Drupal/Component/Cache/FileCache.php
    @@ -0,0 +1,178 @@
    +        // @todo Could get the realpath out of the cid as well.
    

    Are we doing this here? We could make $cid keyed by $cid with values of $realpath for simplicity

  11. +++ b/core/lib/Drupal/Component/Cache/FileCache.php
    @@ -0,0 +1,178 @@
    +          static::$cached[$cid] = $cached;
    ...
    +    static::$cached[$cid] = $cached;
    ...
    +    unset(static::$cached[$cid]);
    

    any reason why we're mixing statics with an instance? can't we do $this->cached?

  12. +++ b/core/lib/Drupal/Component/Cache/FileCacheBackendInterface.php
    @@ -0,0 +1,43 @@
    +}
    

    needs a blank line before the }

  13. +++ b/core/lib/Drupal/Component/Cache/FileCacheFactory.php
    @@ -0,0 +1,65 @@
    +  static $configuration;
    

    missing public/protected scope modifier, given there is a static setConfiguration, this would be protected yeah?

  14. +++ b/core/lib/Drupal/Core/Cache/PhpStorageFileCacheBackend.php
    @@ -0,0 +1,71 @@
    +   * @param (optional) array $configuration
    

    optional should be on newline here too

  15. +++ b/core/lib/Drupal/Core/Cache/PhpStorageFileCacheBackend.php
    @@ -0,0 +1,71 @@
    +   * @{inheritdoc}
    ...
    +   * @{inheritdoc}
    ...
    +   * @{inheritdoc}
    

    these @ should be inside the {} too

  16. +++ b/core/lib/Drupal/Core/DependencyInjection/YamlFileLoader.php
    @@ -34,30 +35,22 @@ class YamlFileLoader
    +      $this->fileCache = FileCacheFactory::get('container_yaml_loader');
    

    we can't non-statically assign a property in a static method? shouldn't this fail?

  17. +++ b/core/lib/Drupal/Core/DependencyInjection/YamlFileLoader.php
    @@ -67,10 +60,13 @@ public static function reset()
    +        // @todo Refactor this to cache parsed definition objects.
    

    do we have an issue for this?

dawehner’s picture

any reason why we're mixing statics with an instance? can't we do $this->cached?

Well, otherwise we can't cache entries during multiple container rebuilds, like in a CLI installation / test run.

larowlan’s picture

Well, otherwise we can't cache entries during multiple container rebuilds, like in a CLI installation / test run.

thanks, makes sense

larowlan’s picture

Well, otherwise we can't cache entries during multiple container rebuilds, like in a CLI installation / test run.

We should add a comment to that effect

wim leers’s picture

14.2 minutes for testbot […]

WOW!

fabianx’s picture

Status: Needs work » Needs review
StatusFileSize
new27.93 KB
new12.61 KB

#85: Thanks for the review. All fixed.

- Still need to create issues, but d.org was down yesterday when I wanted to do that.

The interdiff should address all points.

larowlan’s picture

+++ b/core/lib/Drupal/Core/DrupalKernel.php
@@ -254,11 +254,16 @@ public static function createFromRequest(Request $request, $class_loader, $envir
+      // @todo Enable for non-test bot.

Is this change unintended?

fabianx’s picture

#91 Yes and no, this is for the follow-ups to be created, but yes that @todo chunk could be ignored for now.

dashaforbes’s picture

StatusFileSize
new998 bytes
new27.77 KB

Removed commented out code and added issue for todo.
#2447753: APCu detection erroneously succeeds if apc.enabled is "0"

larowlan’s picture

+++ b/core/lib/Drupal/Core/DrupalKernel.php
@@ -254,16 +254,10 @@ public static function createFromRequest(Request $request, $class_loader, $envir
+      // @todo Use extension_loaded('apcu') for non-testbot https://www.drupal.org/node/2447753.

This needs to be split over two lines (sorry)

dashaforbes’s picture

StatusFileSize
new787 bytes
new27.79 KB
larowlan’s picture

Assigned: fabianx » catch
Status: Needs review » Reviewed & tested by the community
Anonymous’s picture

+1 to commit.

neclimdul’s picture

Status: Reviewed & tested by the community » Needs work

From the summary, this seems like a very big solution for a very small problem. This may just be a problem with the summary though.

Does this have any impact on normal site operations?

Can we get an update of the summary with the performance impact of the patch? Reading the comments it wasn't clear to me though comments on an earlier patch did mention a memory time tradeoffs. Is that still relevant?

The file cache code is a neat idea though. Seems like introducing a system like this should have its own issue and own tests. Especially in Beta.

+++ b/core/modules/config/src/Tests/ConfigImportUITest.php
@@ -55,6 +55,10 @@ function testImport() {
+    // Accessing the configuration import UI might have cached the files in the
+    // APC user cache. Wait a second so we can be sure that they get a different
+    // file modification time.
+    sleep(1);

+++ b/core/modules/field/src/Tests/FieldImportDeleteUninstallUiTest.php
@@ -96,6 +96,11 @@ public function testImportDeleteUninstall() {
+    // Accessing the configuration import UI might have cached the files in the
+    // file cache. Wait a second so we can be sure that they get a different
+    // file modification time.
+    sleep(2);
+

Sleeping in unit tests? *sigh* Is there a better way like maybe forcing an empty cache backend in the test?

fabianx’s picture

Assigned: catch » fabianx
Issue tags: +Needs tests

#99:

1. Yes, will update the IS.

2. Yes, benchmarks are very favorable for cold cache performance, e.g. a 18% performance improvement for the installation with pre-seeded cache.

- Testbot is consistently near 14 min 20 secs. (from before 15 40 or so).

3. Yes, there is an impact when clearing caches, especially for everything container build related or discovery related.

4. There is a memory trade off for the installer, but there is a follow-up that I need to update (its titled wrong) to use a FileCacheNoStatic for the installer to help with memory requirements.

5. Added 'Needs tests' label, I assume unit tests are enough.

6. This is likely no longer needed, depending on if Unit tests go via the DrupalKernel or not. With the FileCache being static and the conifguration being injected by the Kernel - if its not initialized it will use just a static cache, so should probably no longer be triggered in unit tests.

berdir’s picture

I'm not sure where you see anything about unit test, that test explicitly says that it is a UI test? I don't see another solution there, the test explicitly creates and changes the same file in the same second *and* does so in CLI and then uses it through apache, those two apcu caches are not and can not be in sync. That's a test-only scenario, we have a few similar existing cases elsewhere. There's nothing that can be reset, because the test runner has no way to access apcu in apache.

And yes, this is a big performance improvement for normal sites too, during router rebuilds and cache clears for example. So it will be faster for developers and your live site is faster available again when you do things like installing modules/running updates/clearing caches in general.

yesct’s picture

do we need a change record also?

berdir’s picture

@YesCT: I don't think so, nobody will have to change their code, it's a new feature they could use, but that is relevant to only very few people/modules, those that do their own custom parsing/processing based on data in files.

@Fabianx: Would be great to get that issue summary update to get this in, I can try to help and work on specific parts.

fabianx’s picture

Title: YAML parsing is very slow, cache it with APCu / in Drupal\Core\Config\FileStorage::read » YAML parsing is very slow, cache it with FileCache
Issue summary: View changes

.

fabianx’s picture

Issue summary: View changes
Status: Needs work » Needs review
Issue tags: -Needs issue summary update, -Needs tests
StatusFileSize
new45.76 KB
new21.26 KB

Updated the issue summary and here is the test coverage.

- 100% for anything not ignored.

fabianx’s picture

Status: Needs review » Needs work

The last submitted patch, 105: yaml_parsing_is_very-2395143-105.patch, failed testing.

fabianx’s picture

Status: Needs work » Needs review
StatusFileSize
new45.17 KB
new20.66 KB

Please disregard 105, this is a re-post with fixed phpunit.xml.dist:

Updated the issue summary and here is the test coverage.

- 100% for anything not ignored.

Status: Needs review » Needs work

The last submitted patch, 108: yaml_parsing_is_very-2395143-107.patch, failed testing.

fabianx’s picture

+++ b/core/tests/Drupal/Tests/Component/Cache/FileCacheTest.php
@@ -0,0 +1,152 @@
+    $result = $this->fileCache->get($cid);
+    $this->assertNull($result);

This needs to be $filename.

Will fix ...

fabianx’s picture

StatusFileSize
new45.19 KB
new1.81 KB

- Fixed test failures by renaming from 42.php to 42.php.txt for the PhpStorage Mock Fixture.
- Fixed $filename

fabianx’s picture

Status: Needs work » Needs review
berdir’s picture

Didn't really review yet, need to look at the complete patch again, just noticed this:

+++ b/core/lib/Drupal/Component/Cache/FileCache.php
@@ -110,7 +110,7 @@ public function getMultiple(array $filepaths) {
       // wrapper, those are not supported.
       if (empty($realpath)) {
-        continue;
+        continue; // @codeCoverageIgnore
       }
 

We've never added comments like this so far, and it violates coding standards...

fabianx’s picture

Removed all @codeCoverageIgnore annotations for now. This is not the place to discuss this.

EDIT: Created #2462037: [policy, no patch] Discuss usage of @codeCoverageIgnore to discuss it.

Status: Needs review » Needs work

The last submitted patch, 114: yaml_parsing_is_very-2395143-114.patch, failed testing.

fabianx’s picture

StatusFileSize
new45.02 KB

Re-rolled

fabianx’s picture

Status: Needs work » Needs review
yesct’s picture

Issue summary: View changes

noting what this is blocking in the issue summary.

berdir’s picture

Looked through the patch a first time since you started refactoring it, looks pretty good.

What's confusing a bit is the naming (and I'm fully aware that a lot of that is the same since my patch, but a fresh look at something helps to rethink sometimes).. the namespace is Cache, but everything inside is about FileCache. On the other hand, we have the FileCacheBackendInterface, but the implementations of that really have nothing to do with File at all.

Suggestion: Rename Component\Cache to Component\FileCache. Then we avoid possible confusions by having two Cache components, and FileCache is a coherent thing that other projects might be in interested in as well, if we ever manage to split our components.

I think that also solves the naming problem with FileCacheBackendInterface, it's just the cache backend of the FileCache component then, that's fine.

Thoughts?

Will do a code review ASAP, noticed a few things...

dawehner’s picture

+1 for thew new naming.

fabianx’s picture

+1 to the new naming as well, that also had been my thoughts, but I did not wanted to re-factor the original more than needed.

Thanks!

neclimdul’s picture

Status: Needs review » Needs work

I really appreciate the effort you guys are putting into this. This is a really hard problem, some really smart code and I don't want the rest of this review to sound like I'm discounting any of that effort because I'm not.

The my first concern is my gut is telling me we're really not addressing the root problem here. Or not clearly explaining what the room problem is because I don't understand the specific problem or problematic code we're addressing.

  • What sort of sites are we targeting?
  • What performance problems are we trying to address?

Related to that, there is a thread through this issue that this issue is related to decreasing memory. We've marked it as blocking the decrease memory issue. But matching the benchmarks done this far locally drush si generally increasing the max memory usage several k. I dropped some hacky max_mem calls into the installer and index.php and ran through a couple times and the values where a lot more erratic but show a similar trend. Enough that if that's a metric for this issue we should clear it up.

Additionally, the summary talks about "standard install we call FileStorage::read 1772 times but we only have 1244 yml files in all of core". The DefaultPluginManager already caches all our parsed plugins so either those calls where purposeful cache rebuilds or there is a bug in the plugin manager caching code.

One of my tests I did was lopping over drush si. In that test, the wall time for each install decreased and average of 8 seconds with this patch. That's actually really impressive and was actually decreased by 20%. Poking around though just re-using the symfony parser instead of instantiating a new one every time in Yaml::decode() I shaved 3 seconds off the same loop or 6.5% off install times. Pecl Yaml without this patch decreases the same test 15 seconds or almost 38%.

Aside from that, the architecture makes me uncomfortable. We're adding a lot of complexity with an entirely new caching system.

  • Yaml plugin discovery from here on will first hit the plugin cache and then when that misses the FileCache. This means that, by design, we're bypassing cache clears and that means we are very likely to see emergent behaviors. Infact, I don't see any way that FileCache can even be flushed.
  • We are also opting everyone in without any method for opting out. If you use YAML discovery inside or outside plugins, we're going to cache it in file cache. At the very least we should be addressing this within our code where this is a problem, not in the discovery library.
user  system  elapsed
Patched:
19.16 1.15  00:00:30  67%CPU
19.88 1.17  00:00:32  66%CPU
18.67 1.04  00:00:29  67%CPU
19.66 1.1 00:00:31  66%CPU
Average:
19.3425 1.115 00:00:31

Unpatched
26.77 1.15  00:00:37  74%CPU
27.39 1.25  00:00:39  73%CPU
27.97 1.06  00:00:40  72%CPU
27.31 1.06  00:00:39  73%CPU
Average:
27.36 1.13  00:00:39

Difference:
8.0175  0.015 00:00:08
berdir’s picture

Thanks for writing down your concern, that helps me to write a write a useful issue summary. And no offense taken :)

Here are some notes to start addressing your questions and concerns.

* The main problem that this is addressing is performance on cold caches and cache rebuilds.

* There is nothing wrong with the caching in DefaultPluginManager, but we invalidate those caches quite a lot. But 99% of the files that we are then parsing again haven't actually changed.

* The most obvious place is the installer. After every module, we invalidate all plugin manager caches and rebuild the container, because almost all modules add a few plugins, services, routes, links, .... And don't underestimate the impact of a faster installer. I've been working on a large D8 install profile for a year now with multiple other developers. We usually re-install multiple times a day, sometimes multiple times in a row, to get something working. That's a lot of wasted time, because you really can't do much while an installer is running. With this patch, the installation takes 160s currently. Without it, it's 230s.

* However, it also affects running sites. Quite a few administrative operations require router rebuilds for example, like saving views. It's pretty common on big D7 sites that saving a view results blocks the site for a minute.

* It also helps to mitigate existing performance issues. I've noticed in new relic that system_rebuild_module_data() (reparsing all .info.yml files) runs on user/people, taking 5% of the execution time there. When I did some profiling on a vanilla D8 site, it was actually 50%. That also has other factors, like much slower queries due to a few 10k users, but still.

* There is no way to flush caches because it's not needed. The FileCache looks at the file modification time, when that changes then we automatically invalidate our cache. I've included this patch in our installation profile since december and we're actively developing, changing routes, services and info files and we never had a single problem with it.

As you can see in the created follow-ups, there are multiple ideas for further improving the caching and using it in more places, like:
* Ship with pre-generated caches, so that not even the initial parsing is required for all the yaml files that core provides (that will never change unless you update core)
* Use it for annotation parsing
* Improve YamlFileLoader so that we can cache built service definition objects and not just the raw data.

fabianx’s picture

Thanks berdir, even now static pre-generated caches can already be used to speed up installation even more (provided you have already installed once).

This works like this (could also be stored in /tmp or somewhere) using a persistent backing store, but APCu as the active backend.

$configuration = array(
  'class' => 'Drupal\Component\PhpStorage\FileStorage',
  'directory' => DRUPAL_ROOT . '/cache/php',
);

$settings['php_storage']['file_cache'] = $configuration; 

$settings['file_cache']['default'] = array(
  'class' => '\Drupal\Component\Cache\FileCache',
  'cache_backend_class' => array(
    '\Drupal\Component\Cache\ChainedFileCacheBackend' => array(
      '\Drupal\Component\Cache\ApcuFileCacheBackend',
      '\Drupal\Core\Cache\PhpStorageFileCacheBackend' => array(
        'name' => 'file_cache',
      ),
    ),
  ),
);

Note, I also used this already successfully with read-only storage, e.g.:

$configuration = array(
  'class' => 'Drupal\Component\PhpStorage\ReadOnlyFileStorage',
  'directory' => DRUPAL_ROOT . '/cache/php',
);

$settings['php_storage']['file_cache_ro'] = $configuration; 

If its outdated, it will just not be used anymore - thanks to the mtime :).

--

I also created two missing follow-ups:

neclimdul’s picture

Thanks,

But 99% of the files that we are then parsing again haven't actually changed.

That sums up almost all cache rebuilds. They are expensive by definition.

And don't underestimate the impact of a faster installer.

As a full time developer and someone actively reinstalling d8 multiple times a day I'm definitely not but I have clear priority to site performance. Also, developers can easily use PECL on the other patch if that's our audience and it has a bigger impact.

Quite a few administrative operations require router rebuilds for example, like saving views. It's pretty common on big D7 sites that saving a view results blocks the site for a minute.

Can we quantify the effect on D8? I was under the impression cache_rebuild was designed to limit this.

system_rebuild_module_data() (reparsing all .info.yml files) runs on user/people

That sounds like its own problem. 1) why is it running there? 2) Even with this isn't it going to be overly expensive? If its taking 50% of the time and best case we knock of 20% of that isn't that still unacceptable?

I've included this patch in our installation profile since december and we're actively developing, changing routes, services and info files and we never had a single problem with it.

That's definitely re-assuring!

fabianx’s picture

Status: Needs work » Needs review
StatusFileSize
new45.63 KB
new23.45 KB

Here are the patches with the rename.

FileCache being its own namespace under Component feels way better and actually I found two bugs, where I had made the confusion myself.

klausi’s picture

Just for completeness here:

However, it also affects running sites. Quite a few administrative operations require router rebuilds for example, like saving views. It's pretty common on big D7 sites that saving a view results blocks the site for a minute.

This is not true anymore since we are building new route information in the kernel terminate event, which is AFTER the response was sent to the user. So D8 sites do NOT block on router rebuilds anymore. This was implemented in #356399: Optimize the route rebuilding process to rebuild on write.

klausi’s picture

Berdir pointed out in IRC that the Kernel terminate event does not actually work on Apache mod_php that way. The response is blocked until PHP is done with everything and exits.

However, if you use FastCGI then kernel termination works as advertised after the response has been sent to the user. The Symfony Response class invokes fastcgi_finish_request() for us, which closes the connection. Yay!

Lesson learned: don't use Apache with mod_php.

Sorry for the slightly unrelated distraction, carry on.

neclimdul’s picture

Ok, thats good to know. Looking at RouterRebuildSubscriber and RouteBuilder it looks like that is only true for the modifying request not the entire site though. Is that correct?

dawehner’s picture

Ok, thats good to know. Looking at RouterRebuildSubscriber and RouteBuilder it looks like that is only true for the modifying request not the entire site though. Is that correct?

Exactly, that was the main idea of the optimize route rebuilding issue. At least for the case of views we even plan to rebuild way less often than before, see #941970: Only set router rebuild needed when something related to routing actually changes

berdir’s picture

Status: Needs review » Needs work

Ok, I've been testing this a bit, the current patch is more or less as fast as my older versions.

Based on xhprof, #2447803: Use FileCache for caching discovered annotations should save another 12% (20s) or so.

Also, consistent with earlier numbers, without xhprof/xdebug, the installer is twice as fast, but this is still a big improvement 85s instead of 120).

  1. +++ b/core/lib/Drupal/Component/FileCache/ApcuFileCacheBackend.php
    @@ -0,0 +1,36 @@
    +/**
    + * Allows to cache data based on file modification dates in APCu.
    + */
    +class ApcuFileCacheBackend implements FileCacheBackendInterface {
    

    This docblock is confusing, because there is nothing about files or modification dates in this class.

    Maybe something as sinmple as "APCu backend for the file cache.", similar for the others?

  2. +++ b/core/lib/Drupal/Component/FileCache/FileCache.php
    @@ -0,0 +1,181 @@
    +
    +/**
    + * Allows to cache data based on file modification dates.
    + */
    +class FileCache {
    +
    

    If this is pluggable then we should have an interface for it.

  3. +++ b/core/lib/Drupal/Component/FileCache/FileCache.php
    @@ -0,0 +1,181 @@
    +   * The identifier of this cache.
    +   *
    +   * @var string
    +   */
    +  protected $identifier;
    

    I don't remember if $identifier was my name for it, but it is confusing. collection or prefix might be better?

  4. +++ b/core/lib/Drupal/Component/FileCache/FileCache.php
    @@ -0,0 +1,181 @@
    +   * @param NULL|string|array $cache_backend_class
    +   *   (optional) The class that should be used as cache backend. This can be
    +   *   an array('class' => $configuration) data structure or a simple string.
    ...
    +      // Support providing of configuration as an array.
    +      if (is_array($cache_backend_class)) {
    +        $class = array_keys($cache_backend_class)[0];
    +        $configuration = array_shift($cache_backend_class);
    +      }
    

    Do we really need to allow so many variations? Why not enforce the array structure in the factory? And if you make it a class, why not set 'class'/'configuration' keys? Or actually, just pass in two arguments to the constructor?

  5. +++ b/core/lib/Drupal/Core/DependencyInjection/YamlFileLoader.php
    @@ -67,10 +60,13 @@ public static function reset()
    +        // @todo Refactor this to cache parsed definition objects.
    

    Let's add the issue reference here.

  6. +++ b/core/lib/Drupal/Core/DrupalKernel.php
    @@ -245,6 +246,23 @@ public static function createFromRequest(Request $request, $class_loader, $envir
    +    // Initialize the FileCacheFactory component.
    +    $configuration = Settings::get('file_cache');
    +
    +    // Provide a default configuration, if not set.
    +    if (!isset($configuration['default'])) {
    +      $configuration['default'] = [
    +        'class' => '\Drupal\Component\FileCache\FileCache',
    +        'cache_backend_class' => NULL,
    +      ];
    +      //  @todo Use extension_loaded('apcu') for non-testbot
    +      //    https://www.drupal.org/node/2447753.
    +      if (function_exists('apc_fetch')) {
    +        $configuration['default']['cache_backend_class'] = '\Drupal\Component\FileCache\ApcuFileCacheBackend';
    +      }
    +    }
    +    FileCacheFactory::setConfiguration($configuration);
    

    Why not set this default up in FileCacheFactory when configuration is empty?

  7. +++ b/core/lib/Drupal/Core/DrupalKernel.php
    @@ -396,6 +414,7 @@ public function boot() {
         }
    +
         // Initialize the container.
    

    unrelated change.

  8. +++ b/core/lib/Drupal/Core/Extension/ExtensionDiscovery.php
    index 0000000..5424c61
    --- /dev/null
    
    --- /dev/null
    +++ b/core/lib/Drupal/Core/FileCache/PhpStorageFileCacheBackend.php
    

    Looks like this wasn't moved.

  9. +++ b/core/tests/Drupal/Tests/Component/FileCache/FileCacheTest.php
    index 0000000..4099407
    --- /dev/null
    
    --- /dev/null
    +++ b/core/tests/Drupal/Tests/Component/FileCache/Fixtures/llama-23.txt
    

    Wim is going to be happy ;)

  10. +++ b/core/tests/Drupal/Tests/Component/FileCache/StaticFileCacheBackend.php
    @@ -0,0 +1,75 @@
    +/**
    + * Allows to cache data based on file modification dates in a static cache.
    + */
    +class StaticFileCacheBackend implements FileCacheBackendInterface {
    +
    

    Another comment, I assume this implementation is used for tests? Should we name it so then?

catch’s picture

@klausi #127/#128 while the router rebuild will block the request that initiates it with apache/mod_php it will still not block any other requests to the site - whereas in Drupal 7 it does (end of request vs. flush and rebuild with a lock). So either way we should be better off with router rebuilds in terms of scaling/lock stampedes, but it will be worse for people triggering them with apache/mod_php (although for single user sites, the person triggering also gets the rebuild anyway next request, so that's less of an issue).

@neclimdul yes we're adding a new cache layer so that we can improve cold cache performance. We'd not be doing this if we'd chosen JSON as our file format, but given we're using YAML for all the things (and it'll be more in 9.x), I'm not sure this can be avoided. The PECL extension is good for people who understand the problem, but not for everyone else.

fabianx’s picture

#132: That is an indirect nice idea :). We could add a JSON file storage, too, which might be faster than PHP Storage.

wim leers’s picture

Wim is going to be happy ;)

:D I indeed approve! MOAR LLAMAS!

#132/#133: hah, PHP to read JSON to make it faster for PHP to read YAML, all to read values PHP should use. Wow.

catch’s picture

I'd had the indirect idea in #133 directly, although I worry a bit about JSON vs. PHP vs. YAML serialization. That might be a non-issue in practice.

The advantage of JSON over PHP would be that since it's not a PHP file, we'd not have any of the memory overhead that's inherent in loading PHP - especially since these files will (usually?) be opcode cache misses.

webchick’s picture

Issue tags: +D8 Accelerate Dev Days

Tagging.

amateescu’s picture

Status: Needs work » Needs review
StatusFileSize
new45.94 KB
new13.65 KB
new1.69 KB

Rerolled and fixed everything from @Berdir's review in #131. The only thing to mention is that PhpStorageFileCacheBackend was not moved because it uses PhpStorageFactory which is in Drupal\Core.

I also had the idea of doing a single stat() call instead of one file_exists() and two filemtime(). Even if file_exists() populates the stat cache with everything that stat() would, we still potentially save a couple thousand stack calls this way.

Status: Needs review » Needs work

The last submitted patch, 137: 2395143-137.patch, failed testing.

amateescu’s picture

Status: Needs work » Needs review
StatusFileSize
new45.96 KB
new1.69 KB

Ok, that doesn't work :(

Status: Needs review » Needs work

The last submitted patch, 139: 2395143-139.patch, failed testing.

fgm’s picture

The errors happens in DrupalComponentTest::testNoCoreInComponent() because FileCacheFactory uses Drupal\Core\Site\Settings, but Components may not use Core.

amateescu’s picture

Status: Needs work » Needs review
StatusFileSize
new46.61 KB
new2.57 KB

Right, that's why @Fabianx did the initialization in DrupalKernel in the first place. Just talked to @Berdir about it and we have no other choice than to revert that change.

Status: Needs review » Needs work

The last submitted patch, 142: 2395143-142.patch, failed testing.

amateescu’s picture

Status: Needs work » Needs review
StatusFileSize
new46.58 KB
new418 bytes

Silly me :/

berdir’s picture

+++ b/core/lib/Drupal/Core/DependencyInjection/YamlFileLoader.php
@@ -61,7 +61,8 @@ public function __construct(ContainerBuilder $container)
-        // @todo Refactor this to cache parsed definition objects.
+        // @todo Refactor this to cache parsed definition objects in
+        // https://www.drupal.org/node/2464053

url should be indented by two spaces, to make it clear that it belongs to the @todo.

Everything else in the interdiff's looks fine to me.

amateescu’s picture

StatusFileSize
new46.58 KB
new709 bytes

Ah, didn't know what was up with those extra two spaces.

fgm’s picture

Status: Needs review » Needs work

Just took the patch through xdebug for a few passes, and the logic seems sound. However, I think there might be a way to improve it on three counts:

  • initializing the service at the end of DrupalKernel::createFromRequest() as is currently done means the FileCache is instantiated on all code paths, even cached pages, where it will not be used. Initializing it after the reverse proxy step, if possible, could reduce the work on page cache hits
  • when scanning files, the code loops on realpath(), notably in FileCache::getMultiple(), and FileCache::set() (from the findAll() final loop), and that function can be costly in terms of IO (and waits) if the realpath_cache is not set to a value appropriate for the amount of work D8 requires from it, which is often the case. For instance, during a "drush cr", if I set my realpath cache size to 1MB, I find that at the end of the flush, 290kB are used, while the PHP default is 16k, meaning thrashing in that overused cache
  • when data is already in the cache, the discovery is still prepared (e.g. by the plugin manager for link.task), although it may not be needed, but this is probably more of an issue in the relevant plugin managers anyway
berdir’s picture

Thanks for testing.

(you should use numbered bullet points :))

1. Sounds like a good idea. Having it in the factory would be even nicer because we would only have to initialize it when we actually need it, but we can't..
2. Yep, that's interesting. Is that the only place where we call realpath() or does that already happen somewhere else too for those files? (does the cache size change with and without this issue?) We should probably document this somewhere, maybe even check in the status page, but we could do that in a follow-up?
3. Yep, that's #2422815: Don't initialize the discovery object in plugin managers, unless needed and this issue doesn't really change that.

znerol’s picture

We use a modified version of the DI YamlFileLoader.php of Symfony. It includes a static cache for parsed yml files. I think it is great if we can improve on that - it does not make that file less maintainable (should be updated anyway: #2375339: Update DependencyInjection YamlFileLoader for Symfony 2.6).

  1. +++ b/core/lib/Drupal/Core/DependencyInjection/YamlFileLoader.php
    @@ -34,30 +35,22 @@ class YamlFileLoader
    +     * @param \Drupal\Core\DependencyInjection\ContainerBuilder $container
    ...
    +    protected $container;
    

    Should be @var

  2. +++ b/core/lib/Drupal/Core/DependencyInjection/YamlFileLoader.php
    @@ -34,30 +35,22 @@ class YamlFileLoader
    +      $this->fileCache = FileCacheFactory::get('container_yaml_loader');
    
    @@ -67,10 +60,14 @@ public static function reset()
    +        // Load from the file cache, fall back to loading the file.
    +        // @todo Refactor this to cache parsed definition objects in
    +        //   https://www.drupal.org/node/2464053
    +        $content = $this->fileCache->get($file);
    +        if (!$content) {
    +          $content = $this->loadFile($file);
    +          $this->fileCache->set($file, $content);
    

    We should use 4 space indents here because that file uses Symfony coding standards.

  3. +++ b/core/lib/Drupal/Core/DrupalKernel.php
    @@ -245,6 +246,26 @@ public static function createFromRequest(Request $request, $class_loader, $envir
    +    // Initialize the FileCacheFactory component. We have to do it here instead
    +    // of in \Drupal\Component\FileCache\FileCacheFactory because we can not use
    +    // the Settings object in a component.
    

    I do not think that it is worthwhile to hack this into kernel only because of that. Please add a drupal specific factory instead, the rest can remain a component.

  4. +++ b/core/modules/config/src/Tests/ConfigImportUITest.php
    @@ -55,6 +55,10 @@ function testImport() {
    +    // Accessing the configuration import UI might have cached the files in the
    +    // APC user cache. Wait a second so we can be sure that they get a different
    +    // file modification time.
    +    sleep(1);
    
    +++ b/core/modules/field/src/Tests/FieldImportDeleteUninstallUiTest.php
    @@ -95,6 +95,11 @@ public function testImportDeleteUninstall() {
    +    // Accessing the configuration import UI might have cached the files in the
    +    // file cache. Wait a second so we can be sure that they get a different
    +    // file modification time.
    +    sleep(2);
    

    If the file path is known, use touch() with a date in the past instead of sleep().

amateescu’s picture

Status: Needs work » Needs review
StatusFileSize
new46.56 KB
new1.42 KB

Discussed #147.1 with @dawehner, @Berdir and @fgm and the conclusion is that there is no other better place for that code so we'll have to leave it where it is, in DrupalKernel::createFromRequest().

I also looked into #147.2 and, according to this blog post, every operation that involves a file on the disk will result in a PHP realpath() call (which will be cached in the realpath cache), so this patch is not causing any more trouble in this regard than HEAD.

The only thing we could micro-optimize is to not call filemtime() twice in FileCache::getMultiple().

amateescu’s picture

Oh, and we already have a documentation page about this: https://www.drupal.org/node/2219781 which is referenced by https://www.drupal.org/node/2602

Maybe we should update it to say that Drupal 8 needs a value of at least 512K or 1M for the realpath_cache_size PHP setting.

Status: Needs review » Needs work

The last submitted patch, 150: 2395143-149.patch, failed testing.

amateescu’s picture

Status: Needs work » Needs review
StatusFileSize
new46.64 KB
new3.29 KB

@znerol, thanks for the review :)

Fixed #149.1 and 2. The third point was discussed a lot yesterday evening and the conclusion was that we cannot have a core-specific factory because it's also used by other components, so we can just move the code to DrupalKernel::boot() because it makes more sense to have it there than in DrupalKernel::createFromRequest().

As for #149.4, that was already discussed in #55-#57 above and apparently @Berdir thinks that using sleep() there is ok..

Interdiff is to #146.

berdir’s picture

boot() seems like an OK'ish tradeoff to me. It's still equally early, but at least it's out of createFromRequest() as it has nothing to do with the request, and it's still before the container loading/building where we might need it.

dawehner’s picture

+1 for moving it to boot() away from prepareFromRequest()

znerol’s picture

Ok, I'm less unhappy with the initialization code in boot() instead of createFromRequest() :)

Some nitpicks, mainly looked at the tests for the now:

  1. +++ b/core/lib/Drupal/Component/FileCache/ChainedFileCacheBackend.php
    @@ -0,0 +1,91 @@
    +  }
    +}
    
    +++ b/core/lib/Drupal/Component/FileCache/FileCacheBackendInterface.php
    @@ -0,0 +1,43 @@
    +  public function delete($cid);
    +}
    
    +++ b/core/tests/Drupal/Tests/Component/FileCache/ChainedFileCacheBackendTest.php
    @@ -0,0 +1,162 @@
    +  }
    +}
    
    +++ b/core/tests/Drupal/Tests/Component/FileCache/FileCacheFactoryTest.php
    @@ -0,0 +1,79 @@
    +  }
    +}
    
    +++ b/core/tests/Drupal/Tests/Core/FileCache/PhpStorageFileCacheBackendTest.php
    @@ -0,0 +1,99 @@
    +  }
    +}
    

    Missing newline before closing brace.

  2. +++ b/core/lib/Drupal/Core/DependencyInjection/YamlFileLoader.php
    @@ -34,30 +35,22 @@ class YamlFileLoader
    -    public static function reset()
    -    {
    -        static::$yaml = array();
    +      $this->fileCache = FileCacheFactory::get('container_yaml_loader');
         }
    

    This is still off by two characters.

  3. +++ b/core/tests/Drupal/Tests/Component/FileCache/ChainedFileCacheBackendTest.php
    @@ -0,0 +1,162 @@
    +      '\Drupal\Tests\Component\FileCache\StaticFileCacheBackend',
    +      '\Drupal\Tests\Component\FileCache\StaticFileCacheBackend' => [
    +        'bin' => 'preseeded_cache',
    

    This is extremely confusing. Once the class name is the array key, once the array value. If this is correct, can we at least have a comment here?

  4. +++ b/core/tests/Drupal/Tests/Component/FileCache/ChainedFileCacheBackendTest.php
    @@ -0,0 +1,162 @@
    +    $backend_outer->expects($this->once())
    +      ->method('store')
    +      ->with($this->equalTo('foo'), $this->equalTo(42));
    ...
    +    $backend_inner->expects($this->once())
    +      ->method('fetch')
    +      ->with($this->equalTo(['foo', 'not_found']))
    +      ->willReturn(['foo' => 42]);
    

    Not sure whether this is intended or not, store it into the outer backend and retrieve it from the inner?

  5. +++ b/core/tests/Drupal/Tests/Component/FileCache/FileCacheFactoryTest.php
    @@ -0,0 +1,79 @@
    +    $configuration = FileCacheFactory::getConfiguration();
    +    if (!$configuration) {
    +      $configuration = [];
    +    }
    +    $configuration += [ 'test_foo_settings' => $settings ];
    +    FileCacheFactory::setConfiguration($configuration);
    

    This is strange for a test. Why do we need to update existing configuration - but only sometimes? Why can't we just set it to the configuration we actually want to test?

fgm’s picture

+1 for placement in boot() too instead of prepareFromRequest(), as it does not depend on the request.

amateescu’s picture

StatusFileSize
new46.79 KB
new5.61 KB

#156.1, 2 and 3: FIxed.
4) that test method only tests fetch() so mocking store() was not needed at all.
5) that's because FileCacheFactoryTest::testGetConfiguration() depends on testSetConfiguration() and we don't want to lose the configuration that was set there when we go through setUp() for testGetConfiguration().

fabianx’s picture

Chiming in here:

- boot() is fine with me. Thanks @all.

#156:

3. I did not want to type so much!!! I am perfectly fine with the new pattern in #158 though. Whatever works best for us! :)
4. Yes this was intended in that way and tests the intended chain loading behavior.
5. I was mainly worried about loosing the default kernels setConfiguration() values, as interfering there I think would lead to bad results.

Alternatively we could also say that we want to run this Test class in an isolated process, then we can set all configuration - including the default, but I wanted to avoid that and don't know how reliable it is ...

amateescu’s picture

StatusFileSize
new46.83 KB
new2.74 KB

Small self-review.

berdir’s picture

Status: Needs review » Reviewed & tested by the community

Ok, I think this is finally done.

I started this originally, but this has been reviewed, improved and tested by many different people, and nobody seems to feel bold enough to RTBC, so I'm going to do it. There isn't much of my original code left.

There were some reservations by @neclimdul a while back, but I hope I could address them a bit. As we noticed above, doctrine actually includes a very similar annotation parser cache that is using files and also relies on the filemtime() of the original and the cache files. We do it a bit differently, but the concept is the same. So I'm not the first person to come up with this idea.

fabianx’s picture

I hereby RTBC #126-#160, all changes look great to me!

As soon as someone RTBC's the whole patch, we can all enjoy a 1:30 - 1:45 minute faster testbot (on average) :).

X-POST: Yes, RTBC!

alexpott’s picture

Status: Reviewed & tested by the community » Needs work
  1. +++ b/core/lib/Drupal/Component/FileCache/FileCacheBackendInterface.php
    @@ -0,0 +1,44 @@
    +   * @param array $data
    

    This is mixed - not an array.

  2. +++ b/core/tests/Drupal/Tests/Core/FileCache/PhpStorageFileCacheBackendTest.php
    @@ -0,0 +1,100 @@
    +use Drupal\Core\FileCache\PhpStorageFileCacheBackend;
    

    Not used.

  3. +++ b/core/tests/Drupal/Tests/Core/FileCache/PhpStorageFileCacheBackendTest.php
    @@ -0,0 +1,100 @@
    +   * @var \Drupal\Core\FileCache\FileCacheBackendInterface
    

    Component

  4. How come we're bothering with writing the PHP? How about just implementing the apcu cache?
  5. I'd prefer not in hard code the FileCache in config's FileStorage but to use a wrapper in exactly the way we are doing elsewhere. Otherwise we'll end up sticking files we're importing into APCu which seems pointless.
fabianx’s picture

#163:

1. Good catch, even though FileCache only uses it as an array() with mtime, data properties, that is an implementation detail and not part of the interface.
2. Correct, good catch
3. Correct, good catch
4. There is a follow-up to discuss to make that the default when APCu is not available
(especially important if we distinguish between APC and APCU extensions - fragmentation for APC with using user cache is still a BIG problem on PHP 5.4 with just APC)

In my benchmarks PHP Storage was still way way way faster even without opcache than Yaml parsing (and with opcache about the same except for unserialize overhead, which is implicit in APCu rather than explicit).

There is also a follow-up to discuss to pre-generate things and load them in chain loaded fashion (e.g. read-only archive as 2nd level cache).

5. To be able to support a read-only archive in the future we have that, where reading imported data still makes a lot of sense.

amateescu’s picture

StatusFileSize
new46.78 KB
new1.34 KB

Fixing #163.1-3. 4) and 5) need feedback from @Fabianx and/or @Berdir :/

4) is basically asking why are we introducing the PhpStorage file cache backend in this patch since we're not using it (yet).

5) is saying that we're putting a module's config install yaml files in apcu but we're only using that apcu cache when we install the module, which is a very rare operation in real-world cases, only the testbot would actually benefit from it.

berdir’s picture

Status: Needs work » Needs review

needs review I think for the patch?

5) I think we're also using FileStorage when doing a config sync, where usually 95% of the files didn't change, so I think that's also useful there? We're not explicitly adding wrappers elsewhere, just happen to have them already, I think?

alexpott’s picture

5) yes but if we static cache the reads - as another patch does then this is moot - plus putting files in a modules config/install or /optional folder seems unnecessary on 99.9% of sites this files are read once.

amateescu’s picture

StatusFileSize
new39.86 KB
new9.8 KB

After further discussion in IRC with @Fabianx, @alexpott and @catch we agreed to leave the PhpStorage backend and the config part to followups.

fabianx’s picture

Status: Needs review » Needs work
+++ b/core/tests/Drupal/Tests/Component/FileCache/StaticFileCacheBackend.php
@@ -0,0 +1,76 @@
diff --git a/core/tests/Drupal/Tests/Core/FileCache/Fixtures/42.php.txt b/core/tests/Drupal/Tests/Core/FileCache/Fixtures/42.php.txt

--- /dev/null
+++ b/core/tests/Drupal/Tests/Core/FileCache/Fixtures/42.php.txt

+++ b/core/tests/Drupal/Tests/Core/FileCache/Fixtures/42.php.txt
@@ -0,0 +1 @@
+<?php return unserialize('i:42;');
diff --git a/core/tests/Drupal/Tests/Core/FileCache/PhpStorageFileCacheBackendTest.php b/core/tests/Drupal/Tests/Core/FileCache/PhpStorageFileCacheBackendTest.php

--- /dev/null
+++ b/core/tests/Drupal/Tests/Core/FileCache/PhpStorageFileCacheBackendTest.php

These files should be removed as well.

@amateescu: Can you create the follow-up issues for the removed code?

The last submitted patch, 168: 2395143-168.patch, failed testing.

amateescu’s picture

Status: Needs work » Needs review
StatusFileSize
new36.69 KB
new310 bytes

Ugh, the interdiff was good but I didn't upload the correct patch :/ Also removed 42.php.txt this time.

Re-purposed #2447797: Add a PHPStorage based file caching for non-APCU enabled sites to add back the storage and opened #2473179: Use FileCache for config storage for the config part.

fabianx’s picture

Status: Needs review » Reviewed & tested by the community

Back to RTBC, correct files have been removed now and moved to follow-up.

alexpott’s picture

+++ b/core/lib/Drupal/Component/FileCache/FileCache.php
@@ -0,0 +1,152 @@
+  /**
+   * Prefix that is used for cache entries.
+   *
+   * The prefix is static, as all file paths are stored with the full path, so
+   * a per-site prefix is not needed.
+   */
+  const CACHE_PREFIX = 'drupal_file_cache';

What about chroot? Is there an explicit reason why we're not using the salt?

berdir’s picture

This is in Drupal\Component, Settings and the concept of the salt is in Drupal\Core

alexpott’s picture

Status: Reviewed & tested by the community » Needs work

Actually @pwolanin mentioned that if we don't use a salt and an environment does have a shared APCu then a malicious user could poison a cache based on guessing filepaths.

fabianx’s picture

We can set the prefix in the Factory via setConfiguration(). That should pose no problem.

berdir’s picture

Status: Needs work » Needs review
StatusFileSize
new37.29 KB
new7.74 KB

As discussed, added a configurable prefix for FileCache instead of the hardcoded one.

Status: Needs review » Needs work

The last submitted patch, 177: 2395143-175.patch, failed testing.

berdir’s picture

Status: Needs work » Needs review
StatusFileSize
new39.68 KB
new5.26 KB

So that broke unit tests :)

Enforcing that they use a null implementation, also had to make a small test to the FileCacheFactory test, because this unsets the value between the test runs, but that is IMHO a very weird way to test this anyway...

berdir’s picture

StatusFileSize
new40.34 KB
new5.46 KB

Switched to a separate method.

fabianx’s picture

Status: Needs review » Reviewed & tested by the community

The separate method feels way better.

Thanks so much!

Back to RTBC!

effulgentsia’s picture

Overall, I like what's being done in this patch a lot. At first I couldn't tell from reading the issue summary why introduce FileCache as a separate thing than our normal cache system, and how adding another cache helps with our "cold cache" performance and memory usage, but I think the key innovation here is that FileCache never gets cleared (unlike normal caches which get cleared on every module enable) so long as the underlying file remains the same. Which is very cool.

I was tempted to commit in order to unblock issues that are blocked on this, but I found too many nits during review. Here they are, but I also have no objections if someone else wants to commit despite them, so leaving at RTBC.

  1. +++ b/core/lib/Drupal/Component/Discovery/YamlDiscovery.php
    @@ -47,10 +48,27 @@ public function __construct($name, array $directories) {
    +    // load and parse them now. Note the reversed order.
    

    What does "note the reversed order" mean?

  2. +++ b/core/lib/Drupal/Component/FileCache/ChainedFileCacheBackend.php
    @@ -0,0 +1,90 @@
    +class ChainedFileCacheBackend implements FileCacheBackendInterface {
    

    This class is not used for anything real in this patch. Should we remove it (and its unit tests) from here and defer adding it until whatever follow-up makes use of it?

  3. +++ b/core/lib/Drupal/Component/FileCache/FileCache.php
    @@ -0,0 +1,159 @@
    +   * This is not an instance variable as all FileCache objects are backed by
    +   * the same cache and is useful for e.g. simpletest.
    

    I don't understand what "backed by the same cache" means. Was this meant to say "same file" instead?

  4. +++ b/core/lib/Drupal/Component/FileCache/FileCacheBackendInterface.php
    @@ -0,0 +1,44 @@
    +   * Fetch data from the cache backend.
    

    s/Fetch/Fetches/

  5. +++ b/core/lib/Drupal/Component/FileCache/FileCacheBackendInterface.php
    @@ -0,0 +1,44 @@
    +   * Delete data from a cache backend.
    

    s/Delete/Deletes/

  6. +++ b/core/lib/Drupal/Component/FileCache/FileCacheFactory.php
    @@ -0,0 +1,107 @@
    +   * The configuration used to create File Cache objects.
    

    s/File Cache/FileCache/

  7. +++ b/core/lib/Drupal/Component/FileCache/FileCacheFactory.php
    @@ -0,0 +1,107 @@
    +   *   can be used to e.g. specify default usage of a FileCacheCollector class.
    

    What's a FileCacheCollector?

  8. +++ b/core/lib/Drupal/Component/FileCache/FileCacheInterface.php
    @@ -0,0 +1,56 @@
    + * Interface for objects that allow caching file data.
    

    Could be a follow-up, but I think this doc could benefit from some expansion explaining what "file data" means: i.e., that it can be the literal file contents, or some subset of that, possibly parsed, possibly processed in some way, but that the processing should not vary by conditions that can change during normal application operation (e.g., the data processing shouldn't include invoking event listeners or module hooks) since the cache doesn't get cleared by anything other than a change to the file itself.

  9. +++ b/core/lib/Drupal/Component/FileCache/FileCacheInterface.php
    @@ -0,0 +1,56 @@
    +   * @param string $filepath
    +   *   Name of the cache that the cached data is based on.
    

    What does "name of the cache" mean? Same question for the other methods in this interface that this was copy/pasted to.

  10. +++ b/core/lib/Drupal/Component/FileCache/FileCacheInterface.php
    @@ -0,0 +1,56 @@
    +   * Store data based on a filename.
    

    s/Store/Stores/

  11. +++ b/core/lib/Drupal/Component/FileCache/FileCacheInterface.php
    @@ -0,0 +1,56 @@
    +   * Delete data from the cache.
    

    s/Delete/Deletes/

  12. +++ b/core/lib/Drupal/Core/DrupalKernel.php
    @@ -7,6 +7,8 @@
    +use Drupal\Component\FileCache\FileCache;
    

    Unused use statement.

  13. +++ b/core/lib/Drupal/Core/DrupalKernel.php
    @@ -410,6 +412,28 @@ public function boot() {
    +    FileCacheFactory::setPrefix(hash('sha256', Settings::get('hash_salt')));
    

    Is it ok to use the hash of the global salt on its own? In other places, we usually hash the salt combined with some other string (i.e., in this case, maybe with 'FileCacheFactory'?), so that if the hash is leaked from one place, it can't be used to compromise something else.

  14. +++ b/core/modules/config/src/Tests/ConfigImportUITest.php
    @@ -55,6 +55,10 @@ function testImport() {
    +    // Accessing the configuration import UI might have cached the files in the
    +    // APC user cache. Wait a second so we can be sure that they get a different
    +    // file modification time.
    +    sleep(1);
    

    Doesn't look to me like this patch uses FileCache for config files. So, which files is this referring to that can change during this test?

  15. +++ b/core/modules/field/src/Tests/FieldImportDeleteUninstallUiTest.php
    @@ -95,6 +95,11 @@ public function testImportDeleteUninstall() {
    +    // Accessing the configuration import UI might have cached the files in the
    +    // file cache. Wait a second so we can be sure that they get a different
    +    // file modification time.
    +    sleep(2);
    

    Same for here, but also, why 2 seconds here?

berdir’s picture

Component: configuration system » cache system
Status: Reviewed & tested by the community » Needs work

Thanks for the review.

1. I think I meant that $file and provider is switched there, there might be a better way to describe it or it might no be worth a comment at all.
9. Should be name of the file I think.
13. We are doing basically the same for the new default apc classloader in DrupalKernel::createFromRequest(), so I think it's ok. I did just notice that we have a Settings::getHashSalt() that checks if it's not empty, we could use that.
14. & 15. Yes, those are about config files, we can remove those hacks from the patch, nice catch.

berdir’s picture

Assigned: fabianx » berdir

Working on this

berdir’s picture

Status: Needs work » Needs review
StatusFileSize
new31.15 KB
new15.13 KB

1. Improved comment.
2. Removed.
3. Yeah, I think this comment is more confusing than not having one. It's also wrong, it has nothing to do with simpletest, those almost always run in separate processes anyway. I've just removed it, wasn't able to come up with something that is useful instead. It's static so that multiple calls to the same collection will access the same cache, otherwise it would not help during multiple cache clears.
7. I think "Collector" is a left-over.
8. Added some documentation.
13. As mentioned above, same as the classloader.
14. & 15. Removed.

amateescu’s picture

Status: Needs review » Reviewed & tested by the community

The interdiff looks good to me, great review @effulgentsia :)

alexpott’s picture

Status: Reviewed & tested by the community » Fixed

This issue addresses a critical bug and is allowed per https://www.drupal.org/core/beta-changes. Committed 4192117 and pushed to 8.0.x. Thanks!

  • alexpott committed 4192117 on 8.0.x
    Issue #2395143 by amateescu, Fabianx, Berdir, beejeebus, dashaforbes,...
fabianx’s picture

Thanks for the commit!

Unfortunately effulgentsia is right and we should indeed use:

//- $prefix = hash('sha256', Settings::getHashSalt());
$prefix = hash('sha256', 'FileCache' . Settings::getHashSalt());

instead of just hashing the hash salt. The APC class loader uses the 'drupal.' prefix for that.

There are some other things that I would have changed differently, but I will create follow-ups for those.

amateescu’s picture

Status: Fixed » Needs review
StatusFileSize
new568 bytes

That sounds easy enough to do here with a small followup patch.

znerol’s picture

+++ b/core/lib/Drupal/Core/DrupalKernel.php
@@ -431,7 +431,7 @@ public function boot() {
- FileCacheFactory::setPrefix(hash('sha256', Settings::get('hash_salt')));
+ FileCacheFactory::setPrefix(hash('sha256', 'FileCache' . Settings::get('hash_salt')));

I think @Fabianx meant hash_hmac. Otherwise that change does not make sense.

znerol’s picture

Oh, I did misread the dot for a comma. But still hash_hmac is the way to go when using hash_salt.

alexpott’s picture

@znerol well...

    // If the class loader is still the same, possibly upgrade to the APC class
    // loader.
    if ($class_loader_class == get_class($class_loader)
        && Settings::get('class_loader_auto_detect', TRUE)
        && Settings::get('hash_salt', FALSE)
        && function_exists('apc_fetch')) {
      $prefix = 'drupal.' . hash('sha256', 'drupal.' . Settings::getHashSalt());
      $apc_loader = new \Symfony\Component\ClassLoader\ApcClassLoader($prefix, $class_loader);
      $class_loader->unregister();
      $apc_loader->register();
      $class_loader = $apc_loader;
    }

from core/lib/Drupal/Core/DrupalKernel.php

imo we should proceed here and open a follow to change both of these to use hash_hmac()

znerol’s picture

Status: Needs review » Reviewed & tested by the community

Fair enough.

alexpott’s picture

Status: Reviewed & tested by the community » Fixed

Committed bfcb41b and pushed to 8.0.x. Thanks!

  • alexpott committed bfcb41b on 8.0.x
    Issue #2395143 followup by amateescu: YAML parsing is very slow, cache...
znerol’s picture

The test fails over in #2474043-1: hash_salt should be used in combination with hash_hmac() instead of hash() indicate that we currently setup the FileCacheFactory even if the hash_salt setting is empty. That will result in a predictable prefix.

znerol’s picture

Status: Fixed » Needs review
StatusFileSize
new1.89 KB

Use the same condition as the class loader bit in DrupalKernel::createFromRequest().

Status: Needs review » Needs work

The last submitted patch, 199: 2395143-198-followup-followup.patch, failed testing.

alexpott’s picture

Status: Needs work » Fixed

Can we open followups rather than re-open this issue?

alexpott’s picture

Status: Fixed » Needs work

Reverting this until we get testbot stability - see #2474909: Allow Simpletest to use the same APC user cache prefix so that tests can share the classmap and other cache objects

APC stats from a bot are attached to this in an image... we're at 100% fragmentation and inserting into the user cache at the rate of 5512.76 cache requests/second. Potentially we'll have to revert the classloader change too - but we didn't see these types of fails after committing that patch.

Perhaps we need to go forward with #2447753: APCu detection erroneously succeeds if apc.enabled is "0"

I'm sorry @Berdir :(

  • alexpott committed 9125395 on 8.0.x
    Revert "Issue #2395143 followup by amateescu: YAML parsing is very slow...
  • alexpott committed e11f845 on 8.0.x
    Revert "Issue #2395143 by amateescu, Fabianx, Berdir, beejeebus,...
alexpott’s picture

Issue summary: View changes
StatusFileSize
new336.56 KB
mpdonadio’s picture

+++ b/core/lib/Drupal/Component/FileCache/ApcuFileCacheBackend.php
@@ -0,0 +1,36 @@
+  public function store($cid, $data) {
+    apc_store($cid, $data);
+  }
+

Per http://php.net/manual/en/function.apc-store.php, this can return FALSE when the $data can't be stored in the cache. This may be the source of the problem.

berdir’s picture

I don't think it's the set() specifically, nothing should break if that fails, the problem is that due to the very recently introduced hash prefix for both the classloader *and* this, we have a huge amount of different caches, every test environment has it's own caches.

That fills up the cache, things constantly evicted from there and have to be written again, and most importantly, the shared memory becomes completely fragmented and very slow, to the point where the performance with apc enabled is much, much worse than when it's not.

Planned next steps:

* Make the prefix properly configurable again
* Set the prefix in the test environment to a stable value, so it is always the same, for all tests. Possibly both for the classloader and the file cache.
* Consider restarting apache after/before a completed test run, to drop out any cached information between test runs.

wim leers’s picture

* Consider restarting apache after/before a completed test run, to drop out any cached information between test runs.

Almost exactly a year ago, @rfay said this over at #1809248-27: Enable apc.enable_cli php.ini setting on testbots:

I think we've had a restart after every test in there for a long time.

… so … did this change?

alexpott’s picture

Postponing this on #2474909: Allow Simpletest to use the same APC user cache prefix so that tests can share the classmap and other cache objects. That patch will gives up the ability for the entire testsuite to use the same APC Filecache entries. This will massively reduce writes and make cache hit rates fantastic. Also it will reduce the thrashing on the APC cache.

Once that lands we'll need to reroll this patch and include its followups and change the code to use the new method to get the APC prefix. Once we've got the patch we'll then need to check the APC usage after a testbot run to confirm that we're not maxing the 300mb available out. And ensure that we're not too fragmented.

alexpott’s picture

Status: Postponed » Needs review
StatusFileSize
new576 bytes
new31.23 KB

The last submitted patch, 210: a-interdiff-of-sorts.patch, failed testing.

fabianx’s picture

Status: Needs review » Reviewed & tested by the community

Interdiff looks great, RTBC if tests still pass.

alexpott’s picture

So the patch in #210 reached a user cache full 37 times during the test run. Which is way less than was occurring before #2474909: Allow Simpletest to use the same APC user cache prefix so that tests can share the classmap and other cache objects landed.

webchick’s picture

Given that this patch caused testbot issues last time, alexpott is now offline for the day, and Mixologic is about to go home, I propose we wait until tomorrow morning Pacific time to commit this.

fabianx’s picture

Assigned: berdir » alexpott

Agree, lets have alexpott commit this.

berdir’s picture

I would expect that this brings back the performance improvements that we've seen before we introduced the dynamic prefix, because that made it ~1m faster.

Not seeing that right now, testbot runtime for this was only a few seconds faster than it was for HEAD (both around 27min, did we switch to slower servers?)

wim leers’s picture

did we switch to slower servers?

Yes :(

alexpott queued 210: 2395143.210.patch for re-testing.

alexpott queued 210: 2395143.210.patch for re-testing.

fabianx’s picture

The performance improvement already was mostly gone after #2395143-177: YAML parsing is very slow, cache it with FileCache / #177 as testbot hugely did profit from the config storage caching. Maybe we can exchange the service with a caching one only for test runs as part of #2473179: Use FileCache for config storage?

alexpott’s picture

Assigned: alexpott » Unassigned
Status: Reviewed & tested by the community » Fixed

Screenshots of apc.php...

Before

Only local images are allowed.

After

Only local images are allowed.

So APC is not blown by committing this anymore due to #2480541: Bump APC settings and #2474909: Allow Simpletest to use the same APC user cache prefix so that tests can share the classmap and other cache objects - see #2480541-5: Bump APC settings for what this looked liked before upping the APC memory :)

I ponder too about the performance improvement of this patch. Testbot times with the patch are 27 minutes and its 30 minutes without the patch - but can be within normal testbot variation (although I would say the 3 minutes is the extreme end of the variation).

I think I've only worked around the fringes of this patch and therefore I'm in a good place to commit it. Committed e361599 and pushed to 8.0.x. Thanks!

  • alexpott committed e361599 on 8.0.x
    Issue #2395143 by amateescu, Fabianx, Berdir, beejeebus, dashaforbes,...

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.