Problem/Motivation

Annotations parsing is quite slow, but can be perfectly cached with FileCache (introduced by #2395143: YAML parsing is very slow, cache it with FileCache).

Proposed resolution

- Use FileCache for caching Annotations.

Remaining tasks

- Do it
- Fix test failures

User interface changes

- None

API changes

- None

Main Goal: Performance

Follow up from #2395143: YAML parsing is very slow, cache it with FileCache

Comments

anavarre’s picture

Issue tags: +Performance
fabianx’s picture

Issue summary: View changes
Priority: Normal » Major
Status: Active » Postponed
Parent issue: » #2395143: YAML parsing is very slow, cache it with FileCache
amateescu’s picture

Title: Use APCu for caching annotations in FileCache » Use FileCache for caching discovered annotations
Status: Postponed » Needs review
Issue tags: +D8 Accelerate Dev Days
StatusFileSize
new49.59 KB

This is the most important followup to #2395143: YAML parsing is very slow, cache it with FileCache so I want to try and see if my idea that we need to store a serialized annotation works better than the initial version of this code from #2395143: YAML parsing is very slow, cache it with FileCache.

Status: Needs review » Needs work

The last submitted patch, 3: 2447803.patch, failed testing.

amateescu’s picture

Status: Needs work » Needs review
StatusFileSize
new60.69 KB
new13.91 KB

Found the problem! Since image toolkit operation plugins were using a sub-namespace of image toolkit plugins, the file cache was returning image toolkit operation plugins next to image toolkit plugins, so ImageToolkitManager was getting plugin definitions that actually belonged to ImageToolkitOperationManager.

wim leers’s picture

@amateescu-debugging-skills++++

neclimdul’s picture

+++ b/core/lib/Drupal/Core/ImageToolkit/ImageToolkitOperationManager.php
@@ -46,7 +46,7 @@ class ImageToolkitOperationManager extends DefaultPluginManager implements Image
-    parent::__construct('Plugin/ImageToolkit/Operation', $namespaces, $module_handler, 'Drupal\Core\ImageToolkit\ImageToolkitOperationInterface', 'Drupal\Core\ImageToolkit\Annotation\ImageToolkitOperation');
+    parent::__construct('Plugin/ImageToolkitOperation', $namespaces, $module_handler, 'Drupal\Core\ImageToolkit\ImageToolkitOperationInterface', 'Drupal\Core\ImageToolkit\Annotation\ImageToolkitOperation');

similarity index 94%
rename from core/modules/system/src/Plugin/ImageToolkit/Operation/gd/Convert.php

rename from core/modules/system/src/Plugin/ImageToolkit/Operation/gd/Convert.php
rename to core/modules/system/src/Plugin/ImageToolkitOperation/gd/Convert.php

is this acceptable? Seems like a regression and moving files seems pretty disruptive.

amateescu’s picture

@neclimdul, investigation from @Berdir in the parent issue seems to suggest that using FileCache for annotations bring a very nice improvement, see #2395143-6: YAML parsing is very slow, cache it with FileCache and #2395143-131: YAML parsing is very slow, cache it with FileCache, so IMO the disruption is acceptable.

neclimdul’s picture

I guess that's an argument for the disruption. is it valid for the regression?

fabianx’s picture

After discussing in IRC, an alternate approach would be to:

- use one prefix / collection per plugin manager asking for annotations in the file cache

That way we should avoid the mix-up in the caches and be more clean.

A cache really should not have side effects when used so that a renaming is needed: That is a bug in the cache or usage of the cache, not of the classes itself.

amateescu’s picture

StatusFileSize
new2.86 KB
new836 bytes

Of course, it's much better to do it like this. At that time I was just too excited that I found the underlying problem so I wrote the first fix that came to mind :)

berdir’s picture

Yeah, doing nested folders like that seems problematic anyway but that's a separate issue, great if we can solve it like this :)

We should probably profile this a bit.. for example a cache clear in the UI or an installation to see how much we save.

Status: Needs review » Needs work

The last submitted patch, 11: 2447803-11.patch, failed testing.

amateescu’s picture

Status: Needs work » Needs review
StatusFileSize
new5.53 KB
new3.54 KB

We should probably profile this a bit.. for example a cache clear in the UI or an installation to see how much we save.

Sadly.. not that much :/

I did quick`n`dirty benchmark with time drush site-install with an apache2 restart before each run so we get cold caches. These are the results:

Without patch:
drush sis-sq 35,09s user 0,41s system 98% cpu 35,897 total

With patch:
drush sis-sq 33,12s user 0,45s system 99% cpu 33,739 total

So we save ~2 seconds in the installer :)

dawehner’s picture

It feels a little bit ugly that we have to setup the prefix manually ... what about doing that in UnitTestCase instead ...

amateescu’s picture

StatusFileSize
new4.12 KB
new3.38 KB

Sounds great :)

Status: Needs review » Needs work

The last submitted patch, 16: 2447803-16.patch, failed testing.

amateescu’s picture

Status: Needs work » Needs review
StatusFileSize
new5.41 KB
new1.3 KB

This should do it.

fabianx’s picture

Patch looks great to me!

+++ b/core/lib/Drupal/Component/Annotation/Plugin/Discovery/AnnotatedClassDiscovery.php
@@ -100,6 +110,11 @@ public function getDefinitions() {
+                $definitions[$cached['id']] = unserialize($cached['content']);

@@ -113,7 +128,11 @@ public function getDefinitions() {
+                $definitions[$id] = $content;
+                $this->fileCache->set($fileinfo->getPathName(), ['id' => $id, 'content' => serialize($content)]);

Just some background info on this:

The reason we need to explicitly serialize / unserialize things here is:

a) classes that are used in here

b) APC(u) uses its own serializer. In APCu later this was (due to some bugs) switched to PHPs internal serializer, but APC is still using its internal one. So I think we need to keep those serialize / unserialize calls.

Unless there is a good way to quickly check if an annotation has a rich object ...

Not sure that is worth it, but unserialize is slow ... And can as its purely internal be a follow-up.

--

drush benchmarks are problematic for FileCache - unless you use a more persistent backend (which we all removed for now).

In the best case the annotations are already present in the APC cache, but command line you always start with a fresh new cache ...

So 2s out of 32s for drush is great!

--

This would be RTBC from my side.

fabianx’s picture

Status: Needs review » Reviewed & tested by the community

Oh, lets just do it ...

catch’s picture

Status: Reviewed & tested by the community » Needs work
+++ b/core/lib/Drupal/Component/Annotation/Plugin/Discovery/AnnotatedClassDiscovery.php
@@ -100,6 +110,11 @@ public function getDefinitions() {
+                $definitions[$cached['id']] = unserialize($cached['content']);

I think we should leave the serialize()/unserialize() call in, but they need an explanatory comment. Also when trying to write one, I couldn't see why we'd have that comment here:

It makes sense to have this, but why not serialize in the APC backend ourselves? I don't think the annotation discovery should be worried about APC vs. APCu serialize implementations.

If we don't like that overhead for other uses of the backend where it's not strictly necessary, we could provide separate APC vs. APCu backends (not sure that's needed, and definitely not here).

fabianx’s picture

Yes, agree lets add the serialize / unserialize to ApcFileCacheBackend.

We can optimize this later like in the normal CacheBackends with a 'serialized' => 1|0 flag - if we want to - by checking if the serialized array contains O: or C: entries (IIRC). (more work on cacheSet / less work on cacheGet).

berdir’s picture

The seriaize/unserialize has nothing to do with APC.

It is there because sometimes those things are objects, we use serialize/unserialize explicitly to break object references.

See the original test fails #2395143-14: YAML parsing is very slow, cache it with FileCache (comment 14). We have alter hooks for example that based on a setting do something, but then we changed the settings, cleared caches and we go the previous, altered object back.

So yes, we should have a comment to explain this.

fabianx’s picture

#23: Okay, that makes sense.

So #22 is wrong, this is needed also for the FileCache directly, but we don't want to serialize all.

Only the caller has knowledge if what we store contains something that has an object reference, which must be serialized / unserialized to re-fetch it e.g. from the container.

I think for this issue we should just add a comment.

Then think about maybe detecting if the serialized data contains objects and store differently with a flag to not unserialize ... (in a follow-up)

This is slower for cache-sets, but faster for all cache backends to retrieve as for simple data PHP can just return the data directly, JSON can be used and for APC we do not need to serialize twice.

effulgentsia’s picture

We have alter hooks for example that based on a setting do something, but then we changed the settings, cleared caches and we go the previous, altered object back.

What alter hooks are running for this? In #2395143-182: YAML parsing is very slow, cache it with FileCache.8, I raised a concern about making sure no event listeners or hooks are involved in processing the data that is cached in FileCache. (Thanks for the docs addition to FileCacheInterface that made it into the committed patch related to that.)

Berdir queued 18: 2447803-18.patch for re-testing.

berdir’s picture

StatusFileSize
new99.55 KB

Forgot about your question sorry.

The alter hook is the standard entity type alter hook. Because we pass the entity type definitions around as objects, then any change that happens on those object is reflected in the static cache in there.

Here's the impact of this with my install profile. Keep in mind that there's a big xhprof overhead with this kind of stuff, but I still think this is very much worth doing.

Ignore the memory stats, that's completely messed up somehow... the parent call that does little more than just calling this method shows a 300% memory increase, from -36MB to +76MB. Which obviously makes no sense at all ;)

catch’s picture

Version: 8.0.x-dev » 8.1.x-dev

Drupal 8.0.6 was released on April 6 and is the final bugfix release for the Drupal 8.0.x series. Drupal 8.0.x will not receive any further development aside from security fixes. Drupal 8.1.0-rc1 is now available and sites should prepare to update to 8.1.0.

Bug reports should be targeted against the 8.1.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.2.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

fabianx’s picture

Version: 8.1.x-dev » 8.2.x-dev
Status: Needs work » Needs review
StatusFileSize
new1.57 KB
new5.78 KB

I am revisiting this to make tests faster (#2747075: [meta] Improve WebTestCase / BrowserTestBase performance by 50%) as I found it bottlenecks my local development.

I added the missing comment and also the case when a file does not have annotations.

berdir’s picture

Status: Needs review » Reviewed & tested by the community

Looks good to me. I've been using the patch from #18 for a year now and didn't have any issues.

Install time of my install profile:

HEAD: 5m 17s
Patch from #18 that I used so far: 4m 56s
Latest patch: 4m 52s

Numbers are fairly stable, second run with latest patch was 4m 54s.

alexpott’s picture

Status: Reviewed & tested by the community » Needs work
+++ b/core/lib/Drupal/Component/Annotation/Plugin/Discovery/AnnotatedClassDiscovery.php
@@ -64,6 +72,8 @@ function __construct($plugin_namespaces = array(), $plugin_definition_annotation
     $this->annotationNamespaces = $annotation_namespaces;
+
+    $this->fileCache = FileCacheFactory::get('annotation_discovery:' . str_replace('\\', '_', $plugin_definition_annotation_name));

I think the cache suffix should also include $annotation_namespaces since if these are altered the results could change even if the files have not.

alexpott’s picture

fabianx’s picture

Status: Needs work » Reviewed & tested by the community
StatusFileSize
new1.01 KB
new5.91 KB

Yes, that is a good point.

I added a hash of the annotation namespaces.

dawehner’s picture

Status: Reviewed & tested by the community » Needs work
+++ b/core/lib/Drupal/Component/Annotation/Plugin/Discovery/AnnotatedClassDiscovery.php
@@ -3,6 +3,7 @@
 
 use Drupal\Component\Annotation\AnnotationInterface;
+use Drupal\Component\FileCache\FileCacheFactory;
 use Drupal\Component\Plugin\Discovery\DiscoveryInterface;

We also have to adapt the composer.json file for the plugin component.

fabianx’s picture

Status: Needs work » Reviewed & tested by the community
StatusFileSize
new485 bytes
new6.38 KB

Indeed!

dawehner’s picture

Thank you @Fabianx!

alexpott’s picture

Status: Reviewed & tested by the community » Fixed

Committed 1da4d15 and pushed to 8.2.x. Thanks!

  • alexpott committed 1da4d15 on 8.2.x
    Issue #2447803 by amateescu, Fabianx, Berdir: Use FileCache for caching...

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.