Problem/Motivation

According to general profiling done by @moshe weitzman and measurements of @effulgentsia the unserialize call for fetching all library definitions needs around 1ms on every page request.

This seems to be caused by the pure size of it, see https://gist.githubusercontent.com/wimleers/f24a78e518239b927d33/raw/5ae...
as an example.

Proposed resolution

  • Consider to have a "registry" of licenses, so you don't need to define the same one multiple times
  • Consider to only apply default values on runtime (this is a trick we might be able to do in other places, too)

Remaining tasks

User interface changes

API changes

Comments

dawehner’s picture

Issue summary: View changes
dawehner’s picture

Issue summary: View changes
wim leers’s picture

Issue tags: +D8 cacheability

Indeed. 69 KB of library definitions… with only Drupal core, and not even all modules in Drupal core!

fabianx’s picture

Why do we need all the meta data at all?

Can't we split this into the meta data libraries and dependency data chains?

On runtime we should usually only always need the data.

E.g. speaking in abstract terms, have a Library object that has some methods:

- Library::getDependencies()
- Library::getAssets()
- Library::getMetaData()

Dependencies are needed to build the dependency tree, get assets is needed to finally draw the assets to the page, meta data is needed when information about URL, copyright, etc. is needed, which is the 1% case.

I think it is totally reasonable to expect from altering hooks if they need the meta data to do e.g.:

- \Drupal::service('libraryHandler')->getLibrary('module', 'name')->getMetaData();

Meta Data I would cache once per library and not in a cache collector. This is a slow path.

And just have the assets and dependencies in the array structure (and cache / cache collector).

I don't know enough about how it works currently, but this is how I would implement it to split the caches and remove what I guess is the majority of the chunk for libraries.

fabianx’s picture

We discussed in IRC to split off one sub-issue of this:

- Create a major issue to add role and admin flag of route to the cache collector CID (cache id) for libraries.

This means the cache collector is nicely fragmented into admin and non-admin things _and_ per role, which makes end-user performance better. (contextual, etc. never in there, so no need to load them for that user)

We should still look into splitting meta data off, too and tackle this here.

JS/CSS aggregation / minification that needs the actual meta data of the libraries can do its own caching once it has the data, but this is run asynchronously from the main request, so not a concern and either one cache_get_multiple() OR another cache collector.

wim leers’s picture

From a discussion on IRC with Fabianx, dawehner, catch and I, we concluded that a simple way to decrease the size would be to:

  1. add the UserRolesCacheContext to the library cache collector's cache key
  2. add the IsAdminRouteCacheContext to the library cache collector's cache key (and that cache context doesn't exist yet, but will be simple to create)
damiankloip’s picture

Status: Active » Needs review
StatusFileSize
new2.85 KB

So we could do this to the LibraryDiscoveryCollector. CacheCollector is not geared towards an array of keys but we can just not call the parent constructor and easily do what we need. I guess the question is do we want to try and just support this in CacheCollector instead?

I have not added the IsAdminRouteCacheContext yet, will do that shortly.

damiankloip’s picture

StatusFileSize
new4.5 KB

Added the new cache context, no interdiff because ...saturday night :)

I make the following sizes:

Before patch: 84.62kb

After patch:
- admin: 66.2kb
- non admin: 77.1kb

The last submitted patch, 7: 2381473.patch, failed testing.

Status: Needs review » Needs work

The last submitted patch, 8: 2381473-8.patch, failed testing.

dawehner’s picture

Seems legit.

damiankloip’s picture

Status: Needs work » Needs review
StatusFileSize
new6.09 KB
new1.59 KB

Fixed the test, do we want to bother testing the method params (->with(['library_info', 'cache_context.user.roles', 'cache_context.is_admin_route']))?

dawehner’s picture

  1. +++ b/core/lib/Drupal/Core/Cache/IsAdminRouteCacheContext.php
    @@ -0,0 +1,49 @@
    +class IsAdminRouteCacheContext implements CacheContextInterface {
    ...
    +   * Constructs a new UserRolesCacheContext service.
    +   *
    +   * @param \Drupal\Core\Session\AccountInterface $user
    +   *   The current user.
    +   */
    +  public function __construct(AdminContext $admin_context) {
    +    $this->adminContext = $admin_context;
    +  }
    +
    

    Meh documentation, c&p

  2. +++ b/core/tests/Drupal/Tests/Core/Asset/LibraryDiscoveryCollectorTest.php
    @@ -65,8 +72,15 @@ protected function setUp() {
    +      ->will($this->returnValue(['library_info']));
    

    Use willReturnValue() ... maybe we should also return cache keys for the other ones.

dawehner’s picture

On top of that I think we should still try to reduce the actual php array we retrieve.

damiankloip’s picture

Yes, I agree. We ideally want to extract metadata from this normal runtime info. I did this patch, then properly read the comments above saying let's make antoher issue to do this, sorry! :)

dawehner’s picture

@damiankloip
No problem, well, I guess doing both could be in scope of the issue.

wim leers’s picture

What a wonderfully elegant patch! Thanks, Damian!

But given that the patch so far does not present as big of a change as I think we'd hoped (we were concerned about 69 KB being too large, see #3):

Before patch: 84.62kb

After patch:
- admin: 66.2kb
- non admin: 77.1kb

perhaps Daniel's proposal ( I guess doing both could be in scope of the issue.) is appropriate?

mgifford’s picture

Status: Needs review » Needs work

Up to #13... Also, it seems like we're bigger now than before in totally, smaller on a per-page basis.

fabianx’s picture

dpovshed’s picture

Unfortunately patch from #12 is not applicable anymore

wim leers’s picture

Issue tags: +D8 Accelerate Dev Days

@dpovshed's been looking into the performance of asset handling, based on me noticing in #2429617-36: Make D8 2x as fast: Dynamic Page Cache: context-dependent page caching (for *all* users!) that we spend too much time in there. He confirmed that we spend too much time in the loading of asset library definitions… i.e. this issue. This confirms the "major" status.

andypost’s picture

Issue tags: +Needs reroll
damiankloip’s picture

Status: Needs work » Needs review
Issue tags: -Needs reroll
StatusFileSize
new6.23 KB
new1.12 KB

Agree this is important, Let's start with a reroll and a few changes for things that have recently changed. Looks like the cache contexts service now only expects cache contexts to be passed. Before it just left them. That doesn't make things easier, how come that was changed?

damiankloip’s picture

Doh, the tests need updating too.

Status: Needs review » Needs work

The last submitted patch, 23: 2381473-22.patch, failed testing.

dpovshed’s picture

Assigned: Unassigned » dpovshed
dpovshed’s picture

StatusFileSize
new5 KB

Attached is a proof-of-concept patch which store all the library discovery information as PHP file in sites/default/files/php/library instead of JSON-encoded entry in cache bin. Inspired by last bullet in a list here .

According to my test this gives just ~10% improve for related Asset functions . Attention to: @WimLeers .

wim leers’s picture

I'm not able to reproduce the 10% performance benefit. We'll need to look into this more. We should sit down together tomorrow to figure out why. But in any case, it is very much worthwhile to make a single PHP file with all asset library information, and see what that gains us. We want to try that for this problem, and in other cases as well, so the data we gather here will be very valuable :)

wim leers’s picture

Title: Decrease the size of the library cache entry. » Decrease the size of the asset library cache item
dpovshed’s picture

StatusFileSize
new11.76 KB

@WimLeers,

please evaluate and review the attached patch. We completely get rid of caching library data ib favor of generating single file for all the library stuff.

Things not implemented but which I keeping in mind if we go that direction:
- probably AssetStorage should be a service;
- tests should be updated;
- Drush cr should removed all the cached data as well.

dpovshed’s picture

StatusFileSize
new11.57 KB

All topics from previous message are covered now, except testing.

The class is now is a service, with all cache invalidation and event subscribing features.

Updated patch attached.

wim leers’s picture

StatusFileSize
new375.23 KB

#30:
Patch is looking promising!

Unfortunately, the numbers show that it is somehow still slower. On PHP 5.5.11, with OpCache on, MTimeProtectedFileStorage::load() takes more time than CacheCollector::get() (1.196 vs 0.862 ms). We suspected the mtime checking might be costly, but it seems that's not it: the truly surprising thing is that just including the attached PHP file, which is in OpCache, takes 1.161 ms! We expected that to be much faster.

The only explanation we have is that apc_fetch() (APCu) is much faster, despite the unserialization work it does.

dpovshed’s picture

StatusFileSize
new11.27 KB

The patch from #31 was reworked to contain library data wrapped not in a class but into a function.

This was done after tests with large file with 500-x definitions of data by different ways:

  1. as a class member
    class __library_core { 
      $data = array(...);
    } 
    
  2. as a function returning array
    function __library_core() {
      return array(...)
    }
    
  3. as an array with un-evaluated definitions
    __library_['core'] = <<<EOT
    array(...);
    EOT;
    

Results was, in seconds per same number of definitions:

  1. 11.1
  2. 5.7
  3. 2.6

Current option using approach #2 instead of #1 as in patch #31. We should expect 2x time load improvements, and still have a bit unusual option (3) :) if we need to improve more. Last one requires eval() only for used libraries so most likely will get some time from it as well.

fabianx’s picture

If we write serialized data into a file should we not be using PhpStorageFactory, which just uses:

<?php return unserialize("DATA-STRING");

?

dpovshed’s picture

@Fabianx I am not sure what you referring to. When I see cached files - they are just a plain php files. When I profile loading data - it comes down to a simple include_once call.

So I am var_dump ing data before save and including them later on retrieval.

fabianx’s picture

#35: The fastest method to get data from a file containing PHP code is to use a return statement and instead of creating our own cache layer we can directly use lib/Drupal/Core/Cache/PhpBackend.php, which is already in core.

In case the unserialize takes too long (which I think is part of the issue) and we say that this can be stored as an array directly, then we should subclass PhpBackend.php like PhpRawBackend.php and still use our cache layer, etc.

Then use our own serialization logic, which throws an Exception if there is an object or anything in there.

That kind of serialization should be a component in my opinion as it's useful e.g. also for FileCache.

This is an optimization on the caching layer, the implementation should not be dependent on the caches used.

e.g.

Lets assume I am a big enterprise and have shared files.

In that case I probably want to store library caches in APC + Memcache / Redis and I do not want to use a file stored backend, because my filesystem is rather slow - especially if its shared. But because this data needs to be expired, it can't be on /tmp or so (at the moment at least not) and such it would need to be on slow shared file storage.

So therefore:

- I thank you very much for your work on this.
- I think your research on this is great that classes and functions just load too slow, I still encourage to use "<?php return ['entries'];" as generated code.
- I think it would be better to enhance the PhpStorage backend than to bind the LibraryDiscovery to _one_ specific caching method using a component so that other PhpStorage based cache backends can re-use that.
- I don't think it is a good idea to bind the implementation to a specific cache method as that reduces flexibility and is inconsistent with the rest of core
- I like the idea of using PHP more for definitions

wim leers’s picture

Issue summary: View changes
StatusFileSize
new418.95 KB
new663.25 KB

I profiled #33, and it looks much more promising than the prior patches :)

HEAD
Patch in #3

It's now cheaper to load the initial file, as long as a low number of different extensions' data is used during a request, this is going to be cheaper/faster.

But, it now looks quite clear that this is not going to be buying us the significant difference that we'd hoped for. You could #33.3, that would mean this research would then be comprehensive, and useful for future reference.

However, it means that we should look into caching at a higher level, not here.

wim leers’s picture

However, it means that we should look into caching at a higher level, not here.

This has now been done in #2506369: Cache CSS/JS asset resolving. Which means this issue can be marked fixed AFAICT.

fabianx’s picture

I am not 100% in agreement there. The library cache item is still large and its still helpful as JS / CSS asset cache is still unique per page possibly. (hopefully not)

So the hierarchy makes sense ...

wim leers’s picture

What hierarchy are you referring to?

Yes, the asset library cache items are still useful. That's why I'm not proposing to remove them. But we only load that large cache item on routes where we have a combination of libraries that we haven't seen on other routes. And it's still cheaper than going to the file system and parsing YAML files plus invoking a bunch of hooks.

Therefore I believe this can be closed.

dpovshed’s picture

Assigned: dpovshed » Unassigned

As long as people are happy with solution introduced in #2506369: Cache CSS/JS asset resolving Ihave no objections on closing this issue

catch’s picture

Priority: Major » Normal
Status: Needs work » Closed (works as designed)

I'm going to go ahead and close this ('works as designed' seems best). If we run into an issue with this after all, we can re-open it and the patch is here.