Problem/Motivation

A not neglectable amount of time of a request is spent in the autoloader itself.
Note: We do talk about the running php code and the apc_fetch call, not about the opcode. cache.

Proposed resolution

Preload all classes/interface so we never even hit the autoloader.

Remaining tasks

  • Discuss whether this is a sane idea
  • Profile it.
  • Discuss how we can get rid of the _once check

User interface changes

API changes

Comments

dawehner’s picture

StatusFileSize
new20.96 KB

This is certainly work in progress.

dawehner’s picture

Issue summary: View changes
StatusFileSize
new70.49 KB

Embedded the actual wanted image.

dawehner’s picture

Issue tags: +needs profiling

Adding a tag.

dawehner’s picture

Title: Bypass the autoloader for the most used classes » Bypass the classloader for the most used classes

.

wim leers’s picture

For reference: how long does a request to the /contact page take as the anonymous user, with the page cache off, and with the page cache on? Otherwise, this number is too machine-dependent to have significant meaning IMO.

(Nevertheless: 5 ms… wow.)

dawehner’s picture

(Nevertheless: 5 ms… wow.)

Note: These 5ms are are coming from 30ms see https://blackfire.io/profiles/abdb67e4-b76d-48de-a48c-22ba297b558d/graph

pounard’s picture

Symfony provides the bootstrap class performance helper http://symfony.com/doc/current/book/performance.html#use-bootstrap-files and I think that for Drupal most used files, such feature could be a radically boost. The idea being almost the same as the one proposed here except Symfony dumps the actual classes code into the bootstrap.cache.php and relieves PHP from even requiring them.

wim leers’s picture

Potentially related discussion: https://twitter.com/Crell/status/563740858240212994 — if that speed increase is coming from loading classes, then it might reduce the value of doing this.

dawehner’s picture

Here is a quick hack with some code of the symfony classloader component to produce the following bootstrap file, but it doesn't really work. Just posting my result so far.

fabianx’s picture

dawehner’s picture

Status: Active » Needs review
StatusFileSize
new29.96 KB
new4.41 KB

Just more generic work, but I'm seriously not convinced that it will actually safe us anything.

Feel free to experiment with it at any point!

xjm’s picture

Priority: Normal » Major

Discussed with @Wim Leers and @dawehner. This issue is major if it substantially improves the time spent on class loading. However, per @dawehner, this could be changed in a patch release without breaking anything, so there is no need for it to block release even if the improvement is more than ~1 ms. See: https://www.drupal.org/node/1744302#critical-perf-issues

Status: Needs review » Needs work

The last submitted patch, 12: 2421479-12.patch, failed testing.

fgm’s picture

The current version of this patch builds an empty PHP file. Checking why.

webchick’s picture

Issue tags: +D8 Accelerate Dev Days

Tagging.

fgm’s picture

Status: Needs work » Needs review
StatusFileSize
new2.15 KB

It was missing the bit where it actually builds the map. Let's see how it fares with it.

Status: Needs review » Needs work

The last submitted patch, 17: 2421479-bootstrap-16.patch, failed testing.

fabianx’s picture

For benchmarking this is fine, but a classmap / preload configuration needs to use an approach like:

https://www.drupal.org/node/1818628#comment-9650441 (in that patch I am dumping the classmap also via php storage, which is sane).

I don't think loading all classes of all the modules is a good idea though ...

pounard’s picture

is more than ~1 ms

This is hard to set an arbitrary value like this, and this patch will greatly improve nfs-based or slow disk-based environments meanwhile almost all developers are testing using SSD and very fast boxes. I think this patch is really major for bootstrap performances.

fabianx’s picture

apc_fetch is _as_ fast as the php loading opcode cache, so we don't gain much ... apc_fetch is very fast, too.

This issue is about actually _loading_ all classes though and I toyed with that too for my memory profiling (but did not benchmark the call), but memory wise it is not a good idea ...

fgm’s picture

Status: Needs work » Needs review
StatusFileSize
new29.9 KB

Previous patch was missing most files.

Status: Needs review » Needs work

The last submitted patch, 22: 2421479-21-bootstrap.patch, failed testing.

pounard’s picture

This issue is about actually _loading_ all classes though and I toyed with that too for my memory profiling (but did not benchmark the call), but memory wise it is not a good idea ...

If those classes are the most used, you won't loose much memory since they will be used, keep in mind that loading 500k of pre-compiled code in the OPCode worth probably almost nothing in memory compared to the rest of Drupal. There's no doubt why Symfony actually do such optimisation.

wim leers’s picture

Great work, @fgm, thanks for getting this working again :)

I think it'd be very interesting to 1) profile this, 2) push this to the extreme of loading all classes, and then profile this.

This issue is about actually _loading_ all classes though and I toyed with that too for my memory profiling (but did not benchmark the call), but memory wise it is not a good idea ...

Can you explain why it's not a good idea? Do you have some numbers?

fabianx’s picture

Yes, of course and that are not even all classes:

#2294569-22: [meta] Determine cause of high memory consumption

catch is already worried about loading all class names into memory, which is just 1 MB.

Also please don't disregard #1818628-44: Use Composer's optimized ClassLoader for Core/Component classes which does make the class loading overhead zero. (and it is green already!)

And in my benchmarks pre-loading was not faster than loading via a static cache pre-loader, but we can of course re-do that ...

For a Drupal installation via drush alone it is:

- 40 MB with opcode cache off, 14 MB with opcode cache on.

That is our base line ...

We are aiming to run Drupal 8 with 32 MB ...

IMHO we should be using #1818628-44: Use Composer's optimized ClassLoader for Core/Component classes as base for this work too. As all we need to is to call getClassMap() and load all classes in there, then compare to the class map based approach ...

wim leers’s picture

Quoting @pounard in #7:

Symfony provides the bootstrap class performance helper http://symfony.com/doc/current/book/performance.html#use-bootstrap-files and I think that for Drupal most used files, such feature could be a radically boost. The idea being almost the same as the one proposed here except Symfony dumps the actual classes code into the bootstrap.cache.php and relieves PHP from even requiring them.

I think we should give this serious consideration.

effulgentsia’s picture

StatusFileSize
new18.84 KB

I tried a quick test of #27. I installed Minimal profile, and then output which files are autoloaded prior to controller invocation for the /admin page. Here's the attached file list: 300 files.

I then ran a script that called require $file on each of these.

And another script that called require $file on a single file with the combined contents of all of these.

On my machine, the first one took 6ms and the second took 3ms (APC enabled in both cases). I think my machine is about half the speed of Wim's and Fabian's, so probably on theirs this might only be a 1.5ms savings.

Not sure if this is a level of savings that's worth the bother of a bootstrap aggregator. Also, this is on PHP 5.5, and I don't know if the savings would be larger or smaller on PHP 7.

wim leers’s picture

3 or even 1.5 ms is huge.

fabianx’s picture

Title: Bypass the classloader for the most used classes » Use a bootstrap.php file to load the most used classes in one go

Clarified title, that makes more sense :)

yched’s picture

+1 on the approach but: finding this concatenated file in your callstack is quite painful in debugging sessions. On a symfony project I worked on recently, I manually changed the front controller to not include it if the xdebug cookie is present.

Can we include a similar check if we do this ? Or maybe ensure that the include happens in a place where it's easily modified (index.php rather than some nested class) ?

catch’s picture

Concatenating into a single file risks running foul of opcache.max_file_size. This defaults to 0, unlike the apc_max_file_size with was 1M, so less of an issue with PHP 5.5+, but if a host has that set and the file is bigger then it'd make this a performance regression rather than improvement.

Between that and debugging (and maintainability - what if you apply a core patch from the queue? Are you expected to update the list?) I'd go for the long list of require statements - at least for core. If concat really is 3ms faster that's definitely worth looking at for contrib.

Another advantage of both approaches is we also save the lookup on APCu cache misses, and also make the necessary space in APCu less - since 300 less files to store in there. Not a huge deal but better than nothing.

Where it gets trickier is deciding on the list. I'd thought about an APCu backend with a single cache entry, then have a script that dumps that cache entry out to require statements - would mean the 'bootstrap list' could be built per site. If we want to include a list with core, something like effulgentsia's seems good. We'd only run into trouble with that if we moved/removed a file without updating the list, and that would fail hard.

As Fabianx pointed out, loading all classes like this concerns me because there's potentially a lot of code we don't need to load at all, so memory requirements get higher than they otherwise would be.

I've spent a bit of time trying to get numbers between require and the APCu autoloader, but not there yet.

catch’s picture

StatusFileSize
new22.77 KB

OK not quite the comparison I was after (was looking for a good way to trigger spl_autoload_call() without instantiating the class, instanceof doesn't work apparently), but I applied the attached patch and profiled before/after as an anonymous user, no page cache, standard profile, front page.

HEAD:
387 calls to spl_autoload_call()

Total Incl. Wall Time (microsec):	801,871 microsecs
Total Incl. CPU (microsecs):	573,508 microsecs
Total Incl. MemUse (bytes):	44,868,816 bytes
Total Incl. PeakMemUse (bytes):	44,997,296 bytes
Number of Function Calls:	42,156

Patch:
251 calls to spl_autoload_call()

Overall Summary	
Total Incl. Wall Time (microsec):	751,370 microsecs
Total Incl. CPU (microsecs):	554,634 microsecs
Total Incl. MemUse (bytes):	44,524,112 bytes
Total Incl. PeakMemUse (bytes):	44,644,768 bytes
Number of Function Calls:	41,122

1,000 less function calls is nice regardless of the other numbers. Also looks like we only needed 136 of the classes in files.php, but it was still an improvement of 300kb in memory which is mildly confusing. Diffing the runs showed that autoloader was the only difference between them.

effulgentsia’s picture

Or maybe ensure that the include happens in a place where it's easily modified (index.php rather than some nested class) ?

I think the include of the aggregate file should happen either in a middleware that runs after the page cache or in a high priority request listener. Since on a page cache hit, we don't need all those classes loaded. Not sure if there's any benefit in waiting till the request listener phase, such as having access to a more properly initialized PHPStorage. I think even if the aggregate only contains the files used during the request listener phase, it would still be the vast majority of what's in #28.

This would allow the building of the aggregate file to happen dynamically, perhaps also putting the logic into a module (whether core or contrib) that can be disabled during development. In terms of building dynamically, the module could register the middleware or high priority request listener, along with a low priority request listener, and in between the two, could decorate the autoloader to detect the files that need to be aggregated, and then dump the aggregate during the low priority request listener. Perhaps it can also split the aggregate into chunks smaller than the opcache.max_file_size setting.

catch’s picture

Before going too far down the aggregate file route, I'd like to know where the speed improvement really comes from compared to the require statements.

If we think the cost is in individual file inclusion, then opcache.revalidate_freq (apc.stat = 0) might improve things - strace is better than xhprof (i.e. xhprof is useless for this) for showing the difference there. Should be no stat/lstat at all in the strace.

If we don't have any stat calls and are only looking at open() and opcache internals, I'd be a bit surprised that a single file is really that much faster in opcache than 300 files - if it's a 3ms improvement that's 0.01ms per file.

wim leers’s picture

I can add one bit of relevant info here: while we were doing research for #2381473: Decrease the size of the asset library cache item, we saw that including a PHP file that is in OPCache, the include time scaled linearly with the size of the file.

pounard’s picture

@catch Also don't forget that you don't have any autoloader messing up when the class already exists, which means you don't have each autoloader woke up by the class loading, each autoloader does at least one operation which is splitting the class name, then trying to find if it matches the namespace it's taking care of. It's a huge number of string operations in userland PHP skipped by the fact that classes are already loaded. I don't know how D8 does it, but for example if you have one loader per module, or a a single loader which attempts to derivate the original module or component from the class name, by preloading you skip all this. I don't think the OPcode is the only place to look at, class loaders wake up are visible in the stack traces.

fabianx’s picture

#37: However both the static classmap and the APC cached autoloader are very very very fast.

I think the function call overhead of spl_auto_load is the only thing we look at here ... The apc_fetch call is neglectible.

catch’s picture

Also don't forget that you don't have any autoloader messing up when the class already exists, which means you don't have each autoloader woke up by the class loading, each autoloader does at least one operation which is splitting the class name, then trying to find if it matches the namespace it's taking care of.

I'm not forgetting that at all, I'm comparing the following two cases:

1. A file that contains a long list of require statements

2. A file that inlines the actual class definitions.

catch’s picture

Also apc_fetch() overhead isn't 100% negllible - xhprof says 7ms for 745 calls (yes we do that many calls on a normal page request). Some of that is function call overhead from xhprof itself, but still.

fabianx’s picture

Yes I know, but ClassLoader::merge() if we use the default Composer classloader with a class map also takes 3-7 ms ...

catch’s picture

Discussed in irc with dawehner, msonnabaum and neclimdul, I think we should try the following:

- add a modified APCu classloader that uses a single entry.

- add method to that classloader to dump the contents to a list of requires statements.

- have an empty controller.

- clear the classloader entry, then (with caches otherwise warmed) visit that page as an authenticated user

- check the list of classes into core.

- load them via the middleware as discussed

- contrib or custom that doesn't like the default list, could swap it out.

neclimdul’s picture

We could also play with iterating on the apcu entry and doing a require or some such magic. worst case might get pretty big though.

I also wonder how much the spl_autoloader ends up costing us and how much benefit this has over #1818628: Use Composer's optimized ClassLoader for Core/Component classes

Version: 8.0.x-dev » 8.1.x-dev

Drupal 8.0.6 was released on April 6 and is the final bugfix release for the Drupal 8.0.x series. Drupal 8.0.x will not receive any further development aside from security fixes. Drupal 8.1.0-rc1 is now available and sites should prepare to update to 8.1.0.

Bug reports should be targeted against the 8.1.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.2.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.1.x-dev » 8.2.x-dev

Drupal 8.1.9 was released on September 7 and is the final bugfix release for the Drupal 8.1.x series. Drupal 8.1.x will not receive any further development aside from security fixes. Drupal 8.2.0-rc1 is now available and sites should prepare to upgrade to 8.2.0.

Bug reports should be targeted against the 8.2.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.3.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.2.x-dev » 8.3.x-dev

Drupal 8.2.6 was released on February 1, 2017 and is the final full bugfix release for the Drupal 8.2.x series. Drupal 8.2.x will not receive any further development aside from critical and security fixes. Sites should prepare to update to 8.3.0 on April 5, 2017. (Drupal 8.3.0-alpha1 is available for testing.)

Bug reports should be targeted against the 8.3.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.4.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.3.x-dev » 8.4.x-dev

Drupal 8.3.6 was released on August 2, 2017 and is the final full bugfix release for the Drupal 8.3.x series. Drupal 8.3.x will not receive any further development aside from critical and security fixes. Sites should prepare to update to 8.4.0 on October 4, 2017. (Drupal 8.4.0-alpha1 is available for testing.)

Bug reports should be targeted against the 8.4.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.5.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.4.x-dev » 8.5.x-dev

Drupal 8.4.4 was released on January 3, 2018 and is the final full bugfix release for the Drupal 8.4.x series. Drupal 8.4.x will not receive any further development aside from critical and security fixes. Sites should prepare to update to 8.5.0 on March 7, 2018. (Drupal 8.5.0-alpha1 is available for testing.)

Bug reports should be targeted against the 8.5.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.6.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.5.x-dev » 8.6.x-dev

Drupal 8.5.6 was released on August 1, 2018 and is the final bugfix release for the Drupal 8.5.x series. Drupal 8.5.x will not receive any further development aside from security fixes. Sites should prepare to update to 8.6.0 on September 5, 2018. (Drupal 8.6.0-rc1 is available for testing.)

Bug reports should be targeted against the 8.6.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.7.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

andypost’s picture

Version: 8.6.x-dev » 8.9.x-dev

With php 7.4 preload and sf example it makes sense as opt-out
Ref https://symfony.com/blog/new-in-symfony-4-4-preloading-symfony-applicati...

Version: 8.9.x-dev » 9.1.x-dev

Drupal 8.9.0-beta1 was released on March 20, 2020. 8.9.x is the final, long-term support (LTS) minor release of Drupal 8, which means new developments and disruptive changes should now be targeted against the 9.1.x-dev branch. For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

Version: 9.1.x-dev » 9.2.x-dev

Drupal 9.1.0-alpha1 will be released the week of October 19, 2020, which means new developments and disruptive changes should now be targeted for the 9.2.x-dev branch. For more information see the Drupal 9 minor version schedule and the Allowed changes during the Drupal 9 release cycle.

Version: 9.2.x-dev » 9.3.x-dev

Drupal 9.2.0-alpha1 will be released the week of May 3, 2021, which means new developments and disruptive changes should now be targeted for the 9.3.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

andypost’s picture

This issue looks stale but PHP 8.0 JIT compiler working fine but could speed-up core a lot

longwave’s picture

I think this can be closed as outdated, we have autoload.classmap in core/composer.json for this, and IMO any further improvements should be handled upstream in Symfony or the Composer classloader.

berdir’s picture

Status: Needs work » Closed (outdated)

Yeah, I'd say close with reference to #3108687: Compatibility with/take advantage of code preloading in PHP 7.4 (which I just did, so closing :))

andypost’s picture

I think about tp outdate/update for jit related #2704571: Add an APCu classloader with a single entry