Closed (outdated)
Project:
Drupal core
Version:
9.3.x-dev
Component:
base system
Priority:
Major
Category:
Task
Assigned:
Unassigned
Reporter:
Created:
6 Feb 2015 at 14:00 UTC
Updated:
15 Jul 2021 at 14:26 UTC
Jump to comment: Most recent, Most recent file

Comments
Comment #1
dawehnerThis is certainly work in progress.
Comment #2
dawehnerEmbedded the actual wanted image.
Comment #3
dawehnerAdding a tag.
Comment #4
dawehner.
Comment #5
wim leersFor reference: how long does a request to the
/contactpage take as the anonymous user, with the page cache off, and with the page cache on? Otherwise, this number is too machine-dependent to have significant meaning IMO.(Nevertheless: 5 ms… wow.)
Comment #6
dawehnerNote: These 5ms are are coming from 30ms see https://blackfire.io/profiles/abdb67e4-b76d-48de-a48c-22ba297b558d/graph
Comment #7
pounardSymfony provides the bootstrap class performance helper http://symfony.com/doc/current/book/performance.html#use-bootstrap-files and I think that for Drupal most used files, such feature could be a radically boost. The idea being almost the same as the one proposed here except Symfony dumps the actual classes code into the bootstrap.cache.php and relieves PHP from even requiring them.
Comment #8
wim leersPotentially related discussion: https://twitter.com/Crell/status/563740858240212994 — if that speed increase is coming from loading classes, then it might reduce the value of doing this.
Comment #9
dawehnerHere is a quick hack with some code of the symfony classloader component to produce the following bootstrap file, but it doesn't really work. Just posting my result so far.
Comment #10
fabianx commentedComment #11
fabianx commentedComment #12
dawehnerJust more generic work, but I'm seriously not convinced that it will actually safe us anything.
Feel free to experiment with it at any point!
Comment #13
xjmDiscussed with @Wim Leers and @dawehner. This issue is major if it substantially improves the time spent on class loading. However, per @dawehner, this could be changed in a patch release without breaking anything, so there is no need for it to block release even if the improvement is more than ~1 ms. See: https://www.drupal.org/node/1744302#critical-perf-issues
Comment #15
fgmThe current version of this patch builds an empty PHP file. Checking why.
Comment #16
webchickTagging.
Comment #17
fgmIt was missing the bit where it actually builds the map. Let's see how it fares with it.
Comment #19
fabianx commentedFor benchmarking this is fine, but a classmap / preload configuration needs to use an approach like:
https://www.drupal.org/node/1818628#comment-9650441 (in that patch I am dumping the classmap also via php storage, which is sane).
I don't think loading all classes of all the modules is a good idea though ...
Comment #20
pounardThis is hard to set an arbitrary value like this, and this patch will greatly improve nfs-based or slow disk-based environments meanwhile almost all developers are testing using SSD and very fast boxes. I think this patch is really major for bootstrap performances.
Comment #21
fabianx commentedapc_fetch is _as_ fast as the php loading opcode cache, so we don't gain much ... apc_fetch is very fast, too.
This issue is about actually _loading_ all classes though and I toyed with that too for my memory profiling (but did not benchmark the call), but memory wise it is not a good idea ...
Comment #22
fgmPrevious patch was missing most files.
Comment #24
pounardIf those classes are the most used, you won't loose much memory since they will be used, keep in mind that loading 500k of pre-compiled code in the OPCode worth probably almost nothing in memory compared to the rest of Drupal. There's no doubt why Symfony actually do such optimisation.
Comment #25
wim leersGreat work, @fgm, thanks for getting this working again :)
I think it'd be very interesting to 1) profile this, 2) push this to the extreme of loading all classes, and then profile this.
Can you explain why it's not a good idea? Do you have some numbers?
Comment #26
fabianx commentedYes, of course and that are not even all classes:
#2294569-22: [meta] Determine cause of high memory consumption
catch is already worried about loading all class names into memory, which is just 1 MB.
Also please don't disregard #1818628-44: Use Composer's optimized ClassLoader for Core/Component classes which does make the class loading overhead zero. (and it is green already!)
And in my benchmarks pre-loading was not faster than loading via a static cache pre-loader, but we can of course re-do that ...
For a Drupal installation via drush alone it is:
- 40 MB with opcode cache off, 14 MB with opcode cache on.
That is our base line ...
We are aiming to run Drupal 8 with 32 MB ...
IMHO we should be using #1818628-44: Use Composer's optimized ClassLoader for Core/Component classes as base for this work too. As all we need to is to call getClassMap() and load all classes in there, then compare to the class map based approach ...
Comment #27
wim leersQuoting @pounard in #7:
I think we should give this serious consideration.
Comment #28
effulgentsia commentedI tried a quick test of #27. I installed Minimal profile, and then output which files are autoloaded prior to controller invocation for the
/adminpage. Here's the attached file list: 300 files.I then ran a script that called
require $fileon each of these.And another script that called
require $fileon a single file with the combined contents of all of these.On my machine, the first one took 6ms and the second took 3ms (APC enabled in both cases). I think my machine is about half the speed of Wim's and Fabian's, so probably on theirs this might only be a 1.5ms savings.
Not sure if this is a level of savings that's worth the bother of a bootstrap aggregator. Also, this is on PHP 5.5, and I don't know if the savings would be larger or smaller on PHP 7.
Comment #29
wim leers3 or even 1.5 ms is huge.
Comment #30
fabianx commentedClarified title, that makes more sense :)
Comment #31
yched commented+1 on the approach but: finding this concatenated file in your callstack is quite painful in debugging sessions. On a symfony project I worked on recently, I manually changed the front controller to not include it if the xdebug cookie is present.
Can we include a similar check if we do this ? Or maybe ensure that the include happens in a place where it's easily modified (index.php rather than some nested class) ?
Comment #32
catchConcatenating into a single file risks running foul of
opcache.max_file_size. This defaults to 0, unlike the apc_max_file_size with was 1M, so less of an issue with PHP 5.5+, but if a host has that set and the file is bigger then it'd make this a performance regression rather than improvement.Between that and debugging (and maintainability - what if you apply a core patch from the queue? Are you expected to update the list?) I'd go for the long list of require statements - at least for core. If concat really is 3ms faster that's definitely worth looking at for contrib.
Another advantage of both approaches is we also save the lookup on APCu cache misses, and also make the necessary space in APCu less - since 300 less files to store in there. Not a huge deal but better than nothing.
Where it gets trickier is deciding on the list. I'd thought about an APCu backend with a single cache entry, then have a script that dumps that cache entry out to require statements - would mean the 'bootstrap list' could be built per site. If we want to include a list with core, something like effulgentsia's seems good. We'd only run into trouble with that if we moved/removed a file without updating the list, and that would fail hard.
As Fabianx pointed out, loading all classes like this concerns me because there's potentially a lot of code we don't need to load at all, so memory requirements get higher than they otherwise would be.
I've spent a bit of time trying to get numbers between require and the APCu autoloader, but not there yet.
Comment #33
catchOK not quite the comparison I was after (was looking for a good way to trigger spl_autoload_call() without instantiating the class, instanceof doesn't work apparently), but I applied the attached patch and profiled before/after as an anonymous user, no page cache, standard profile, front page.
HEAD:
387 calls to spl_autoload_call()
Patch:
251 calls to spl_autoload_call()
1,000 less function calls is nice regardless of the other numbers. Also looks like we only needed 136 of the classes in files.php, but it was still an improvement of 300kb in memory which is mildly confusing. Diffing the runs showed that autoloader was the only difference between them.
Comment #34
effulgentsia commentedI think the include of the aggregate file should happen either in a middleware that runs after the page cache or in a high priority request listener. Since on a page cache hit, we don't need all those classes loaded. Not sure if there's any benefit in waiting till the request listener phase, such as having access to a more properly initialized PHPStorage. I think even if the aggregate only contains the files used during the request listener phase, it would still be the vast majority of what's in #28.
This would allow the building of the aggregate file to happen dynamically, perhaps also putting the logic into a module (whether core or contrib) that can be disabled during development. In terms of building dynamically, the module could register the middleware or high priority request listener, along with a low priority request listener, and in between the two, could decorate the autoloader to detect the files that need to be aggregated, and then dump the aggregate during the low priority request listener. Perhaps it can also split the aggregate into chunks smaller than the
opcache.max_file_sizesetting.Comment #35
catchBefore going too far down the aggregate file route, I'd like to know where the speed improvement really comes from compared to the require statements.
If we think the cost is in individual file inclusion, then opcache.revalidate_freq (apc.stat = 0) might improve things - strace is better than xhprof (i.e. xhprof is useless for this) for showing the difference there. Should be no stat/lstat at all in the strace.
If we don't have any stat calls and are only looking at open() and opcache internals, I'd be a bit surprised that a single file is really that much faster in opcache than 300 files - if it's a 3ms improvement that's 0.01ms per file.
Comment #36
wim leersI can add one bit of relevant info here: while we were doing research for #2381473: Decrease the size of the asset library cache item, we saw that including a PHP file that is in OPCache, the include time scaled linearly with the size of the file.
Comment #37
pounard@catch Also don't forget that you don't have any autoloader messing up when the class already exists, which means you don't have each autoloader woke up by the class loading, each autoloader does at least one operation which is splitting the class name, then trying to find if it matches the namespace it's taking care of. It's a huge number of string operations in userland PHP skipped by the fact that classes are already loaded. I don't know how D8 does it, but for example if you have one loader per module, or a a single loader which attempts to derivate the original module or component from the class name, by preloading you skip all this. I don't think the OPcode is the only place to look at, class loaders wake up are visible in the stack traces.
Comment #38
fabianx commented#37: However both the static classmap and the APC cached autoloader are very very very fast.
I think the function call overhead of spl_auto_load is the only thing we look at here ... The apc_fetch call is neglectible.
Comment #39
catchI'm not forgetting that at all, I'm comparing the following two cases:
1. A file that contains a long list of require statements
2. A file that inlines the actual class definitions.
Comment #40
catchAlso apc_fetch() overhead isn't 100% negllible - xhprof says 7ms for 745 calls (yes we do that many calls on a normal page request). Some of that is function call overhead from xhprof itself, but still.
Comment #41
fabianx commentedYes I know, but ClassLoader::merge() if we use the default Composer classloader with a class map also takes 3-7 ms ...
Comment #42
catchDiscussed in irc with dawehner, msonnabaum and neclimdul, I think we should try the following:
- add a modified APCu classloader that uses a single entry.
- add method to that classloader to dump the contents to a list of requires statements.
- have an empty controller.
- clear the classloader entry, then (with caches otherwise warmed) visit that page as an authenticated user
- check the list of classes into core.
- load them via the middleware as discussed
- contrib or custom that doesn't like the default list, could swap it out.
Comment #43
neclimdulWe could also play with iterating on the apcu entry and doing a require or some such magic. worst case might get pretty big though.
I also wonder how much the spl_autoloader ends up costing us and how much benefit this has over #1818628: Use Composer's optimized ClassLoader for Core/Component classes
Comment #50
andypostWith php 7.4 preload and sf example it makes sense as opt-out
Ref https://symfony.com/blog/new-in-symfony-4-4-preloading-symfony-applicati...
Comment #54
andypostThis issue looks stale but PHP 8.0 JIT compiler working fine but could speed-up core a lot
Comment #55
longwaveI think this can be closed as outdated, we have autoload.classmap in core/composer.json for this, and IMO any further improvements should be handled upstream in Symfony or the Composer classloader.
Comment #56
berdirYeah, I'd say close with reference to #3108687: Compatibility with/take advantage of code preloading in PHP 7.4 (which I just did, so closing :))
Comment #57
andypostI think about tp outdate/update for jit related #2704571: Add an APCu classloader with a single entry