During the 8.x cycle we've introduced several known performance regressions compared to Drupal 7, which we need to resolve before release so that Drupal 8 isn't slower than Drupal 7.

This doesn't mean every single regression needs to be individually optimized away - in some cases it might be necessary to do that, or trying to micro-optimize it won't be worth the extra effort.

However there are places where we've introduced something with the intention that it will allow us to make performance or scalability enhancements elsewhere (like blocks as sub-requests for example), and introducing the performance regression without getting the nice performance feature at the end of it puts us in a not very happy place.

Opening this as a meta-issue to try to track those regressions as they get committed, along with the issues that are attempting to resolve them - making this a critical task since I'm not prepared to release Drupal 8 with obvious performance regressions compared to Drupal 7, it was bad enough from 6 to 7 and we shouldn't do that again.

Criteria for critical performance issues

A performance issue is critical by itself if some of the following are true:

  • There is concrete performance issue identified by profiling (MySQL, PHP or browser equivalent) and a viable plan to resolve it
  • It can't be committed to a patch-level version (8.0.0 => 8.0.1)
  • Over ~100ms or more savings with cold caches and would have to be deferred to a minor version
  • Over ~10ms savings with warm caches and would have to be deferred to 9.x
  • Over ~1ms or more with the internal page cache and would have to be deferred to 9.x
  • Gets measurably worse with lots of contrib modules or large data sets (e.g. non-indexed queries) and would have to be deferred to a minor version
  • Other specific issues at branch maintainer discretion

Working spreadsheet

For up-to-date information on work being done on performance and caching, see the Drupal 8 performance issue spreadsheet.

High priority issues

Several new core APIs have lost optimizations from Drupal 7 and earlier where multiple objects could be loaded with a single request from the database/cache (i.e. compare CMI to variable_init()), the following issues attempt to add some kind of multiple load/pre-loading/CacheCollector support to those systems:

Routes: #2058845: Pre-load non-admin routes (related: menu tree caching #1805054: Cache localized, access filtered, URL resolved, and rendered menu trees)
Configuration objects: #1880766: Preload configuration objects
State lookup: #1786490: Add caching to the state system
Plugin discovery caching: #2114319: Lots of cache requests from plugin discovery
Block context discovery: #2354889: Make block context faster by removing onBlock event and replace it with loading from a ContextManager

As a generic performance improvement, there is also a focus on enabling render caching by default for all entities, this is being tracked in the D8 cacheability tag:

Other general performance issues

Use the Performance tag to find any other performance issues. Filtering by 'critical' or 'major' should find the most serious ones.

Original issues where some of the more serious regressions were originally committed

This is an incomplete list, but it would be useful to track specific commits that made performance worse - feel free to add them here. However a lot of performance issues are introduced by new APIs and only become measurable as core is converted to the new API and backwards compatibility layers removed...
#1571632: Convert regional settings to configuration system
#1290694: Provide consistency for attributes and classes arrays provided by template_preprocess()
#1599108: Allow modules to register services and subscriber services (events)
#1535868: Convert all blocks into plugins
#916388: Convert menu links into entities
#636454: Cache tag support (the minimum lifetime removal and also #1848968: Too many checksum tag queries executed by the cache backend)
#1786490: Add caching to the state system
#1272870: No semantics for nested comments / bad for screen-readers
#1696640: Implement API to unify entity properties and fields
#2102777: Allow theme_links to use routes as well as href

Comments

tim.plunkett’s picture

Title: Resolve known performance regressions in Drupal 8 » [meta] Resolve known performance regressions in Drupal 8

Clarifying the title.

webchick’s picture

Just as a thresholds check, the following are preventing features from being committed to D8 atm:

#1187726: Follow-up: Add caching for configuration / rework config object loading (Was: Memory usage and i/o from config objects)
#1578090: Benchmark/profile kernel
#1743590: Isolated Block Rendering

All are related to performance. Is it possible we can reduce one or more to a "major" task (only 83 of those atm) and hinge them off this critical, rather than taking up 4 slots in the critical task queue?

catch’s picture

I bumped #1578090: Benchmark/profile kernel down to major.

The config and wscci/scotch issues deserve to be critical by themselves IMO since they're major new functionality that's seriously slowing things down atm but which both ought to be nicely fixable.

webchick’s picture

Fair enough; thanks for being flexible! :)

catch’s picture

Issue summary: View changes

Added property API.

catch’s picture

Issue summary: View changes

Updated issue summary.

Berdir’s picture

Issue summary: View changes

Updated issue summary.

Berdir’s picture

Added #1786490: Add caching to the state system to the list.

Edit: Closed the previous issue as a duplicate of that one.

Berdir’s picture

Issue summary: View changes

Updated issue summary.

pounard’s picture

Actual HEAD does 120 SQL queries on front page load logged as admin:

  1. 45 cache backend CacheBackendInterface::getMultiple() calls including 27 for config CachedStorage::read()
  2. 34 for cache backend CacheBackendInterface::checksumTags()
  3. 15 for the key value store getMultiple() (actually triggered by get()) triggered by menu API functions mostly
  4. The rest is random API pieces triggering queries for building stuff

Just posting that as a brief synthesis of some XDebug traces I just made.

chx’s picture

DatabaseStorageController::getFieldDefinitions surely could use some optimization, likely in the form of some well deserved caching.

chx’s picture

Issue summary: View changes

Added two twig related issues now that twig is in.

Berdir’s picture

Issue summary: View changes

Updated issue summary.

Berdir’s picture

Working a bit on the config and state caches. On a frontpage with 10 nodes from users with profile pictures, I'm seeing 204 queries (102 cache, 48 state).

With the latest patches from #1187726: Follow-up: Add caching for configuration / rework config object loading (Was: Memory usage and i/o from config objects) and #1786490: Add caching to the state system, I'm down to 91 (37 cache, zero state).

Also, the checksumTags() bug was identified and fixed a few days ago. That is down to 10 of the queries.

Berdir’s picture

Issue summary: View changes

Updated issue summary.

beejeebus’s picture

so, #1535868: Convert all blocks into plugins slowed down head by 15%, attempting to get some of that back in #1880766: Preload configuration objects

beejeebus’s picture

Issue summary: View changes

Updated issue summary.

beejeebus’s picture

Issue summary: View changes

add blocks as regression.

Berdir’s picture

Issue summary: View changes

Updated issue summary.

webchick’s picture

According to Alex's findings at #914382-145: Contextual links incompatible with render cache, D8 is now about 500% slower than stock D7. We should start ramping up our efforts in this area.

effulgentsia’s picture

#10 was viewing the front page with 5 node teasers. I just ran some numbers again (ab -c1 -n100) with no front page content at all (just hitting the front page immediately after a Standard profile install).

7.x HEAD: 61ms
8.x HEAD: 222ms (+264%)

If someone gets a substantially different ratio on a different machine, please share.

msonnabaum’s picture

I took a look at the front page difference and I also got a huge difference.

The biggest chunk I found was all EntityNG. 50ms alone spent in the magic methods.

fago’s picture

Yeah - right now we've BC-mode in use, what means we've an extra mapping layer on *each* entity property read or write. We need to move on with conversions such that we can remove that.

catch’s picture

Category: task » bug

222ms makes this a bug...

effulgentsia’s picture

webchick’s picture

#1855260: Page caching broken by accept header-based routing is going to bite us if we don't get that figured out.

effulgentsia’s picture

Some more info:

- I tried it again today, and got 230ms. Not sure if the extra 8ms is due to HEAD changes since #11, or random factors on my computer.
- If in _node_add_access(), I hard code a return FALSE at the top, that drops it down to 185ms. That's a way to isolate the effects of #1979094: Separate create access operation entity access controllers to avoid costly EntityNG instantiation and let us look for what other causes of regression there are.
- If in my settings.php, I uncomment the $settings['class_loader'] = 'apc'; line, that drops it down to 166ms. Yay for there being an easy way to remove autoloader inefficiency!
- 382 PHP files are loaded to show an anonymous home page with no content. And that's even with the early return in _node_add_access() mentioned above. Yowza! I thought that simply the require on that many files was a huge factor. But it turns out not to be. Changing my apc.stat configuration to 0 was able to shave off 5ms. And timing a script that just does a require on those files turned out to only be another ~10-20ms. I need to rewrite and rerun the script to get a more precise number, and will post that when I do, but the good news is that simply loading all that extra Symfony code and OOP Drupal code isn't where our biggest problems are.

catch’s picture

Opened #1983114: Make the autoloader swappable, ideally we'll allow for contrib to provide alternative autoloaders.

Also I hope everyone who thinks autoloading is done for performance reasons reads #17 ten times and repents.

larowlan’s picture

Not sure if it is relevant or not, but when we moved composer.json to the top level, we didn't rebuild the composer autoloader.
So all the paths specified in it are wrong.
You can see that in #1959660: Replace xpath() with WebTestBase::cssSelect() by leveraging Symfony CssSelector which is the first issue since then to add a new entry to composer.json entry and run composer update.
Could be a factor?

Berdir’s picture

@effulgentsia: I still can't reproduce your numbers, not even remotely. Can you provide some more information about your setup?

With uid 1 and no nodes on the frontpage, I get "Executed 137 queries in 10.14 ms. Queries exceeding 5 ms are highlighted. Page execution time was 134.94 ms. Memory used at: devel_boot()=4.61 MB, devel_shutdown()=13.81 MB, PHP peak=14 MB." That varies a bit, but not a lot. ab on frontpage is 96.918ms, a 404 page is 54ms.

- Is this a laptop, with/without power plugged in? (I have huge differences with and without power, @dawehner for example didn't)
- Is xhprof/xdebug enabled?
- How many queries, how long do they take? I do have a somewhat optimized mysql configuration and my queries are quite fast, given the number of them.
- When you test the front page, that means we still have to load and execute the view, and that's a considerably higher overhead than the old node_default_page() which was just a single query. A lot of that is one time overhead and is less and less relevant as you display more views/content. Might make more sense to compare a page that hasn't changed that much, e.g. 404.
- I'm also not seeing a big difference when I add the return FALSE to node_access(), possibly that's due to the entity field definitions cache that was commited today.
- Can you check how #1786490: Add caching to the state system and #1971158-15: Follow-up: Add loadMultiple() and listAll() caching to (cached) config storage affect those numbers? The second one only gets interesting with a lot of config files and configurations so you will probably not a see big difference with that but it's huge with real, large sites.

msonnabaum’s picture

Here's what I came up with this morning for a default front page with no content:

http://rpubs.com/msonnabaum/d8d7_response_times

That's with both xdebug and xhprof disabled, just throwing the output of microtime into a csv.

So there's clearly some xhprof overhead we're seeing, but it goes both ways. The difference is still rather staggering.

Owen Barton’s picture

We could also compare Drupal 6 + views with an empty node view, to Drupal 7 front page, which would allow us to quantify more of the non views related changes (of course views is changed also in D7, but I think the performance profile is probably still pretty similar).

I have also used the login form as a comparative benchmark in the past - it does a bit more work than a 404.

larowlan’s picture

See https://github.com/symfony/symfony/pull/8081 adds 4% performance gain in class loader

dcam’s picture

http://drupal.org/node/1427826 contains instructions for updating the issue summary with the summary template.

dcam’s picture

Issue summary: View changes

Added the 'menu links as entities' issue.

geoffreyr’s picture

Issue summary: View changes

Added #2002094 to performance improvements

geoffreyr’s picture

Issue summary: View changes

Added #2002104 to performance improvements

geoffreyr’s picture

Issue summary: View changes

Added #2002108 to performance improvements

geoffreyr’s picture

Issue summary: View changes

Added #2002222 to perf improvements

yannisc’s picture

I did some benchmarking yesterday comparing Drupal 7.22 with D8 dev. You can see more details here: http://www.netstudio.gr/en/blog/early-drupal-7-vs-drupal-8-performance-c....

In fact, I found this issue through a comment on the above blog post.

Pancho’s picture

Added #2029075: Configuration translation step in the installation takes a reeeeaaallly long time when installing in a non-English language to the issue summary. For non-English installs, this might be the most noticeable performance regression at all.

Pancho’s picture

Issue summary: View changes

Updated issue summary.

fgm’s picture

FWIW, on my machine (PHP 5.3, APC 3.1.9), I had

  • 61ms for 7.x right after standard install, no caching
  • 7ms for 7.x after enabling the page cache
  • 132ms for D8 right after standard install, no caching
  • 42ms for D8 with the page cache and the normal classloader
  • 33ms for D8 with the page cache and the APC classloader
larowlan’s picture

Pancho’s picture

fgm's metrics really look devastating.

But if we don't fear the results of a realistic comparison, we really should really define a number of representative configurations (e.g., first impression, basic site, typical site, feature-rich site, data-intensive site) plus a few targets plus two or three concurrency levels, and then start profiling all of these both continuously and automatically.

If we don't define fair profiling configurations ourselves, slightly simplistic comparisons that don't take D7 contrib into account, like the ones by Yannis or fgm, or by completely unexperienced people, will make performance parity an impossible goal, and might finally hurt our reputation regarding performance.
Even more now that Symfony2 hit the headlines for being an exceptionally slow framework. Would be nice to demonstrate that we're selectively leveraging "the best" from different PHP frameworks and are not bound to be even slower.

In the end, I'd really like to see a graph that nicely displays how performance improves from week to week, and in a few cases the D8 configuration would outperform the D7 one, in others it would stay behind, but altogether it would remain comparable. That should be our goal.

Eronarn’s picture

I'm currently working on automating Drupal 8 builds via Chef and Vagrant, and I should be done with that today or tomorrow. At that point I'll be building out a basket of representative D7 vs. D8 performance test sites over the coming weeks. My current targets include:

  • Stock install: Out of the box D8 vs. D7, no content on the site.
  • Brochure site: Mostly static D8 vs. D7+now-in-core modules. Basic Views, Panels, and content types.
  • Custom site: Larger, cache-backed site with some roles, configuration, custom blocks, etc.
  • Dynamic site: The above, but with dynamic content/conditions driven by simulated user registration, content creation, batch runs of VBOs, etc. Behind Varnish, and ideally leveraging ESI support (not sure how far along D8 is with this, though).

It'd be cool to get feedback about how to structure the environments, what modules to base them off of, what tests to run, and so on.

Pancho’s picture

@Eronarn:
That really sounds awesome!
Out of the blue, I can't exactly say which configurations would be the most relevant and correct, but we should have at least one multilingual configuration that extensively uses Entity translation, i18n and all the additional stuff we don't need anymore in D8.

Generally, we should leverage some of the more popular contrib modules that have been included into D8 core or which aren't necessary anymore. Instantly, these come into my mind:
WYSIWYG + CKEditor, Date module, Entity API, Entity reference, Entity Translation, Views, Profile2, Context, Administration Menu, Diff, RESTful API... what else?

[edit:] Removed Profile2 from the list - it's yet to be included: #1668292: Move simplified Profile2 module into core

pounard’s picture

#31

what else?

George Clooney.

Eronarn’s picture

#31

Thanks for the reminder about translation. That definitely wasn't on my radar, but is an important consideration. I will include a frontend performance monitoring component of this, so it should also be feasible to monitor node editing performance, including WYSIWYG.

Has anyone heard of Drush Make being ported to D8? Drush itself is fine, but the latter doesn't seem very functional right now. I could just tarball an entire site, but it'd be nice to something more easily versioned that I can point people to.

Owen Barton’s picture

It would be great to have a benchmark target (or targets) that could be used for different benchmarking and instrumentation activities. I think the "Dynamic site" is probably the biggest win, since it is the kind of site that causes most scaling challenges (in addition to just page load performance) Brochure type sites rarely have scaling challenges in my experience (although .

Given the rate of D8 development, I was assuming a script to configure the content structure and populate dummy content is pretty much a requirement - I doubt a database snapshot will last for long before schema changes break it. Not sure I understand using drush make with D8 yet - are there sufficient stable & API chasing contrib modules to make this worthwhile?

Eronarn’s picture

I'd prefer scripting using Drush Make plus some post-processing setup script leveraging Drush because that means the build is more standardized and easier to contribute to. It's not a requirement by any means, just intended to make it easier for people other than me to contribute to the build (pretty annoying to do if it's a huge git tree with all of core in it). If anyone has other suggestions, I'm totally open.

I don't think there are many ported contrib D8 modules yet (it's a pretty miserable process - I did this for Tracelytics/TraceView for DrupalCon and it already needs extensive rewrites). However, I want this to be something that will be run over the course of several months (probably will start off with the alpha releases but maybe move to nightlies if enough people are interested in setting up nodes), and I'm hopeful that we'll see more D8 contrib alphas and betas by then.

EDIT: Note that drush already works with cron, devel generate, etc. in D8. So that part of the scripting will be trivial.

Pancho’s picture

Pancho’s picture

Owen Barton’s picture

Awesome - totally agree that a scripted setup is what we need - probably devel generate will need some love. I think the make file will pretty much just be 2 lines that point at core (for D8 anyway, at least to start with), but it will do the job just fine :)

I wonder if it would be best to split performance test targets out as a separate issue (if there isn't an existing one), since this is supposed to be meta.

ParisLiakos’s picture

hello! PHP 5.5 / Apache 2.4.4 here

Fresh install d8 and d7: (logged in)

Overall Summary D7 D8
Total Incl. Wall Time (microsec): 132,380 microsecs 256,260 microsecs
Total Incl. CPU (microsecs): 128,000 microsecs 248,000 microsecs
Total Incl. MemUse (bytes): 15,651,360 bytes 11,777,208 bytes
Total Incl. PeakMemUse (bytes): 15,821,408 bytes 11,790,360 bytes
Number of Function Calls: 8,868 56,956

Like the memory usage?:)
(wasnt able to disable xdebug - had some nice segfaults once i did:P buggy 2.4.4 still so this should affect stuff)

I just did this for fun, mostly to check php5.5 and zend opcode cache..i found it interesting, so i posted it. i know that we cant actually compare vanilla d8 and d7 and also tests should be run with some content.
i am only posting it cause i found the memory usage interesting (which means that oop and autoloading stuff seems to work)

quicksketch’s picture

Apache 2.4.4 here

With Apache 2.4.4, I'm guessing that the req/s you're seeing here isn't Drupal delivering the page, it's the built-in cache of Apache 2.4 (similar to nginx or Varnish). I know different servers will get different results, but there's no way Drupal/PHP/MySQL is going to deliver 7800 req/s, even with Drupal's page cache.

ParisLiakos’s picture

edit: i removed ab results, apache obviously lies and it makes no sense anyway. just left the memory usage result, which was the only point i wanted to make:)

catch’s picture

If doing profiling, it's best to enable the APC ClassLoader in settings.php first - that's a known performance regression of dozens if not hundreds of milliseconds with an existing workaround.

fgm’s picture

FWIW, I did various measurements for my presentation at DevDays Dublin, and here are the differences for an anonymous home (10k hits/ concurrency 10):

Default classloader:

Time taken for tests: 46.318 seconds
Complete requests: 10000
Failed requests: 0
Write errors: 0
Requests per second: 215.90 [#/sec] (mean)
Time per request: 46.318 [ms] (mean)
Time per request: 4.632 [ms] (mean, across all concurrent requests)
Transfer rate: 1689.04 [Kbytes/sec] received

APC Classloader:

Time taken for tests: 32.602 seconds
Complete requests: 10000
Failed requests: 0
Write errors: 0
Requests per second: 306.73 [#/sec] (mean)
Time per request: 32.602 [ms] (mean)
Time per request: 3.260 [ms] (mean, across all concurrent requests)
Transfer rate: 2399.61 [Kbytes/sec] received

So this does indeed mean at 14ms (30%) speedup over the default classloader.

giorgio79’s picture

Instead of, or in addition to APC, perhaps Zend Opcache should be tested, as the php team included opcache instead of APC into php core (meaning noone will use APC after 5.5 :D )
https://blogs.oracle.com/opal/entry/using_php_5_5_s
I already isntalled php-opcache easily with yum for example for php 5.3.

fgm’s picture

Indeed on PHP 5.5 this is a must. If I understand correctly, there might not even be an official release of APC for 5.5, while OpCache is provided as a default, something APC never was.

catch’s picture

The APC classloader users the user cache provided by APC. This is completely different to opcode caching and Zend OpCache has no equivalent.

If ZendOpcache is used as an opcode cache, you can't also have APC installed to use the user cache, but there's the new APCu fork which just provides the user cache an is compatible.

fgm’s picture

Interesting to have a fork reducing functionality, that's not so common. It is available on PECL:
- http://pecl.php.net/package/APCu

...and the code is on Github:
- https://github.com/krakjoe/apcu

ISTR seeing on php-internals discussion about how the APC allocator was specifically optimized for its opcode caching tasks, and was not a good fit for user caching because of the difference in cache access patterns. The currently existing commits do not appear to have changed the logic, focusing on the removal of opcode-related features and general cleanup.

Damien Tournoud’s picture

Also see #2023325-15: Add interface for classloader so it can be cleanly swapped for my design of a class loader that is not stupidly inefficient and supports namespace introspection.

fgm’s picture

@Damien Tournoud: the stream wrapper idea is nice in theory, but implementing one means lots of low-level methods to implement, although most of them are individually rather simple. But the very fact that they are needed suggests lots of inter-method (hence user-space) calls.

This is a much more involved interface than a class loader, and I do not see what other parts of our code base could make use of this particular name space to justify the extra code involved. Of course, as always, this would need to be benchmarked against the alternatives.

Damien Tournoud’s picture

@fgm: you are talking about the cold path. The hot path has zero userspace intervention, which is the whole point.

penyaskito’s picture

penyaskito’s picture

penyaskito’s picture

msonnabaum’s picture

Issue summary: View changes

Removing 1187726, beejeebus said its no longer relevant.

msonnabaum’s picture

Issue summary: View changes

Removed 2033501 since that issue isn't directly related to a performance regression and the benefit is questionable.

msonnabaum’s picture

In the interest of keeping this issue focussed so we can work from it, I went ahead and removed some issues from the summary that were either no longer relevant or not directly related to a performance regression in d8.

msonnabaum’s picture

Issue summary: View changes

Removing 1800286 since it's not a D8 regression.

catch’s picture

Issue summary: View changes

Updated issue summary.

catch’s picture

Issue summary: View changes

Updated issue summary

xjm’s picture

Issue tags: +Prague Hard Problems
xjm’s picture

Issue summary: View changes

Updated issue summary.

catch’s picture

Issue summary: View changes

Updated issue summary.

catch’s picture

Issue summary: View changes

Updated issue summary.

catch’s picture

Issue summary: View changes

Adding another issue.

moshe weitzman’s picture

FYI, a small D8 Performance team has started meeting weekly to make progress on these issues. We use this Google Doc to help us prioritize, assign, etc. You can see issues that we recently fixed in that doc. If anyone wants to join the meeting, please contact me.

larowlan’s picture

@moshe, I'd like to be involved, timezones Permitting

xjm’s picture

Issue summary: View changes
joelpittet’s picture

Alan D.’s picture

A more recent performance comparison using Drupal 8 alpha10. This is includes a D7 standard install that is running "breakpoints, ctools, ckeditor, contact, edit, libraries, entity, views, views_ui" modules for a more accurate comparison.

http://www.appnovation.com/blog/drupal-7-vs-drupal-8-performance-comparison

[edit]
I have no affiliation with that blog. Simply running tests like this on a laptop would give inaccurate results! Please feel free to point people to better benchmarks. :)

Berdir’s picture

- Those numbers claim that non-cached pages as anonymous user are slower than they are for an admin, that seems weird? If that is really the case then that sounds like there might be a bug.
- Did you actually replace the frontpage with a view in 7.x? Just enabling views isn't enough, if you want to do a fair comparison, you will also need to have a view on the frontpage instead of the much faster 7.x node listing.
- Did you have any content on that site? I think that with the render cache now enabled for 8.x this might give 8.x a chance to catch up, especially when #2099131: Use #pre_render pattern for entity render caching is done. Will only help when the page cache is disabled of course.
- For the page cache case, I started two issues in Szeged that allow to return from page cache without having to load configuration. that is #2228215: Remove module check in DrupalKernel and #1576322: page_cache_without_database doesn't return cached pages without the database.

Berdir’s picture

Also, testing with concurrency is problematic, because it's heavily depending on how many CPU's and so on you have and many requests you can do in parallel, you're testing your system not Drupal. I suggest you don't do parallel request as that will give you a much better idea how long it takes to return one request.

pounard’s picture

Those numbers claim that non-cached pages as anonymous user are slower than they are for an admin, that seems weird? If that is really the case then that sounds like there might be a bug.

It does not suprises me, it's already what's happening in Drupal 7 in a lot of sites I had to profile, because admin users actually won't trigger a lot of access checks that anonymous or normal user would.

fgm’s picture

Also, testing with concurrency is important because it can make locking/contention problem visible where a single-user test will never exhibit it, and actual sites always run with concurrency.

catch’s picture

You're very unlikely to get a locking/concurrency issue with a read only test though. Needs authenticated users, posting of forms etc. to create the situations that can result in actual scaling issues. Also sometimes larger data sets etc. Can't use ab for that.

catch’s picture

I've started bumping some individual performance issues to critical. I think we need some kind of criteria to define 'critical performance issue'.

Not differentiating between front end and PHP here for now, and as with any issue we need to balance impact vs. disruption.

Unless something is atomic-level in that it prevents actual usage of the site due to slowness or memory requirements, it wouldn't normally be critical in itself, however Drupal 8 performance is currently significantly slower than Drupal 7, and this is due to lots of major issues that combined are covered by this meta.

We know from various sources that 100ms affects user perception and we also know from experience that Drupal core rarely has individual issues that result in 100ms of saving just by themselves. Therefore we need to ensure that enough 'major' issues are fixed prior to 8.0.0 and/or can be fixed soon afterwards, otherwise this meta stays open indefinitely with no definite end point.

I think we should consider promoting a performance issue to critical in its own right only if the following is true:

  • There is concrete performance issue identified by profiling (MySQL, PHP or browser equivalent) and a viable plan to resolve it
  • It can't be committed to a patch-level version
  • Over ~100ms or more savings with cold caches and would have to be deferred to a minor version
  • Over ~10ms savings with warm caches and would have to be deferred to 9.x
  • Over ~1ms or more with the internal page cache and would have to be deferred to 9.x

This ensures we don't end up with known, unresolvable, high-impact performance issues for the entire 8.x cycle, and focuses attention on those vs. ones that could be fixed easily in an early patch release.

Closing this issue should probably still be based on some kind of comparison against 7.x. Should encompass most or all of rebuilds/module install, cold caches, warm caches, authenticated, anonymous, light and heavy pages and the internal page cache. We don't need to be equivalent everywhere but we should know what the status is otherwise it's impossible to know which critical performance issues might be lurking.

moshe weitzman’s picture

Those sound like pretty strong criteria to me. There is always room for judgement, since some critical problems are only evident with lots of Contrib modules, huge menus, etc.

catch’s picture

Yes that's missing from #65, we should definitely add "gets measurably worse with lots of contrib modules or large data sets and would have to be deferred to a minor version" as an extra bullet point. Any non-indexed query (except under /admin perhaps) would to fall into that for example.

moshe weitzman’s picture

I posted a flamegraph of user/password to #2370667: [Meta] user/password flame graph. Hopefully it will shed some light on potential speed-ups.

effulgentsia’s picture

Issue tags: +Performance
peterx’s picture

Re the mentions of contributed modules.

A big performance leap forward over D7 would be examples and tutorials explaining best practices. D5, 6, and 7 had some excellent module developed with best practices at the start but the word did not spread and many D7 modules, event recent ones, present real problems.

The documentation would not have to be big, just point to examples in some of the optional core modules. The example code could contain references on the doc system to generate links into the current part of code. Can the comments have a #performance type tag. @performance?

I am happy to put the odd day here and there writing docs but the maintenance would be a real pain without something to tie the documentation and example together.

xjm’s picture

Issue summary: View changes

Added @catch's recommended criteria for individually critical issues to the summary (based on #65 - #67).

xjm’s picture

Issue summary: View changes
xjm’s picture

Issue summary: View changes
dawehner’s picture

chx’s picture

It comes up very often so I asked on Stackoverflow: what do we know about opcache vs class size?

You should check opcache.max_file_size option. This option can set a maximum file size to cache. Thus, big files can be skipped by opcode cacher. However, it defaults to 0, meaning all files will be cached.

Next option to check is opcache.max_accelerated_files. For big projects with Twig and annotations default value 2000 is not enought. Consider to increase it.

And the last one is opcache.memory_consumption. I noticed, that after reaching this limit, opcache won't add new items into the cache. So, increase it to 256M or 512M.

xjm’s picture

Issue tags: +Triaged D8 critical
dawehner’s picture

Issue summary: View changes

.

kim.pepper’s picture

joelpittet’s picture

This could have some nice improvements in D6, D7 and D8. #1443308: Add static cache to module_load_include() It's not a regression, but this seemed like the best meta to put this in, if another meta is better please point me.

kentr’s picture

Is the Drupal 8 Performance Issues spreadsheet still the reference list for performance issues?

It shows #1964922: When building a route, store the regexp as "Not yet started".

Alan D.’s picture

Starter if someone with edit rights wants to have a look.

From a single pass of all non-fixed lines

Line 14
#2296527: Improve Composer performance is marked as duplicate of #1818628: Use Composer's optimized ClassLoader for Core/Component classes (needs review)
Line 28
#1964922: When building a route, store the regexp (fixed)
Line 42
#2232609: Cacheable breadcrumbs block, and fix breadcrumb builders (needs work)
Line 76
#2429617: Make D8 2x as fast: Dynamic Page Cache: context-dependent page caching (for *all* users!) (needs work)
Line 82
#2429257: Bubble cache contexts (fixed)
Line 83
#2099137: Entity/field access and node grants not taken into account with core cache contexts (needs work)
Line 84
#2396333: BlockContentBlock ignores cache contexts required by the block_content entity (needs work)
Line 88
#2433591: Views using pagers should specify a cache context is duplicate of #2433599: Ensure every (non-views) pager automatically associates a matching cache context
Line 89
#2429617: Make D8 2x as fast: Dynamic Page Cache: context-dependent page caching (for *all* users!) (needs work)
Line 91
#2432837: Make cache contexts hierarchical (e.g. 'user' is more specific than 'user.roles') (needs review)
Line 109
#1867518: Leverage entityDisplay to provide fast rendering for fields (in progress)
Line 111
#941970: Views rebuilds the menu more than it needs to (in progress)
Line 121
#2381217: Views should set cache tags on its render arrays, and bubble the output's cache tags to the cache items written to the Views output cache (in progress)
Line 127
#2381277: Make Views use render caching and remove Views' own "output caching" (flagged postponed but blocker fixed)
Line 128
#2429257: Bubble cache contexts (fixed)
Line 133
#2217985: Replace the custom menu caching strategy in Toolbar with Core's standard caching. (listed as postponed but blocked on #1805054: Cache localized, access filtered, URL resolved, and rendered menu trees)
Line 143
#2248897: Fix AliasManager and AliasManagerCacheDecorator (in progress)
Line 168
#1014086: Race conditions, stampedes and cold cache performance issues with css/js aggregation (in progress)
Line 169
#886488: Add stampede protection for css and js aggregation (postponed by #1014086: Race conditions, stampedes and cold cache performance issues with css/js aggregation)
Line 196
#956186: Allow AJAX to use GET requests (in progress)
Line 199
#1945262: Replace custom weights with dependencies in library declarations; introduce "before" and "after" for conditional ordering (in progress)
Line 210
#1905334: Only load all modules when a hook gets invoked (in progress)
Line 212
#1762204: Introduce Assetic compatibility layer for core's internal handling of assets (postponed due to "Apparently we're marking all 8.1.x issues postponed for now, so doing that here too")
Line 217
#1597696: Consider whether HttpCache offers any significant benefit over the existing page cache Postponed (maintainer needs more info)

moshe weitzman’s picture

Made all those edits. Thanks Alan D.

webchick’s picture

Issue tags: +D8 Accelerate Dev Days

Tentatively tagging for the Performance sprint at Dev Days coming up.

Wim Leers’s picture

webchick’s picture

Tagging, as there's no clear "next steps" here.

peterx’s picture

One test not mentioned here is of the cache tagging system. What happens when the cache system searches the cache tag strings for a tag? Would it be better to have a separate tag index? A code sprint day could also teach performance measurement.

Measure. Coffee. Code. Measure. Repeat.

webchick’s picture

So of all the criticals, this one seems the most vague and least clear on what to do next and when we are done.

In the issue summary, we have the following that are still outstanding:

* #1805054: Cache localized, access filtered, URL resolved, and rendered menu trees (critical; actively being worked on)
* #1880766: Preload configuration objects (last comment from @alexpott in March asks Is this still something we want to pursue given #2248767: Use fast, local cache back-end (APCu, if available) for low-write caches (bootstrap, discovery, and config))
* #2354889: Make block context faster by removing onBlock event and replace it with loading from a ContextManager (last activity a couple of weeks ago)
* It also mentions issues with the D8 cacheability tag, of which we have #2429287: [meta] Finalize the cache contexts API & DX/usage, enable a leap forward in performance critical meta for that.
* It also mentions issues with the Performance tag, which we have been using to tag critical regressions such as #2263569: Bypass form caching by default for forms using #ajax..

So... given we're tracking critical issues related to cacheability, performance regressions, etc. already, I'm a bit confused on whether we still need this issue, and if so, what is the path to get it to fixed?

Wim Leers’s picture

So, just before DrupalCon LA, i.e. last Friday, I talked to @catch about closing #2470679: [meta] Identify necessary performance optimizations for common profiling scenarios, and merging it with this issue. He agreed with doing that. I just haven't gotten to it yet. Once I've moved all those issues over, I think this issue becomes significantly more actionable.

That being said, yes, this issue is definitely *very* meta, because we don't have concrete performance targets. I.e. we don't have a target of e.g. 50 ms for the front page as an authenticated user, 150 ms for all admin pages, etc. If we'd have such concrete targets, this would be less "meta". But we've never set such targets, so I'm not sure if we'd want to start doing that now.

catch’s picture

#88 is pretty accurate on the status here I think.

This issue comes down to what is acceptable performance for 8.0.x to ship with, and that hasn't really been defined.

Obviously anything the same or better than 7.x (or 6.x) we'd ship with.

In practice more or less everything is going to be slower.

Then it comes down to:

1. How much slower?

2. Are there any mitigating factors?

For example if internal page caching was 30% slower, but had a 10000% higher hit rate in a realistic load test scenario due to cache tags and cache_clear_all() nukage then that trade-off is pretty good. Or if 8.0.x is faster on PHP7 than 7.x is on PHP 5.4 on the same hardware, that also helps.

The outstanding issues in #87 (and X number of majors), once resolved, get us to an 8.x baseline for known improvements relative to current (and past) 8.0.x.

#2470679: [meta] Identify necessary performance optimizations for common profiling scenarios (to be folded intp this issue) tells us if there are further feasible optimizations to lower that baseline more (known unknowns :P). My profiling in that issue and more recently in #2488538: Add SafeMarkup::remove() to free memory from marked strings when they're printed suggests there's still some room to bring things down within the constraints of the beta, although not loads sadly.

A useful move towards answering what's acceptable would be comparing similar pages between 7.x and 8.x again. I only know of http://wimleers.com/blog/drupal-8-page-caching-enabled-by-default recently and that wasn't apples to apples.

Suggested pages to look at:

Warm cache:
/node/1
/node (with Views in both versions)
/admin/reports/status/php
/node/1 as anonymous with page caching enabled

Cold cache:
/node/1
/node with views in both versions
/admin/people/permissions

Then depending on the results of that, we might get a nice surprise, we might want to bump #2429617: Make D8 2x as fast: Dynamic Page Cache: context-dependent page caching (for *all* users!) to critical, and/or we might want to persevere with more profiling/optimizations to try to get things closer.

But the honest appraisal of where we stand on that comparison and narrowing any gaps as far as is feasible is what this issue was opened for and IMO it's still valid for that.

geerlingguy’s picture

To get some benchmarks, I fired up two instances of Drupal VM with identical PHP/Apache/MySQL specs (PHP 5.5 with opcache on, standard defaults as defined in Drupal VM's config), and compared Drupal 7 (7.37) vs Drupal 8 (HEAD).

Drupal Site Setup steps

  1. Install Standard install profile, adding drush/drush_generate for D7/D8 and views/ctools for D7.
  2. drush generate-content 100 to generate some content
  3. (D7 only) Enabled default 'Front page' view that comes with Views and set path to /node to override core /node page.
  4. (D8 only) Made sure default 'Front page' view that comes with Standard profile has path set to /node.

Other Notes:

  • drush cc all/drush cr was run in between each cold cache single request test.
  • Each test was an average of three test runs, with a clean environment each time, and with one preceding test run discarded.
  • In no case was standard deviation greater than 1-2% (so these numbers are pretty solid).

Drupal 7

Warmed cache, anonymous user (unless otherwise noted), page caching turned on (using ab -n 500 -c 1 http://drupaltest.dev/xyz):

Page / Scenario Result
/node/1 262 req/s
/node (as a View overriding core's /node path) 244 req/s
/admin/reports/status/php (authenticated) 268 req/s
/node/1 (authenticated) 258 req/s

Cold cache, single request, anonymous user (unless otherwise noted), page caching turned on (using ab -n 1 -c 1 http://drupaltest.dev/node/1):

Page / Scenario Result
/node/1 0.250s (4.00 req/s)
/node (as a View overriding core's /node path) 0.395s (2.53 req/s)
/admin/people/permissions (authenticated) 0.204s (4.91 req/s)

Drupal 8

Warmed cache, anonymous user (unless otherwise noted), page caching turned on (using ab -n 500 -c 1 http://drupaltest.dev/xyz):

Page / Scenario Result
/node/1 161 req/s
/node (as a View (D8's built-in front page View)) 131 req/s
/admin/reports/status/php (authenticated) 162 req/s
/node/1 (authenticated) 153 req/s

Cold cache, single request, anonymous user (unless otherwise noted), page caching turned on (using ab -n 1 -c 1 http://drupaltest.dev/node/1):

Page / Scenario Result
/node/1 1.606s (0.62 req/s)
/node (as a View (D8's built-in front page View)) 1.984s (0.50 req/s)
/admin/people/permissions (authenticated) 1.128s (0.89 req/s)
Wim Leers’s picture

Thanks, awesome profiling work! Could you redo it with concurrency 1, that avoids testing how the web server is able to deal with concurrency? Thanks :)

geerlingguy’s picture

@Wim Leers - Will do; I'll rerun the tests in a bit, and update the comment above when that's done.

I've re-run all the tests using only concurrency=1, and the spread seems to be about the same as using -c 10. In general, it seems the numbers are fairly consistent (just more req/s) up to -c 40 or so... at least on this particular VM/configuration with Apache.

  1. Warmed cache: Drupal 7 is about 1.7x faster.
  2. Cold cache: Drupal 7 is about 6x faster.
dawehner’s picture

With #2381277: Make Views use render caching and remove Views' own "output caching" moving forward, /node could be actually much much faster than before, just as example.

Fabianx’s picture

I think overall this is one of the last criticals we close before going to RC (and that is exactly how this ticket was planned).

The questions to ask are (where now means the current time point):

- Is performance overall acceptable now?
- Is scalability acceptable now?
- Is performance of the page cache acceptable now?
- Is the number of DB queries done on fully cached pages acceptable now?
- Is the number of DB queries done on cold-cache pages acceptable now?

And newly:

- Is cache invalidation performance acceptable now? (Yes, having cache tags means also we have new problems in overhead ...)

==

The more often we do this before RC, the better obviously, however at that point where we don't have any other criticals left, we really need to take again a very good look and decide.

We also need to see if there is anything left that is not able to go into 8.0.x or 8.1.x, but which is crucial for performance for years to come.

This is why this ticket is kinda a 'placeholder' of:

- Lets not forget to check performance before its too late
- Lets not check memory requirements one day-before-release (TM)

---

To the benchmarks:

I would love to see some authenticated warm-cache user benchmarks for /node, too. Especially now that we have views row cache in core.

peterx’s picture

Is anyone looking at storing the cache tags as discrete items to avoid the string scans of the tag columns?

Fabianx’s picture

#95: Not sure what you mean with discrete items?

Cache Tags are calculated as a checksum of tag strings, so in essence a key-value table.

moshe weitzman’s picture

So of all the criticals, this one seems the most vague and least clear on what to do next and when we are done.

webchick summed it up pretty well there. I'd like to share my point of view, having led a Performance team for the past 1.5 years (#55).

IMO, we should demote this issue to Major. This issue achieved its main goal, which was to decide on criteria for prioritizing a Perf improvement as Critical. Those criteria are in the IS. Based roughly on those criteria, we currently have 7 critical issues tagged with Performance. Those stand on their own merit as Criticals, and I see little benefit to additionally keeping this Meta as Critical.

There are occasional calls to keep this open as a general check for D7 versus D8 performance regression. IMO thats a misguided ideal:

  1. This issue was never about D7 => D8 comparison. It was about prioritization rules for potential speed improving issues
  2. Any comparison between the two platforms is arbitrary and subject to a tiresome bikeshed.
    1. We have also slowed the bootstrap and there are no low hanging fruit available for making up that time. That was a significant take-away from Dev Days Montpelier. This is not fixable in D8.
    2. We have vastly improved Drupal's caching (see Wim+fabianx demo and presentation) such that complex pages are significantly faster in D8. I could see promoting the BigPipe issue to Critical. I'm ambivalent about that.
  3. We have worked 4.5 years on Drupal 8 performance and we should just ship it. Perf is a journey that will continue long after 8.0.0.
Berdir’s picture

That works for me. We can also add some fancy tags to revisit before RC but we'll do that anyway I think...

mikeytown2’s picture

If D8 ships without big pipe that would suck. Trading this critical for that one would work for me. Adding some tags for looking at the issue again before an RC sounds good.

catch’s picture

Every time I profile a page, I still find actionable critical performance issues, such as #2494987: [meta-6] Reduce cold cache memory requirements (at least some of the sub-issues should probably be independently critical too). So while this issue in itself is not that useful as a meta, we still have critical-but-undiagnosed performance issues in core.

I could see putting this somehow into the pre-RC checklist - since we won't have a final idea what things look like until we're there. But performance is so bad in many places that we risk getting stuck with unresolvable issues due to API changes necessary to fix them.

#2254865: toolbar_pre_render() runs on every page and is responsible for ~15ms/17000 function calls was bumped to critical today and is an API change - and has been blocked for over a year on other work. In other words, there's always going to be work left to do, but I remain concerned about work left that we'll be prevented from doing unless we get the issues diagnosed sooner rather than later.

webchick’s picture

We talked about this on the core committer call today. I might get some details wrong, but here's what I remember:

1) We still do need a critical task (whether it's this one or #2470679: [meta] Identify necessary performance optimizations for common profiling scenarios or whatever) to do a "deep-dive" profiling and figure out where D8 is slow, especially areas that would necessitate a BC break to fix. We (well, everyone but catch) approved a D8 Accelerate grant for catch to do this work, hopefully next week. This doesn't negate the findings of the DevDays sprint, but we've learned some things since then, and also BigPipe/SmartCache is further along.

2) To ensure we catch (heh) any further regressions that might be introduced between now and whenever D8 ships, catch is also going to update #2485119: [meta] The Drupal 8.0.0-rc1 Release Checklist with a set of profiling scenarios to run through prior to tagging RC1.

3) Once both of those are done, we should be able to file/elevate issues individually as critical where it makes sense, and close this meta out. Goal is to do that on or before June 17 (our next core committer call).

webchick’s picture

Also, since this issue will be marked fixed relatively soon (yay!) I moved the criteria for critical performance issues into the issue priority doc directly: https://www.drupal.org/node/45111/revisions/view/8434349/8511363

Wim Leers’s picture

I said this in #88, but still haven't gotten around to doing it (apologies!):

So, just before DrupalCon LA, i.e. last Friday, I talked to @catch about closing #2470679: [meta] Identify necessary performance optimizations for common profiling scenarios, and merging it with this issue. He agreed with doing that. I just haven't gotten to it yet. Once I've moved all those issues over, I think this issue becomes significantly more actionable.

That being said, yes, this issue is definitely *very* meta, because we don't have concrete performance targets. I.e. we don't have a target of e.g. 50 ms for the front page as an authenticated user, 150 ms for all admin pages, etc. If we'd have such concrete targets, this would be less "meta". But we've never set such targets, so I'm not sure if we'd want to start doing that now.

Shall I still merge that other issue with this one, and migrate all child issues from there, to make this issue more actionable?

webchick’s picture

Sure, starting from there sounds great.

catch’s picture

Status: Active » Closed (duplicate)

i'm going to retire this issue in favour of #2470679: [meta] Identify necessary performance optimizations for common profiling scenarios which has more up-to-date information at the moment per #101.

Wim Leers’s picture

Good thing that I didn't merge it in the other direction yet :D