XHProf, a go to tool for profiling, actually impacts the profiling due to huge amount of function/method calls in Drupal 8. @msonnabaum suggested sampling as a lighter weight profiling technique. Initial results are promising. See the interactive SVG below, and annotated screenshot below that, where I list and circle potentially slow methods.
- The interactive flame graph
- Docs on how to read flame graphs. The gist is that you look for boxes that are wider than what you expect. Then you click on the box to more clearly see where the time is going, within that method. Hover over a box for sampling detail. Colors are not meaningful.
- Code for sampling and generating a flame graph
- This graph is taken from 1000 requests to the /user/password page of standard profile. I picked this page as it is relatively simple.
- Using PHP 5.4 with apc.stat=0 to reduce filesystem noise. Drupal is configured to use APC classloader.
Indeed a dead end, this takes the hit of opening the DB connection and initializing the module loader service.
service_container_prod::getStreamWrapperManagerService(). Perhaps a dead end as we take the inevitable hit of loading Module and DB classes here. Module loading can be moved to on-demand after .
service_container_prod::getRouterListenerService(). See .
Drupal\Core\Block\BlockBase::access()See . More specifically:
BlockBase::getConditionContexts()is the culprit. It should be statically cached, but isn't. It dispatches the
BlockEvents::CONDITION_CONTEXTevent for every block. Epic fail. But also a super easy fix! :) On the default front page, with XDebug and XHProf disabled, this takes 6 to 8 milliseconds. Statically caching would bring this down to ~4.5. Of course, with more blocks, on actual sites, the difference would be much bigger.
\Drupal\Core\Menu\MenuActiveTrail::getActiveTrailCacheKey(). Seems like an unoptimized query in
- Drupal\Core\Routing\RoutePreloader::loadNonAdminRoutes (necessary probably to avoid other stuff)
menu_local_tasksspends a lot of time on initializing necessary services and then even more time on actually getting the local tasks. Hopefully will bring salvation there.
- Last but far from least:
UserLoginBlock! This is taking up a pretty large amount of the time (>2%). About half of that (1%) is spent on generating a link and finding the alias for a URL. For that, we have . The other half (1%) is just the building of a form being expensive. If we could render cache the login block, this would go away… and thanks to we should be able to do that soonish.