Problem/Motivation
By @pwolanin:
In the block manager and other places in core we have to use truly horrible code like this to get the human-readable name of a module:
function getModuleName($module) {
// Gather module data.
if (!isset($this->moduleData)) {
$this->moduleData = system_get_info('module');
}
// If the module exists, return its human-readable name.
if (isset($this->moduleData[$module])) {
return $this->t($this->moduleData[$module]['name']);
}
// Otherwise, return the machine name.
return $module;
}
Why is that so horrible? Well it requires calling out to a procedural function AND system_get_info() calls system_rebuild_module_data() and none of this is cached even statically.
This was also pretty bad in Drupal 7, but in the worst case you could get the module info out of the database. In 8, none of that seems to be available.
By @Berdir:
system_get_info() calls right into system_rebuild_module_data(), which parses all .info.yml files. And that's slow, even with the apcu file cache that I have applied already. Without it, it's much worse.
We used to have some pretty complicated and arcane caching there, but we removed it, as we no longer needed that function during bootstrap.
However, we still call it on some places, like /admin/people, to group the permission list by module. On a default D8 installation, 50%/200ms is spent in that function on that page.
Proposed resolution
Cache the module data into the cache backend as well as statically in system_get_info() and clean the cache when we do a system_rebuild_module_data(). Stop loading system_rebuild_module_data()/system_get_info() if we can use the ModuleHandler::getName() method instead.
Remaining tasks
N/A
User interface changes
N/A
API changes
Not really an API change but in the current version of the patch ModuleHandler::getName()
now returns the module name as it had been input in case it can't find the actual name, instead of giving an error.
Beta phase evaluation
Issue category | Bug because significant slowdown on certain pages |
---|---|
Issue priority | Major because significant slowdown |
Prioritized changes | The main goal of this issue is performance |
Comment | File | Size | Author |
---|---|---|---|
#63 | 2281989-63.patch | 17.62 KB | stefan.r |
#63 | interdiff-59-63.txt | 877 bytes | stefan.r |
#59 | interdiff-52-59.txt | 1.42 KB | stefan.r |
#59 | 2281989-59.patch | 17.53 KB | stefan.r |
#56 | xhprof1.png | 92.01 KB | stefan.r |
Comments
Comment #1
YesCT CreditAttribution: YesCT commentedlinking to the homestretch as the parent.
Comment #2
dawehnerTo be honest this does not really belongs to the module handler ... the module handler not cares about the meta information
about modules ... on the other hand I don't know a really good service this belongs to
Related issue where similar issues got discussed: #2208429: Extension System, Part III: ExtensionList, ModuleExtensionList and ProfileExtensionList, #2228093: Modernize theme initialization
Comment #3
pwolanin CreditAttribution: pwolanin commentedstil todo I guess
Comment #4
Wim LeersComment #5
stefan.r CreditAttribution: stefan.r commentedI'll try to give this a go over the next days :)
Comment #6
pwolanin CreditAttribution: pwolanin commentedIn the dup we had - the following suggestion, but I'm not sure that's sufficient? Or is the human-readable name (& description?) the only thing we are looking for often (vs. e.g. requirements, dependencies, etc)
Proposed resolution
We already have the module list with filename and pathname in the container. Won't hurt much to put the module name in there as well, and then we can offer a method on ModuleHandler that returns it?
Comment #7
BerdirFine with closing the other as duplicate but then let's use the same category/priority/tag as that one had, a single function taking 50% of the wall time is a major performance bug IMHO
Comment #8
dawehnerI honestly think that we should keep the container as slim as possible and adding the module name for all of them for a really minor usecase is not a good thing IMHO.
Instead we should expose those meta information about all extensions as its own thing, if you ask me, see #2208429: Extension System, Part III: ExtensionList, ModuleExtensionList and ProfileExtensionList
Comment #9
pwolanin CreditAttribution: pwolanin commentedTalking to dawehner, there are 2 places roughly where we cache overlapping info
core/lib/Drupal/Core/DrupalKernel.php line 952 calls:
Where is where the
%container.modules%
data comes form that's set on the module handler. Sadly, in that location it seems not to be parsing the .info file, so we don't have the human-readable name.In addition, in core/modules/system/system.module line 1004:
core/lib/Drupal/Core/Extension/ThemeHandler.php line 422:
The use of state for these seems a little weird. Possibly that should be replaced or wrapped by implementing the service described in #2208429: Extension System, Part III: ExtensionList, ModuleExtensionList and ProfileExtensionList
Comment #10
BerdirHaving the extension handler would be great, but I think I would prefer a smaller fix that is easier to achieve as a first step.
I just noticed that ModuleHandler *already has* the API to get the module name: ModuleHandlerInterface::getName(). And it calls out to system_rebuild_module_data().
Just finding a better way to implement this and then switch to use that wherever we just need the name would already improve a lot of cases where system_get_info() is used.
Comment #11
BerdirDiscussed this with @dawehner, and managed to convince him that this is a useful first step. This takes 1.8s on my production site every time it is called, IMHO, that more than qualifies some sub-optimal small and local performance improvements instead of waiting for a full refactoring of the whole extension discovery/info system.
So..
- Find a way to make getName() faster. Whether we call out to state(), or pass in the module name as part of the existing data structure, I don't care so much. A few extra calls to state() are fine with me if we want to avoid having to dump this in the container.
- Check all calls to system_get_info(), update them to use this API when possible (when it just needs the name)
- Add a @todo for #1744302: [meta] Resolve known performance regressions in Drupal 8.
Also switching the meta issue.
Comment #12
BerdirComment #13
stefan.r CreditAttribution: stefan.r commentedThis patch simply saves the module info into state with key
system.module.data
while we're rebuilding module info anyway insystem_rebuild_module_data()
.It also adds a
getInfo($module)
method on theModuleHandler
which does exactly whatsystem_get_info('module', $module)
used to do, except it gets the module info from state. This method is now used bygetName()
as well.It also adds a fallback to the machine name of the module in
ModuleHandler::getName()
in case the name can't be found, as @Berdir had already implemented inMenuLinkDefaultForm::getModuleName()
.It seems there are already calls to
system_rebuild_module_data()
everywhere where state would need to be reset.Comment #15
dawehnerGiven that this is just a temporary solution until #2208429: Extension System, Part III: ExtensionList, ModuleExtensionList and ProfileExtensionList is done, it seems more sane to not add getInfo() itself to the interface.
Comment #16
stefan.r CreditAttribution: stefan.r commentedComment #18
stefan.r CreditAttribution: stefan.r commented@daweher I think we can change the issue title then? :)
Comment #19
stefan.r CreditAttribution: stefan.r commentedDisregard #18, that was a patch for another issue...
Comment #20
stefan.r CreditAttribution: stefan.r commentedThis replaces another few calls to
system_get_info()
by calls togetName()
Comment #22
stefan.r CreditAttribution: stefan.r commentedComment #23
BerdirThis still calls directly into system_rebuild_module_data(), so that will still trigger the rebuild every time?
Comment #25
stefan.r CreditAttribution: stefan.r commentedYes that should be calling
system_get_info()
.Comment #26
stefan.r CreditAttribution: stefan.r commentedCan we get rid of
system_get_module_admin_tasks()
still? It is only being called fromAdminController
andHelpController
, it needs an info file in the second argument only to take the name out.Comment #27
BerdirNot sure. We could offer a new version, for example as a trait that both controllers can use and deprecate it for removal in 9.0. And if the core committers are OK with removing it, then we can still do that too, would be easy then.
Comment #28
stefan.r CreditAttribution: stefan.r commentedCool, we do so in this issue or in a followup?
Comment #29
dawehnerThank you for keeping the amount of changes limited.
That sounds certainly like follow up work, but looking at that method, extracting that into somehow its own thing seems doable.
Comment #30
stefan.r CreditAttribution: stefan.r commentedOK I have created a followup issue over at #2466933: Change $info array argument to system_get_module_admin_tasks() to $name
Comment #31
dawehnerLooks ready for me. I hope we can get done with proper reorganization of all that various mess around.
Comment #32
stefan.r CreditAttribution: stefan.r commentedUpdated issue summary
Comment #33
stefan.r CreditAttribution: stefan.r commentedComment #34
catchWhy state and not just a cache entry?
State should really be used for information that can't be rebuilt if it goes missing, which isn't the case here.
Comment #35
BerdirSorry, but this doesn't quite work yet :)
Preparing a longer response, just want to make sure this is not committed yet.
Ah, @catch already commented.
Comment #36
Berdiradmin/people is now actually slower than before. This because we call PermissionHandler::getPermissions(), which calls sortPermissions(), which calls ::getModuleNames(), which calls systemRebuildModuleData(), which just a wrapper for system_rebuild_module_data().
So that means we do both now. We rebuild and we load and unserialize the data from state. And it's a lot of data to unserialize.
Which shows the flaw the current approach. cache get/set must happen in the same function. Otherwise direct calls to the other function, which we still have a lot, constantly re-write that data, and that's not cheap.
Doing it in system_get_info() means we can optimize it to only cache enabled modules. That can make quite a difference.
I was even thinking about doing the cache/state for just the module names and not the whole info. There's hardly a case where we need anything but the name for any page were performance matters? Then we'd do that directly in ModuleHandler imho, load from cache on the first call and then return from there. And don't touch system_get_info().
So, before we can RTBC this again, we definitely need to do profiling to make sure were actually improving the situation :)
Comment #37
stefan.r CreditAttribution: stefan.r commentedgetModuleNames should actually be calling system_get_info(). No need to rebuild.
Comment #38
stefan.r CreditAttribution: stefan.r commentedWe actually don't have that much direct calls to system_rebuild_module_data, it's only the one you pointed out in #36 that seems misplaced, the other ones actually need to trigger a cache reset (on update, on module install/uninstall etc)
I could also see the point of having other data than just the name available (ie. configure page, description), but when we get the patch right we can profile whether there's a difference between storing the name vs the full processed info.
Comment #39
stefan.r CreditAttribution: stefan.r commentedThis should solve the gist of the performance issue, anything beyond this we can micro-optimize if needed. The info files by themselves are not that big/complicated so I can't imagine the unserialization to be that expensive. But we can profile whether making it smaller makes a difference.
@Berdir when you say "Otherwise direct calls to the other function, which we still have a lot, constantly re-write that data, and that's not cheap.", you are referring to calls to
system_rebuild_module_data()
? I really don't see that many of them (where we should be using cached data).Comment #41
stefan.r CreditAttribution: stefan.r commentedComment #42
stefan.r CreditAttribution: stefan.r commentedIt's a 150ms improvement here but there seems to be some room for shaving off a few ms, it's still spending too much time on unserializing/cache_gets when it only needs a couple of names.
Comment #43
stefan.r CreditAttribution: stefan.r commentedComment #44
stefan.r CreditAttribution: stefan.r commentedCan we profile this to see how much better this is and whether we need to optimize this further?
Comment #46
stefan.r CreditAttribution: stefan.r commentedComment #47
stefan.r CreditAttribution: stefan.r commentedComment #48
stefan.r CreditAttribution: stefan.r commentedComment #51
stefan.r CreditAttribution: stefan.r commentedSo I have looked at this in xhprof and on admin/people the execution time for
system_get_info()
goes from taking up 21% of the load time to down to about a millisecond.Comment #53
dawehnerI really like the patch in general because it outlines where we really want to rebuild the module data and where we just use it to
gain some information which is out in the system.
#2208429: Extension System, Part III: ExtensionList, ModuleExtensionList and ProfileExtensionList will introduce some proper listing service to be able to retrieve the information
I'm curious, do we call system_get_info() so often that its worth to use the faster drupal_static pattern?
Nitpick: The ModuleHandler has a different namespace
Comment #54
stefan.r CreditAttribution: stefan.r commentedOn admin/people we get the full list of modules several times and then get the name of each one separately from
system_get_info()
, so we end up doing >100 calls to that function. In any case I assume that static will be turned into a protected property anyway as soon as we implement that listing service.Comment #55
stefan.r CreditAttribution: stefan.r commentedComment #56
stefan.r CreditAttribution: stefan.r commentedComment #57
BerdirStrange, what page did you test this on, admin/people? Do you have a lot of users there? What is taking a lot of time? Make sure that you don't have xdebug and xhprof/uprofiler enabled at the same time. I can see that you're not using the CPU flag, that's also a big overhead.
I'm asking because the difference for system_rebuild_module_data() is about the same for me... but the total wall time is 367ms before and 199ms after, making this a 45% improvement and 108k instead of 260k function calls, a 60% improvement. I don't have any users, though. but if that would be the reason, I'd expect to see more function calls in your run.
Anyway, this looks good to me. Great to see this resolved.
Comment #58
alexpottI don't understand why we need the word
consecutive
. Could this comment be:Comment needs re-flowing.
Comment #59
stefan.r CreditAttribution: stefan.r commentedYes the revised comment makes more sense. "consecutive" was from when that comment was elsewhere in a previous version of this patch.
Comment #60
pwolanin CreditAttribution: pwolanin commentedDo we need the extra code complexity of using this?
Comment #61
stefan.r CreditAttribution: stefan.r commentedYou mean do we need to use the drupal_static_fast pattern at all? Or that the way it's implemented is unclear?
It prevents over 100 calls to drupal_static on select pages but it's not thousands either, so it's just shaving off a couple hundred microseconds -- if we rather have clearer code I am happy to take it out?
Comment #62
pwolanin CreditAttribution: pwolanin commentedYes, I mean do we need to use the drupal_static_fast pattern at all?
In general, I'd look to use that only in things used heavily every page. I think the speedup here for admin pages will already be significant, so simpler code is better.
Comment #63
stefan.r CreditAttribution: stefan.r commentedYes, we actually don't really need it, it's likely not even a millisecond difference...
Comment #64
Wim LeersBack to RTBC since it was RTBC and fixed only the recent nitpicks :)
Comment #65
pwolanin CreditAttribution: pwolanin commentedAre we worried about adding a new/heavy dependency to Permissions, FieldPluginBase, etc?
Comment #67
effulgentsia CreditAttribution: effulgentsia at Acquia commentedThis looks like a step in the right direction, so committed and pushed to 8.0.x. Thanks!
Some thoughts for optional follow-ups if anyone's inspired to pursue them:
I think this is a good candidate for the 'discovery' bin rather than the default bin. That would let it benefit from APCu.
Sad that there's a little circularity here: the module handler calls this function, which calls back to the module handler. Maybe that can be improved in #2208429: Extension System, Part III: ExtensionList, ModuleExtensionList and ProfileExtensionList.
Shame that we need a system_get_info() here, but #2466933: Change $info array argument to system_get_module_admin_tasks() to $name can help with that.
I don't think it's a problem for the plugins changed by this patch. I think it would be a problem if we needed it on FieldPluginBase, but I don't see why we do.