Split off from #281405: Drupal 7.x can't be installed with memory_limit=32M
While installing the default profile, Drupal generates around 3000 calls to drupal_parse_info_file() (total time 10s, memory copied >1.3GB), 90000 calls to file_uri_scheme(), and 125000 calls to readdir().
The reason for the first: Whenever a module is installed, the module data is rebuilt. This causes every single info file to be re-parsed more than 30 times. "Hey, this module's .info data has been the same the last twenty times; maybe after enabling *this* module, it might change!" ;) The nasty part is that drupal_parse_info_format also calls get_defined_constants(), a particularly expensive function that allocates quite a bit of memory.
The reason for the second is that file_stream_wrapper_normalize() is called for every single directory entry discovered with file_scan_directory() during module discovery (which happens every time a module is enabled, which is the reason for the third). f_s_w_n() calls file_uri_scheme(). file_uri_scheme() attempts to split the protocol off a file name (a file name retrieved from readdir(), mark). The result is 90000 calls to explode() on strings that by definition don't contain the protocol delimiter.
It's pretty awe-inspiring. :P
I'd diagnose this as a bug in the module installation system in that it fails to adequately handle batch installation. Resetting the module registry may make sense when enabling a single module, but there has to be a way to avoid scanning the same file system and parsing the same info files so many times.
Comment | File | Size | Author |
---|---|---|---|
#48 | registry-do-not-rebuild-module-data-661420-48.patch | 5.45 KB | David_Rothstein |
#40 | registry_mode.patch | 16.08 KB | Anonymous (not verified) |
#37 | registry_mode.patch | 16.12 KB | Anonymous (not verified) |
#32 | registry_mode.patch | 12.22 KB | Anonymous (not verified) |
#23 | registry-do-not-rebuild-module-data-661420-23.patch | 962 bytes | David_Rothstein |
Comments
Comment #1
Anonymous (not verified) CreditAttribution: Anonymous commentedouch! investigationing...
Comment #2
cburschkaI hacked it out trivially, expecting everything to break:
Surprisingly, the site was actually installed, and works without major breakage (some caching issues, but that's to be expected with this hack). It cut 10 seconds off the 16 second installation, and even shaved 3MB off the peak memory (bringing it to 37MB).
Comment #3
catchI think we'd all like to be able to run the installer in 6 seconds instead of 16 seconds, and cutting 3MB of memory usage helps us get back under 32MB which is really shocking, so bumping status to critical.
Comment #4
Anonymous (not verified) CreditAttribution: Anonymous commentedi get a fatal error on install with the snippet above, which is not surprising. but i got to the fatal error very, very quickly :-)
Fatal error: Class 'DrupalQueue' not found in /var/www/drupal/7/modules/modules/update/update.install on line 13
Comment #5
mcrittenden CreditAttribution: mcrittenden commentedSub.
Comment #6
cburschkaI don't think the web installer will work at all with a hack this simple. You'll need to install non-interactively...
Comment #7
brianV CreditAttribution: brianV commentedsub
Comment #10
Anonymous (not verified) CreditAttribution: Anonymous commentedhmm, this kinda sucks. we have no access to a real cache_get/cache_set during install, so we can't do the obvious thing and just cache the .info file data and make sure it gets invalidated at the right time.
attached proof-of-concept patch makes use of the hack in DrupalFakeCache to see if the db is available + suppression of errors to make cache_set and cache_get available in drupal_parse_info_file(). no invalidation of those cache entries yet, just wanted to see if it would run and pass some tests.
Comment #11
Dries CreditAttribution: Dries commentedComment #12
carlos8f CreditAttribution: carlos8f commentedThis problem has been around a while. A while back I dissected the function calls in D6 that parse info files during install.php (since it was consistently hogging enormous resources), and I was very sorry I did. It's a real mess :P The problem is when you have lots of directories in your install profile, i.e. tinymce, things kind of blow up.
I would love to see progress on this, not just for shared hosts with ~ 32 MB memory. Since with the introduction of
drush installsite
Drupal sites can be set up programmatically--I know a lot of software-as-a-service applications that would benefit from the installer being 10 seconds faster and not devouring the poor server's resources.Comment #13
moshe weitzman CreditAttribution: moshe weitzman commentedI don't know why we lost that fakeCache during install. seems quite a good idea to me ... there some ninja debugging left in the patch.
Comment #14
Anonymous (not verified) CreditAttribution: Anonymous commentedtook out the 'ninja' debug stuff.
on the file_scan_directory stuff, that's very much by design - a registry rebuild needs to scan all files it knows about, and parse news ones/reparse ones that have changed.
don't think we can fix that without introducing another mode, where we distinguish between adding/removing modules and rebuilding the whole registry. that will undoubtedly be more complex, but would be a huge performance win.
pseudocode in module_enable:
and in includes/common.inc:
Comment #15
Anonymous (not verified) CreditAttribution: Anonymous commentedok, attached patch has a proof of concept registry_add_modules(). its a hack, but it gets drupal installed, and its *fast*. tests don't run just yet...
some feedback from Dries/moshe etc would be good - if this approach will be considered, i'll open a separate ticket to make it robust.
Comment #16
moshe weitzman CreditAttribution: moshe weitzman commentedlooks interesting to me. i think we need some registry experts in here like chx, crell, tha_sun, .... i don't see handling when disabling modules which maybe problematic.
Comment #17
moshe weitzman CreditAttribution: moshe weitzman commentedRemove misleading title (not about batch api)
Comment #18
Anonymous (not verified) CreditAttribution: Anonymous commented@moshe: yep, totally proof-of-concept, right now i'm concentrating on getting tests to run...
Comment #19
Anonymous (not verified) CreditAttribution: Anonymous commentedok, tests seem to at least run now, so lets see what the bot thinks. still a hack, but we can remove another old hack, preloadRegistry() in DrupalWebTestCase.
Comment #20
Anonymous (not verified) CreditAttribution: Anonymous commentedupdate patch, slightly different approach, little bit less hacky, but still really just want to see what the testbot thinks and get feedback.
Comment #21
Anonymous (not verified) CreditAttribution: Anonymous commentedand again, this time including the death of the $disable_modules_installed_hook hack.
Comment #22
Anonymous (not verified) CreditAttribution: Anonymous commentedahem, set the ctime and mtime of files to 0 so the registry thinks they've changed, and make sure system module is added. makes the tests that failed for me pass, so lets hope the bot comes back up soon...
Comment #23
David_Rothstein CreditAttribution: David_Rothstein commentedAny reason we can't just do something like the attached patch? It seems to provide the same speedup for me, and no obvious ill effects.
Comment #24
Anonymous (not verified) CreditAttribution: Anonymous commentednice idea, but i think it falls into the too simple part of 'as simple as possible, but no simpler'.
when we do a full rebuild of the registry, we reparse everything to handle cases like a developer adding another file to the list in module.info. your patch would never pick that up (or a bunch of such changes generally, like moving a module folder etc), so its not acceptable for the 'i really want a full rebuild case'. i considered the idea of a flag to _registry_rebuild that meant something like 'just assume things are all the same', but i thought the straight addition idea was more obvious from an API point of view.
Comment #25
Anonymous (not verified) CreditAttribution: Anonymous commentedopened another issue #664436: files listed in .info that don't contain classes or interfaces to deal with the fact that we still parse a bunch of files for no reason, given they don't contain any interfaces or classes.
Comment #26
ShutterFreak CreditAttribution: ShutterFreak commentedI do not consider myself being a Drupal core expert, but I decided to do a small research on general PHP memory gotchas - it appears that we could gain some memory if we consider the following items:
__unset()
in order to allowunset()
to properly clear resources.StdClass
objects as they are known to leak memory (they're flexible objects, and it's not straightforward for the PHP interpreter to release their memory in use).gc_enable()
).include()
method.Hope this helps.
Olivier
Comment #27
David_Rothstein CreditAttribution: David_Rothstein commented@justinrandell (#24): That's certainly true, but also true of dozens of other places that query the {system} table. I think in general we don't expect that changes to the filesystem are immediately picked up by Drupal, and that it's up to the calling code to call system_rebuild_module_data() if it cares about the difference. So for example, one thing my patch should have done (but didn't do) is add an explicit call to system_rebuild_module_data() inside drupal_flush_all_caches(), since I removed the implicit one.... but that's a somewhat rare case.
I guess I can see why the registry might be an exception to this rule, though. And registry_add_modules() is a nice new function - just seems like it might be introducing a fair amount of duplicate code (at least at the moment).
Comment #28
Crell CreditAttribution: Crell commented@ShutterFreak: Can you post links for the first three points you mention? I'm familiar with 4, but 2 and 3 in particular surprise me. (And #5 doesn't surprise me at all, given that it is PHP...)
Drupal uses stdClass objects, um, *everywhere*. :-)
Comment #29
cburschkaYeah, this is pretty much a standard pattern:
Of course, we might be able to switch to this pattern:
Assuming that is what fixes it. I'm not sure if I understood the bug fully.
The include() seems to be particularly focused on repetitive inclusions of the same file, which might be happening quite a bit in the theme layer with .tpl.php files.
Comment #30
Anonymous (not verified) CreditAttribution: Anonymous commentedi've created #664940: investigate general php memory consumption fixes to investigate the general memory issues, please continue that discussion there.
Comment #31
ShutterFreak CreditAttribution: ShutterFreak commented@Crell: here's one link explaining #1, #2 and #3: Why does this simple php script leak memory? | StackOverflow
In addition I came across another leaky PHP object pattern. The following will leak memory:
and the following apparently doesn't (or does to a lesser extent):
At least, on certain PHP versions.
Finally it appears the
($list as $element)
is also an interesting candidate for leaking memory as$element
is not unset after running the iteration. This can be costly in terms of memory if the iterator runs over objects (see the other object related remarks regarding destructors).Comment #32
Anonymous (not verified) CreditAttribution: Anonymous commentednew patch:
* factored out the db code loading so its not repeated
* made the registry preloading in simpletest a bit smarter, so we can stop faking the ctime/mtime values in registry_add_modules(), which makes tests faster
* grabbed the real module weights instead of faking it
is the testbot ever coming back?
a known problem to tackle next is any hook_registry_files_alter() implementations, notably simpletests.
Comment #33
Dries CreditAttribution: Dries commentedWow, that looks like a much more complicated patch than David's. I wonder why you considered David's approach to be "too simple".
Comment #34
Anonymous (not verified) CreditAttribution: Anonymous commentedDries: see #24 - it breaks anything that relies on a full registry_rebuild() picking up any changes that aren't in the system table.
Comment #35
catchLooks like the $exclude_files patch got mixed in with the last one.
I'd be very wary of anything which reduces the change of picking up filesystem changes, we already have outstanding bugs like #605374: Dont add nonexisting files to the registry
Comment #36
moshe weitzman CreditAttribution: moshe weitzman commentedI ran the drush installsite command while tracing was enabled in xdebug. This installs Drupal in one php invocation - no batch API. Here is the ugliness, sorted by memory-own. As mentioned earlier, get_defined_constants() is killer. good news is that we have some optimization opportunities here. See http://drupal.org/node/659980 for the analysis tools.
Comment #37
Anonymous (not verified) CreditAttribution: Anonymous commentedmoshe: i'm assuming you ran the drush command on HEAD? could you try it with this patch and report the numbers?
Comment #38
int CreditAttribution: int commentedIs the 2818 calls to get_defined_constants in #36 that makes the memory increase?
Can we use one cached drupal_get_defined_constats function, will improve?
Comment #39
Juanlu001 CreditAttribution: Juanlu001 commentedSubscribing.
Comment #40
Anonymous (not verified) CreditAttribution: Anonymous commentedwas trying to figure out why some book tests where failing, and it turns out that we have a stale theme registry on the test side. so, not doing a full registry rebuild when installing modules causes the theme registry to go stale? or put another way, doing a registry rebuild stops the theme registry from going stale!?! holy "code-that-only-works-because-of-a-hidden-side-effect" batman!
updated patch does calls drupal_theme_rebuild() in DWTC->setUp() and the book test fails go away.
Comment #41
ShutterFreak CreditAttribution: ShutterFreak commented@justinrandell: If I am not mistaken the theme registry was required to be invoked since Drupal 6 (D6) when performing some routine administrative tasks, e.g., visit the module list (admin/build/modules). This is apparently one of the main reasons why Drupal 5 (D5) is so fast. I don't know however which 'fundamental' change happened between D5 and D6 to require the theme registry (and menu router) to be rebuilt so frequently since D6.
It would be great if we were able to defer rebuilding the theme registry (and probably the menu router as well) until the point where it only needs to be run once. Or twice at most. But I fear this won't be easy to fix without changing the Drupal core API.
I have the impression there are a couple hooks that interfere (e.g., run hook_FOO requires to check which other FOO hooks were called, or will be called) and as a side effect require triggering a theme (or menu router) rebuild. Maybe those hooks are doing 2 unrelated things hence they should be split?
Comment #42
Crell CreditAttribution: Crell commented@ShutterFreak: Drupal 5 had no theme registry and the menu system was completely different. Those systems as we know them now didn't exist in D5.
Do you know of any such hooks that indirectly trigger rebuilds? I can't think of any off hand that would do so at odd times, and even without that the memory usage and page runtime are simply unacceptable at the moment.
Comment #43
ShutterFreak CreditAttribution: ShutterFreak commented@Crell: Unfortunately no. I am not too acquainted with the entire Drupal core API, and it's starting to become rather difficult to still understand what is going on when registering a module nowadays.
The only hook-linked method call I know propagates into all modules implementing a certain hook is
module_invoke_all()
. This method happens to be triggered in quite some places, and even a lot more in D7 than in D6, at least according to the API documentation.Comment #44
Crell CreditAttribution: Crell commentedmodule_invoke_all() is called a ton throughout Drupal, as it is one of Drupal's main bend-points for extensibility. It does not itself trigger a menu rebuild or a theme rebuild, though, and if anything it should be cheaper in Drupal 7 than it was in Drupal 6.
Comment #48
David_Rothstein CreditAttribution: David_Rothstein commentedRe #34, #35 and others - if we want to preserve the current behavior of registry_rebuild(), there's still a much simpler way to do it. See the attached patch.
I do agree that the registry_add_files() approach is a bit cleaner in terms of API, but it's also adding a fair amount of complexity to the registry. And I'm not sure how that works with hook_registry_files_alter(); the latest patch in #40 just removes that hook altogether? ... But, any patch that gets rid of that crazy $disable_modules_installed_hook parameter can't be too bad :)
Comment #49
cburschkaMaybe we need a $rebuild_parts_of_registry_likely_to_have_changed parameter?
( :-P )
Comment #50
Anonymous (not verified) CreditAttribution: Anonymous commentedlooks good to me, lets just leave the mess of accidental dependencies covered up by a registry rebuild alone until D8.
Comment #51
carlos8f CreditAttribution: carlos8f commented+1 to #48, it makes a lot of sense to split that up into two functions. module_enable() is so often used in loops and doing system_rebuild_module_data() every time is just silly :P
Comment #52
carlos8f CreditAttribution: carlos8f commentedI merged David's patch from #358815: drupal_get_install_files() is slow with a large tree with #48, and wow. We have a very very much faster installer.
In my benchmark I used the default profile and sprinkled my module folders with tinymce to simulate a real-life module folder mess. Regardless of how many modules are in the profile, each folder takes extra resources to scan for .info files. I also used a macbook for my server which isn't incredibly fast.
So that's 115 seconds down to 33 seconds. The difference is exaggerated since I have so many useless folders in there, but there are lots of cases where you have a pre-packaged Drupal app with tons of folders.
drush installsite
would of course run faster still, since the batch API overhead is skipped.Comment #53
Anonymous (not verified) CreditAttribution: Anonymous commentedComment #54
cburschkaIt would be wise to fix the testbot and wait for it to confirm this patch before committing. Failing to wait for the testbot has now broken HEAD at least three times this week.
Comment #55
Dries CreditAttribution: Dries commentedCommitted to CVS HEAD. We can't keep waiting for the test bot to come back so I ran the tests locally. I got 3 failures but they seemed unrelated. Hopefully someone can kick the testbots and figure out those failures in parallel. If not, I can always stop committing patches later on. Thanks all.
Comment #57
MustangGB CreditAttribution: MustangGB commentedHope it's alright to get this ported to 6.x
Comment #58
moshe weitzman CreditAttribution: moshe weitzman commentedNot possible. D7 has changed too much.
Comment #59
David_Rothstein CreditAttribution: David_Rothstein commentedRight, I would suggest creating a different issue if there is a problem in D6 that can be fixed.
(The patch committed here involved fixes to the registry, which did not exist in D6 at all, so it would not be a matter of porting the patch but rather writing an entirely different, unrelated one.)
Comment #60
David_Rothstein CreditAttribution: David_Rothstein commented