Split off from #281405: Drupal 7.x can't be installed with memory_limit=32M

While installing the default profile, Drupal generates around 3000 calls to drupal_parse_info_file() (total time 10s, memory copied >1.3GB), 90000 calls to file_uri_scheme(), and 125000 calls to readdir().

The reason for the first: Whenever a module is installed, the module data is rebuilt. This causes every single info file to be re-parsed more than 30 times. "Hey, this module's .info data has been the same the last twenty times; maybe after enabling *this* module, it might change!" ;) The nasty part is that drupal_parse_info_format also calls get_defined_constants(), a particularly expensive function that allocates quite a bit of memory.

The reason for the second is that file_stream_wrapper_normalize() is called for every single directory entry discovered with file_scan_directory() during module discovery (which happens every time a module is enabled, which is the reason for the third). f_s_w_n() calls file_uri_scheme(). file_uri_scheme() attempts to split the protocol off a file name (a file name retrieved from readdir(), mark). The result is 90000 calls to explode() on strings that by definition don't contain the protocol delimiter.

It's pretty awe-inspiring. :P

I'd diagnose this as a bug in the module installation system in that it fails to adequately handle batch installation. Resetting the module registry may make sense when enabling a single module, but there has to be a way to avoid scanning the same file system and parsing the same info files so many times.

CommentFileSizeAuthor
#48 registry-do-not-rebuild-module-data-661420-48.patch5.45 KBDavid_Rothstein
#40 registry_mode.patch16.08 KBAnonymous (not verified)
#37 registry_mode.patch16.12 KBAnonymous (not verified)
#32 registry_mode.patch12.22 KBAnonymous (not verified)
#23 registry-do-not-rebuild-module-data-661420-23.patch962 bytesDavid_Rothstein
#22 registry_mode.patch7.95 KBAnonymous (not verified)
#21 registry_mode.patch7.61 KBAnonymous (not verified)
#20 registry_mode.patch5.38 KBAnonymous (not verified)
#19 registry_mode.patch9.63 KBAnonymous (not verified)
#15 registry_mode.patch7.98 KBAnonymous (not verified)
#10 cache_install.patch4.21 KBAnonymous (not verified)
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Anonymous’s picture

ouch! investigationing...

cburschka’s picture

I hacked it out trivially, expecting everything to break:

// In install.php
module_enable(array($module), FALSE, FALSE);
...

function module_enable($module_list, $disable_modules_installed_hook = FALSE, $registry_rebuild = TRUE) {
...

  if (!empty($invoke_modules) && $registry_rebuild) {
    // Refresh the module list to exclude the disabled modules.
    system_list_reset();
    module_list(TRUE);
    module_implements('', FALSE, TRUE);
    // Force to regenerate the stored list of hook implementations.
    registry_rebuild();

Surprisingly, the site was actually installed, and works without major breakage (some caching issues, but that's to be expected with this hack). It cut 10 seconds off the 16 second installation, and even shaved 3MB off the peak memory (bringing it to 37MB).

catch’s picture

Priority: Normal » Critical

I think we'd all like to be able to run the installer in 6 seconds instead of 16 seconds, and cutting 3MB of memory usage helps us get back under 32MB which is really shocking, so bumping status to critical.

Anonymous’s picture

i get a fatal error on install with the snippet above, which is not surprising. but i got to the fatal error very, very quickly :-)

Fatal error: Class 'DrupalQueue' not found in /var/www/drupal/7/modules/modules/update/update.install on line 13

mcrittenden’s picture

Sub.

cburschka’s picture

i get a fatal error on install with the snippet above,

I don't think the web installer will work at all with a hack this simple. You'll need to install non-interactively...

brianV’s picture

sub

Anonymous’s picture

Status: Active » Needs review
FileSize
4.21 KB

hmm, this kinda sucks. we have no access to a real cache_get/cache_set during install, so we can't do the obvious thing and just cache the .info file data and make sure it gets invalidated at the right time.

  // Load the cache infrastructure using a "fake" cache implementation that
  // does not attempt to write to the database. We need this during the initial
  // part of the installer because the database is not available yet. We
  // continue to use it even when the database does become available, in order
  // to preserve consistency between interactive and command-line installations
  // (the latter complete in one page request and therefore are forced to
  // continue using the cache implementation they started with) and also
  // because any data put in the cache during the installer is inherently
  // suspect, due to the fact that Drupal is not fully set up yet.
  require_once DRUPAL_ROOT . '/includes/cache.inc';
  require_once DRUPAL_ROOT . '/includes/cache-install.inc';
  $conf['cache_default_class'] = 'DrupalInstallCache';

attached proof-of-concept patch makes use of the hack in DrupalFakeCache to see if the db is available + suppression of errors to make cache_set and cache_get available in drupal_parse_info_file(). no invalidation of those cache entries yet, just wanted to see if it would run and pass some tests.

Dries’s picture

Issue tags: +Favorite-of-Dries
carlos8f’s picture

This problem has been around a while. A while back I dissected the function calls in D6 that parse info files during install.php (since it was consistently hogging enormous resources), and I was very sorry I did. It's a real mess :P The problem is when you have lots of directories in your install profile, i.e. tinymce, things kind of blow up.

I would love to see progress on this, not just for shared hosts with ~ 32 MB memory. Since with the introduction of drush installsite Drupal sites can be set up programmatically--I know a lot of software-as-a-service applications that would benefit from the installer being 10 seconds faster and not devouring the poor server's resources.

moshe weitzman’s picture

Status: Needs review » Needs work

I don't know why we lost that fakeCache during install. seems quite a good idea to me ... there some ninja debugging left in the patch.

Anonymous’s picture

took out the 'ninja' debug stuff.

on the file_scan_directory stuff, that's very much by design - a registry rebuild needs to scan all files it knows about, and parse news ones/reparse ones that have changed.

don't think we can fix that without introducing another mode, where we distinguish between adding/removing modules and rebuilding the whole registry. that will undoubtedly be more complex, but would be a huge performance win.

pseudocode in module_enable:

  if (!empty($invoke_modules)) {
    // Refresh the module list to exclude the disabled modules.
    system_list_reset();
    module_list(TRUE);
    module_implements('', FALSE, TRUE);
    // Add the new modules files to the registry.
    registry_add_modules($invoke_modules);
    // Refresh the schema to include the new enabled module.
    drupal_get_schema(NULL, TRUE);

    // If any modules were newly installed, execute the hook for them.
    if (!$disable_modules_installed_hook && !empty($modules_installed)) {
      module_invoke_all('modules_installed', $modules_installed);
    }
  }

and in includes/common.inc:

/**
 * Add the files for the given modules to the registry.
 *
 * @param $modules
 *   The list of modules being enabled.
 */
function registry_add_modules(array $modules) {
  // The hard bit goes here. Perhaps we make registry_rebuild accept a list of modules
  // instead of always building the list itself, or we refactor so we don't duplicate the 
  // shared stuff between this function and the larger rebuild.
}
Anonymous’s picture

FileSize
7.98 KB

ok, attached patch has a proof of concept registry_add_modules(). its a hack, but it gets drupal installed, and its *fast*. tests don't run just yet...

some feedback from Dries/moshe etc would be good - if this approach will be considered, i'll open a separate ticket to make it robust.

moshe weitzman’s picture

looks interesting to me. i think we need some registry experts in here like chx, crell, tha_sun, .... i don't see handling when disabling modules which maybe problematic.

moshe weitzman’s picture

Title: Batch installation of modules is hugely inefficient » Installation of modules is hugely inefficient

Remove misleading title (not about batch api)

Anonymous’s picture

@moshe: yep, totally proof-of-concept, right now i'm concentrating on getting tests to run...

Anonymous’s picture

Assigned: Unassigned »
Status: Needs work » Needs review
FileSize
9.63 KB

ok, tests seem to at least run now, so lets see what the bot thinks. still a hack, but we can remove another old hack, preloadRegistry() in DrupalWebTestCase.

Anonymous’s picture

FileSize
5.38 KB

update patch, slightly different approach, little bit less hacky, but still really just want to see what the testbot thinks and get feedback.

Anonymous’s picture

FileSize
7.61 KB

and again, this time including the death of the $disable_modules_installed_hook hack.

Anonymous’s picture

FileSize
7.95 KB

ahem, set the ctime and mtime of files to 0 so the registry thinks they've changed, and make sure system module is added. makes the tests that failed for me pass, so lets hope the bot comes back up soon...

David_Rothstein’s picture

Any reason we can't just do something like the attached patch? It seems to provide the same speedup for me, and no obvious ill effects.

Anonymous’s picture

nice idea, but i think it falls into the too simple part of 'as simple as possible, but no simpler'.

when we do a full rebuild of the registry, we reparse everything to handle cases like a developer adding another file to the list in module.info. your patch would never pick that up (or a bunch of such changes generally, like moving a module folder etc), so its not acceptable for the 'i really want a full rebuild case'. i considered the idea of a flag to _registry_rebuild that meant something like 'just assume things are all the same', but i thought the straight addition idea was more obvious from an API point of view.

Anonymous’s picture

opened another issue #664436: files listed in .info that don't contain classes or interfaces to deal with the fact that we still parse a bunch of files for no reason, given they don't contain any interfaces or classes.

ShutterFreak’s picture

I do not consider myself being a Drupal core expert, but I decided to do a small research on general PHP memory gotchas - it appears that we could gain some memory if we consider the following items:

  1. Reduce the scope of variables to the strict minimum required to function properly. After this, one may decide to trade memory for speed.
  2. When using classes in PHP, we need to make sure they implement __unset() in order to allow unset() to properly clear resources.
  3. If possible, we should avoid StdClass objects as they are known to leak memory (they're flexible objects, and it's not straightforward for the PHP interpreter to release their memory in use).
  4. Watch out for objects with recursive references as they're known to leak memory. There are workarounds but they must be implemented and called. Apparently these workarounds are no longer needed when you're running PHP 5.3 or newer; however it appears you then still need to manually enable the garbage collector once by calling gc_enable()).
  5. There's apparently a PHP memory leak in certain PHP versions when calling the include() method.

Hope this helps.

Olivier

David_Rothstein’s picture

@justinrandell (#24): That's certainly true, but also true of dozens of other places that query the {system} table. I think in general we don't expect that changes to the filesystem are immediately picked up by Drupal, and that it's up to the calling code to call system_rebuild_module_data() if it cares about the difference. So for example, one thing my patch should have done (but didn't do) is add an explicit call to system_rebuild_module_data() inside drupal_flush_all_caches(), since I removed the implicit one.... but that's a somewhat rare case.

I guess I can see why the registry might be an exception to this rule, though. And registry_add_modules() is a nice new function - just seems like it might be introducing a fair amount of duplicate code (at least at the moment).

Crell’s picture

@ShutterFreak: Can you post links for the first three points you mention? I'm familiar with 4, but 2 and 3 in particular surprise me. (And #5 doesn't surprise me at all, given that it is PHP...)

Drupal uses stdClass objects, um, *everywhere*. :-)

cburschka’s picture

Yeah, this is pretty much a standard pattern:

$object = (object) array(
  'property' => 'value',
);

Of course, we might be able to switch to this pattern:

$object = new DrupalObject(array(
  'property' => 'value'
));

class DrupalObject {
  function __construct($data) {
    foreach ($data as $key => $value) {
      $this->$key = $value;
    }
  }

  function __unset() {
    foreach ($this as $key => $value) {
      unset($this->$key);
    }
  }
}

Assuming that is what fixes it. I'm not sure if I understood the bug fully.

The include() seems to be particularly focused on repetitive inclusions of the same file, which might be happening quite a bit in the theme layer with .tpl.php files.

Anonymous’s picture

i've created #664940: investigate general php memory consumption fixes to investigate the general memory issues, please continue that discussion there.

ShutterFreak’s picture

@Crell: here's one link explaining #1, #2 and #3: Why does this simple php script leak memory? | StackOverflow

In addition I came across another leaky PHP object pattern. The following will leak memory:

 foreach ($rss->channel as $channel) {
  /* do something */
} 

and the following apparently doesn't (or does to a lesser extent):

 $foo = $rss->channel;
foreach ($foo as $channel) {
  /* do something */
} 

At least, on certain PHP versions.

Finally it appears the ($list as $element) is also an interesting candidate for leaking memory as $element is not unset after running the iteration. This can be costly in terms of memory if the iterator runs over objects (see the other object related remarks regarding destructors).

Anonymous’s picture

FileSize
12.22 KB

new patch:

* factored out the db code loading so its not repeated

* made the registry preloading in simpletest a bit smarter, so we can stop faking the ctime/mtime values in registry_add_modules(), which makes tests faster

* grabbed the real module weights instead of faking it

is the testbot ever coming back?

a known problem to tackle next is any hook_registry_files_alter() implementations, notably simpletests.

Dries’s picture

Wow, that looks like a much more complicated patch than David's. I wonder why you considered David's approach to be "too simple".

Anonymous’s picture

Dries: see #24 - it breaks anything that relies on a full registry_rebuild() picking up any changes that aren't in the system table.

catch’s picture

Looks like the $exclude_files patch got mixed in with the last one.

I'd be very wary of anything which reduces the change of picking up filesystem changes, we already have outstanding bugs like #605374: Dont add nonexisting files to the registry

moshe weitzman’s picture

I ran the drush installsite command while tracing was enabled in xdebug. This installs Drupal in one php invocation - no batch API. Here is the ugliness, sorted by memory-own. As mentioned earlier, get_defined_constants() is killer. good news is that we have some optimization opportunities here. See http://drupal.org/node/659980 for the analysis tools.


                                               Inclusive        Own
function                               #calls  time     memory  time     memory
-------------------------------------------------------------------------------
get_defined_constants                    2818  1.0079 720178048  1.0079 720178048
system_schema                             150  0.0636 31649840  0.0636 31649840
preg_match_all                           3001  0.4581 30317896  0.4581 30317896
_menu_router_cache                        936  0.1739 23520568  0.1100 22988616
array_merge                              6435  0.1375 22192248  0.1375 22192248
explode                                 46766  0.8950 17583112  0.8950 17583112
DatabaseCondition->compile               8211  1.4374 23838472  0.8333 16859728
get_defined_functions                      23  0.0280 14049456  0.0280 14049456
file_uri_scheme                         42960  3.3510 26801904  1.9458 11758448
preg_split                              26544  0.4198 10362776  0.4198 10362776
module_list                              2616  0.3920 10750568  0.0790  9639864
drupal_load                                88  0.1189 10191600  0.0817  8664672
PDOStatement->fetch                      2444  0.0598  7873608  0.0598  7873608
PDOStatement->execute                    5823  9.6628  7421088  9.6628  7421088
SelectQuery->addField                    5971  0.1583  7061728  0.1583  7061728
InsertQuery->values                      1247  0.2209  8977776  0.1680  6377624
DatabaseCondition->condition             5378  0.2392  5889272  0.1715  5889272
file_get_contents                        3001  0.2870  5344952  0.2870  5344952
_drupal_depth_first_search               3675  0.1899  5556824  0.1573  4777616
DatabaseConnection->defaultOptions       5823  0.1183  4638264  0.1183  4638264
_menu_link_build                          925  0.0391  4635352  0.0391  4635352
implode                                 21348  0.3053  4390320  0.3053  4390320
DatabaseCondition->mapConditionOperator  6185  0.1464  3708016  0.1460  3707536
install_begin_request                       1  0.1966  6992216  0.0267  3685472
DatabaseConnection->expandArguments      5823  1.1516  3447728  0.5928  3445192

Anonymous’s picture

FileSize
16.12 KB

moshe: i'm assuming you ran the drush command on HEAD? could you try it with this patch and report the numbers?

int’s picture

Is the 2818 calls to get_defined_constants in #36 that makes the memory increase?

Can we use one cached drupal_get_defined_constats function, will improve?

Juanlu001’s picture

Subscribing.

Anonymous’s picture

FileSize
16.08 KB

was trying to figure out why some book tests where failing, and it turns out that we have a stale theme registry on the test side. so, not doing a full registry rebuild when installing modules causes the theme registry to go stale? or put another way, doing a registry rebuild stops the theme registry from going stale!?! holy "code-that-only-works-because-of-a-hidden-side-effect" batman!

updated patch does calls drupal_theme_rebuild() in DWTC->setUp() and the book test fails go away.

ShutterFreak’s picture

@justinrandell: If I am not mistaken the theme registry was required to be invoked since Drupal 6 (D6) when performing some routine administrative tasks, e.g., visit the module list (admin/build/modules). This is apparently one of the main reasons why Drupal 5 (D5) is so fast. I don't know however which 'fundamental' change happened between D5 and D6 to require the theme registry (and menu router) to be rebuilt so frequently since D6.

It would be great if we were able to defer rebuilding the theme registry (and probably the menu router as well) until the point where it only needs to be run once. Or twice at most. But I fear this won't be easy to fix without changing the Drupal core API.

I have the impression there are a couple hooks that interfere (e.g., run hook_FOO requires to check which other FOO hooks were called, or will be called) and as a side effect require triggering a theme (or menu router) rebuild. Maybe those hooks are doing 2 unrelated things hence they should be split?

Crell’s picture

@ShutterFreak: Drupal 5 had no theme registry and the menu system was completely different. Those systems as we know them now didn't exist in D5.

Do you know of any such hooks that indirectly trigger rebuilds? I can't think of any off hand that would do so at odd times, and even without that the memory usage and page runtime are simply unacceptable at the moment.

ShutterFreak’s picture

@Crell: Unfortunately no. I am not too acquainted with the entire Drupal core API, and it's starting to become rather difficult to still understand what is going on when registering a module nowadays.

The only hook-linked method call I know propagates into all modules implementing a certain hook is module_invoke_all(). This method happens to be triggered in quite some places, and even a lot more in D7 than in D6, at least according to the API documentation.

Crell’s picture

module_invoke_all() is called a ton throughout Drupal, as it is one of Drupal's main bend-points for extensibility. It does not itself trigger a menu rebuild or a theme rebuild, though, and if anything it should be cheaper in Drupal 7 than it was in Drupal 6.

Status: Needs review » Needs work
Issue tags: -Performance, -Favorite-of-Dries

The last submitted patch, , failed testing.

Status: Needs work » Needs review

Re-test of from comment #2402200 was requested by @user.

Status: Needs review » Needs work
Issue tags: +Performance, +Favorite-of-Dries

The last submitted patch, , failed testing.

David_Rothstein’s picture

Re #34, #35 and others - if we want to preserve the current behavior of registry_rebuild(), there's still a much simpler way to do it. See the attached patch.

I do agree that the registry_add_files() approach is a bit cleaner in terms of API, but it's also adding a fair amount of complexity to the registry. And I'm not sure how that works with hook_registry_files_alter(); the latest patch in #40 just removes that hook altogether? ... But, any patch that gets rid of that crazy $disable_modules_installed_hook parameter can't be too bad :)

cburschka’s picture

Maybe we need a $rebuild_parts_of_registry_likely_to_have_changed parameter?

( :-P )

Anonymous’s picture

Assigned: » Unassigned
Status: Needs work » Needs review

looks good to me, lets just leave the mess of accidental dependencies covered up by a registry rebuild alone until D8.

carlos8f’s picture

+1 to #48, it makes a lot of sense to split that up into two functions. module_enable() is so often used in loops and doing system_rebuild_module_data() every time is just silly :P

carlos8f’s picture

I merged David's patch from #358815: drupal_get_install_files() is slow with a large tree with #48, and wow. We have a very very much faster installer.

In my benchmark I used the default profile and sprinkled my module folders with tinymce to simulate a real-life module folder mess. Regardless of how many modules are in the profile, each folder takes extra resources to scan for .info files. I also used a macbook for my server which isn't incredibly fast.

HEAD (I ran this 3x for reinforcement):

wait after selecting language: 20 sec
wait after entering db info: 30 sec
batch took: 1 min 5 sec

w/ David's merged patches:

wait after selecting language: 3 sec
wait after entering db info: 10 sec
batch took: 20 sec (!!)

So that's 115 seconds down to 33 seconds. The difference is exaggerated since I have so many useless folders in there, but there are lots of cases where you have a pre-packaged Drupal app with tons of folders.

drush installsite would of course run faster still, since the batch API overhead is skipped.

Anonymous’s picture

Status: Needs review » Reviewed & tested by the community
cburschka’s picture

It would be wise to fix the testbot and wait for it to confirm this patch before committing. Failing to wait for the testbot has now broken HEAD at least three times this week.

Dries’s picture

Status: Reviewed & tested by the community » Fixed

Committed to CVS HEAD. We can't keep waiting for the test bot to come back so I ran the tests locally. I got 3 failures but they seemed unrelated. Hopefully someone can kick the testbots and figure out those failures in parallel. If not, I can always stop committing patches later on. Thanks all.

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

MustangGB’s picture

Version: 7.x-dev » 6.x-dev
Status: Closed (fixed) » Patch (to be ported)

Hope it's alright to get this ported to 6.x

moshe weitzman’s picture

Version: 6.x-dev » 7.x-dev
Status: Patch (to be ported) » Fixed

Not possible. D7 has changed too much.

David_Rothstein’s picture

Right, I would suggest creating a different issue if there is a problem in D6 that can be fixed.

(The patch committed here involved fixes to the registry, which did not exist in D6 at all, so it would not be a matter of porting the patch but rather writing an entirely different, unrelated one.)

David_Rothstein’s picture

Status: Fixed » Closed (fixed)