I'm doing a high-volume migration (2 million nodes, 650 thousand users, almost a million subscriptions, ...) into Commons (which has lots of modules hooking in to do things like assigning points, notifying, counting etc. that we don't necessarily need or want done during migration). So, the first point makes optimizing migration speed critical, while the second makes it difficult. We could disable offending modules during migration, but that is problematic for a few reasons - we don't really want modules disabled site-wide (we just want them to shut up within the migration process), other modules (or the install profile) may have dependencies on them, etc. Some modules offer a means to disable their actions - for example, setting 'pathauto' to 0 during migration prevents generation of aliases - but this is far too rare.

An alternative is to disable the hooks at the core, using hook_module_implements_alter(). The following function in my project-specific migration module instantly improved my migration speed by 30% or so:

function example_migrate_module_implements_alter(&$implementation, $hook) {
  unset($implementation['commons_follow_node']);
  unset($implementation['commons_activity_streams']);
  unset($implementation['commons_radioactivity_groups']);
  unset($implementation['xmlsitemap_node']);
}

So, to support this generally, we could pass a 'disabled_hooks' array in the registration arguments, and Migrate could implement hook_module_implements_alter to retrieve the disabled_hooks list from the currently-running migration (if any) and remove the hooks for the listed modules.

In discussing this with @cthos, the caching of the module_implements list does present some challenges, however. hook_module_implements_alter is only called when the hook being sought is not in the cache, so in normal usage the hooks will probably already be cached and nothing will happen. Well, that's fine, we can clear the module_implements cache in preImport. But, what will happen then is that the modified module_implements list will be cached, and the hooks thus disabled on the site itself until the next cache clear. Well, we can clear that cache again on postImport so it will get properly rebuilt by the next normal page view. Still, though, there is a risk of race conditions - one possibility is presented by the migration I'm working on now, whose base query takes 30 seconds to execute. Thus, if in preImport we clear the module_implements cache, normal site usage might rebuild the cache before we actually start importing, so we'll still hit the hooks we're trying to avoid.... But, as I carefully read through module_implements(), it appears that if we follow a reset call immediately with a normal module_implements call, cache_get() won't get called again (we'll just have the static cache in play), so it may just work out...

So, I'm going to proceed along these lines. This feature will have to come with plenty of caveats - it will kill all hooks offered by a named module, and it may be in some cases you won't that (for example if in addition to having annoying update/insert hooks it has some info hooks important to the site structure) - but in performance-critical applications I think it's well worth having the option.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

mikeryan’s picture

Well, I've run into the caveat - I disabled the registration module due to annoyances from the its entity_insert/entity_update hooks, and thus lost the entity_info hook and messed up all sorts of stuff. So, this needs to be more precisely targeted - specific hooks for specific modules. Something like:

  $api = array(
    'api' => 2,
    'migrations' => array(
      'Discussions' => array(
        'class_name' => 'MyDiscussionMigration',
        'disable_hooks' => array(
          'entity_insert' => array(
            'metatag', 
            'registration',
          ),
          'entity_insert' => array(
            'metatag', 
            'registration',
          ),
          'node_insert' => array(
            'commons_follow_group',
            'commons_follow_node',
            'xmlsitemap_node',
          ),
          'node_update' => array(
            'commons_follow_group',
            'commons_follow_node',
            'xmlsitemap_node',
          ),
        ),
      ),
    ),
  );

This will definitely be a power tool - you'll have to be very careful what hooks you disable, and to be sure to disable all relevant hooks (e.g., presave and insert hooks might be working in tandem). But in some cases it is critical, both for performance and to prevent unwanted actions. To identify what hooks are being called (and what they're costing you), you can use the --instrument=timer switch in drush in conjunction with the patch at https://drupal.org/node/1336588 (which I need to update).

mikeryan’s picture

Status: Active » Needs review
FileSize
2.5 KB

I believe I've got it, not so bad after all, and by manipulating the module_implements static cache directly I think I can avoid any race conditions. I'll need to live with it for a few days on my current project before committing, but if anyone's curious here's the patch.

mikeryan’s picture

Status: Needs review » Fixed

Working well on my project, no naysayers, so committed.

JvE’s picture

Nice, got me a cool 25% performance increase there by disabling search_api and cer hook_entity_insert processing.

I could not get it to work at first and then I noticed the naming discrepancy between 'disabled_hooks' in your comment and 'disable_hooks' in the actual code.

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

Jehu’s picture

Do not get it to work with migrate-7.x-2.6-rc1+22-dev
"wpk_2sage" is a custom module with hook_insert and hook_update implementations. Both contains just a watchdog-call to test, if they are called on migration.

my migrate_api shows like this:

...
    $api = array(
      'api'        => 2,
      'groups'     => array(
        'default' => array(
          'title'         => 'Import from Sage Office Line',
          'disable_hooks' => array(
            'node_update' => array(
              'wpk_2sage',
            ),
          ),
        ),
      ),
      'migrations' => array(
        'Lieferant'     => array(
          'class_name' => 'LieferantMigration',
          'group_name' => 'default',
        ),
        'Artikelgruppe' => array(
          'class_name' => 'ArtikelgruppeMigration',
          'group_name' => 'default',
        ),
        'Artikel'       => array(
          'class_name'    => 'ArtikelMigration',
          'group_name'    => 'default',
          'disable_hooks' => array(
            'node_update' => array(
              'wpk_2sage',
            ),
          ),
        ),
      ),
    );
...

Need help here...