Migrate API overview

Last updated on
24 January 2024

This documentation needs work. See "Help improve this page" in the sidebar.

The Migrate API provides services for migrating data from a source system to Drupal 8. This documentation guide focuses on the technical documentation of Migrate API. Please refer to the Upgrade to Drupal 8 guide for documentation on how to upgrade your Drupal 6 or Drupal 7 site to Drupal 8.

Migrate modules and executing migrations

  • Drupal 8 core Migrate module implements the general-purpose framework.
  • Drupal 8 core Migrate Drupal module builds on that foundation to provide an upgrade path from Drupal 6 and Drupal 7 to Drupal 8.
  • Drupal 8 core Migrate Drupal UI module provides a browser user interface for Migrate Drupal.

Migrations can be executed with different tools :

  • Refer to Upgrading to Drupal 8 handbook on how to execute Drupal 6/7 to Drupal 8 migrations.
  • Executing migrations from non-Drupal sources require contributed modules that work together with the core Migrate API. Refer to Executing migrations page of this handbook to learn more on this topic.

Migrations are Extract - Transform - Load (ETL) processes

Drupal 8 migrations are ETL processes

Migration is an Extract, Transform, Load (ETL) process. In the Drupal Migrate API the :

  • extract phase is called source
  • transform phase is called process
  • load phase is called destination

It is important to understand that the term load in ETL means to load data into the storage, while in a typical Drupal context the term load refers to loading data from storage.

In the source phase, a set of data, called the row, is retrieved from the data source. The data can be migrated from a database, loaded from a file (for example CSV, JSON or XML) or fetched from a web service (for example RSS or REST). The row is sent to the process phase where it is transformed as needed or marked to be skipped. After processing, the transformed row is passed to the destination phase where it is loaded (saved) into the target Drupal site.

Migrations could have dependencies like migrating nodes comes after migrating users. See migration_dependencies example.

Migrate API plugins

Migration plugins specify individual ETL migrations, such as node, user or taxonomy term migration.

  • Migration plugins are defined in YAML format.
  • Examples for migrating nodes, users and other entities from non-Drupal sources.
  • Reading the migration plugins defined by Drupal core / contributed modules is also very good way to learn about migration plugins. These migrations are mainly extracting the data from Drupal 6 / 7 database. These migrations can be found in a module's 'migrations' directory.

Source plugins extract the data from the source.

Process plugins transform the data.

Destination plugins save the data to Drupal 8.

Stubs

Taxonomy terms are an example of a data structure where an entity can have a reference to a parent. When a term is being migrated, it is possible that its parent term has not yet been migrated. Migrate API addresses this 'chicken or the egg' dilemma by creating a stub term for the parent so that the child term can establish a reference to it. When the parent term is eventually being migrated, Migrate API updates the previously created stub with the actual content.

Map tables

Once a migrated row is saved and the destination ID is known, Migrate API saves the source ID, destination ID and the row hash into a map table (see Track Changes option). The source ID and the hash in the map allow for tracking changes for continuous migrations. The map between source and destination ID also allows for looking up values during other migrations that are executed later.

Highwater marks

Highwater marks allow Migrate API to track changes so that we can migrate only content that has been created or updated in the source since the migration was previously executed. This requires the source plugin to have a special high_water_property property. This can be any property that indicates the highest (or most recent) value migrated so far.

Let's use nodes as an example. If we would use nid as the high_water_property property, the migration system would keep track of the highest nid migrated so far. When the migration is executed again, we would only migrate those nodes that have a higher nid. In other words, only nodes that are created in the source system since the previous migration. Nodes also have a timestamp in the changed property. This timestamp is populated when a node is created or an existing node is updated. If this property is used as a highwater property, the next migration would include those nodes that have a higher value in the changed property. In other words, the nodes that have been created or updated in the source system since the previous migration.

Example using the node entity ID as a highwater mark for a Drupal upgrade migration:

source:
  plugin: d7_node_complete
  node_type: article
  high_water_property:
    name: nid
    alias: n

Note: when using highwater marks, it is critical that your source data be sorted by the highwater field (i.e., the timestamp). If the data is out of order, some changed rows might be skipped, and other rows might be unnecessarily remigrated.

Note: when using a timestamp property as a highwater mark, the value must be unique as multiple records with the same timestamp will cause unpredictable results.

A slower alternative to using highwater marks would be to use the track_changes property instead.

Events and Hooks

Often there is a need to interject actions at different points in execution. Traditionally with Drupal we use Hooks. However, with Drupal 8 and now 9 there has been a move towards using events over hooks. The migrate module is no exception. Currently, there are only two types of hooks available for the migrate module. The "hook_migrate_prepare_row" hook and the "hook_migration_plugins_alter" hook. While these are useful, they are not as useful or powerful as using events.

A detailed list of migration events can be found here.

For a quick reference, these are the events:

  1. MAP_SAVE - This event allows modules to perform an action whenever the disposition of an item being migrated is saved to the map table.
  2. MAP_DELETE - This event allows modules to perform an action whenever a row is deleted from a migration's map table (implying it has been rolled back).
  3. PRE_IMPORT - This event allows modules to perform an action whenever a migration import operation is about to begin.
  4. POST_IMPORT - This event allows modules to perform an action whenever a migration import operation is completing.
  5. PRE_ROW_SAVE - This event allows modules to perform an action whenever a specific item is about to be saved by the destination plugin.
  6. POST_ROW_SAVE - This event allows modules to perform an action whenever a specific item has been saved by the destination plugin.
  7. PRE_ROLLBACK - This event allows modules to perform an action whenever a migration rollback operation is about to begin.
  8. POST_ROLLBACK - This event allows modules to perform an action whenever a migration rollback operation is completing.
  9. PRE_ROW_DELETE - This event allows modules to perform an action whenever a specific item is about to be deleted by the destination plugin.
  10. POST_ROW_DELETE - This event allows modules to perform an action whenever a specific item has been deleted by the destination plugin.
  11. IDMAP_MESSAGE - This event allows modules to perform an action whenever a message is being logged by the ID map.

Creating an Event Subscriber

To create an event subscriber, you will create both an EventSubscriber class and an entry in your module's services.yml file.

In your module create the following directory structure: "src/EventSubscriber". Then create a file for the subscriber that is descriptive in its purpose. Keep in mind that a subscriber can "subscribe" to one or more events. Meaning the same class can handle the "POST_IMPORT" event and the "PRE_ROLLBACK" event or all of the events if you wish. Often, a preference is to have one class per event that you are subscribing to so that you make it easier to find and troubleshoot problems when they occur. Keeps things isolated.

In this case, create a file that is called "MyModuleMigrationSubscriber.php". With a class of the same name within it (see example below). Then within that you create a getSubscribedEvents function and return an array of the events your class will act on and associate each event with a function within your class. Then declare each function and take the needed actions. The below example is really used on a production website right now (though names have been changed to protect witnesses).

After you have created your subscriber class you just need to add a reference to it in your module's services.yml file (see example below). Then finally clear cache to get it recognized.

Class MyModuleMigrationSubscriber.php:

<?php

namespace Drupal\MY_MODULE\EventSubscriber;

use Drupal\migrate\Event\MigrateEvents;
use Drupal\migrate\Event\MigrateImportEvent;
use Symfony\Component\EventDispatcher\EventSubscriberInterface;

/**
 * Class PreMigrationSubscriber.
 *
 * Run a test to validate that the server is available.
 *
 * @package Drupal\YOUR_MODULE
 */
class MyModuleMigrationSubscriber implements EventSubscriberInterface {

  /**
   * Get subscribed events.
   *
   * @inheritdoc
   */
  public static function getSubscribedEvents() {
    $events[MigrateEvents::PRE_IMPORT][] = ['onMigratePreImport'];
    $events[MigrateEvents::POST_IMPORT][] = ['onMigratePostImport'];
    return $events;
  }

  /**
   * Check for the image server status just once to avoid thousands of requests.
   *
   * @param \Drupal\migrate\Event\MigrateImportEvent $event
   *   The import event object.
   */
  public function onMigratePreImport(MigrateImportEvent $event) {
    $migration_id = $event->getMigration()->getBaseId();

    if (strpos($migration_id, '_products', -9)) {
      $store = \Drupal::service('tempstore.private')->get('my_module_migrations');

      if ($this->checkImageServerStatus('https://www.TESTDOMAIN.com')) {
        $store->set('server_available', TRUE);
      }
      else {
        $store->set('server_available', FALSE);
        $event->logMessage('The server is unreachable.');
      }
    }
  }

  /**
   * Checks the status of the image server.
   *
   * @param string $url
   *   The URL to check.
   *
   * @return bool
   *   TRUE if the image server is available, FALSE otherwise.
   */
  private function checkImageServerStatus($url) {
    $headers = @get_headers($url);

    // Use condition to check the existence of URL.
    if ($headers && strpos($headers[0], '200')) {
      return TRUE;
    }

    return FALSE;
  }

  /**
   * Check for our specified last node migration and run our flagging mechanisms.
   *
   * @param \Drupal\migrate\Event\MigrateImportEvent $event
   *   The import event object.
   */
  public function onMigratePostImport(MigrateImportEvent $event) {
    $migration_id = $event->getMigration()->getBaseId();

    // Do a little bit of cleanup.
    if (strpos($migration_id, '_products', -9)) {
      $store = \Drupal::service('tempstore.private')->get('my_module_migrations');
      $store->delete('server_available');
    }
  }

}

The my_module.services.yml file:

services:
  migration_subscriber:
    class: Drupal\MY_MODULE\EventSubscriber\MyModuleMigrationSubscriber
    tags:
      - { name: 'event_subscriber' }

Rollbacks

It is quite typical that when developing a migration, the first version does not provide correct results for all scenarios. Rollbacks allow you to undo a migration, adjust the migration and then execute it again.

Glossary

Migration as configuration: Migrations written in YAML format that require the contributed Migrate Plus module to work. They are placed in a location where the Configuration Management system can detect new configuration. This can be your site's /config/sync directory or a module's /config/install directory. The file naming convention is migrate_plus.migration.[migration_id].yml Adding, removing, or changing these files require syncing the configuration to Drupal's active storage. This can be done using drush config:import if modifying the files directly on your site's /config/sync directory. If the files are placed in a custom module, you can execute drush config:import --partial --source=modules/custom/my_module/config/install/ for the changes to be detected.

Migration as plugins: Migrations written in YAML format that work with Drupal core's Migrate module. They are placed in a /migrations directory and follow the following naming convention [migration_id].yml Adding, removing, or changing these files require rebuilding Drupal's plugin cache. This can be done using drush cache:rebuild

Migration runner: Once the migration YAML files have been created, there are many options to execute them. Migrations as plugins can be executed from the command line with the Drush commands provided by the Migrate Tools module. Migrations as configuration can be also be executed from the command line or from the user interface provided by the Migrate Tools module at /admin/structure/migrate. Other migrations runners include Migrate Upgrade, Migrate Manifest, Migrate Scheduler, Migrate Cron, and Migrate source UI.

Process pipeline: A sequence of process plugins to apply multiple transformations to a source value. They act like Unix pipelines in that the output of one process plugin becomes the input of the next one in the pipeline. The return value of the last plugin is assigned to the field mapping. Read this article and see example below for information.

Pseudofield: Placeholders values to use later in the process pipeline. They are defined in the process section. The name can be arbitrary as long as it is ignored by the destination plugin. If you are doing content migrations, using or extending the EntityContentBase plugin, the name should not conflict with a property or field name attached to the target entity. For example, a node migration into Drupal's Article content type, that uses the entity:node plugin, cannot have a pesudofield named title (entity property) or field_tags (field name). To use the value, you refer to it with '@name'. To set the value of the pseudofield, it is possible to use any process pipeline. See example below.

Source constants: Placeholders values to use later in the process pipeline. They are defined in the source section under a constants key (by convention). The constants themselves are defined in name: value format. To use the value, you refer to it with constants/name. Read this article and see example below for information.

Subfield: A field might store complex data. This data is stored in multiple subfields. For example, a rich text field has one to store the text value and another for the text format. To set the value of the subfield you follow this pattern: [field_name]/[subfield_name]: [source_value]. See example below.

The example below defines two source constants: title_suffix and text_format. It also defines a pseudofield named pseudo_title which uses a process pipeline consisting of two transformations by the callback plugin. Note, only the first process in the pipeline requires a source configuration. The psuedofield is later used when setting the title property for the node. There are mappings for two subfields of the body field: value and format.

id: example_migration
label: 'Example migration'
source:
  constants:
    title_suffix: ' (example)'
    text_format: plain_text
  plugin: embedded_data
  data_rows:
    - unique_id: 1
      src_title: 'DRUPAL MIGRATIONS'
      src_content: 'Example content'
    - unique_id: 2
      src_title: 'DRUPAL UPGRADES'
      src_content: 'Example content'
  ids:
    unique_id:
      type: integer
process:
  pseudo_title:
    - plugin: callback
      source: src_title
      callable: mb_strtolower
    - plugin: callback
      callable: ucwords
  title:
    plugin: concat
    source:
      - '@pseudo_title'
      - constants/title_suffix
  body/value: src_content
  body/format: constants/text_format
destination:
  plugin: 'entity:node'
  default_bundle: page

Further reading

If you're looking for practical instructions on upgrading your Drupal 6 or Drupal 7 site to Drupal 8, please refer to the Upgrade to Drupal 8 documentation guide.

The Migrate Plus module comes with several helpful sub-modules, a couple of which provide example code for migrations from both Drupal and non-Drupal sources.

Help improve this page

Page status: Needs work

You can: