Migrating files and images

Last updated on
16 February 2023

This documentation needs work. See "Help improve this page" in the sidebar.

This page currently describes migration of File entities. Since Drupal 8 core also has Media entities which require specific migrations. Feel free to complement this documentation by adding examples for Media migrations. 

Drupal content types can have image or file fields where image files or attachments such as PDF files can be added. The file migration can be done in two separate migrations as follows:

  • First migrate the files using entity:file destination plugin.
  • Use migration_lookup process plugin in the node migration to associate the previously migrated files to the nodes.

An alternative and much simpler approach is to use the contributed Migrate Files (extended) module which provides process plugins for migrating images and files from an internal or external source and a third process plugin for migrating remote files so that they are used from the remote location. Note that when using this approach, some operations such as rolling back a migration will have no effect on the files that were imported this way.

The rest of this tutorial describes how you can use the Migrate Files (extended) module plugins in some specific scenarios:

Example: migrating files by copying them from an external source

  • The example below assumes that the Article content type has a field field_attachments which accepts txt files. 
  • The example uses the embedded_data source plugin for the sake of simplicity. We have two rows of data in this example.
id: custom_article_migration_with_external_files
label: 'Custom article migration with external files'
source:
  plugin: embedded_data
  data_rows:
    -
      id: 1
      title: 'Page 1 title'
      file: 'https://www.drupal.org/files/issues/2018-06-23/interdiff-2944846-2-5.txt'
    -
      id: 2
      title: 'Page 2 title'
      file: 'https://www.drupal.org/files/issues/interdiff-2933620-38-47.txt'
  ids:
    id:
      type: integer
process:
  nid: id
  title: title
  field_attachment:
    plugin: file_import
    source: file
destination:
  plugin: entity:node
  default_bundle: article

The important part of the example migration above is the file_import process plugin.

  • The source configuration key is required and it contains the source path or URL for the file to be migrated. 
  • In this example our source plugin provides a full URL for the file to be downloaded but the value could also be /path/to/foo.txt or public://bar.txt if the file is already present in your file system.

Optional configuration keys:

  • destination: (recommended) The destination path or URI to import the file to. If no destination is set, it will default to "public://". The destination property works like the source in that you can reference source or destination properties for its value. This allows you to build dynamic destination paths based on source or destination values (see the "Dynamic File Path Destinations" section below for an example). However, this means if you want to assign a static destination value in your migration, you will need to use a constant. To provide a directory path (to which the file is saved using its original name), a trailing slash must be used to differentiate it from being a filename. If no trailing slash is provided the path will be assumed to be the destination filename.
  • uid: The user ID uid to attribute the file entity to. Defaults to 0.
  • move: Boolean. If set to TRUE, move the file, otherwise copy the file. Only applicable if the source file is local. If the source file is remote it will be always copied. Defaults to FALSE.
  • file_exists: Action to peform if the destination file already exists.
    • 'replace' - (default) Replace the existing file.
    • 'rename' - Ensure the destination filename is unique by appending '_{incrementing number}".
    • 'use existing' - The existing destination file is used.
  • skip_on_missing_source: Boolean. If set to TRUE, this field will be skipped if the source file is missing (either not available locally or the remote server returns HTTP 404 'file not found'). Otherwise, the row will fail with an error. Note that if you are importing a lot of remote files, this check will greatly reduce the speed of your import as it requires an HTTP request per file to check if the file exists. Defaults to FALSE.
  • skip_on_error: Boolean. If set to TRUE, this field will be skipped if any error occurs during the file import (including missing source files). Otherwise, the row will fail with an error. Defaults to FALSE.
  • id_only: Boolean. If set to TRUE, the process will return just the id instead of a entity reference array. Useful if you want to manage other sub-fields in your migration (see other example below).

The 'destination' and 'uid' configuration fields support copying destination values using the @ character. Values using @ must be wrapped in quotes.

Dynamic File Path Destinations

Since the destination property can accept a destination value, you can create dynamic filepaths. First you create a temporary field (you can name this whatever you want as long as it isn't the name of a field on the migrate destination entity/object):

process:
  # ...
  _file_destination:
    plugin: concat
    source:
      - constants/file_destination
      - constants/directory_separator
      - '@text_field_1'
      - constants/directory_separator
      - '@text_field_2'
      - constants/directory_separator

Now you can use your pseudo temp field as a destination value:

process:
  # ...
  field_file:
    plugin: file_import
    source: file
    destination: '@_file_destination'

Example: migrating images by copying them from an external source

  • The example below assumes that the Article content type has a field field_image which accepts PNG files and that image alt and title fields are enabled. 
  • This example demonstrates how the define the destination directory where the image files will be downloaded.  
  • The example uses the embedded_data source plugin for the sake of simplicity. We have two rows of data in this example.
id: custom_article_migration_with_external_images
label: 'Custom article migration with external image files'
source:
  plugin: embedded_data
  data_rows:
    -
      id: 1
      title: 'Page 1 title'
      file: 'https://www.drupal.org/files/druplicon-small.png'
      file_title: 'Druplicon logo'  
    -
      id: 2
      title: 'Page 2 title'
      file: 'https://www.drupal.org/files/drupal_logo-blue.png'
      file_title: 'Drupal logo'
  ids:
    id:
      type: integer
  constants:
    file_destination: 'public://images/'
process:
  nid: id
  title: title
  field_image:
    plugin: image_import
    source: file
    destination: 'constants/file_destination'
    title: file_title
    alt: !title
destination:
  plugin: entity:node
  default_bundle: article

The image_import process plugin extends the file_import plugin. In addition to the configuration keys inherited from file_import process plugin, image_import has the following additional optional configuration keys.

  • alt: The alt attribute for the image.
  • title: The title attribute for the image.
  • width: The width of the image.
  • height: The height of the image.

All of the above fields support copying destination values using the starting @ sign. Values using @ must be wrapped in quotes.

Additionally, a special value '!file' is available as demonstrated in the example above. This magical value can be used to populate the file name for example to the alt or title fields.

Example: migrating file entities using remote stream wrapper

  • The contributed Remote Stream Wrapper module provides a capability to use files directly from a remote location instead of the file system of your Drupal installation.
  • The file_remote_url process plugin can be used to migrate the remote URL to the field that uses the Remote Stream Wrapper.
  • The example below assumes that the Article content type has a field field_remote_file which uses the contributed Remote Stream Wrapper Widget.  
id: custom_article_migration_with_remote_file
label: 'Custom article migration with remote file'
source:
  plugin: embedded_data
  data_rows:
    -
      id: 1
      title: 'Page 1 title'
      file: 'https://www.drupal.org/files/druplicon-small.png'
    -
      id: 2
      title: 'Page 2 title'
      file: 'https://www.drupal.org/files/drupal_logo-blue.png'
  ids:
    id:
      type: integer
process:
  nid: id
  title: title
  field_remote_file:
    plugin: file_remote_url
    source: file
destination:
  plugin: entity:node
  default_bundle: article

Example: migrating media entities using remote stream wrapper

  • The contributed Remote Stream Wrapper module provides a capability to use files directly from a remote location instead of the file system of your Drupal installation.
  • The file_remote_url process plugin can be used to migrate the remote URL to the field that uses the Remote Stream Wrapper. This may be configured within your media entity bundle.
  • The example below assumes that the Article content type has a field field_picture that references a media bundle which utilises an image field that is configured to also uses the contributed Remote Stream Wrapper Widget.  
id: custom_article_migration_with_remote_file
label: 'Custom article migration with remote file'
source:
  plugin: embedded_data
  data_rows:
    -
      id: 1
      title: 'Page 1 title'
      file: 'https://www.drupal.org/files/druplicon-small.png'
    -
      id: 2
      title: 'Page 2 title'
      file: 'https://www.drupal.org/files/drupal_logo-blue.png'
  ids:
    id:
      type: integer
process:
  nid: id
  title: title
  field_picture:
    plugin: migration_lookup
    migration: my_migration_id_media
    source: id
  field_remote_file:
    plugin: file_remote_url
    source: file
destination:
  plugin: entity:node
  default_bundle: article
migration_dependencies:
  required:
    - my_migration_id_media

Notice the use of the migration_lookup plugin. This will reference the common id required to link the two different entities. Also notice we have defined our second script as a dependency. This will cause this dependency to run first.

You will need to create a migration script to import the images separately. This will look something like this:

id: my_migration_id_media
label: 'Remote Media import for Custom article'
source:
  plugin: embedded_data
  data_rows:
    -
      id: 1
      title: 'Page 1 title'
      file: 'https://www.drupal.org/files/druplicon-small.png'
    -
      id: 2
      title: 'Page 2 title'
      file: 'https://www.drupal.org/files/drupal_logo-blue.png'
  ids:
    id:
      type: integer
process:
  nid: id
  field_media_image:
    plugin: file_remote_url
    source: image
  field_title: title
destination:
  plugin: 'entity:media'
  default_bundle: remote_image

It's worth noting we are referencing the field_media_image which is the actual image field that we have configured to use the remote_steam_wrapper widget initially mentioned, not the field_picture that is just an entity reference to the media entity that uses this field and any others you may have added. 

Also note that we are specifying that the destination uses the 'entity:media' plugin.

Other source plugins

The examples on this page use the embedded_data source plugin for the sake of simplicity so that this example can be copy-pasted as a working example. In the real world, you would most probably want to migrate the nodes using for example the CSV source plugin or to write a simple SQL source plugin to read the source data from another database.

Executing the migrations

Read more on how to execute migrations.

Help improve this page

Page status: Needs work

You can: