This project is not covered by Drupal’s security advisory policy.

This module provides a Migrate process plugin to enable you to check whether a migrated remote image url/string is in fact a valid address for a media asset such as an image. It does this by requesting each image and see if it returns a valid response.

If this fails, it will attempt to remove any query string to that has been appended to an image to see if removing this helps to generate a successful response.

If the response is not ok, it will fallback to using an empty string so as to avoid any exceptions when trying to migrate the remote image asset.

Additionally, this plugin allows a migration to strip the query string by default on asset urls if desired.

Background

There are a number of gotchas when handling remote assets, in particular remote images. I have identified the following scenarios:

  1. Remote image is not a valid url as defined by FILTER_VALIDATE_URL
  2. Remote image is relative not absolute
  3. Remote image is referencing localhost e.g. http:///
  4. Some remote images with query strings render while other's don't (see examples below)
  5. On occassion, the url is simply malformed, which is beyond our control

Example 1
https://images.lbc.co.uk/images/605914?crop=16_9&width=660&relax=1&format=webp&signature=NMDC95Yh7RrJFuycLD04j52xUeA=

This works, but this does not:

https://images.lbc.co.uk/images/605914

Example 2
https://www.telegraph.co.uk/content/dam/news/2023/02/03/TELEMMGLPICT000324170962_trans_NvBQzQNjv4Bq1V8_3oXt_XBWwkgI1jrKEeDSV_dXcWbrTlT5gho2zKg.jpeg?impolicy=logo-overlay

This does not work, but this does:

https://www.telegraph.co.uk/content/dam/news/2023/02/03/TELEMMGLPICT000324170962_trans_NvBQzQNjv4Bq1V8_3oXt_XBWwkgI1jrKEeDSV_dXcWbrTlT5gho2zKg.jpeg

Example 3
https://images.lbc.co.uk/images/605919?width=1200&crop=16_9&signature=35vrQw9Zg953mYC5R8m4MbaUjhs=

This does not work, and looks malformed however this does:

https://images.lbc.co.uk/images/605919?width=1200&crop=16_9&signature=35vrQw9Zg953mYC5R8m4MbaUjhs=

drupal:migrate

Features

By default this plugin is configured to accept urls with query strings. However, you can change the configuration to always remove the query string. One benefit of this is it will also try to validate the url without the query string. If the request for the modified url is successful, it will use this value, if not it will default to an empty string.

To enable this option from your migration config set remove_query_string: true. Please see below for a working example.

Example Usage

process:
  'body/value':
   -
     plugin: migrate_process_html
     source: link
     enablejs: false // optional defaults to true
   -
     plugin: dom
     method: import
   -
     plugin: dom_select
     selector: //meta[@property="og:image"]/@content
   -
     plugin: skip_on_empty
     method: row
     message: 'Field image is missing'
   -
     plugin: extract
     index:
       - 0
   -
     plugin: migrate_process_remote_image_check
     remove_query_string: true # default false
   -
     plugin: skip_on_condition
     method: row
     condition:
       plugin: not:matches
       regex: /^(https?:\/\/)[\w\d]/i
     message: 'We only want a string if it starts with http(s)://[\w\d]'
   -
     plugin: file_remote_url

Project information

Releases