This project is not covered by Drupal’s security advisory policy.
This module provides a Migrate process plugin to enable you to check whether a migrated remote image url/string is in fact a valid address for a media asset such as an image. It does this by requesting each image and see if it returns a valid response.
If this fails, it will attempt to remove any query string to that has been appended to an image to see if removing this helps to generate a successful response.
If the response is not ok, it will fallback to using an empty string so as to avoid any exceptions when trying to migrate the remote image asset.
Additionally, this plugin allows a migration to strip the query string by default on asset urls if desired.
Background
There are a number of gotchas when handling remote assets, in particular remote images. I have identified the following scenarios:
- Remote image is not a valid url as defined by FILTER_VALIDATE_URL
- Remote image is relative not absolute
- Remote image is referencing localhost e.g. http:///
- Some remote images with query strings render while other's don't (see examples below)
- On occassion, the url is simply malformed, which is beyond our control
Example 1
https://images.lbc.co.uk/images/605914?crop=16_9&width=660&relax=1&format=webp&signature=NMDC95Yh7RrJFuycLD04j52xUeA=
https://images.lbc.co.uk/images/605914?crop=16_9&width=660&relax=1&format=webp&signature=NMDC95Yh7RrJFuycLD04j52xUeA=
This works, but this does not:
https://images.lbc.co.uk/images/605914
Example 2
https://www.telegraph.co.uk/content/dam/news/2023/02/03/TELEMMGLPICT000324170962_trans_NvBQzQNjv4Bq1V8_3oXt_XBWwkgI1jrKEeDSV_dXcWbrTlT5gho2zKg.jpeg?impolicy=logo-overlay
https://www.telegraph.co.uk/content/dam/news/2023/02/03/TELEMMGLPICT000324170962_trans_NvBQzQNjv4Bq1V8_3oXt_XBWwkgI1jrKEeDSV_dXcWbrTlT5gho2zKg.jpeg?impolicy=logo-overlay
This does not work, but this does:
https://www.telegraph.co.uk/content/dam/news/2023/02/03/TELEMMGLPICT000324170962_trans_NvBQzQNjv4Bq1V8_3oXt_XBWwkgI1jrKEeDSV_dXcWbrTlT5gho2zKg.jpeg
Example 3
https://images.lbc.co.uk/images/605919?width=1200&crop=16_9&signature=35vrQw9Zg953mYC5R8m4MbaUjhs=
https://images.lbc.co.uk/images/605919?width=1200&crop=16_9&signature=35vrQw9Zg953mYC5R8m4MbaUjhs=
This does not work, and looks malformed however this does:
https://images.lbc.co.uk/images/605919?width=1200&crop=16_9&signature=35vrQw9Zg953mYC5R8m4MbaUjhs=
Recommended modules/libraries
drupal:migrate
Features
By default this plugin is configured to accept urls with query strings. However, you can change the configuration to always remove the query string. One benefit of this is it will also try to validate the url without the query string. If the request for the modified url is successful, it will use this value, if not it will default to an empty string.
To enable this option from your migration config set remove_query_string: true. Please see below for a working example.
Example Usage
process:
'body/value':
-
plugin: migrate_process_html
source: link
enablejs: false // optional defaults to true
-
plugin: dom
method: import
-
plugin: dom_select
selector: //meta[@property="og:image"]/@content
-
plugin: skip_on_empty
method: row
message: 'Field image is missing'
-
plugin: extract
index:
- 0
-
plugin: migrate_process_remote_image_check
remove_query_string: true # default false
-
plugin: skip_on_condition
method: row
condition:
plugin: not:matches
regex: /^(https?:\/\/)[\w\d]/i
message: 'We only want a string if it starts with http(s)://[\w\d]'
-
plugin: file_remote_urlProject information
- Project categories: Import and export
- Ecosystem: Migrate
- Created by 2dareis2do on , updated
This project is not covered by the security advisory policy.
Use at your own risk! It may have publicly disclosed vulnerabilities.
