Problem/Motivation
The file_copy migrate process plugin does not validate the
integrity of a file after copying or downloading it. When the source is a
remote URL and a network failure occurs (e.g. timeout, connection reset) during
the download, the plugin creates an empty (zero-byte) file at the destination
path and registers it in file_managed with the expected filesize
from the source, rather than the actual size on disk.
This leads to the following issues:
-
A
fileentity is persisted in the database with an incorrect
filesizevalue, making it impossible to detect the corruption
via the API. -
Modules relying on the file dimensions or content (e.g.
responsive_image) throw unrecoverable exceptions such as
LogicException: Could not determine image width, which can
cause the entire page to return a500error. -
The migration reports the row as successfully processed, hiding the failure
from operators.
Steps to reproduce
-
Set up a migration using the
file_copyprocess plugin with a
remote URL as source (e.g. viadata_fetcher_plugin: file). -
Simulate a network failure or timeout during the migration (e.g. by
interrupting connectivity to the remote host mid-transfer). -
Observe that a zero-byte file is created at the destination path, and a
fileentity is created in the database pointing to it.
Proposed resolution
After the copy/download operation, file_copy should verify that
the destination file exists and has a non-zero size. If the file is empty, it
should be deleted and a MigrateSkipRowException should be thrown
so the row is flagged as failed rather than silently processed.
Remaining tasks
- Patch
file_copyto add post-copy integrity validation. - Add a test covering the zero-byte file scenario.
- Update documentation.
User interface changes
None.
Introduced terminology
None.
API changes
FileCopy::transform() will throw a
MigrateSkipRowException when the destination file is empty after
the copy/download operation.
Data model changes
None.
Release notes snippet
The file_copy migrate process plugin now validates that the
destination file is non-empty after a copy or download operation. If an empty
file is detected (e.g. due to a network failure), the file is deleted and the
migration row is skipped with an explicit error, preventing corrupt file
entities from being created silently.
Issue fork drupal-3587036
Show commands
Start within a Git clone of the project using the version control instructions.
Or, if you do not have SSH keys set up on git.drupalcode.org:
Comments
Comment #3
macsim commentedThe remote URL dataset (
https://www.drupal.org/favicon.ico) has been removed fromproviderSuccessfulReuse().doTransform()stubs the HTTP client viacreateStub(Client::class), which causes the download plugin to produce a zero-byte file at the destination.Before this patch, that empty file was silently accepted.
Now that
FileCopy::transform()performs a post-copy integrity check, the zero-byte file is correctly rejected and the dataset would always fail.The "use existing" behavior for remote sources is not retested here because it would require a download plugin mock returning a non-empty file; the remote download path is already exercised by
testDownloadRemoteUri()andtestZeroByteRemoteDownload().