Problem/Motivation

The file_copy migrate process plugin does not validate the
integrity of a file after copying or downloading it. When the source is a
remote URL and a network failure occurs (e.g. timeout, connection reset) during
the download, the plugin creates an empty (zero-byte) file at the destination
path and registers it in file_managed with the expected filesize
from the source, rather than the actual size on disk.

This leads to the following issues:

  • A file entity is persisted in the database with an incorrect
    filesize value, making it impossible to detect the corruption
    via the API.
  • Modules relying on the file dimensions or content (e.g.
    responsive_image) throw unrecoverable exceptions such as
    LogicException: Could not determine image width, which can
    cause the entire page to return a 500 error.
  • The migration reports the row as successfully processed, hiding the failure
    from operators.

Steps to reproduce

  1. Set up a migration using the file_copy process plugin with a
    remote URL as source (e.g. via data_fetcher_plugin: file).
  2. Simulate a network failure or timeout during the migration (e.g. by
    interrupting connectivity to the remote host mid-transfer).
  3. Observe that a zero-byte file is created at the destination path, and a
    file entity is created in the database pointing to it.

Proposed resolution

After the copy/download operation, file_copy should verify that
the destination file exists and has a non-zero size. If the file is empty, it
should be deleted and a MigrateSkipRowException should be thrown
so the row is flagged as failed rather than silently processed.

Remaining tasks

  • Patch file_copy to add post-copy integrity validation.
  • Add a test covering the zero-byte file scenario.
  • Update documentation.

User interface changes

None.

Introduced terminology

None.

API changes

FileCopy::transform() will throw a
MigrateSkipRowException when the destination file is empty after
the copy/download operation.

Data model changes

None.

Release notes snippet

The file_copy migrate process plugin now validates that the
destination file is non-empty after a copy or download operation. If an empty
file is detected (e.g. due to a network failure), the file is deleted and the
migration row is skipped with an explicit error, preventing corrupt file
entities from being created silently.

Issue fork drupal-3587036

Command icon Show commands

Start within a Git clone of the project using the version control instructions.

Or, if you do not have SSH keys set up on git.drupalcode.org:

Comments

macsim created an issue. See original summary.

macsim’s picture

Status: Active » Needs review

The remote URL dataset (https://www.drupal.org/favicon.ico) has been removed from providerSuccessfulReuse().
doTransform() stubs the HTTP client via createStub(Client::class), which causes the download plugin to produce a zero-byte file at the destination.
Before this patch, that empty file was silently accepted.
Now that FileCopy::transform() performs a post-copy integrity check, the zero-byte file is correctly rejected and the dataset would always fail.
The "use existing" behavior for remote sources is not retested here because it would require a download plugin mock returning a non-empty file; the remote download path is already exercised by testDownloadRemoteUri() and testZeroByteRemoteDownload().