Problem/Motivation

CSV files can have column names with spaces. This produces unintended effects when using the default ID map plugin: \Drupal\migrate\Plugin\migrate\id_map\Sql

On PHP 8.1, you get the following warning when rolling back a migration that has a column name with a space.
[warning] Undefined array key "Some Header with Space" Sql.php:232

That happens in \Drupal\migrate\Plugin\migrate\id_map\Sql::getSourceIdsHash

This is because the column is used an alias for the field to be added in the SELECT statement. And aliases are spaced in `\Drupal\Core\Database\Connection::escapeAlias`. Per the method description: “Escapes an alias name string. Force all alias names to be strictly alphanumeric-plus-underscore”.

Steps to reproduce

  1. In a site running PHP 8.1 or higher, create a CSV migration that has a column name with a space used as ID. For example: "Project ID".
  2. Import the migration.
  3. Rollback the migration.

Proposed resolution

Document that column names have to be alphanumeric characters or underscores.

Remaining tasks

  • Create and review patch.
  • Evaluate if we need to mention that this happens when the ID map uses or extends \Drupal\migrate\Plugin\migrate\id_map\Sql.
  • Shall we file a similar issue in https://www.drupal.org/project/migrate_plus since this could also happen in JSON files which that module supports?
  • Shall we create an issue in Drupal core to document this as part of `\Drupal\migrate\Plugin\MigrateSourceInterface::getIds`?

User interface changes

None.

API changes

None.

Data model changes

None.

Notes

This issue was discussed in slack https://drupal.slack.com/archives/C226VLXBP/p1689078305424729

For reference, below is a comment with some of the research into the issue.

  1. In \Drupal\migrate\Plugin\migrate\id_map\Sql::deleteDestination there is a call to \Drupal\migrate\Plugin\migrate\id_map\Sql::lookupSourceId
  2. In there, Each source key is added to the query via $query->addField('map', $id_map_field_name, $source_field_name);
  3. Later in the same method, there is a $query->execute() which eventually reaches \Drupal\Core\Database\Query\Select::__toString
  4. When the query is assembling the list of (query) fields to be added in the SELECT statement via the foreach ($this->fields as $field) loop, there is a call to \Drupal\Core\Database\Connection::escapeAlias
  5. And in there, the alias is escaped using preg_replace('/[^A-Za-z0-9_]+/', '', $field). Per the method description: “Escapes an alias name string. Force all alias names to be strictly alphanumeric-plus-underscore”. In there Some Value gets transformed to SomeValue
  6. Back in deleteDestination , there is a call to $this->getSourceIdsHash($source_id_values)
  7. In \Drupal\migrate\Plugin\migrate\id_map\Sql::getSourceIdsHash, the $source_id_values argument is keyed by the escaped field alias. But the call to $this->sourceIdFields() returns the unescaped raw source key.
  8. So, in $source_id_values[$field_name] , the raw Some Value does not exists as a key. By then, the key has changed to SomeValue. And this is what triggers the error.
  9. And this probably has other unintended consequences. Also in getSourceIdsHash , there is a call to hash which receives $source_id_values as an argument. At this point, I stopped investigating.

Comments

dinarcon created an issue. See original summary.

dinarcon’s picture

Status: Active » Needs review
StatusFileSize
new863 bytes

Attaching documentation patch for review. Shall we also mention this in the README.md file?

mlncn’s picture

Very impressive investigation and documentation! Would it be possible to throw a clear error when encountering a space… since someone (probably me) won't read the documentation and still be confused when the "inexplicable" error happens?

  • heddn committed 91b1488b on 8.x-3.x authored by dinarcon
    Issue #3379184 by dinarcon, heddn: Document that source IDs can only...
heddn’s picture

Status: Needs review » Fixed

Added some validation and tests and landed this. Thanks for the documentation and heads up that this is a problem.

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.

b2f’s picture

We should have address this problem to the root, we don't need to remove such important existing functionalities.

I patched lookupSourceId Sql.php method in Core so that source keys spaces (or any other special characters for that matter?) are not removed anymore:

https://www.drupal.org/project/drupal/issues/3422603

caesius’s picture

This update should be reverted per the above comment and issue #3413894: Revert requirement that IDs not include spaces or special characters

I'll also note that per the issue title and description, the resolution should have been to "Document that column names have to be alphanumeric characters or underscores" and yet the actual implementation prevents the use of spaces in column names entirely. This breaks backward compatibility with migrations that were created before this change was implemented and does not document an upgrade path.