All my migrations previously worked with XML files encoded in UTF-16LE but were suddenly broken after upgrading to Migrate Plus 4.2.

Drupal\migrate\MigrateException: Fatal Error 73: expected '>'
Line: 542
Column: 20
File:  in Drupal\migrate_plus\Plugin\migrate_plus\data_parser\SimpleXml->openSourceUrl() (line 51 of modules/contrib/migrate_plus/src/Plugin/migrate_plus/data_parser/SimpleXml.php).

It turns out that the issue #3046753 Make XML parser more resilient introduced a call with trim() before simplexml_load_string()

protected function openSourceUrl($url) {
    // Clear XML error buffer. Other Drupal code that executed during the
    // migration may have polluted the error buffer and could create false
    // positives in our error check below. We are only concerned with errors
    // that occur from attempting to load the XML string into an object here.
    libxml_clear_errors();

    $xml_data = $this->getDataFetcherPlugin()->getResponseContent($url);
    $xml = simplexml_load_string(trim($xml_data));
    foreach (libxml_get_errors() as $error) {
      $error_string = self::parseLibXmlError($error);
      throw new MigrateException($error_string);
    }
    $this->registerNamespaces($xml);
    $xpath = $this->configuration['item_selector'];
    $this->matches = $xml->xpath($xpath);
    return TRUE;
  }

The function trim() is not safe when working with multibyte encoded string, whereas SimpleXML can perfectly handle multibyte data. I don't think it necessary to call trim() before simplexml_load_string. If your XML has an empty line before the openning tag, your XML is not well-formed and required special treatment. Adding trim() to the generic parser will prevent it from working properly with Unicode data.

Comments

sonnykt created an issue. See original summary.

sonnykt’s picture

Status: Active » Needs review
StatusFileSize
new676 bytes

Patch to remove the trim call.

sonnykt’s picture

Issue summary: View changes
3li’s picture

#2 removing the trim method has solved my issue.

nadim hossain’s picture

StatusFileSize
new676 bytes

Re-rolled the patch against 6.x

nadim hossain’s picture

Version: 8.x-4.x-dev » 6.0.x-dev

  • heddn committed ed7dab98 on 6.0.x
    Issue #3051858 by sonnykt, nadim hossain, 3li, heddn: Simple XML broken...
heddn’s picture

Status: Needs review » Fixed

Thanks for your contributions.

  • heddn committed 6edde178 on 6.0.x
    Revert "Issue #3051858 by sonnykt, nadim hossain, 3li, heddn: Simple XML...
heddn’s picture

Status: Fixed » Needs work

Reverted the commit as it broke tests. More work needed.