Problem/Motivation

When migrating into formatted text fields (text_long, text_with_summary), Drupal requires each field delta to include both a value and a format. The documented approach uses sub-property mapping:

field_body/value: source_body
field_body/format:
  plugin: default_value
  default_value: full_html

This works for single-value fields (cardinality = 1), but breaks for multi-value fields (cardinality > 1). When the source is an array of strings — common with XML sources like itemList/item/html — there is no clean way to set the format per-delta using existing plugins:

  • The /format + /value sub-property syntax sets one format for the whole field, not per-delta
  • sub_process requires arrays of associative arrays, but XML parsers return arrays of plain strings
  • array_chunk + sub_process fails when the XML parser returns a scalar string (single child element) instead of a 1-element array

The result: format is stored as NULL in the database, and the HTML renders as raw escaped text instead of being processed by the text format filter pipeline.

This is a longstanding gap. Core issue #2632814 documented the single-value workaround in 2015 but was closed without addressing multi-value fields.

Steps to reproduce

  1. Create a content type with a multi-value text_long field (cardinality = unlimited)
  2. Create a migration that maps an XML source with repeated child elements to that field:
    field_features: field_features
    
  3. Run the migration
  4. Check the database: SELECT field_features_format FROM node__field_features LIMIT 5
  5. Result: NULL for every row — HTML renders as raw text on the page

Proposed resolution

A text_format process plugin that wraps each value with the specified format, handling both scalar strings and arrays:

process:
  field_features:
    plugin: text_format
    source: field_features
    format: full_html

The plugin uses handle_multiples = TRUE and multiple() = TRUE so it receives the raw source (scalar or array), wraps each value into ['value' => $v, 'format' => $format], and returns the structured array that Drupal's entity field system requires.

Configuration keys:

  • format: (optional) The text format machine name. Defaults to basic_html.

The attached patch includes:

  • src/Plugin/migrate/process/TextFormat.php — the process plugin (~70 lines)
  • tests/src/Unit/process/TextFormatTest.php — 11 unit tests covering scalar, multi-value, NULL, empty, HTML entities, and format configuration
  • tests/src/Kernel/Plugin/migrate/process/TextFormatTest.php — 3 kernel tests that demonstrate the actual problem (bare strings → NULL format) and the fix (plugin output → correct format) against real entity field storage

Remaining tasks

  • Review the patch
  • Consider whether default format should be basic_html or configurable-only (no default)
  • Add change record documentation

User interface changes

None.

API changes

New process plugin text_format added. No changes to existing APIs.

Data model changes

None. The plugin produces the standard ['value' => ..., 'format' => ...] array structure that Drupal's entity field system already expects.

Command icon Show commands

Start within a Git clone of the project using the version control instructions.

Or, if you do not have SSH keys set up on git.drupalcode.org:

Comments

diamondsea created an issue. See original summary.

diamondsea’s picture

Status: Active » Needs review