Problem/Motivation

Manipulating HTML during migration is something that migration process plugins now handle via simple text manipulation, using regular expressions. It is more reliable to manipulate HTML using a parser, for example using the PHP DOMDocument and related classes.

Some manipulation examples can be found on #2958281: Allow manipulating html from process plugins. More reasons can be found on this issue.

Proposed resolution

One way to do it is to have a plugin to import the HTML string into an object representing the HTML, then several plugins that manipulate that object, and finally export the object back into a HTML string.

Remaining tasks

Current children tickets:

User interface changes

N.A.

API changes

None for now, unless we want to generalize this and make migrate process plugins that can accept a subset of values, e.g. an interface, and handle incompatibilities.
If that is the case, I guess this would become more a Drupal core ticket than a migrate_plus one; since it is a major change in how process plugins work.
I would suggest to avoid this for now if possible, we can always try this when we have enough experience with the approach.

Data model changes

N.A.

Comments

marvil07 created an issue. See original summary.

marvil07’s picture

Issue summary: View changes

Adding one more child: dom_migration_lookup_str_replace

benjifisher’s picture

Issue summary: View changes
benjifisher’s picture

benjifisher’s picture

Issue summary: View changes
marvil07’s picture

We may want to use andypost's approach with https://wiki.php.net/rfc/opt_in_dom_spec_compliance on dom plugin.

See tangentially related MR https://git.drupalcode.org/project/drupal/-/merge_requests/8916 doing something similar for help topics.

Namely at https://git.drupalcode.org/project/migrate_plus/-/blob/6.0.x/src/Plugin/..., likely a new case for the import method, if we want to be backward compatible.