Problem/Motivation
Manipulating HTML during migration is something that migration process plugins now handle via simple text manipulation, using regular expressions. It is more reliable to manipulate HTML using a parser, for example using the PHP DOMDocument and related classes.
Some manipulation examples can be found on #2958281: Allow manipulating html from process plugins. More reasons can be found on this issue.
Proposed resolution
One way to do it is to have a plugin to import the HTML string into an object representing the HTML, then several plugins that manipulate that object, and finally export the object back into a HTML string.
Remaining tasks
Current children tickets:
- #2958281: Allow manipulating html from process plugins: Contains the
dommigrate process plugin, which both imports and exports HTML/DOMObject. - #2958285: Allow replacing based on a xpath expression: Contains an analogue of
str_replace, but using aDOMObjectand xpath to find the target. - #2958672: Use migration lookup on text fields: Contains a child of
dom_str_replace, that usesmigration_lookupprocess plugin for replacing ids on aDOMObject - #3042539: Apply styles configured for CKEditor: Contains the
dom_apply_stylesmigrate process plugin, which applies styles configured for the Editor module based on xpath expressions in the plugin configuration.
User interface changes
N.A.
API changes
None for now, unless we want to generalize this and make migrate process plugins that can accept a subset of values, e.g. an interface, and handle incompatibilities.
If that is the case, I guess this would become more a Drupal core ticket than a migrate_plus one; since it is a major change in how process plugins work.
I would suggest to avoid this for now if possible, we can always try this when we have enough experience with the approach.
Data model changes
N.A.
Comments
Comment #2
marvil07 commentedAdding one more child: dom_migration_lookup_str_replace
Comment #3
benjifisherComment #4
benjifisher(comment moved to #2958281: Allow manipulating html from process plugins)
Comment #5
benjifisherComment #6
marvil07 commentedWe may want to use andypost's approach with https://wiki.php.net/rfc/opt_in_dom_spec_compliance on dom plugin.
See tangentially related MR https://git.drupalcode.org/project/drupal/-/merge_requests/8916 doing something similar for help topics.
Namely at https://git.drupalcode.org/project/migrate_plus/-/blob/6.0.x/src/Plugin/..., likely a new case for the import method, if we want to be backward compatible.