Support for Drupal 7 is ending on 5 January 2025—it’s time to migrate to Drupal 10! Learn about the many benefits of Drupal 10 and find migration tools in our resource center.
Problem/Motivation
When migrating data into Drupal, especially ongoing migrations from a feed, we want to style the imported data to be consistent with the rest of the site. One way of doing this is to use the markup (HTML elements and CSS classes) configured for CKEditor.
Proposed resolution
Add a process plugin that can be configured to apply specific styles defined for the Styles menu in CKEditor.
Here is the sample usage from the doc block in the proposed plugin:
process:
'body/value':
-
plugin: dom
method: import
source: 'body/0/value'
-
plugin: dom_apply_styles
format: full_html
rules:
-
xpath: '//b'
style: Bold
-
xpath: '//span/i'
style: Italic
depth: 1
-
plugin: dom
method: export
Remaining tasks
User interface changes
None.
API changes
None.
Data model changes
None.
Comment | File | Size | Author |
---|---|---|---|
#4 | 3042539-apply-styles-4.patch | 11.41 KB | benjifisher |
#4 | interdiff-3042539-2-4.txt | 9.99 KB | benjifisher |
Comments
Comment #2
benjifisherHere is a patch that adds the process plugin I have been using. I modified the namespace, and I updated it to use the base class introduced in #2958285: Allow replacing based on a xpath expression. Because it uses that base class, I am marking this issue postponed.
Comment #3
marvil07 CreditAttribution: marvil07 as a volunteer commented#2958285: Allow replacing based on a xpath expression was accepted \o/, back to NW.
Comment #4
benjifisherI updated the plugin class to be compatible with the base class we added in #2958285: Allow replacing based on a xpath expression and I added test coverage.
I started with the 8.x-4.x branch, but I see that tests are configured to run against the 8.x-5.x branch. I hope it does not make much difference.
This is the first time I have use mock objects in a unit test. I hope I used them appropriately. In particular, I used the Prophecy approach, although I see that MigrateProcessTestCase uses
getMockBuilder()
. I do not think this causes any technical problems, but perhaps for consistency I should have used the same approach as the base class.Writing the tests forced me to look at the validation again, and I noticed that I was testing for problems that would already have thrown exceptions in the
setStyles()
method. So I moved some of the validation into that method and renamed the originalvalidateConfiguration()
tovalidateRules()
.This patch could use some manual testing, but the unit test is passing locally, so I am optimistic that it will work.
Comment #5
benjifisherUpdating issue tags and summary.
Comment #6
alisonWorks for me, with migrate_plus 4.2! Very cool, thank you!
Maybe also would be cool -- a little more "forgiving" matching? The "styles" in D7 (on the site I'm working on, anyway) just put a
style
attribute right on the element -- an unfortunate situation I just ran into while testing this plugin was that in a couple cases, there was a trailing semicolon at the end of the style attribute, and in other cases, there wasn't -- i.e.xpath: '//div[@style="background:#eeeeee;border:1px solid #cccccc;padding:5px 10px;min-height:350px"]'
^^ didn't match when there was a semicolon after 350px--
style="background:#eeeeee;border:1px solid #cccccc;padding:5px 10px;min-height:350px;"
(I know it's not necessarily simple to have partial matching with xpath -- I just had to learn to do a whole xpath->registerPHPFunctions + preg_match thing for a custom plugin -- just figured I'd share the idea, maybe y'all know how to do it :) )
Anyway, totally wonderful plugin, works great, thanks!
Comment #7
benjifisher@alisonjo2786:
Thanks for testing this plugin.
I do not think we want to do custom search-and-replace at the level of the process plugin. You have a lot of flexibility when specifying the
xpath
parameter, and I think it makes sense to leave that part of the job (i.e., figuring out the right XPath expression to use) to the person writing the migration.One option is to specify a few rules with the same
style
parameter. If you need to be as specific with your selectors as your examples, then something like this should work:Comment #8
benjifisherAdd sample usage to the issue summary.
Comment #9
alison@benjifisher cool cool, and good point, I can definitely do that.
If it matters, the kind of searching I mean is to overcome the lack of regex support in xpath 1.0 (another thing I just learned about in the last week) -- example of the kind of usage I mean:
(Not that that means it's critical for this process plugin -- just sharing/explaining!)
Thank you!
Comment #10
alisonOk one more "dream world" idea -- could there be access to other things like, just, the bold/italic button behavior, and format > h2-3-4-5-6......?
I might write a custom thing for it, but, just wanted to put it out there.
(i.e. the D7 source site I'm working with has a whole bunch of unfortunate instances of
<h4><strong><u>
going on, that I'd love to just replace with<h4>
-- also,<span style="font-weight:bold">
that I'd like to replace with<strong>
(without having to add "Bold" or "Heading 4" to my Styles dropdown -- though, I might do that just for during our migration process, as a workaround!)Comment #11
alison#6, me:
OH MY GOD "contains" -- that's what I can do, i.e.
xpath: '//div[contains(@style,"background:#eeeeee;border:1px solid #cccccc;padding:5px 10px;min-height:350px")]'
AGH of course, duh, #facepalm, etc.
(It's not an actual regex, of course, but it takes care of many of my situations, including, quite nicely-- ) ...all our cases of
<img style=" etc etc">
where they've manually set width, height, and floating (among other things), and I want to get all cases of img tags withfloat:left/right
and put a Style on them, instead of the inline style nonsense -- but, the manually-set width/height are different on every image, soooo I can do...Comment #12
marvil07 CreditAttribution: marvil07 at Isovera for Pegasystems commented@benjifisher, thanks for the adaptation of the class after the base plugin got in, and for adding test coverage.
Tests look good, and run OK.
@alisonjo2786, I am glad the plugin is already useful to you.
Re #10 "dream world" idea: Sounds like a new process plugin, maybe an extension to
Drupal\migrate_plus\Plugin\migrate\process\DomStrReplace
, but since it is changing html structure, probably extendingDomProcessBase
makes more sense, maybe a "dom_element_replace".After taking a look at the latest patch, and considering @alisonjo2786 is also using it, I am marking it as RTBC.
On tests, I think using prophecy is OK, let us see what maintainer thinks.
Comment #13
benjifisher@marvil07: Thanks for the review.
@alisonjo2786: I suppose that the plugin could be extended: if the
format
key is supplied, then each rule should specifyxpath
andstyle
, the way it works now. Otherwise, each rule should specifyxpath
,element
, and (optional)classes
. Also,depth
would be optional in either case.We might also allow a different format, or none, for each rule.
Despite the overhead of creating another plugin, there are some advantages of doing so, as @marvil07 suggested. Interacting with the configuration system complicates the code, including the test class. I prefer to avoid that complication when we do not need it (because we are supplying the HTML elements and CSS classes in the configuration).
If we do decide to have separate plugins, I think that
If we do that, then it will be a lot easier if we do it before this patch is accepted. So let's decide now.
Since this issue is already RTBC, I will wait for input from one of the maintainers before making any further changes.
Comment #14
benjifisherI thought of another option. If you do not want to clutter up Basic HTML (or whatever text format your content editors actually use) then you can easily add another text format. Perhaps call it
migrate_html
. Then add style rules to this extra text format and use it with thedom_apply_styles
plugin.Comment #15
alison@benjifisher Fantastic idea, thank you, I will be doing that right away :)
I'm just catching up on this thread, your ideas in #13 are really interesting ideas, I need to re-read a couple more times to have it sink in and be able offer feedback (while we wait for input from a maintainer anyway).
Comment #16
alisonOk I encountered an issue -- maybe it's impractical to address, in which case we could just add a node in the docblock or something? Or maybe I just need to adjust my xpath/style config values. Anyway--
Goal: replace
with
Attempt:
Outcome:
because!!... the xpath result nodes share a parentNode! #facepalm
.....................
Again, I realize this might not be a practical thing to deal with -- if that's the case, I'd be happy to tweak the patch with a note in the docblock, just lmk.
Comment #17
alisonOof, well, I guess I have other instances... it's really just some unfortunate content. Like, I was all excited about using this to turn...
<h4><strong>Some heading</strong></h4>
into
<h4>Some heading</h4>
via
but the thing is, there are many cases of...
<h4><strong>About the strategy map</strong><strong>:</strong></h4>
...at which point, I'm back to, multiple xpath result nodes with the same parent node. And actually, this heading situation is even less reasonable to try to deal with in a repeatable way, compared to the unordered list example.
All that's to say, I welcome any advice, but, I'll probably give up on cleaning up this particular type of messy markup using this particular method :) (I still get a lot out of this plugin -- like all of my uses that don't require
depth
manipulation.)As the plugin docblock says (line 56):
Comment #18
benjifisherRight, the
depth
parameter is a bit of a kluge. If it works, that's nice.I am not sure how strict the DOMDocument class is about nesting rules, and I am also not sure whether
<ul><div>foo</div></ul>
is valid HTML. But I would try something like this for your first example:The idea is to replace the
<li>
elements with<p>
elements in the first pass, and then to remove the<ul>
(or replace it with a<div>
) in the second pass.In the second example, you might simply replace the
<strong>
elements (inside the<h4>
) with<span>
s.Another possibility is that you write some custom process plugins that do exactly what you want. If we are lucky, that experience will lead you to some common set of parameters, and then you can propose a new process plugin and I can review it. In fact, maybe it is simply a plugin that replaces an HTML element (specified by an xpath expression) and replaces it with its inner HTML.
Comment #20
heddnReally like the work here. Nicely documented and tested piece of usefulness.
Comment #21
alison@benjifisher funny, I did almost exactly that:
(The second "paragraph" thing ends up being "dissolved" along the way, b/c it won't actually allow
<p><p>things things</p><p>more thing things</p></p>
.)(Indeed, the example source markup I'm dealing with is, at best, not great, at worst, not valid :) )
..................
For the headings, I ended up putting things into my existing custom process plugin for rich text things -- sharing here in case it's helpful for other people: (excerpts)
At the very least, I hope to get to take the time to function-ize a lot of what I have going on in this custom process plugin (including these heading markup fixes), and then while I do that, I'll have a better sense of patterns/common params/etc., for a potential plugin to contribute back. *fingers crossed*
..................
Thanks again for all the help, and for the great plugin!
Comment #22
alisonCrap, sorry, didn't mean to change the status.......
Comment #23
alisonBlahhh, I'm so sorry... I think some fields were auto-filled to old values b/c I had this tab open for too long. (And actually, the comment preview doesn't include issue metadata, as it turns out.)