Problem/Motivation

When migrating data into Drupal, especially ongoing migrations from a feed, we want to style the imported data to be consistent with the rest of the site. One way of doing this is to use the markup (HTML elements and CSS classes) configured for CKEditor.

Proposed resolution

Add a process plugin that can be configured to apply specific styles defined for the Styles menu in CKEditor.

Here is the sample usage from the doc block in the proposed plugin:

 process:
   'body/value':
     -
       plugin: dom
       method: import
       source: 'body/0/value'
     -
       plugin: dom_apply_styles
       format: full_html
       rules:
         -
           xpath: '//b'
           style: Bold
         -
           xpath: '//span/i'
           style: Italic
           depth: 1
     -
       plugin: dom
       method: export

Remaining tasks

User interface changes

None.

API changes

None.

Data model changes

None.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

benjifisher created an issue. See original summary.

benjifisher’s picture

Assigned: benjifisher » Unassigned
Issue summary: View changes
Status: Active » Postponed
Issue tags: +Needs tests
FileSize
6.65 KB

Here is a patch that adds the process plugin I have been using. I modified the namespace, and I updated it to use the base class introduced in #2958285: Allow replacing based on a xpath expression. Because it uses that base class, I am marking this issue postponed.

marvil07’s picture

Status: Postponed » Needs work
benjifisher’s picture

Status: Needs work » Needs review
FileSize
9.99 KB
11.41 KB

I updated the plugin class to be compatible with the base class we added in #2958285: Allow replacing based on a xpath expression and I added test coverage.

I started with the 8.x-4.x branch, but I see that tests are configured to run against the 8.x-5.x branch. I hope it does not make much difference.

This is the first time I have use mock objects in a unit test. I hope I used them appropriately. In particular, I used the Prophecy approach, although I see that MigrateProcessTestCase uses getMockBuilder(). I do not think this causes any technical problems, but perhaps for consistency I should have used the same approach as the base class.

Writing the tests forced me to look at the validation again, and I noticed that I was testing for problems that would already have thrown exceptions in the setStyles() method. So I moved some of the validation into that method and renamed the original validateConfiguration() to validateRules().

This patch could use some manual testing, but the unit test is passing locally, so I am optimistic that it will work.

benjifisher’s picture

Issue summary: View changes
Issue tags: -Needs tests

Updating issue tags and summary.

alison’s picture

Works for me, with migrate_plus 4.2! Very cool, thank you!

Maybe also would be cool -- a little more "forgiving" matching? The "styles" in D7 (on the site I'm working on, anyway) just put a style attribute right on the element -- an unfortunate situation I just ran into while testing this plugin was that in a couple cases, there was a trailing semicolon at the end of the style attribute, and in other cases, there wasn't -- i.e.
xpath: '//div[@style="background:#eeeeee;border:1px solid #cccccc;padding:5px 10px;min-height:350px"]'
^^ didn't match when there was a semicolon after 350px--
style="background:#eeeeee;border:1px solid #cccccc;padding:5px 10px;min-height:350px;"

(I know it's not necessarily simple to have partial matching with xpath -- I just had to learn to do a whole xpath->registerPHPFunctions + preg_match thing for a custom plugin -- just figured I'd share the idea, maybe y'all know how to do it :) )

Anyway, totally wonderful plugin, works great, thanks!

benjifisher’s picture

@alisonjo2786:

Thanks for testing this plugin.

I do not think we want to do custom search-and-replace at the level of the process plugin. You have a lot of flexibility when specifying the xpath parameter, and I think it makes sense to leave that part of the job (i.e., figuring out the right XPath expression to use) to the person writing the migration.

One option is to specify a few rules with the same style parameter. If you need to be as specific with your selectors as your examples, then something like this should work:

    plugin: dom_apply_styles
    rules:
       - 
          xpath: '//div[@style="background:#eeeeee;border:1px solid #cccccc;padding:5px 10px;min-height:350px"]'
          style: 'Callout'
       - 
          xpath: '//div[@style="background:#eeeeee;border:1px solid #cccccc;padding:5px 10px;min-height:350px;"]'
          style: 'Callout'
benjifisher’s picture

Issue summary: View changes

Add sample usage to the issue summary.

alison’s picture

@benjifisher cool cool, and good point, I can definitely do that.

If it matters, the kind of searching I mean is to overcome the lack of regex support in xpath 1.0 (another thing I just learned about in the last week) -- example of the kind of usage I mean:

  public function transform($value, MigrateExecutableInterface $migrate_executable, Row $row, $destination_property) {
    $this->init($value, $destination_property);
    $this->xpath->registerNamespace("php", "http://php.net/xpath");
    $this->xpath->registerPHPFunctions("preg_match");

    foreach ($this->xpath->query('//div[@class="btgrid"]') as $html_node) {
      $rowNode = $this->xpath->query('div[php:function("preg_match", "/^row row-\d+$/", string(@class))]', $html_node);
      // etc etc
    }
  // etc etc
  }

(Not that that means it's critical for this process plugin -- just sharing/explaining!)

Thank you!

alison’s picture

Ok one more "dream world" idea -- could there be access to other things like, just, the bold/italic button behavior, and format > h2-3-4-5-6......?

I might write a custom thing for it, but, just wanted to put it out there.

(i.e. the D7 source site I'm working with has a whole bunch of unfortunate instances of <h4><strong><u> going on, that I'd love to just replace with <h4> -- also, <span style="font-weight:bold"> that I'd like to replace with <strong> (without having to add "Bold" or "Heading 4" to my Styles dropdown -- though, I might do that just for during our migration process, as a workaround!)

alison’s picture

#6, me:

xpath: '//div[@style="background:#eeeeee;border:1px solid #cccccc;padding:5px 10px;min-height:350px"]'
^^ didn't match when there was a semicolon after 350px--
style="background:#eeeeee;border:1px solid #cccccc;padding:5px 10px;min-height:350px;"

OH MY GOD "contains" -- that's what I can do, i.e.

xpath: '//div[contains(@style,"background:#eeeeee;border:1px solid #cccccc;padding:5px 10px;min-height:350px")]'

AGH of course, duh, #facepalm, etc.

(It's not an actual regex, of course, but it takes care of many of my situations, including, quite nicely-- ) ...all our cases of <img style=" etc etc"> where they've manually set width, height, and floating (among other things), and I want to get all cases of img tags with float:left/right and put a Style on them, instead of the inline style nonsense -- but, the manually-set width/height are different on every image, soooo I can do...

     -
       plugin: dom_apply_styles
       format: full_html
       rules:
         -
           xpath: '//img[contains(@style,"float:left")]'
           style: 'Image: Align left'
         -
           xpath: '//img[contains(@style,"float:right")]'
           style: 'Image: Align right'
marvil07’s picture

Status: Needs review » Reviewed & tested by the community

@benjifisher, thanks for the adaptation of the class after the base plugin got in, and for adding test coverage.
Tests look good, and run OK.

@alisonjo2786, I am glad the plugin is already useful to you.
Re #10 "dream world" idea: Sounds like a new process plugin, maybe an extension to Drupal\migrate_plus\Plugin\migrate\process\DomStrReplace, but since it is changing html structure, probably extending DomProcessBase makes more sense, maybe a "dom_element_replace".

After taking a look at the latest patch, and considering @alisonjo2786 is also using it, I am marking it as RTBC.

On tests, I think using prophecy is OK, let us see what maintainer thinks.

benjifisher’s picture

@marvil07: Thanks for the review.

@alisonjo2786: I suppose that the plugin could be extended: if the format key is supplied, then each rule should specify xpath and style, the way it works now. Otherwise, each rule should specify xpath, element, and (optional) classes. Also, depth would be optional in either case.

We might also allow a different format, or none, for each rule.

Despite the overhead of creating another plugin, there are some advantages of doing so, as @marvil07 suggested. Interacting with the configuration system complicates the code, including the test class. I prefer to avoid that complication when we do not need it (because we are supplying the HTML elements and CSS classes in the configuration).

If we do decide to have separate plugins, I think that

  1. The current DomApplyStyles plugin should be renamed to DomApplyEditorStyles
  2. The new DomApplyEditorStyles should extend DomApplyStyles

If we do that, then it will be a lot easier if we do it before this patch is accepted. So let's decide now.

Since this issue is already RTBC, I will wait for input from one of the maintainers before making any further changes.

benjifisher’s picture

I thought of another option. If you do not want to clutter up Basic HTML (or whatever text format your content editors actually use) then you can easily add another text format. Perhaps call it migrate_html. Then add style rules to this extra text format and use it with the dom_apply_styles plugin.

alison’s picture

I thought of another option. If you do not want to clutter up Basic HTML (or whatever text format your content editors actually use) then you can easily add another text format. Perhaps call it migrate_html. Then add style rules to this extra text format and use it with the dom_apply_styles plugin.

@benjifisher Fantastic idea, thank you, I will be doing that right away :)

I'm just catching up on this thread, your ideas in #13 are really interesting ideas, I need to re-read a couple more times to have it sink in and be able offer feedback (while we wait for input from a maintainer anyway).

alison’s picture

Ok I encountered an issue -- maybe it's impractical to address, in which case we could just add a node in the docblock or something? Or maybe I just need to adjust my xpath/style config values. Anyway--

Goal: replace

<ul class="list-unstyled">
  <li>Apples</li>
  <li>Bananas</li>
  <li>Oranges</li>
</ul>

with

<p>Apples</p>
<p>Bananas</p>
<p>Oranges</p>

Attempt:

         -
           xpath: '//ul[contains(@class,"list-unstyled")]/li'
           style: 'Paragraph'
           depth: 1

Outcome:

Couldn't fetch DOMElement. Node no longer exists DomApplyStyles.php:194    [warning]
Error: Call to a member function replaceChild() on null in Drupal\migrate_plus\Plugin\migrate\process\DomApplyStyles->apply() (line 197 of /path/to/modules/contrib/migrate_plus/src/Plugin/migrate/process/DomApplyStyles.php)    [error]

because!!... the xpath result nodes share a parentNode! #facepalm

.....................
Again, I realize this might not be a practical thing to deal with -- if that's the case, I'd be happy to tweak the patch with a note in the docblock, just lmk.

alison’s picture

Oof, well, I guess I have other instances... it's really just some unfortunate content. Like, I was all excited about using this to turn...
<h4><strong>Some heading</strong></h4>
into
<h4>Some heading</h4>
via

    -
      xpath: '//h4/strong'
      style: 'Heading 4'
      depth: 1

but the thing is, there are many cases of...
<h4><strong>About the strategy map</strong><strong>:</strong></h4>
...at which point, I'm back to, multiple xpath result nodes with the same parent node. And actually, this heading situation is even less reasonable to try to deal with in a repeatable way, compared to the unordered list example.

All that's to say, I welcome any advice, but, I'll probably give up on cleaning up this particular type of messy markup using this particular method :) (I still get a lot out of this plugin -- like all of my uses that don't require depth manipulation.)

As the plugin docblock says (line 56):

 * You may get unexpected results if there is anything between the two opening
 * tags or between the two closing tags. That is, the code assumes that
 * '<span><i>' is closed with '</i></span>' exactly.
benjifisher’s picture

Right, the depth parameter is a bit of a kluge. If it works, that's nice.

I am not sure how strict the DOMDocument class is about nesting rules, and I am also not sure whether <ul><div>foo</div></ul> is valid HTML. But I would try something like this for your first example:

       -
         dom_aply_styles:
           xpath: '//ul[contains(@class,"list-unstyled")]/li'
           style: 'Paragraph'
       -
         dom_aply_styles:
           xpath: '//ul[contains(@class,"list-unstyled")]'
           style: 'Div'

The idea is to replace the <li> elements with <p> elements in the first pass, and then to remove the <ul> (or replace it with a <div>) in the second pass.

In the second example, you might simply replace the <strong> elements (inside the <h4>) with <span>s.

Another possibility is that you write some custom process plugins that do exactly what you want. If we are lucky, that experience will lead you to some common set of parameters, and then you can propose a new process plugin and I can review it. In fact, maybe it is simply a plugin that replaces an HTML element (specified by an xpath expression) and replaces it with its inner HTML.

  • heddn committed 34c7429 on 8.x-5.x authored by benjifisher
    Issue #3042539 by benjifisher, alisonjo2786, marvil07, heddn: Apply...
heddn’s picture

Version: 8.x-4.x-dev » 8.x-5.x-dev
Status: Reviewed & tested by the community » Fixed

Really like the work here. Nicely documented and tested piece of usefulness.

alison’s picture

Version: 8.x-5.x-dev » 8.x-4.x-dev
Status: Fixed » Reviewed & tested by the community

@benjifisher funny, I did almost exactly that:

      -
        plugin: dom_apply_styles
        rules:
          -
            xpath: '//ul[contains(@class,"list-unstyled")]/li'
            style: 'Paragraph'
          -
            xpath: '//ul[contains(@class,"list-unstyled")]'
            style: 'Paragraph'

(The second "paragraph" thing ends up being "dissolved" along the way, b/c it won't actually allow <p><p>things things</p><p>more thing things</p></p>.)

(Indeed, the example source markup I'm dealing with is, at best, not great, at worst, not valid :) )

..................
For the headings, I ended up putting things into my existing custom process plugin for rich text things -- sharing here in case it's helpful for other people: (excerpts)

class MyThingDomStrReplaceBtgrid extends DomProcessBase {

  /**
   * {@inheritdoc}
   */
  public function transform($value, MigrateExecutableInterface $migrate_executable, Row $row, $destination_property) {
    $this->init($value, $destination_property);

    //
    // other stuff........
    //

    // remove <strong> elems inside of headings, ew
    $head3StrongNode = $this->xpath->query("//h3/strong");
    foreach ($head3StrongNode as $hd3StrElem) {
      $formattingElemsToRemove[] = $hd3StrElem;
      while ($hd3StrElem->hasChildNodes()) {
        $child = $hd3StrElem->removeChild($hd3StrElem->firstChild);
        $hd3StrElem->parentNode->insertBefore($child, $hd3StrElem);
      }
    }
    $head4StrongNode = $this->xpath->query("//h4/strong");
    foreach ($head4StrongNode as $hd4StrElem) {
      $formattingElemsToRemove[] = $hd4StrElem;
      while ($hd4StrElem->hasChildNodes()) {
        $child = $hd4StrElem->removeChild($hd4StrElem->firstChild);
        $hd4StrElem->parentNode->insertBefore($child, $hd4StrElem);
      }
    }
    foreach($formattingElemsToRemove as $formattingElem){
      $formattingElem->parentNode->removeChild($formattingElem);
    }

    //
    // other stuff........
    //

    return $this->document;
  }
}

At the very least, I hope to get to take the time to function-ize a lot of what I have going on in this custom process plugin (including these heading markup fixes), and then while I do that, I'll have a better sense of patterns/common params/etc., for a potential plugin to contribute back. *fingers crossed*

..................
Thanks again for all the help, and for the great plugin!

alison’s picture

Status: Reviewed & tested by the community » Fixed

Crap, sorry, didn't mean to change the status.......

alison’s picture

Version: 8.x-4.x-dev » 8.x-5.x-dev

Blahhh, I'm so sorry... I think some fields were auto-filled to old values b/c I had this tab open for too long. (And actually, the comment preview doesn't include issue metadata, as it turns out.)

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.