Problem/Motivation

CKEditor 5 changes the HTML Structure almost immediately. This doesn't affect the pre-existing HTML structure of different pages until and unless we open those respective nodes in /edit mode.

Steps to reproduce

  1. Setup a D10 Site.
  2. Enable CKEDitor 5
  3. Configure any text format to use "CKEditor 5" as the text editor in /admin/config/content/formats
  4. Input the following in "Source" -
    <div class="social-media">
    <span>Share</span> 
    <span class="icon">
    <a href="#" target="_blank" rel="noopener">
    <em class="fa-fw fa-twitter fab">&nbsp;</em>
    </a>
    </span>
    </div>
  5. The structure gets changed into -
    <div class="social-media">
    <span>Share</span>&nbsp;
    <a href="#" target="_blank" rel="noopener">
    <em class="fa-fw fa-twitter fab">
    <span class="icon">&nbsp;</span>
    </em>
    </a>
    </div>

Proposed resolution

Make sure that the HTML Structure doesn't get changed.

Comments

IGhosh created an issue. See original summary.

wim leers’s picture

Category: Bug report » Support request
Priority: Critical » Normal
Status: Needs work » Postponed (maintainer needs more info)
Related issues: +#3274028: CKEditor 5 compatibility
ighosh’s picture

Hi Wim, I am giving another example of the HTML DOM Restructuring which is not how I need it to be -
Input -
<div class="container"> <span class="icon"><a href="#" target="_blank"><em>SOMETHING</em></a></span></div>
Output -
<div class="container"><a href="#" target="_blank"><em><span class="icon">SOMETHING</span></em></a></div>
The main issue with this kind of restructuring is that it affects all the existing nodes the moment they are opened up in "/edit" and re-saved. I am currently backtracking on what happens when I click on the "Source" button. Maybe some .js file gets called which in turn filters out and restructures the DOM(?).

wim leers’s picture

Title: CKEditor 5 Changes HTML Structure » [upstream] CKEditor 5 reorganizes inline HTML tags: <a> first, then <em>, then <span>
Category: Support request » Bug report
Issue tags: +Needs upstream bugfix, +Needs tests, +data loss

I can see how that's disruptive. But that sure looks like some pretty questionable HTML 😅 That makes this difficult to describe and report. Can you still reproduce this without the <em> too? Try to find the smallest possible pattern, and then verify that it works with multiple tag combinations. That'd help report this upstream, and would result in a higher priority upstream.

ighosh’s picture

@Wim, I removed <em> from my DOM -

<div class="container">
<span>
<a href="#">Test</a>
</span>
</div>

It is getting changed into -

<div class="container">
    <a href="#"><span>Test</span></a><span>&nbsp;</span>
</div>

However, upon further testing, this is not the only instance where the DOM is getting changed. I am checking on a few more instances of HTML structure, where the DOM is getting changed. Will keep everything updated here.

wim leers’s picture

Thanks!

ighosh’s picture

Status: Postponed (maintainer needs more info) » Needs work

Tested for these cases -
Input -
<a href="#">&nbsp;</a>
Output - Entire Thing Got Removed. However, if I enter <a href="#">Lorem Ipsum</a>, it works.
Another test case -
Input -

<a aria-label="Lorem Ipsum" class="lorem-ipsum-class" href="#" rel="noopener" target="_blank">
  <svg fill="none" height="18" viewbox="0 0 16 18" width="16" xmlns="http://www.w3.org/2000/svg">
    <path d="M1 2.6554C1 1.48814 2.27454 0.768165 3.27427 1.37068L13.8017 7.71531C14.7693 8.29848 14.7693 9.70157 13.8017 10.2847L3.27427 16.6294C2.27454 17.2319 1 16.5119 1 15.3447V2.6554Z"
      stroke="#14142B" stroke-linecap="round" stroke-linejoin="round" stroke-width="2"></path>
  </svg>
  Watch
</a>

Output -

<a class="lorem-ipsum-class" href="#" aria-label="Lorem Ipsum" rel="noopener" target="_blank"><svg fill="none" height="18" viewBox="0 0 16 18" width="16" xmlns="http://www.w3.org/2000/svg">
    <path d="M1 2.6554C1 1.48814 2.27454 0.768165 3.27427 1.37068L13.8017 7.71531C14.7693 8.29848 14.7693 9.70157 13.8017 10.2847L3.27427 16.6294C2.27454 17.2319 1 16.5119 1 15.3447V2.6554Z" stroke="#14142B" stroke-linecap="round" stroke-linejoin="round" stroke-width="2"></path>
  </svg></a>
<p>
    <a class="lorem-ipsum-class" href="#" aria-label="Lorem Ipsum" rel="noopener" target="_blank">&nbsp;Watch</a>
</p>

Here, the anchor tag that contains the main <svg><path></path></svg> is being copied after the main anchor tag, and put inside a paragraph (<p></p>).

ighosh’s picture

Title: [upstream] CKEditor 5 reorganizes inline HTML tags: <a> first, then <em>, then <span> » [upstream] CKEditor 5 Restructures And Removes Inline HTML Tags
mvonfrie’s picture

That is related to the HTML normalization "feature" of CKeditor 5. See https://github.com/ckeditor/ckeditor5/issues/16203 for more examples.

quietone’s picture

Version: 10.1.x-dev » 11.x-dev
ighosh’s picture

Regarding this issue, I found that there was no easy way to "fix" the problem. As this is not an issue in the first place. Meaning, that CKEditor 5 was altering the HTML code because the code itself was wrong (obviously). So, I updated the structure of the DOM via code using an update hook to queue all nodes where I needed my DOM processing to take place and then created a QueueWorker to process the DOM.
Here is a gist of how the work has been done. Please note that I have targetted only those nodes using some specific paragraph components as the DOM alteration was taking place in those nodes containing some specific components.
Update Hook -

/**
 * Implements hook_update_N().
 *
 * CKEditor 5 Components Update.
 */
function ckeditor_5_update_9250() {
  $node_data = \Drupal::entityTypeManager()->getStorage('node');
  $paragraph_data = \Drupal::entityTypeManager()->getStorage('paragraph');
  // Components Array.
  $components_array = [
    'lorem_ipsum_component_name',
    'lorem_ipsum_component_name_1',
    'lorem_ipsum_component_name_2',
  ];
  // Get Field Map For Entity Reference Revisions.
  $paragraph_bundles = \Drupal::service('entity_field.manager')->getFieldMapByFieldType('entity_reference_revisions');
  $nodes = [];
  if ($paragraph_bundles) {
    foreach ($paragraph_bundles as $index => $paragraph_field) {
      // Check If The Bundle Is For Nodes.
      if ($index == 'node') {
        foreach ($paragraph_field as $field_name => $field_info) {
          $paragraph_field_load = FieldStorageConfig::loadByName('node', $field_name);
          // Check If The Field's Target type Is Paragraph.
          if ($paragraph_field_load->getSettings()['target_type'] == 'paragraph') {
            foreach ($components_array as $component_name) {
              $paragraph_load = $paragraph_data->loadByProperties(['type' => $component_name]);
              foreach ($paragraph_load as $paragraph_id => $paragraph) {
                // Check If Nodes Use The Components.
                if (count($node_data->loadByProperties([$field_name => $paragraph_id]))) {
                  $paragraph_bundle = $paragraph->bundle();
                  $nodes[$paragraph_bundle][] = $node_data->loadByProperties([$field_name => $paragraph_id]);
                }
              }
            }
          }
        }
      }
    }
  }
  // If There Are Nodes Associated With Components.
  if ($nodes) {
    // Array To Store Nodes' Group With More Than One Element In A Separate
    // Index.
    $excess_nodes = [];
    foreach ($nodes as $component => $nodes_group) {
      foreach ($nodes_group as $node_group) {
        // Check If The Array Group Has More Than One Element.
        if (count($node_group) > 1) {
          foreach ($node_group as $node) {
            $excess_nodes[$component][] = [$node];
          }
        }
        else {
          $excess_nodes[$component][] = $nodes_group;
        }
      }
      $nodes = $excess_nodes;
    }
    // Remove Duplicate Nodes, And Store Unique Nodes In A Separate Array.
    $unique_nodes = [];
    foreach ($nodes as $component => $unique_node_group) {
      foreach ($unique_node_group as $node) {
        $unique_nodes[$component] = array_values(array_map('unserialize', array_unique(array_map('serialize', $node))));
      }
    }
    /** @var \Drupal\Core\Queue\QueueInterface $queue */
    $queue = \Drupal::service('queue')->get('ckeditor5_components_update');
    foreach ($unique_nodes as $component => $node_group) {
      foreach ($node_group as $node) {
        $item = new \stdClass();
        $item->nodes = $node;
        $item->components = $component;
        $queue->createItem($item);
      }
    }
  }
}

QueueWorker -

public function processItem($data) {
    $nodes = (array) $data->nodes;
    $node = reset($nodes);
    $node_id = $node->id();
    $referenced_entities = $node->referencedEntities();
    foreach ($referenced_entities as $field) {
      if ($field instanceof Paragraph) {
        $fields = [
          'field_html_section',
          'field_html',
        ];
        $paragraph_id = $field->bundle();
        foreach ($fields as $main_html_field) {
          if ($field->hasField($main_html_field) && $field->get($main_html_field)->value) {
            $html_value = $field->get($main_html_field)->value;
            $html_value = mb_convert_encoding($html_value, 'HTML-ENTITIES', 'UTF-8');
            $dom = new \DOMDocument();
            $dom->loadHTML($html_value, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
         }
       }
     }
   }
  }

Then, in the QueueWorker, used switch-case to target each paragraph and its corresponding DOM processing.
Hope this helps someone :)

mvonfrie’s picture

Why is your first example <a href="#">&nbsp;</a> (obviously) wrong? Syntactically it is totally correct, but of course semantically this doesn't make sense because the user will never be able to click this link. If this for some reason is used as a trap link (kind of honeypot) with a special url, you would know that the link has been "clicked"/followed by a robot and not a human, then it makes sense again.

Would be interesting what CKeditor5 does with this? <a name="top">&nbsp;</a> This is an invisible anchor which can be used as jump target (a "Top" button at the end of the page or floating at the bottom to jump back to the start of the page (after header, banner image etc.).

In my opinion, CKeditor5 should correct syntactically wrong markup but not interpret syntactically correct markup which maybe makes no sense, as it cannot know a developer's intentions.

skowyra’s picture

We've also been running into this behavior; html tags and classes get stripped out in CKEditor 5. I can see where normalization could be the culprit, but in our case, we have a clunky work-around when the behavior occurs. If resaving doesn't work after several attempts, we copy the content (Source), paste into a text editor, add the new class or html there, copy the updated content back. That usually works.

The fact that we can eventually save it indicates that normalization would not be the root cause. Let me add, this behavior occurs in nodes and webforms, plus we use Site Studio where it occurs in our components.

This started happening when we upgraded to CKEditor 5. We're currently on Drupal 10.2.4, but will be going up to 10.3 very soon.

lisa.rae’s picture

Also encountered this issue on a site that was recently upgraded from CKEditor4 to CKEditor5. Edited a footer block that was created with CKEditor4 to make a minor text edit. The block also contained fontawesome social media icons, which were not affected by the text edits made.

Saving the footer resulted in the FontAwesome social media icons getting wrapped in <em></em> tags.

Original content:

<div class="column medium-3"><img class="footer-logo" src="/themes/custom/usap_base/source/images/usap/svg/footer-logo-b.svg" alt="USAP Logo" width="201" height="57"></div><div class="column medium-3"><p><br>&nbsp;</p></div><div class="column medium-2"><div id="block-footersocial"><ul class="social"><li><a class="offsite" href="http://facebook.com/usanesthesiapartners" target="_blank"><i class="fa fa-facebook"><span class="visually-hidden">Facebook</span>&nbsp;</i></a></li><li><a class="offsite" href="http://twitter.com/USAP_Updates" target="_blank"><i class="fa fa-twitter"><span class="visually-hidden">Twitter</span>&nbsp;</i></a></li><li><a class="offsite" href="http://linkedin.com/company/us-anesthesia-partners/" target="_blank"><i class="fa fa-linkedin"><span class="visually-hidden">Linkedin</span>&nbsp;</i></a></li><li><a class="offsite" href="http://instagram.com/usanesthesiapartners" target="_blank"><i class="fa fa-instagram"><span class="visually-hidden">Instagram</span>&nbsp;</i></a></li></ul></div></div><div class="column small-3"><p class="copyright">?2021&nbsp;U.S. Anesthesia Partners. All rights reserved.</p><p><a href="/terms-and-conditions">Terms &amp; Conditions</a></p></div>

Changed to:

<div class="column medium-3"><img class="footer-logo" src="/themes/custom/usap_base/source/images/usap/svg/footer-logo-b.svg" alt="USAP Logo" width="201" height="57"></div><div class="column medium-3"><p><br>&nbsp;</p></div><div class="column medium-2"><div id="block-footersocial"><ul class="social"><li><a class="offsite" href="http://facebook.com/usanesthesiapartners" target="_blank"><em><i class="fa fa-facebook"><span class="visually-hidden">Facebook</span>&nbsp;</i></em></a></li><li><a class="offsite" href="http://twitter.com/USAP_Updates" target="_blank"><em><i class="fa fa-twitter"><span class="visually-hidden">Twitter</span>&nbsp;</i></em></a></li><li><a class="offsite" href="http://linkedin.com/company/us-anesthesia-partners/" target="_blank"><em><i class="fa fa-linkedin"><span class="visually-hidden">Linkedin</span>&nbsp;</i></em></a></li><li><a class="offsite" href="http://instagram.com/usanesthesiapartners" target="_blank"><em><i class="fa fa-instagram"><span class="visually-hidden">Instagram</span>&nbsp;</i></em></a></li></ul></div></div><div class="column small-3"><p class="copyright">?2021&nbsp;U.S. Anesthesia Partners. All rights reserved.</p><p><a href="/terms-and-conditions">Terms &amp; Conditions</a></p></div>

Version: 11.x-dev » main

Drupal core is now using the main branch as the primary development branch. New developments and disruptive changes should now be targeted to the main branch.

Read more in the announcement.