I ran into a really interesting issue. Somehow the text string that is getting processed in the \Drupal\gutenberg\Plugin\Filter\CommentDelimiterFilter process function is losing new line characters. I checked the DB and the node body has a value like this:

<!-- wp:paragraph -->\r\n<p>Lorem ipsum dolor sit amet</p>\r\n<!-- /wp:paragraph -->\r\n<!-- wp:custom/block -->\r\n<div>custom</div>\r\n<!-- /wp:custom/block-->

When it gets processed it looks like this

<!-- wp:paragraph --><p>Lorem ipsum dolor sit amet</p><!-- /wp:paragraph --><!-- wp:custom/block --><div>custom</div><!-- /wp:custom/block -->

The regex then matches through to the final "-->" and causes the div to be stripped.

This only just happened when I pushed to our production server. The production server is on PHP 7.3.8 and my local is on PHP 7.3.7 (I couldn't get 7.3.8 installed locally). I'm not sure if this is a PHP issues, an environment issue, or something else. My work around was to do the following:

class CommentDelimiterFilter extends FilterBase {

  /**
   * Process each delimiter.
   */
  public function process($text, $langcode) {

    $text = str_replace("><", ">\n<", $text);
    
    $lines = explode("\n", $text);

    $lines = preg_replace_callback('#<!-- \/?wp:.* \/?-->#', [$this, 'renderContent'], $lines);

    $text = implode("\n", $lines);

    return new FilterProcessResult($text);
  }

  /**
   * Callback function to process each delimiter.
   */
  private function renderContent($match) {
    return '';
  }
}

Any ideas why this is happening? I did try disabling all filters but it seems to be an odd line ending issue. If it's possible for Gutenberg output to be combined onto a single line, the above approach might be necessary. Let me know if you think this is a potential issue and want me to submit a patch.

thanks!

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

onedotover created an issue. See original summary.

onedotover’s picture

Similar issue with the Reusable Blocks filter.

If a reusable block is at the end of the content, or at the end of a group it will not appear. The markup looks like this:

<!-- wp:group -->
<div class="wp-block-group"><div class="wp-block-group__inner-container"><!-- wp:paragraph -->
<p>Test</p>
<!-- /wp:paragraph -->

<!-- wp:block {"ref":8} /--></div></div>
<!-- /wp:group -->

The regex for the reusable block filter fails on this because the line doesn't end with -->

The filters all seem to rely on the comment tags being the end of lines. Should Gutenberg be adding newlines before saving? If not I believe the workaround above will work for these cases.

sayco’s picture

Thank you for reporting the issue. I've checked on the latest dev branch and also on version 1.5. For both cases, I was using PHP 7.3.8 and everything was fine.

Could you be more specific about reproducing the issue?
How do you exactly adding the new lines and divs?
I was trying to put reusable blocks at the end of the content but also wasn't able to reproduce the issue.

Maybe it is your production environment?
Do you know the server OS? Usually, Linux and Windows are using the different end of line delimiter, so if you are using Linux for local development and Windows for production, then you might face issues like this. You could check for example for PHP_EOL constant value on both.
I'm not an expert in DB engines, but maybe you also have some different configuration values which somehow impacts new line characters?

onedotover’s picture

Thanks for responding! The second issue I mentioned is on both environments environments. To reproduce the second one you can do the following:
1. Create a Reusable Block (if one doesn't already exist)
2. Make a new page
3. Add the block to the content
4. Select the block and make it a group (this will wrap it in HTML and cause the issue)
5. Save

Dev Environment
Module: 8.1.6
Server: Linux (macOS 10.14.6)
DB: mysql 5.7.17
PHP_EOL is default

Prod Environment (Pantheon)
Module: 8.1.6
Server: Linux
DB: 10.0.23-MariaDB-log
PHP_EOL is default

I didn't realize it was MariaDB rather than mysql. I'll see what I can find regarding line ending differences between the two.

sayco’s picture

Status: Active » Needs review
FileSize
1002 bytes

I managed to reproduce the issue with reusable blocks. Basically, there was a bad assumption that the pattern will need to start and end with the comment. I removed ^ and & signs, so now it will match all blocks and transform them properly.

Here I provide a patch which fixes the issue. I did some testing, but still good to review it first.

Nevertheless, I have no idea how to reproduce your original issue. I don't know from here those prefixes are coming:
<!-- wp:custom/block -->

We are using only #<!-- wp:drupalblock prefixes for Drupal blocks.

BTW. MariaDB and MySQL have a common origin, so they are quite similar, I doubt that there will be some issues related to them, but still worth to try.

onedotover’s picture

Thanks sayco. Maybe I should have created these as separate issues. I combined into one issue because I think they are related.

The first issue is with any block and the way the CommentDelimiterFilter works. If two gutenberg comments are on one line (for whatever reason), then the CommentDelimiterFilter processing can fail because it can greedily capture more than it should.

The second issue was with the ReusableBlockFilter. Its check was dependent on a line so it failed if the closing comment was not at the end of a line. While your patch fixes the ReusableBlockFilter in this case, it makes is susceptible to the first issue.

I'm not sure we can trust that the comments are always on their own line (or can we?). Newline differences in operating systems, HTML editing, copying and pasting etc could all cause the comments to end on the same line. I think there needs to be additional string processing before matching. This would apply to BlockFilter, ReusableBlockFilter and CommentDelimiterFilter.

  • sayco committed 5f690d6 on 8.x-1.x
    Issue #3073122 by sayco: Comment filter can strip out HTML - fix grouped...
marcofernandes’s picture

Status: Needs review » Fixed

Closing this one because on 2.x we have a new parser and on 1.x, IIRC we pretty much port the way WordPress parses the content. Let's say it works as expected.
Any problems related to this it's best to create a new issue.

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.